Blog

Changing the Data Access Conversation for Healthcare AI

Share:

The recent announcement in VentureBeat about the “Major flaws found in machine learning for COVID-19 diagnosis” was no surprise to the BeeKeeperAI team at UCSF’s Center for Digital Health Innovation. While healthcare AI has the potential to improve outcomes and reduce treatment costs, the barrier of access to real-world data in sufficient quantities to produce a model that is capable of consistently performing in all healthcare environments is an extremely time-consuming and costly endeavor.

We were not surprised to learn that roughly half of the COVID and pneumonia detection models received no external validation. In order for an algorithm to be utilized within a clinical setting it must be capable of performing in an ethnically, clinically, and geographically agnostic manner. In other words, the algorithm must be generalizable across any and all healthcare settings. Achieving generalizability for a model requires that it be trained and validated on ALL of the variables it is likely to experience in the wild.

We were not surprised by the creative measures (e.g., “Frankenstein” datasets) that developers had used to overcome their data access issues. It takes as long as 36 months to secure access to sufficiently diverse data to develop generalizable healthcare AI, with costs in excess of $2.5M per model. Increasing cyberattacks on the very data that is required to create robust, generalizable healthcare AI amplifies the cost and complexity of the problem. Is it any wonder that people would be tempted to short-cut the process?

And we were not surprised that model owners were unwilling to share their code in order to validate the performance of their model. If a company has invested capital to create a model why would they want to share it? If a company is building a commercial entity on an AI model, all of the secret sauce is contained in a single table of model weights. Why would they want to risk compromising the security of that asset?

Sadly, the finding for the many models reviewed was that “none of the machine learning models . . . are likely candidates for clinical translation for the diagnosis/prognosis of COVID-19.” So, what’s to be done in an industry with a forecasted total addressable market of $45.2B*? Innovate!

BeeKeeperAI-logo.png

BeeKeeperAI is changing the data access conversation by applying privacy-preserving analytics to multi-institutional sources of protected health data in a confidential computing environment in which:

  1. The data owner’s data never leaves their HIPAA-protected environment;

  2. The data owner’s data is never shared nor exposed for attack; and

  3. The algorithm owner never has to expose their code base or model weights to a third party.

Miraculous? No! It’s called a zero-trust environment and it’s enabled by new confidential computing technology. Through BeeKeeperAI, technology now exists to enable academic medical organizations to leverage their data to accelerate the pace of healthcare AI innovation. Through our collaboration with Fortanix, Intel, and Microsoft we are looking forward to announcing our progress in the days and months ahead.

* https://www.marketsandmarkets.com/Market-Reports/artificial-intelligence-healthcare-market-54679303.html