Daphne Koller is best known as the cofounder of Coursera, the open database for online learning that launched in 2012. But before her work on Coursera, she was doing something much different. In 2000, Koller started working on applying machine learning to biomedical datasets to understand gene activity across cancer types. She put that work on hold to nurture Coursera, which took many more years than she initially thought it would. She didn’t return to biology until 2016 when she joined Alphabet’s life science research and development arm Calico.
Two years later, Koller started Insitro, a drug discovery and development company that combines biology with machine learning. “I’m actually coming back to this space,” she says.
There’s a lot of hope that artificial intelligence could help speed up the time it takes to make a drug and also increase the rate of success. Several startups have emerged to capitalize on this opportunity. But Insitro is a bit different from some of these other companies, which rely more heavily on machine learning than biology.
By contrast, Insitro has taken the time to build a cutting-edge laboratory, an expensive and time consuming project. Still, having equal competency in lab based science and computer science may prove to be the winning ticket. Though only two years old, Insitro has already caught the attention of old-guard pharmaceutical companies. Last year, the company struck a deal with pharmaceutical giant Gilead to develop tools and hopefully new drug targets to help stop the progression of non-alcoholic fatty liver disease (NASH). The partnership netted Insitro $15 million with the potential to earn up to $200 million.
I spoke with Koller to discuss what her company is doing differently and where machine learning may ultimately make a difference in drug development and discovery. This interview has been edited for publication.
Fast Company: What you’re doing is different than most artificial intelligence drug companies, which are using the existing knowledge base of articles and published studiesto come up with drug targets. Instead, you’ve developed a drug company that uses artificial intelligence but also has a full lab for biologists. Why did you take this approach?
Daphne Koller: The other model is a much easier startup effort in the sense that there’s all this data out there and you can go and collect it. You can do it with a team of purely data-science folks. You don’t need to build up a wet lab, you just go and collect all those data and you put them in a big pile and then you let your machine learning people have at it.
What we’re doing is much more complicated and ambitious on a number of different dimensions. One is that we really did need to build up a high-throughput biology lab, which is beyond the frontier on multiple levels. That requires a much more expensive build. It also requires building up a team that’s really not been built before, which is taking some people who are at the cutting edge of their field, on the biology side, and putting them together in a single integrated team with some people who are at the cutting edge of machine learning and data science, and really telling them, “you speak different languages, but you’re going to work together as a single team.” And I think that’s really a very challenging cultural effort that most companies haven’t been willing or able to pursue.
FC: Why do that? What’s the benefit of having a drug company that gives biologists and data scientists and machine learning experts equal standing?
DK: When you look at the drug discovery process—which, if you’re lucky, is 15 years end-to-end with a 5% chance of success—there are multiple forks in the road where currently people are making decisions. “Do I go down path A or B or C or D?” And if you’re lucky, one path in 99 will lead you to success. If you go down the wrong one, then it’s years and tens of millions of dollars in wasted spend. So what if we could make better predictions on which fork to take?
Part of the problem biopharma has had is that it’s really difficult to fail fast.”
It’s kind of more of a fail-fast model that Silicon Valley has really pioneered, but within the context of biology. I think part of the problem biopharma has had is that it’s really difficult to fail fast. You oftentimes make a 5-10 year investment in something before you realize that it’s not looking so good. And by that point, the sunken costs are so large that people are like, “Oh, okay. You know what? I’m just going to push this through to the clinic and hope for the best.” I think that’s one of the reasons that we see the failure rates that we see is that people push things through that probably shouldn’t be pushed through because they feel—in many cases correctly—that they have no choice.
What we hope to be able to do, because we’re building these predictive models, is to be able to make the decisions faster.
The other piece is that machine learning has become pretty good at making accurate predictions across a broad spectrum of domains. It’s not been as effectively applied so far in life sciences broadly, and one of the main reasons for that is just the lack of high-quality data that we have [compared to] computer vision or natural language processing or logistics. At the same time, the bioengineering cell biology community has invented in the last few years a remarkable suite of tools that can really be put together in unique and interesting ways to generate massive amounts of data that can help feed those machine-learning algorithms.
If you put those two together, the high throughput biology piece and the machine learning piece, perhaps that provides a way in which we could build these predictive models that make better predictions in pharma research and development.
Why drugs fail
FC: What is the biggest reasons that drugs fail?
DK: We know from the statistics that most drugs [that go into trials] fail because of lack of efficacy in phase two or phase three. And it’s not because the drug wasn’t good. It was targeting the wrong target. Where the machine learning comes in is to look holistically at many, many different attributes of those cells and say which of them are the most predictive of human clinical outcome. And that is something that people are really not that good at, because cells are complex and there’s many dimensions toputting all those pieces together to detect, what often times is a subtle signal. It’s not something that people excel in.
FC: So once you set up these apps, how can you use them?
DK: You can use those apps in a variety of ways. First of all, you could use them to identify targets by basically saying, “Hey, now we know what a sick cell looks like. Now we know what a healthy cell looks like.” What if I [use] CRISPR to perturb the cell to move from an active to an inactive state or vice versa? Well, if you do that, and the phenotype goes from an unhealthy to a healthy state, maybe that gene is a good target for a drug.
People think that Alzheimer’s is one disease—almost certainly, that’s not true.”
The other thing that the platform enables is the segmentation of what is often a heterogeneous patient population into subsets that are much more coherent. The analogy here is to think about what happened in precision oncology. About 15, 20 years ago, we used to think of breast cancer as one thing. But then as we started to get more molecular data about people whose cancers were different, we realized that there were very different subtypes of cancers. There were what’s called the HER2 positive cancers that were very well targeted by Herceptin. There’s the BRCA-1 cancers that are now targeted by PARP inhibitors. And so there are these subsets that are very distinctive from each other and are now treated much better by precision therapeutics.
People think that Alzheimer’s is one disease—almost certainly, that’s not true. People think that type two Diabetes is one disease—also probably not true. For these diseases, we haven’t yet identified subtypes. We believe that by collecting enough data on enough different genetics at the molecular level, maybe those subtypes will emerge.
FC: Do you have any insight around the role that machine learning can play in helping come up with either a treatment or a vaccine for COVID-19?
DK: I think that there are opportunities. Right now, we’re looking at vaccine approaches that different companies have developed, and we’re putting them in with a bunch of viral protein and hoping for the best. To predict vaccine efficacy—the techniques just don’t exist, and there’s not going to be enough time to develop them. But I do think that there’s some interesting work that’s happening on the therapeutic side, where there’s been more work on the application of machine learning to everything from the interpretation of cellular [gene expression]. There is potential for designing new drugs, new drug combinations, and even just interpretation of the cellular state.
FC: You’re working with Gilead on better understanding non-alcoholic fatty liver disease (NASH). What’s difficult about NASH is that it can only be diagnosed and monitored through liver biopsy, which is brutal for the patient. You’ve said that you’ve had some success with machine learning apps being able to detect aspects of the disease that a human cannot otherwise detect, which holds a lot of promise for changing even just the way doctors track the disease in individuals. I’m curious what are other areas of human health are interesting to you?
DK: We feel like neuroscience is an area that’s about to burst wide open in finally understanding the very complex genetics of Central Nervous System diseases. The unmet need is huge, and the animal models are particularly untranslatable. So for some diseases you could say, “Well, the animal model is not great, but it’s acceptable.” The animal model for depression—and this is going to sound surreal, but I’m telling you, it’s not— it’s to take a mouse and you put in a bucket with water and you make it swim until it gets really tired and drowns. And if it’s swims longer, it’s less depressed. It’s called the forced swim test.
Now, the thing is, if you look at depression, it is a disease with significant genetic heritability where we know that there’s hundreds of genes that are implicated with very specific pathways, and stuff that is all now starting to emerge from the genetics and single cell analysis of brain tissue. None of that has anything to do with making a mouse swim longer. We think that in things like neuro-degeneration and neuropsychiatry there’s a tremendous opportunity for a different set of tools to be applied. I guarantee you, they will not be perfect models of the disease. But they can’t be that much worse than making a mouse swim longer. Right?