Alex Zhavoronkov,
Founder and CEO,
Insilico Medicine
"You need to have a seamless pipeline, which identifies the targets, generates the molecules and runs those molecules through a large number of simulations in one seamless pipeline. That is what we are building and that is our holy grail."
What is the focus of Insilico Medicine and what AI/ML techniques are you using to develop software for drug discovery and development?
Insilico is focused primarily on the development and application of next generation AI techniques to drug discovery, biomarker development, and prediction of clinical trial outcomes. We focus specifically on combining two machine learning techniques: generative adversarial networks and reinforcement learning. We use these techniques for a few purposes. One is identifying molecular targets, another is constructing biomarkers from multiple data types and generating new molecular structures and lastly, we use them for prediction of clinical trials. We were one of the first companies to generate novel molecules using generative adversarial networks and validate those molecules experimentally.
We try to reinvent everything from scratch and we develop our own AI and write our own software. In fact, we sell the advanced AI enviroment to the big pharmaceutical companies as well. You need to have a seamless pipeline, which identifies the targets, generates the molecules and runs those molecules through a large number of simulations in one seamless pipeline. That is what we are building and that is our holy grail.
In what ways is AI able to boost efficiency?
If you go with the very early steps of the pipeline and start working on the hypothesis generation, target indication, there are usually multiple paths to pursue. One path is to look at the literature and identify promising areas that have been uncovered by scientists in the past and were published in peer reviewed literature. Ideally, those hypotheses were not implicated by somebody else in the disease at which you are looking. AI can help you mine massive amounts of literature and also other associated data types to identify signals that certain targets might be implicated in disease. At Insilico we start with grant data. We look at biomedical grants and monitor US$1.7 trillion worth of grant money over the past 25 years. Then we look at how those grants progress in publications, in patents, in clinical trials and then into products on the market. We follow this from idea to money and from money to market. We also look at how money becomes data. Usually, when the government is supporting a certain study, the data needs to be reposited in a public repository for other people to replicate it and also for a common good. We try to follow the money into data. If the data is not there, we try to contact the scientist and get the data from the scientist, or to encourage scientists to put the data into the public repository.
In the “omics” domain, we work primarily with gene and protein expression data and we look at how the level of expression of certain genes or entire networks changes from a healthy state to disease and we deconvolute those changes, those signatures of disease into individual targets, establish causality models, and identify what kind of proteins could be targeted for the small molecule. Then we go back into the prior art in the text and see if anybody has published anything that strengthens our hypothesis. It does not necessarily mean that our hypothesis is wrong if the signal is not there in the text because sometimes the humans just couldn't associate a certain target with a disease using older methods. It gives us more confidence if somebody touched on this challenge before.
We are aggregating enormous amounts of data, which is just not possible to process using human intelligence. And we are also aggregating and grooming those data types together. Sometimes those data types are completely incompatible and it is impossible to suture them together using standard tools. Therefore, it is necessary to train deep neural networks on several data types at the same time, in order for them to generalize for us to be able to extract relevant features that are present in several data types. Some of the data types that we work with are completely incomprehensible to human intelligence. For example, gene expression or movement or cardiovascular activities or ultrasound. We bring those data types together using AI, and then identify relevant targets that trigger certain conditions.
What makes a great AI scientist and how do those skills translate in the field of biology? What are the implications from a staffing standpoint?
When you are looking at really great AI scientists, they are usually not great in biology or chemistry. They are good at math and some are great at neuroscience and math. This is why some percentage of our company are just great mathematicians, who are developing novel methods for bridging chemistry and biology using deep learning and are focusing on AI theory. For example, part of the company is specifically focused on applications of already existing techniques, like reinforcement learning to existing problems in chemistry in biology. Those people are usually on the applied side and they know both chemistry and biology and they can talk to mathematicians. They can do some basic research in AI as well. We also have pure play biologists and chemists who are necessary in order to validate the results of our AI. For these reasons we have a large and diverse, international team. You really need to have those three areas covered the methods, the applications and validation.
What is the obstacle preventing using AI to go from beginning to end?
Because the failure rates in pharma in general are very high, there are very few success stories to train on while there are hundreds of thousands of failures. Those success stories are very diverse. In some areas it is easy to validate whether your algorithm is producing some meaningful output. But in many cases, you will need to go and validate every step of the way. That is why when you are building you need to validate every part of the process. This needs to be done both internally and with external partners. That is what we are trying to do as well.
In many areas, it is actually not possible to fully virtualize drug discovery without humans, because biology is so diverse and medicine is so diverse that it is difficult to have a solution that would fit all. That is why people are going primarily after cancer, because it is a little bit easier to validate.
How would you compare the approach of Chinese Pharma companies with American companies when it comes to using AI and ML?
Most people have a misconception about the Chinese pharmaceutical companies and about China in general. The country is not what it used to be 20 or even 10 years ago. It made giant leaps in many areas and is now an example to follow. The government is now pushing the pharmaceutical industry to be much more innovative and deliver cheaper and better drugs that work. So many companies that were focusing on generic drugs, traditional Chinese medicine, and ineffective herbs are now rushing to innovate. And some of them hired amazing smart people to develop “me too” or “me better” drugs going after known targets. But very few are going after novel targets and true first in class drugs because it is very risky and very expensive. Those that do innovate are very actively looking for AI solutions. Unfortunately, despite high interest in AI-powered drug discovery they are not willing to pay nearly as much as the big pharma companies for a more validated technology and choose the local AI companies that are trying to replicate our systems. I am sure that this strategy will backfire and after a few target failures or suboptimal compounds they will start looking for the very established solutions for critical steps of drug discovery. I am very optimistic when it comes to the Chinese pharmaceutical industry. It is the future.
Why is IP a challenge for AI in drug discovery?
When we first proposed using generative AI for generation of novel chemistry and biological data at conferences in 2015, the big pharma companies were laughing at us. Then in 2016 we published our first peer-reviewed paper in this area. It is the first peer-reviewed research paper on generative adversarial networks in drug discovery in history. And in 2017 we did the first successful experimental validation. But even then both investors and pharma were sceptical. In 2018 the pharmaceutical industry realized the value of the technology but with a few exceptions instead of partnering with us they started building their own teams in AI even when they understood that they are 2-3 years behind or partnering with the companies that claimed to do AI but were primarily using classical approaches. So it was very difficult and we had to spend a lot of money on validation. But at that time we filed for a lot of patents in this area on both the models, methods, and molecules. So now our technology is validated beyond any reasonable doubt but we also have a strong IP portfolio. Now the biggest question is how to enforce IP protection because it is impossible to know what they use internally for drug discovery. The good thing is that the current legal opinion is such that if the molecule is made using the AI which is infringing on our IP, we may have a claim to the molecules. That helps us a lot with partnering on AI software, where we provide full licenses to the pharmaceutical companies to use the IP and also to build on top of it. But I think we are just scratching the surface of the IP issues. In the future it will be a very important issue.
How do you weigh internal R&D versus collaboration with big pharma?
If you know that your AI works well and it is validated beyond any reasonable doubt, it is always better to start developing your own therapeutic programs because they bring the most value. So we are developing our own portfolio of therapeutic assets. But the problem with this approach is that these programs require a lot of resources for preclinical validation in experimental systems and animals after AI has done its job. So you may be able to use your AI a dozen times to come up with the programs and then you need several years and tens of millions to develop these programs into clinical-stage assets. And you don’t want your AI to stay idle and collect dust. You want this validated AI to work for other companies and also evolve. That is why it is so important to partner with big pharma and biotechnology companies.
Are we on the precipice of building multibillion dollar computational biology driven biotech companies? What might they look like and what breakthroughs are needed to get there?
The answer is yes. If you were to ask this question a year ago, the answer may have been different but now we see Schrödinger and Relay Therapeutics achieving multi-billion dollar valuations after IPO and other companies raising hundreds of millions per company to grow their businesses. There is no doubt that we will see this grow into possibly a trillion dollar economy as there is nothing more important and demanded by the people then health. Nowadays my biggest worry is high valuations investors are giving companies in this field. Insilico Medicine is one of the pioneers and technology leaders in this industry and have pretty good understanding of the competitive landscape. But when I see companies that do not have even a half or the technology base and value or even companies that form from scratch to go after an unproven idea raise hundreds of millions of dollars at valuations that are many times higher than ours, it makes me worried. In the traditional pharmaceutical paradigm, around 90% of all projects fail. And I am worried that if it happens to many of the overvalued AI companies it will affect the companies that worked very hard, invented new technologies, supported the claims with publications, and focused on rigorous experimental validation.
In terms of the breaktroughs, most of the technology we need to accelerate is already there. It is now up to clinical validation to check how it worked. In my opinion, the most exciting area for companies like ours is combination of AI with robotics technologies in both chemistry and biology, and I am working a lot in this area. But this area is also very risky. We are partnering with many robotics companies to see which approach works best and there is very rapid progress in this area. The problems with robots is that if you selected the wrong robotic system and someone came up with a better one, you are stuck with the old hardware that nobody needs. So we need to see more centers for collective use of robotics technology and we are very happy that we managed to engage with and help shape some of these centers.