Introducing AI/ML In the Life Sciences

Key Milestones Indicate Transformational Power of AI/ML

When MIT mathematician Jim Simmons founded Renaissance Technologies in 1988, it was unfathomable that computers driven by algorithms might outperform top Wall Street fund managers. Today, after a more than thirty year average return of 66% per annum, the Renaissance model of applying machine learning techniques to invest in a manner that is automated, dispassionate and performed with minimal human intervention is widely accepted. Quantitative finance, as it is dubbed, has profoundly reshaped the financial industry. In the past, pharma has expressed a similar skepticism around the adoption of artificial intelligence and machine learning to perform tasks such as molecular design, discovery and clinical trials. After all, lives are at stake and Mark Zuckerberg’s idea that success requires startups to “move fast and break things” is the antithesis of the industry’s guiding principle to “do no harm.” Therefore, companies making bioactive substances to put into patients and maybe even permanently engineer their genes must be very cautious that the technologies they are using are proven and safe. Despite this reluctance, there has been an explosion of promising startups created over the last five years that merge the field of computer science with computational biology, chemistry and biophysics. These companies are seeking to disrupt the current paradigm wherein new drugs fail to reach the market over 90-95% of the time, costing US$2.6 billion to develop (compared to US$0.2 billion in the 1980s), with development timelines of 10-15 years (McKinsey). Part of the reason for this inefficiency is that drug discovery in pharma comes from a heritage of trial and error. Many molecules are tested before finding one that works that can be taken forward. This new wave of companies is attempting to make the process much more efficient. BenchSci, for example, was founded on the thesis that a big pharmaceutical company will conduct tens of thousands of experiments per year to evaluate molecular targets. On average, around 7,000 non-clinical, non-human experiments are needed to get a drug into the clinic and roughly 50-70% of those experiments do not scientifically advance the study of those targets. Improving the efficiency of this process would generate enormous value by lowering costs and timelines. AI is an uninterested participant in terms of trying to understand biology and does not require the same need to create mental frameworks that humans rely upon. As a result, artificial intelligence can learn what drives biological systems in a much more comprehensive and non-hypothesis-driven way. Daphne Koller, founder of Insitro refers to this as moving drug discovery from an artisanal to an engineered approach, where a lot of the pieces to be built are designed to be performed in a repeatable reproducible manner. Although biology is extremely complex, one can draw parallels to the sentiment shift that occurred in finance and see that pharma could be going through a similar generational change. Because machine learning and AI, at their heart, are the most statistically sound way to handle large amounts of information, and drug design is fundamentally a data science problem, the decades ahead could fundamentally transform expectations around drug costs, time to market and how science is done.

"The first generation of biopharma started from chemistry as chemical compound companies. Then the biotech industry shifted that paradigm. Now I think we are entering into a new generation, where the computational component is going to be playing a big part in drug discovery. Big pharma is trying to adopt it. However, instead of trying to fit computation and AI into a system that is already very set in its way, sometimes it is just easier to start from scratch."

Pek Lum, Co-Founder and CEO, Auransa

“Our knowledge of disease is so much greater, and yet somehow, if you look at productivity output metrics on the pharma industry over the past 10 years it has declined from 10% IRR a decade ago to around 1% today. So, there is this paradox between the increase in knowledge and advancement in technology and yet a decrease in return on investment. Whilst any one technology could give us great new insights, what has not changed is how people running these projects understand how to integrate this information. That is where AI and machine learning make a fundamental difference.”

Andrew Hopkins, CEO, Exscientia


Trailblazing a New Model

Undoubtedly, AI and ML in pharma remain many years away from fulfilling their full promise. Compared to other industries, there is a longer incubation period to figure out the extent of what these methods are capable of delivering. Nevertheless, there are many companies working to catalyze change. In the AI applied to chemistry category, the standard often involves years of iterative and complex medicinal chemistry. Atomwise is using AI for structure-based small molecule drug discovery, removing the barriers of physical screening that have limited the success of traditional drug discovery methods. By predicting the binding of billions of small molecules to a protein of interest with a known disease association in a matter of days, the company can accelerate the earliest stages of drug design by several orders of magnitude. “Our technology is based on convolutional neural networks – the same AI technology that is used for image and speech recognition. If you have ever talked to Siri or Alexa, or uploaded a photo to Facebook and had it prompt you to tag certain friends, those are examples of convolutional neural networks at play. Atomwise was the first group to take what works in image and speech recognition and apply it to molecular recognition,” said co-founder and CEO Abraham Heifets.

"We are aggregating enormous amounts of data, which is not possible to process using human intelligence, and we are also grooming those data types together. Sometimes those data types are completely incompatible and it is impossible to suture them together using standard tools. Therefore, it is necessary to train deep neural networks on several data types at the same time in order for them to generalize for us to be able to extract relevant features that are present in several data types. Some of the data types that we work with are completely incomprehensible to human intelligence. We bring them together using AI and then identify relevant targets that trigger certain conditions."

Alex Zhavoronkov, Co-Founder & CEO, Insilico Medicine

Another leading company in the space is Insilico Medicine and their approach is to use generative adversarial networks (GAN) to generate novel molecules. Referring to the more well known application of GANs, Insilico founder and CEO Alex Zhavoronkov pointed out: “It may sound like a slightly illogical step to go from producing bird pictures and DeepFakes to creating ultra-precise designs for new molecules, but we have experienced considerable success.” The validation of those molecules experimentally has been seen as a key milestone both for the company and for the broader field. Renowned computer scientist and venture capitalist Kai-Fu Lee commented that Insilico’s validation: “Substantially advances the efficiency of biochemistry implementation in drug discovery.” In contrast to others, Cyclica’s drug discovery platform accelerates preclinical drug development by predicting the polypharmacological profile and medicinal properties of drug candidates. Polypharmacological approaches to drug discovery have been around since the 1950’s, but faced many challenges arising from the complex nature of biological systems. Today, advancements in computer science, systems biology and bioinformatics are breathing new life into this approach. According to Cyclica’s co-founder and CEO Naheed Kurji: “Instead of going narrow on one target, we decided to flip the problem on its head and look at all the potential targets in the proteome and design and evaluate those targets for a given molecule.”


Designing Proteins

Another area in which AI could have a large impact is going to be in decreasing the cost of protein therapeutics (biologics). Today, seven of the top ten drugs are antibody protein drugs. They are the most valuable individual drugs and are the fastest growing segment of the therapeutics market. The computational tools that are well established for small molecules do not work as well in the protein space. Furthermore, historical methods are less appropriate because pharma does not have similar amounts of big data and because the molecules are physically much larger. ProteinQure was founded to leverage some of the worlds most advanced computational tools to design and engineer novel protein therapeutics. “We take on challenging projects where partners have very little data available. For example, they might be exploring a new protein because it has non-natural amino acids or because it has a linker or some kind of special proprietary chemistry attached. Forget millions, in such cases they will not have even thousands of data points. The focus is around creating De novo chemical matter,” said CEO Lucas Siow.

“We have words that categorize proteins but they are just human conventions - a way that a human scientist might try to give structure to a world that does not necessarily align with the quantum or biophysical representation. I think the future is going to be about machine representations of what a protein actually is. This would allow us to define a protein in terms of its light spectra, atomistic coordinates and 3D movements. There are many different ways to use machine learning to try to learn a language of how to define what these biological objects and systems are, but historically companies have tried to take a human understanding and convert that into a machine. The big untapped space is identifying what the correct fingerprint for a protein is and how we should describe it. It may be the case that it is not human interpretable.”

Lucas Siow, Co-Founder & CEO, ProteinQure

AI in Humans

Perhaps one of the most important announcements in the field of AI and machine learning in drug development came from Exscientia at the start of 2020 when it revealed it was moving into Phase I human clinical trials for the treatment of obsessive-compulsive disorder in partnership with Sumitomo Dainippon Pharma (DSP). Previously, the biggest criticism from sceptics was that there were not any clinical assets where a target was identified and a molecule was generated by AI. AI had been used for diagnosing patients and for analyzing patient data and scans, but this was the first direct use of ML in the creation of a new medicine. This project required less than 12 months to advance the program from target to identifying developmental candidate; just a fraction of the typical average of 4.5 years using conventional research techniques.


Data Cleaning

Companies like Novartis claim to spend years just cleaning data sets before they can begin to run algorithms. It can be a lengthy process and people can underestimate how little clean data there is out there and how hard it is to clean. The key point is that data on its own is useless - it must be effectively interpreted to advance the drug design process. This is where companies like Genialis are well positioned to thrive. The company developed a software platform for aggregating next gen sequencing data, especially RNA sequencing data. They aggregate that data on their software with data from thousands of historical experiments and other sources. This allows them to enforce good data management practices where the data gets annotated consistently, processed consistently, and all data has its history tracked so they know the derivatives of each analysis. “The data management piece is the unsung hero that allows us to do the more exciting things in machine learning,” said Rafael Rosengarten, co-founder and CEO at Genialis.

Attracting Top Talent

One of the strongest indications that life science startups using AI and ML tools are being taken seriously is that there has been an influx of talented pharma industry executives moving into these AI powered startups. Veterans like Andy Protter joined Auransa’s executive ranks and Mark Eller joined twoXAR’s. They join these companies because they represent a different way to do things on the development side and because it is fertile soil for them to come in and establish huge platforms. It is also an exciting opportunity for talented graduates from universities with computer science and algorithm backgrounds who want to use their skills “in a more meaningful fashion.” The most sought after graduates are those who are “bilingual” in computer science and biology, as it is rare to have expertise in both fields and all too often there are barriers in communication between employees from the different backgrounds.

“If you look at the Stanford bioinformatics alumni page, many graduates are working at companies like Google, Facebook, Netflix, and Twitter because they are offered wheelbarrows of cash, but also because those companies at their core are computer science based. In the pharmaceutical industry, the scientist or the personnel that leads the value in the company is the biologist researcher. Innovation disruption comes from companies like ours, where the computer science is leading the innovation and discovery. This is a completely different cultural shift."

Andrew Radin, CEO, TwoXAR


The progression of AI adoption in pharma is still in its infancy relative to other industries. However, the life sciences are advancing at a remarkable rate, perhaps faster than any other branch of science. The same can be said of deep learning: it is one of the most exciting, rapidly advancing areas of computer science. The combination of the two has the potential to change the world in dramatic, far-reaching ways. The effects are already starting to be felt, but those are trivial compared to what will likely happen over the next few decades. The union of deep learning with biology can do enormous good.