Genomics… and AI… then what?

Every data/AI-driven drug discovery company must eventually do real-world biology experiments. A new class of human in vitro experimental models is ripe to serve this need.

Saul Kato
8 min readAug 10, 2022

There is sustained enthusiasm for genomics-led strategies for the earliest phase of the dominant model of contemporary drug discovery — that is, novel target discovery. Scores of biotech startups and big pharma R&D divisions are betting on this approach. Even Illumina, the big winner of the picks-and-shovels market in DNA sequencing, is getting into their customers’ business.

Cheap sequencing made it possible

Thanks to the Moore’s Law-on-steroids revolution in sequencing, we have now sequenced the genomes of millions of individuals, many of whom knowingly or unknowingly experience a disease. We sift through this data to detect correlations of gene variants (the particular flavor of genes that each individual carries) to the expression of disease, across populations. We have done scores of these genome-wide association studies (GWASs), on grander and grander scale. GWASs have uncovered many gene variants that confer risk of particular diseases, and in some poignant cases, lone causal genes of rare disease (“monogenic disease”) that open the door to targeted cures. Publicly available GWASs abound, and some pharma companies tout their own proprietary GWAS as sources of differentiated value. However, GWASs can’t find smoking guns for most diseases — they can only winnow down the list of potential molecular players, from many thousands of gene products, to hundreds — and they cannot detect interactive (“epistatic” or “synthetic” to use jargon) effects of two or more mutated/variant genes acting in concert, which is almost assuredly how many common diseases arise.

Shortcomings aside, we think that genomics is a sound starting point for target and drug discovery for many diseases. Empirically, many diseases have a strong genetic risk component. Promising disease-modifying targets are likely to be found in these scans-of-the-haystack. Some companies have taken a step further up the chain of causation in biology, going from genetics to gene expression by leveraging the advances in scaled transcriptomics, often mined from post-mortem tissue from patients. While transcriptomes are notoriously noisy and murky to interpret, we think this is also a sound approach to winnow down a list of potential targets to prosecute.

Target validation required

However, observational genomics approaches are only a starting point for a target-driven drug discovery program. In general, we cannot infer causality of genes, we cannot determine whether a target is druggable, and we cannot observe the temporal, dynamical aspects of disease. Most importantly, we cannot know whether target engagement will actually lead to repair of the disease state. Biology is simply too complex and uncharted. And despite the recent compelling successes of machine learning, we simply do not have adequate data to train an AI system to model all of biology (nor is it clear how we would even acquire this data). We must, unfortunately, do real-world experiments. Costly and messy biological experiments. These generally fall under the innocuously named exercise of target validation, which sounds like just ticking a checkbox, but is actually a costly, experts-only, scientific undertaking. Target validation requires reading out complex biological signals in animal models or human cell culture models, using a panoply of sophisticated experimental techniques.

Established pharma companies know the criticality of target validation: they have been burned many times pushing drugs into the clinic, at enormous cost, for which the target was insufficiently “validated” and prematurely claimed as a disease-curing target. Case in point: 20 years and 50 billion dollars of failed trials later, amyloid beta plaques are no longer fingered by most to be a cause for Alzheimer disease. More tragic than the dollar cost is the enormous opportunity cost that the field, and humanity, has suffered. It is in the long-term interest of pharma companies to acquire higher confidence that their proposed drugs will have effect before entering clinical trials involving real humans — failed trials make up the majority of the massive cost of drug R&D, and the majority of failed late stage trials — when costs really start piling up — fail due to lack of efficacy. However, short-term incentives often perversely win. There is severe pressure to move programs into clinical trials for marketing to investors. And once clinical trials begin, there is a sunk-cost fallacy (irrational) or pot-odds calculation (rational) that gives incentive to push programs forward and keep investing in them, however dim the prospects may be, all the way to a Phase III failure or worse yet, to a debacle like Aduhelm.

Enter functional genomics

What are the right experiments to do to find curative, or to use the more modest industry phrase — disease-modifying — agents before heading into the clinic? For a common disease, we will first need to do target validation on hundreds of genes that came up as hits in genomics studies. Rather than worry about developing therapeutic molecules at this stage, we can take advantage of the incredible advances in genetic engineering to modulate expression of a gene in an experimental biological model (say in an animal that expresses some analogue of a human disease) and thereby infer the effect of modulating its gene product in a human. This is the exercise of functional genomics. Knock out a gene, or knock in a gene variant, and see what the system does —i.e. assess phenotypic changes. What do we measure out of the cornucopia of potential bioassays? We can measure specific biological readouts indicative of the disease in question and look for a shift to non-disease (a “rescue”), or we can measure general biological readouts indicative of “normal, healthy” function. Even better, we can compose a biologically rich — and therefore more believable — amalgam of readouts called a deep phenotype. As an alternative to genetic perturbation, we can screen a “target-biased” library of molecules with known target bindings to find both phenotype-modifying targets and even jump ahead on the drug development path to finding molecules that can be starting points (“leads” ) for molecule design and optimization efforts.

We’d like to do these functional genomics experiments in humans, but systematic perturbation of gene expression in humans for understanding disease is not ethically or practically feasible. We want to modulate or perturb several hundred gene products, one or a few at a time, and watch the resulting effects on our experimental model. And ideally, we want to do this in a model that exhibits features of the disease we are attacking. Historically, we have resorted to model animals such as the mouse. However, we have come to realize that mice are generally lousy models for human biology, and moreover, they are still large organisms that don’t lend themselves to experimental scale-up, and we need scale if we are to be doing comprehensive discovery. If not animals, then what?

Human cells for human disease modeling

In the late 90’s we miraculously figured out how to grow cells of almost any type in the body, derived from human donor cells, and therefore possessing the genetics of the donor. We can now take skin cells, or now blood cells, and convert them into induced pluripotent stem cells (iPSCs), then steer them with signaling molecules to assume the identity of just about every cell in the human body. The thrilling implication is that these cell cultures can be used for disease studies in human, not animal, cells, closing the massive “translational gap”. Several companies were founded a decade ago to exploit this human-cellular approach for drug discovery. But it was hard going. It takes a ton of automated machinery— hardware and software — to scale up these exercises to the point that they can be used for screening experiments, which simply did not exist until recently. And these stem cells grow a bit funny, in unpredictable ways — increasing the trial-to-trial variability of many measurements and calling into question their relevance to natural, in vivo biology. But the field has matured and gained acceptance. Today, virtually every large pharma company is embracing the use of human cell culture for disease modeling and early stage drug discovery. We have entered the era of human-derived in vitro drug discovery.

However, there is a thorny problem that renders these cell culture approaches dead on arrival for many biological studies and therefore useless for drug discovery. These traditional cell cultures lack something crucial — context. Most cells from humans and other animals really don’t like to be cultured alone, or the relative isolation of a single layer spreading out on the surface of a glass dish. They get sick, they tend to die off, and they don’t generally display many of the functions or appearance of cells in vivo. In a multi-cellular organism, cells rely on other cells — their community context — to thrive and function. From another perspective, it is miraculous that cells can survive and even grow and divide in these starkly un-biological conditions at all. And while there are diseases that arise completely inside a cell irrespective of any interactions with other cells— “cell-autonomous” in the parlance — often, the resulting organism-level disease states are not easily observable without the surrounding context of other cells, i.e. tissue. We need a way to study disease biology at the multi-cellular level.

From iPSCs to complex in vitro models

Quietly and without fanfare in the early 00’s, some of the same pioneers of induced pluripotent stem cells also realized that there were signs of self-emergent organization of cells in these cultures. Occasionally, cells in culture would form something that resembled embryonic tissue — elements of natural embryonic self-organization were furtively appearing. With some trial and error, culture conditions were found that encouraged this unfolding of the biological program into multicellular tissue. Around 2010, other researchers figured out how to do this more reliably, and push the recapitulation of natural development further and coax cell culture into tissue-like aggregates, dubbed organoids. Fast forward a decade — everybody in academia is doing it. That is to say, if you were in vitro disease modeler in the 2000s, you are now growing organoids as part of your research program. We have now figured out how to grow organoids of all types — brain, liver, intestine, kidney, heart, eye, skin, bone, lymph node, pick your favorite organ. While it is a revolution — a step function in biological realism, it is also just part of an inevitable march toward using more realistic, but scientifically prosecutable, models. Fait accompli.

A cerebral organoid used to study brain disease at Herophilus

Every genomics-driven drug discovery company, as well as any other drug discovery company starting with a data-driven or computationally-driven innovation thesis, will face the necessity of doing biological experiments before heading into the clinic. And when they do, they should be compelled to use the most human-biologically relevant pre-clinical model, which for most diseases, is an organoid. Perhaps contract research organizations (CROs) will successfully serve up this new, complex experimental need, or perhaps big pharma will be able to build up this kind of R&D capability in house, or perhaps there will be a new set of companies that run such scaled research engines and provide the basis for truly science-led pharma development.

We are in the early days of the development and refinement of these new complex in vitro models, and innovation is breakneck. There is still hesitancy, just as there was with IPSCs, that these cultures are just too weird and too unpredictable to be used for any real disease science work, even though they are more biologically realistic, rather than less, versus traditional cell culture. But just about every month, a new culturing protocol or bioengineering innovation opens up the opportunity to study yet another disease using these complex in vitro systems. Many researchers in the field — like medieval alchemists— believe their particular innovation will be the essential invention to elevate these models into their fully appreciated potential, but the truth is, the age of complex in vitro models is already here.



Saul Kato

Alfred and Alice Werth Endowed Professor, Weill Institute for Neurosciences, University of California, San Francisco. Co-founder, Herophilus.