5 – How AI and genomics make multi-cancer early detection possible
February 15, 2023

From the early 20th century discovery that cancer is a disease of the genome to having sequenced more than one million humans’ genomes today – this is just some of the history Amoolya Singh, Senior Vice President of Data Science and Chief Scientific Officer at GRAIL discusses when she shares the role of genomics and AI in cancer detection, what’s needed for population-scale screening tests, and what she’s excited about for the future.

Sign up for GRAIL’s monthly newsletter


Kim Thiboldeaux 00:08
Welcome to The Cancer Signal, a new podcast presented by GRAIL where we discuss the impact of early cancer detection, the science behind multi-cancer early detection and insight into how this approach has the potential to shift the cancer paradigm. I’m your host Kim Thiboldeaux. Today we will discuss the genomics behind multi cancer early detection with Amoolya Singh, the Senior Vice President of Data Science and Chief Scientific Officer at GRAIL. Amoolya earned a PhD in Computational Biology and an MS in Computer Science at the University of California at Berkeley. She holds a BS degree in Biology and Computer Science from Carnegie Mellon. Amoolya completed her postdoctoral fellowship at the European Molecular Biology Lab in Heidelberg, Germany in 2008, and a Computational and Life Sciences fellowship at Emory University in 2011 in the fields of comparative genomics, and metagenomics, population genetics, and experimental evolution. Wow, that’s a mouthful. Welcome to the show, Amoolya.

Amoolya Singh 01:08
Thank you, Kim. Thank you for that kind introduction. And thanks for having me.

Kim Thiboldeaux 01:05
Very impressive, very impressive. So I’d love to get to know you a little bit so tell us a little bit more about yourself. And tell us about your role at GRAIL. And really what drew you to this opportunity?

Amoolya Singh 01:23
Yeah, yeah, so I’m a computational biologist by training, as you said, in plain English, that means that I use math and computers to understand how living things work. And I’ve worked at sort of the interface of these two subjects for a couple of decades. At GRAIL, I have two roles. The first is to oversee the bioinformatics and data science group. This is a group of about 80 scientists who analyze all of our lab data, and they develop computer models that predict whether a blood sample might have a cancerous signal or not. And then my second role as Chief Scientific Officer, is to look after the overall innovation strategy for the whole company. And I’m particularly excited about that role, because, you know, the research of today is hopefully the product of tomorrow. And so part of our work is, you know, helping ideas come to life in the real world, which I very much enjoy.

Kim Thiboldeaux 02:22
Fantastic, Amoolya, there’s been certainly a lot of attention around the convergence of healthcare and technology. We completed sequencing the human genome in 2003. How has that changed or informed how we screened for cancer?

Amoolya Singh 02:39
Yeah, this is a very interesting area. And you’re right to call it a convergence. The sequencing of the human genome, which by the way, was originally a mosaic of about 20 actual humans, it gave us a way for the first time to read the entire library of instructions written in DNA, that are needed to create and maintain a human being. It’s kind of amazing, if you stop to think about it. What’s less well known is that, you know, this was a huge project, billion dollar project, multiple countries worked, you know, for 16 or 17 years to sequence the genome. One of the original justifications to do it, which was put forward in an essay almost 17 years earlier in 1986 in Science Magazine, was, in fact, to better understand the origins and progression of cancer. And so that essay was written by a famous Italian biologist named Renato Dulbecco. He had won the Nobel Prize already a decade earlier, for his work on using viruses to study cancer. It was, it turns out that there’s certain viruses that can cause cancer. And so he was a pioneer in that work. And later, when he wrote this essay in ‘86, he mentioned viruses and rats and mice in his essay, as some of the lab tools that were available at the time to understand cancer. And you can almost hear his weariness in the essay because he argues that if we’re to understand human cancer, then we need to learn all the genes in humans that affect the progression of cancer in different cells, right, rather than just doing it piecemeal, in different other organisms.

So the reason that the genome is the particular readout for cancer is that cancer, as you might know, is thought to be a disease of the genome. It’s often talked about that way. That is the machinery, where each cell has machinery, that protects and keeps that DNA blueprint pristine. And in cancer, that machinery starts to get corrupted. And then these glitchy cells overgrow and evade the body’s kind of natural cleanup and turn cancerous.

So Dulbecco, coming back to Dulbecco. He passed away in 2011, about eight years after the completion of the Human Genome Project. So if he were alive today, like how would he say that we did on you know, screening and treating cancer? So on the treatment part, you know, the FDA approves about 50 new cancer drugs a year. And I would venture to say that not a single one of them anymore could have been developed without both specific and detailed knowledge of the human genome. And then if we turn to screening, this is actually one of the biggest applications of sequencing, because since 2003, we’ve drastically improved our ability to read the DNA, both in space and in time. There’s also the side matter of writing DNA, which I won’t get into. But that’s a huge, that’s another big conversation that has big implications for cancer therapeutics. But for reading the, you know, the human genome has been continuously updated since 2003. And more than a million humans have been sequenced, starting with those original 20, who gave their samples. In fact, we’re currently on version 38 of the human genome. And it now incorporates many more differences based on ethnicity and ancestry. And now that sequencing has gotten cheap enough, we can sequence humans, not just once for every snapshot, but at different stages of their life, in sickness and in health, even after death, actually, so that we can form a baseline of what does a healthy individual’s DNA look like? What does aging look like? And then therefore, what does cancer look like?

Kim Thiboldeaux 06:23
Wow, we’re getting a crash course today. I want to continue with our crash course. This is fantastic. I want to just talk for a minute or two about about AI, about artificial intelligence, while we’re talking about that convergence and intersection of healthcare and technology. Amoolya, how is AI enabling innovation in cancer screening? What is AI, first of all? And how is it empowering this innovation?

Amoolya Singh 06:50
Yeah, great. I love that you asked that question, because it’s a topic that’s near and dear to my heart. So first of all, I’ll say AI has a big role to play in cancer screening, and I’ll tell you why, but what is it? So AI, of course, stands for artificial intelligence. It’s a set of usually computer programs that can do various things that approximate the tasks that humans or you know, or other sort of cognitive creatures can do. And these include, for example, pattern recognition, learning, reasoning, planning, perception, inference, and the ability to kind of move and manipulate objects. So those are the, those are actually the main fields of AI, what I just listed. The place that it has in cancer screening, coming back to your question about the genome, if you print out a human genome on paper, that is you write out you know, all the A’s and C’s and T’s and G’s, one after another, it would fill 130 books, and it would take 95 years to read. And so it might also be kind of monotonous reading, honestly, because you only have four letters of the alphabet. And you’re reading them for sort of close to a century, let alone trying to find any patterns quickly in that…kind of boring. And let’s not even get started on all the paper we just wasted. And so you know, what AI can do is learn these kinds of patterns very quickly and get better at discriminating say, you know, healthy DNA versus disease DNA. That’s the kind of most obvious task for AI. But there’s a range of others as well, it can also figure out what’s the best way to store that biological information in DNA in a computer so that a computer can analyze it. This is called representation learning. AI can also reason about connections between seemingly unrelated observations and suggest these connections to human scientists, as hypotheses that then we can test or rule out in the lab. That’s called inference. Right, so here I described three of the principal subfields: learning, representation and inference. I also talked about planning and you know, the ability to move and manipulate objects. You hear a lot of this and you know, self-driving cars and, and autonomous vehicles and so on. But we use these aspects of AI as well in the lab, to run a very high throughput automated lab that can process 1000s of blood samples a day through a very complex workflow without human error or injury. So there’s a, you know, nuts and bolts role that AI plays, really in the lab itself.

Kim Thiboldeaux 09:33
Terrific, Amoolya. So let’s take that sort of baseline that we’re talking about here and pivot to MCED, multi-cancer, early detection tests. Obviously, GRAIL’s test Galleri is an MCED test. Tell our listeners, how does how does the MCED test or the Galleri test work generally to identify a cancer signal? You know, for example, how does it distinguish between DNA that might have come from a cancer cell versus a healthy cell? What’s the technology and science there?

Amoolya Singh 10:03
Yeah, so there’s…it’s complicated, but it comes down to two simple principles actually, that underlie our test, the Galleri test. One is DNA shedding, and the other is methylation. So what is DNA shedding? It turns out that all the cells in the body as they die, shed small amounts of DNA into the bloodstream. And this is happening all the time, actually. And cancerous cells, if they’re present, also release DNA. And so you might end up with a mixture of DNA in the blood. Now, how do you know if the DNA came from a cancer cell or a healthy cell, and that’s where the second concept methylation comes in. So methylation is a process by which all living cells actually all the way from bacteria, you know, I don’t know squids, Komodo dragons, human beings, pine trees – they all selectively mark their DNA at certain locations, which signals which sections of the DNA to turn on or off. And so you can think of this kind of like a highlighter in a page of text that suggests which bits you should read and which bits you can skip. And so methylation is that highlighter, and DNA is the text that you can read through a sequencer. So then what happens in cancerous cells is that that methylation becomes abnormal. It’s kind of like the highlighter has run amok. And so by sequencing the DNA, you can tell whether the DNA fragments came from a healthy or cancerous cell. And beyond this, even this process of methylation and abnormal methylation is considered one of the underlying enablers of what are called the hallmarks of cancer, which is, you know, six to eight different aspects of cancer that are common to all cancers, the ability to evade the immune system, the, you know, the instability of the genome, the reprogramming of your, the way your body uses sugars and energy to divert that energy towards cancerous cells. So there’s a number of these that are called the hallmarks of cancer and abnormal methylation, which falls into this larger category of kind of genome instability is one of those.

Kim Thiboldeaux 12:21
And so with that Hallmark being identified, does this mean that an MCED test can find any cancer, can find all cancers?

Amoolya Singh 12:30
Well, so two things: can MCEDs find any cancer? It depends on these two factors, right, the DNA shedding and the methylation. While methylation is a common feature of all cancers, the rate at which different cells in the body shed DNA is inherently different. And it also varies by the stage of the cancer. So out of the box, the MCED test tends to be, you know, more sensitive for certain cancers than others, and maybe for certain stages more than others, you know, even so it can detect over 50 types of cancers and many at earlier stages. And then to you, you know, you hinted at kind of how was this identified. I think it’s important for folks to know that, you know, GRAIL didn’t invent the idea of methylation, GRAIL didn’t discover that. In fact, the idea that cancer is a disease of the genome was put forward over a century ago by a German biologist named Theodore Boveri. This was in the early 1900s, I think, 1901 or 1902. And so Boveri studied of all things, sea urchin eggs. I don’t know how he came upon this, but anyway, and how they kind of get fertilized and divide. And he hypothesized that in a cancerous process, the genome starts to become unstable and can’t be kind of cleaned up and corrected by the usual error correction mechanisms. Then it took another few decades, actually, for other biologists to prove Boveri’s hypothesis. And there’s another famous name in here, who’s the American biologist, T.H. Morgan. He’s not exactly a household name to most folks. But he is to every biologist. Maybe one little interesting factoid about Morgan is that his great grandfather, I think, was Francis Scott Key who wrote the American National Anthem. Anyway, T.H. Morgan is considered the father of modern genetics. And so he won a Nobel Prize not only for this work, but you know, two of his students went on to win Nobel prizes also for a kind of follow on work for how genes are regulated and how radiation affects DNA. All of this has a lot to do with cancer as well. So, once Morgan discovered this, then, you know, this was in the 19, about 1915 or 1920, around the First World War. Then there came various descriptive studies of what are the kinds of genomic instabilities, more descriptive studies of you know, the extent of noise in these measurements. And then about mid century started more predictive studies, which is, is genome instability a cause or a consequence of cancer. So all this by way of saying that, you know, this whole area has been a very rich area of inquiry in medicine for a long time. And we really sort of standing on the shoulders of giants here. Yeah, it’s not just GRAIL that’s been working on this.

Kim Thiboldeaux 15:26
Wow, that’s fascinating. And I certainly have great appreciation for any family who has strength in both the arts and the sciences. So that’s fantastic. That’s a great story, a great connection. Amoolya, I think many folks are used to our usual cancer screenings, things like mammography, things like colonoscopy. How are MCED tests different from single cancer screenings? And is the idea that these are going to replace single cancer screenings?

Amoolya Singh 15:54
Yeah, that’s a great question. They are different than single cancer tests. No, I don’t think they need to replace them. And then let me get into why. You know, as folks probably know, and you alluded to this in the US, there’s, you know, there are four major single cancer screens, right for breast, for lung, for cervical cancer, and colorectal cancer. And most of the screening tests here are looking for some type of abnormal cell, either visually through imaging or from a tissue sample. Right, so the mammogram or the CT scan are visual, they’re imaging modalities, and then the Pap smear and the colonoscopy are getting some tissue and then analyzing the tissue. So the MCED test, such as Galleri, is different in that it’s not looking for specific cells, you don’t have to kind of get, you don’t have to either look at, you know, cells through an imaging modality or take cells out of the body, it’s actually just looking for this abnormal DNA that’s shed in the blood, right. And because the blood circulates throughout the body, you can gather information from actually all over the body through the blood. So you don’t have to look at this particular organ or that particular organ. And then to your, you know, the question a little more about does it replace single cancer screen? No, absolutely, I don’t think so. The four cancers that I mentioned, have already, you know, well recommended screening guidelines, and they can and continue, you know, to be screened in that way. But this test detects another 46 cancers that these four screens don’t. Many of them are actually far more deadly than those four cancers. So one should really think of it as complementary, it’s a non-invasive way to start this, and it needs to then be followed up, you know, by a doctor with an actual diagnosis.

Kim Thiboldeaux 17:40
I know, Amoolya, that we’re still, in many ways in early days, I know there’s been a lot of time and investment in this technology, but we’re still in many ways in early days. But if you’re looking down the road into the future, if we want to imagine the idea that this would, you know, be a sort of a population scale screening, like you know, we as women every year we go get our, you know, mammography right. If we were thinking about sort of population scale screening for this kind of test, what are the important characteristics of an MCED test to enable that level of screening? Where are we going with this? I know, there are so many studies happening right now.

Amoolya Singh 18:16
Yeah, if we take a step back and think about this from, you know, and I can think about this from the perspective of a consumer rather than as a scientist, right? What, what does it mean for me to go and do this? At population scale, it needs to be cheap, right? It needs to be quick and painless for me. It needs to be highly accurate and sensitive so that I’m not worried about overdiagnosis or misdiagnosis, both of those are very, you know, could be points of real anguish in someone’s life. And it needs to be informative, so that the doctor can decide what to do next. And here’s one place where, you know, I want to call out a specific capability that the Galleri test has, which is it’s not just saying, Oh, look, we detected a cancer signal somewhere in the body. It’s actually able to pinpoint and localize where in the body that signal might be coming from. Is it from the thorax, you know, is it from the head and neck? Is it from the pancreas? Is it from the gut, and so on and so forth? And so that’s the informative piece that the doctor can then decide, okay, what, you know, how should I follow up on this? What shall I do next?

Kim Thiboldeaux 19:26
Tell me obviously, you’re very deeply engaged in this clinical development program there at GRAIL, how has Galleri changed over the past three or four years over time? You know, as the clinical program develops and advances, what are you learning? How’s the test changed or improved or become more refined?

Amoolya Singh 19:43
Yeah, terrific. And I love that you asked that because there’s a tie in actually to an earlier question, you asked about AI and machine learning. And I’ll try to bring those back together. So you know, the version of Galleri that’s on the market now went through pretty extensive testing in several large clinical studies of you know, tens of thousands of people before release. We’re very proud of that, and proud of the rigor and the transparency of that work. It has, at the moment, a low false positive rate. So the false positive rate means you, the test says, you have cancer when actually you don’t. And so you can see how harmful that could be. So you want an extremely, vanishingly low false positive rate there, so that you’re not over diagnosing or making people worry unduly. So our false positive rate is about half a percent. So that’s five in 1000. And it’s accurate in calling the location of the cancer 90% of the time. So again, if there’s a workup that’s necessary, the test is not going to send you on a wild goose chase through multiple practitioners to then localize and diagnose it. So we feel pretty good about that. What we’ve been continuing to work on behind the scenes is to drive down the cost of the test, right both by you can innovate in the lab workflows, there’s been innovations in sequencing technology, we’re continuously integrating all of those. And then of course, on the computational side, the efficiency and the scale of our computer models. We have gathered over 30 petabytes of data, that’s 30 followed by 15 zeros. And we publish our findings in both academic journals, as well as in the leading oncology conferences, and as I said, I’m proud of that level of transparency and rigor, because you need that to develop a safe, strong product in a kind of regulated marketplace. So then coming to the, you know, what are we learning? And what are we doing next? You know, in parallel to, as I said, there have been tens of thousands who have gone through the clinical studies. In total, if you include a big trial that we’re doing with the National Health System in the UK, the NHS Galleri trial, that encompasses in total of almost 350,000 individuals in you know, seven or eight different clinical studies. We also have data now on almost 60,000 commercial cases since the launch of Galleri. And so we, in house, we analyze these data continuously, actually, every week, and we’ll use them to consider further internal improvements. So coming back to machine learning, one of the definitions actually, of machine learning an operational definition is that it’s a method that gets better with more data. The more data you have, the better the method gets. And anything that fulfills that definition is actually considered machine learning. And so that’s why we’re so excited about continuing to gather data. And we don’t sit on all this data ourselves, either, we have actually almost a half dozen partnerships with, you know, leading academic labs around the world, so that we can start to share these findings with them, and advance our collective understanding of cancer biology as well. So hopefully driving the research as well as the detection.

Kim Thiboldeaux 23:06
Amoolya, I’ve been hearing from folks at GRAIL about a study that you guys are doing in the UK with the NHS, the National Health Service there. Can you tell us a little bit about that study and what you’re hoping to learn from that?

Amoolya Singh 23:18
Yeah, this is a very exciting and very big trial that is, that we’re doing in collaboration, as you said, with the UK National Health System, to essentially use a kind of randomized controlled setting, which is a gold standard for how you do a clinical trial where you have, you know, one group of folks in the trial receiving some intervention and the others not. So in this case, the intervention is the Galleri test, and the others then don’t get the Galleri test. And we have enrolled 140,000 people across the UK, in a very broad swath of both demographics and socio economics in this trial. The trial completed enrollment last summer, and in fact, set records in the world for actually one of the fastest enrolling trials, if not the fastest, because 140,000 people were enrolled in just 10 months. So there’s a huge logistics achievement actually, in sending the test out to where people were rather than expecting them to all come to a centralized location. So something to feel proud of there in terms of the reach and the and the access. With those 140,000 people what’s happening next is that they’re going to be followed for a period of three years, so that we can actually see in cases where the Galleri test predicted cancer, what does that look like over time? Do those develop into cases of cancer or not? What is the right interval in which we might recurrently screen people? And then the kind of outcome that we’re looking for is that by using this kind of tool, we can shift the detection of cancer to earlier and earlier stages, right? So you might know that cancer is typically staged in four stages. I won’t get into the details of it, but you know, stage one is the earliest and stage four is sort of the worst in terms of one’s prognosis. And so by this notion of stage shift says rather than detecting cancers at stages three and four, then we can hopefully shift many of those into stage one and two detection, when there are more options to treat these folks. So that’s a whirlwind tour of the NHS Galleri study.

Kim Thiboldeaux 25:42
That’s great. We’ll look forward to hearing the results of that and seeing how it impacts some of your clinical program in other parts of the world. Amoolya, I can feel your passion for this work and feel your passion for this field, the science and technology in the field of MCED. When as we come to the close of our conversation together, what what are you most excited about when you think about the future of cancer screening as a scientist and as a citizen?

Amoolya Singh 26:08
Yeah, yeah, two things. So maybe I’ll start with this with the citizen piece first, actually, you know, equity, I think is the first one that I’m particularly excited about, right? Because it’s known, it’s well known that even when there are no underlying genetic differences between people, the outcomes for cancer are much worse for minorities and underserved communities. And yet, you know, if you look at how the clinical trials usually enroll research participants, it lags the kind of census or the population at large, and those studies are often not representative of the population. And so I’m particularly excited about the there’s a bunch of work that GRAIL is doing towards equity, how the NHS trial has been designed. We have partnerships with the Veterans Association in America, the state of Louisiana, firefighters, various other risk populations, and so on. And so for me, in particular, on a personal note, you know, having grown up in a developing country, I find this aspect of equity incredibly inspiring, and I really hope we can I continue to act on this, it means a lot to me.

And then to your, you know, as a scientist, I’m excited to detect cancer earlier, because that sheds light on the sort of the origins and the progression of the disease. You know, as I said, people have been thinking about this for over a century. But this will hopefully let you know lead to kind of better and safer therapeutics. Before there were tests like this, cancer, you know, is typically diagnosed at much later stages. So that kind of limits what you can observe about the cancer, because now you’re at the moment of diagnosis or after, and your observations of that person or that patient may not generalize. And therefore the way that you treat it, and how well your treatment will work in other individuals may also not generalize. It’s sort of like, if you said, you know, you only ever look up at the sky in the evenings, then you’re limited to what you can see, you know, the lights fading, there’s only certain birds that are flying at dusk, there’s a moon, sometimes there’s not. And you actually have no idea that there’s such a thing as a morning sky or an afternoon sky, it looks quite different. And so I’m very excited to start to see that whole progression, rather than kind of a narrow slice of it, because then we can learn more, and our learning will hopefully be richer. And then finally, very quickly, I’ll say that, you know, I, like many others, have had a family member who has been diagnosed with cancer. And last last year passed away, actually. And I remember thinking to myself, when I was considering joining GRAIL, you know, if I can help even one family to not go through what my family just went through by finding that cancer earlier when it can be cured, like what better way to spend my time. So I’m very excited to be here and be able to contribute to this mission.

Kim Thiboldeaux 29:17
That’s fantastic. Amoolya Singh, Thank you so much for joining us for the show today. Amoolya Singh is the Senior Vice President of Data Science and Chief Scientific Officer at GRAIL and your insight and your passion really comes through in the conversation. So thank you for that. This is The Cancer Signal presented by GRAIL. I’m Kim Thiboldeaux. Tune in next time to learn more about the impact of early cancer detection.


Important Safety Information
The Galleri test is recommended for use in adults with an elevated risk for cancer, such a those aged 50 or older. The Galleri test does not detect all cancers and should be used in addition to routine cancer screening tests recommended by a healthcare provider. Galleri is intended to detect cancer signals and predict where in the body the cancer signal is located. Use of Galleri is not recommended in individuals who are pregnant, 21 years old or younger, or undergoing active cancer treatment.
Results should be interpreted by a healthcare provider in the context of medical history, clinical signs and symptoms. A test result of “No Cancer Signal Detected” does not rule out cancer. A test result of “Cancer Signal Detected” requires confirmatory diagnostic evaluation by medically established procedures (e.g. imaging) to confirm cancer.
If cancer is not confirmed with further testing, it could mean that cancer is not present or testing was insufficient to detect cancer, including due to the cancer being located in a different part of the body. False-positive (a cancer signal detected when cancer is not present) and false-negative (a cancer signal not detected when cancer is present) test results do occur. Rx only.

Laboratory/Test Information
GRAIL’s clinical laboratory is certified under the Clinical Laboratory Improvement Amendments of 1988 (CLIA) and accredited by the College of American Pathologists. GRAIL’s clinical laboratory is regulated under CLIA to perform high-complexity testing. The RUO test was developed, and its performance characteristics were determined by GRAIL. The RUO test is For Research Use Only, Not for Diagnostic Purposes. GRAIL’s current product offerings have not been cleared or approved by the Food and Drug Administration.