In a recent post, we discussed key considerations for designing a robust next-generation sequencing (NGS)-based lung cancer assay. Putting those plans into action in the development phase brings forth a new set of challenges. Through our experience developing NGS reference materials and the relationships we’ve built with assay developers of all stripes, we’ve identified those important factors and ways to navigate them. But before you begin designing and optimizing your assay, you should become very familiar with binomial and Poisson distributions and their use because the outcome of many analytical steps can be modeled and explained with them.
Comprehensive – More Than Just a Buzzword
A truly comprehensive assay should look for all diagnostic, prognostic, and predictive genetic markers related to the disease of interest. Some mutations that are often in so-called “hotspots” lead to the activation of signaling pathways and transcription factors that lead to cell growth. Amplifications of such molecules can also lead to cell growth. Abundant mutations at other sites can be indicative of cancers that have lost the ability to properly repair their DNA, which may be candidates for the use of checkpoint inhibitors.
Additionally, genomic rearrangements where an expressed protein is fused to a signaling molecule – so called fusions – are also important. By incorporating the ability to detect a variety of mutations in your assay, you’re potentially increasing its clinical utility. And with the availability of biosynthetic reference materials bearing even rare mutations, you can be confident in your assay’s ability to detect those important markers.
Platforms and Elements
There are a variety of NGS platforms with different strengths and weaknesses. For developing our Seraseq® NGS reference materials, we often use a MiSeq for its relatively long reads, reasonable accuracy, and acceptable turnaround time. Accuracy is important and, for what we do, decreased accuracy is not an acceptable tradeoff for much faster turnaround times. But depending on your assay’s intended use, shorter turnaround times make take priority over peak accuracy. These are just two factors to consider in evaluating the most appropriate platform for your assay.
Whichever platform you choose, you want to thoroughly understand all elements of the assay and their failure modes. You could start with an off-the-shelf assay and an off-the-shelf data analysis pipeline, but you need to understand it and you will very likely need to (as was mentioned many times on American Idol) make it your own. There are many assays out there already, and improved diagnosis and treatment of cancer requires improved assays – not more people running existing assays. Unfortunately, improved diagnosis and treatment is not necessarily associated with higher payments from insurers… yet.
Sample Type Drives Important Decisions
Liquid biopsy assays have very different requirements from those interrogating solid tumor samples. With a median variant frequency for driver mutations at below 1% and possible drug resistance mutations at below 0.1%, liquid biopsies demand exceptional sensitivity and specificity. Additionally, the detection of fusions by cell-free DNA (cfDNA) generally requires the sequencing of entire introns. On the other hand, formalin fixation is known to damage DNA and introduce apparent low frequency mutations, so solid tumor assays that use FFPE tissues as input generally require lower sensitivity to not get swamped by the noise. However, the expected variant frequencies are generally well above 1% – although, sometimes dissection is required to enrich for tumor cells – and preserved RNA can be used to detect fusions without having to rely on sequencing genes in their entirety. While unfixed biopsies are generally not available, their analysis should allow for less noise.
You may also want to consider a tumor/normal or tumor/normal/calibrator workflow. If you have access to non-cancerous tissue that is from the same person as the possible tumor and that has been processed similarly, then the detection of differences such as somatic mutations and amplifications is often more straightforward than when the normal tissue is not available. By including an additional calibrator of known sequence, it is further possible to determine how accurately your assay sequences various regions in a particular run and provides an additional sequence to compare against the tumor and normal samples. Even if a region in the calibrator is not sequenced accurately, if this region looks different in the other samples, then there may be something different in those samples.
Detecting All the Right Mutations
Mutations come in different flavors including SNVs, short INDELs, long INDELs, INDELs in homopolymers, etc., and some can be quite difficult to detect. Of these, changes to the lengths of longer homopolymers can be difficult to detect reliably but are clinically relevant when they lead to frameshift mutations that inactivate tumor suppressor genes. While different sequencing platforms differ in their ability to sequence homopolymers accurately, there is a reason why even Illumina has the R8 filter flag.
A sometimes-underappreciated aspect of detecting mutations is finding appropriate settings for aligners that allow for the detection of a broad range of mutations. When it comes to INDELs, the gap opening and gap extension penalties are important. If the penalties are too low in relation to the mismatch penalties, then there is a chance that sequencing errors and SNVs will lead to the appearance of INDELs. On the other hand, if they are too high, then there is a chance that INDELs at the ends of reads will be missed – especially if the aligner is allowed to clip bases from the ends. If INDELs are missed at the ends of reads, then their apparent frequencies will be biased low and may end up below reporting thresholds.
The way an assay is designed can also impact its ability to detect mutations. For an amplicon-based assay, it is critical to design primers in regions of genes that are unlikely to contain SNPs. If an SNP prevents the binding of a primer, then a mutation that would normally be detected may be missed unless it can also be detected with a different amplicon; however, overlapping amplicons can pose challenges for PCR. With amplicon-based assays, it can be helpful to look at the relative amounts of different amplicons to see if some are below expected frequencies, but that does not guarantee that an SNP under a primer would get noticed. Similarly, mutations under primers can prevent the detection of nearby mutations. In fact, while designing NGS reference materials with multiple nearby mutations, we have encountered assays that cannot detect some of the mutations well because other mutations prevent the binding of primers. Primers in multiplex reactions must be compatible with one another, selecting primers that are compatible, cover the regions of interest, and are unlikely to be affected by SNPs is important. Finally, amplicon-based assays can have difficulties when templates are short – such as is the case with circulating cell-free DNA – and amplicons are long.
For a hybrid/capture-based assay, mutations likely represent mismatches to the probes used for capture, and probes with mismatches tend to hybridize less efficiently. This has the potential to bias the capture of DNA molecules toward those whose sequences are closest to the reference, but this can be mitigated by using longer probes and lowering the binding stringency. Hybrid/capture-based assays must also be able to separate sequences from homologous genes and pseudogenes because they will very likely end up in the data. An advantage over PCR is that it is generally easier to sequence genes in their entirety, and hybrid/capture-based assays are generally easier to scale to add additional regions for sequencing. Being able to sequence genes in their entirety is important if you intend to sequence tumor suppressor genes because there are a very large number of ways to inactivate a gene. At the same time, it is very hard to reactivate an inactivated gene, so such mutations may be more prognostic in nature. Being able to sequence genes in their entirety – or, at least certain introns – is also important if you intend to look for fusions at the DNA level.
While an assay must be designed with the intended performance in mind, it is critical to validate that the assay performs as intended. Additionally, lot-to-lot and run-to-run variability should be monitored to determine whether performance is always as intended. With a nearly infinite number of possible mutations, it is important to validate with reference materials that contain many different types of mutations at different locations at consistent frequencies. While integrating the performance results from many mutations is complex, doing so provides a better view of assay performance than relying on just a handful of variants. That’s why our Seraseq reference materials contain up to 40 different mutations and a variety of variant types, and our iQ NGS QC Management software enables robust tracking and trending of QC data with just a few clicks.
The Rising Importance of Detecting Fusions
In fusions, there are chromosomal rearrangements that link two different genes together. In those that are clinically relevant, one of the genes is often expressed highly and the other provides domains that are often involved in signaling or gene expression. It can also be the case that the fusion has a conformation that leads to constitutive signaling or gene expression. While some fusions such as EML4-ALK are relatively common and can even be diagnostic for some cancers, new fusions are constantly being reported.
Theoretically, detecting possible fusions at the mRNA level is relatively straightforward – especially when both ends of mRNA molecules can be sequenced. One simply has to look for sequences where the 5’ and 3’ ends do not come from the same gene and hope that it is not an artifact of the assay. However, cells contain many mRNA molecules and many common mRNA molecules make it difficult to obtain sufficient sequences from rarer mRNA molecules. Additionally, many mRNA molecules are so long that they are not amenable to paired-end sequencing. A solution is to sequence mRNA fragments (after converting them to cDNA, typically) and to either incorporate hybrid/capture approaches in order to enrich for domains that are known to be important in fusions or to use amplicons where one or both primers are in regions that are known to be important in fusions.
Detecting fusions at the DNA level is more difficult since the relevant translocations often involve introns. However, for assays that have to use DNA – such as those that look at cfDNA – it is necessary to sequence introns in order to look for fusions. Introns can be large and often far exceed the size of exons. This makes it difficult to design amplicon-based assays for the detection of fusions at the DNA level. Therefore, methods like hybrid/capture should be used, but fusions will behave like large INDELs and the capture efficiency of inserts that span the fusion may be lower than of wild-type inserts. Attempting to detect fusions at the DNA level also means that a large amount of the NGS reads will be devoted to sequencing introns that could otherwise be used to look for somatic mutations.
Despite the challenges, fusions play an important role in cancer and a comprehensive assay should look for them. And having a sustainable source of reference material containing these sometimes-rare fusions is critical to evaluating your assay’s sensitivity and specificity for these targets. Check out our purified RNA and FFPE-based reference materials for detecting fusions at the mRNA level, and DNA and ctDNA reference materials containing fusions at the DNA level.
Don’t Forget About CNVs
In fusions, the promoters and enhancers of a well-expressed gene can lead to increased expression of domains that are often involved in signaling or gene expression. Gene amplifications – generally referred to as copy number variations (CNVs) or alterations (CNAs) – accomplish something similar by having more copies of a target gene. For example, the duplication of EGFR can lead to enhanced expression of EGFR and higher sensitivities to its ligands such as EGF and TGF-α.
To detect amplifications, it is necessary to establish how your assay’s data looks when amplifications are present and when they are not. In most NGS-based assays (both amplicon-based and hybrid/capture-based), amplifications manifest themselves as increased numbers of reads coming from the amplified regions in relation to other reads. Detecting such increases generally requires consistent assay performance, which is why carefully tracking and trending your NGS QC data is crucial. In the case of assays that look at cfDNA, those increases can be very, very subtle – yet another challenge unique to liquid biopsy assays. Additionally, the increase in reads may not be linear and it may be necessary to use calibrators if you want to relate increases in reads to extra copies. We have purified DNA reference materials for liquid biopsy and solid tumor assays with defined numbers of extra copies of genes such as EGFR, ERBB2, MET, MYC, and MYCN.
What’s Next?
We have only scratched the surface in both these brief paragraphs and the first blog in this two-part series. Developing NGS-based assays is uniquely challenging, but with careful design, and the aid of ground-truth reference materials with which to perform thorough validation studies, you can bring the world one step closer to the promise of precision medicine.
You can dive deeper into some of the topics covered in this blog by downloading our free white paper on how to develop a clinical NGS assay without losing your mind (or your shirt).