A recurring question SeraCare has been asked is whether synthetic DNA sequences perform similarly to cell line-derived sequences. For example, it could be hypothesized that secondary structure and/or the GC content of the DNA around a particular amplicon might lead to a detection bias when only 400 bases on either side of the amplicon are present compared to when many kilobases are present.
In order to address this, we looked at the performance of our Seraseq™ Solid Tumor Mutation Mix-I AF10 and AF20 reference materials at several external laboratories that were running Ion Torrent AmpliSeq™-based assays on Ion Torrent PGM instruments from Thermo Fisher Scientific. AF10 and AF20 represent allele frequencies of 10 and 20 percent respectively.
Each of these materials was analyzed repeatedly over the course of several months. One of the sites ran a custom AmpliSeq assay, while the other two ran the Ion Torrent AmpliSeq Cancer Hotspot v2 assay. FASTQ files were obtained – which should represent the platform’s best estimate of the sequence base calls for a particular read – and analyzed using a custom algorithm that looks for the mutations and Internal Quality Marker (ALTs) that are found within the reference material as well as the GM24385-derived germline reference sequences (REFs). (More information about the Internal Quality Marker (IQM) can be found in this technical note PDF).
In NGS-based assays, all regions of DNA are not detected or amplified equally well, which results in different depths of coverage. As shown below in data for the Seraseq Solid Tumor Mutation Mix-I AF20, for a given site and assay, the depth of coverage for a given target in relation to that of other targets is relatively consistent between runs. The depth of coverage for a given target (sum of REF and ALT) in relation to the average depth for all targets in that run is shown log2 transformed and averaged across all runs on the horizontal axis. This data indicates that most coverage depths are within a factor of 2 of the average coverage depth. The values on a per-run basis are shown on the vertical axis and typically have tight distributions.
Such differences could arise if certain genomic regions are converted into amplicons with different efficiencies. In this case, a shorter synthetic DNA sequence might be amplified preferentially compared to a difficult long stretch of cell-line-derived genomic sequence.
For Ion Torrent AmpliSeq-based assays, there does not appear to be an amplification and/or detection bias for synthetic DNA sequences. As shown below in data for the AF20 sample, the ratio of ALT and REF reads for a given target does not appear to be influenced by the relative depth of coverage for that target. The horizontal axis shows the relative depth of coverage for a given AF20 target in a given run that was depicted on the vertical axis in the previous figure. The vertical axis shows the ratio of ALT and REF reads for that target and run, log2 transformed. This ratio is used instead of variant frequency since it does not compact close to 0% and 100%. For a 20% variant frequency, a value of log2(0.2/0.8) = -2 would be expected. Overall, the data for each site distributes horizontally, which would be expected if there were no detection bias. Thus, amplicons that are generated less efficiently (left on the horizontal axis) do not appear to show a trend towards preferential amplification and/or detection of the synthetic sequences (upward shift on the vertical axis). Assay 1 does appear to show that the most difficult to generate/detect amplicon has a higher variant frequency. However, the variant frequency of that particular target (FGFR3, a homopolymer variant) is consistent with the variant frequency observed at the other sites.
As shown below, the observed variant frequencies correlate well between sites for AF10 (blue) and AF20 (orange). When the ratio of ALT and REF reads (log2-transformed and averaged across all runs for a given site) is compared as before, the majority of values are found close to a 1:1 diagonal and are tightly clustered. There are several points outside of the clusters, which are typically targets involving homopolymers where aberrant amounts of REF or ALT alleles are observed due to sequencing errors in the FASTQ files. The FGFR3 data points are highlighted in red.
As shown below, unlike the observed variant frequencies, there is poor correlation between the relative coverage depths obtained with different assays (left panel). At the sample time, there is good correlation between the relative coverage depths obtained with the same assay run at different sites (right panel). The results for AF10 (blue) and AF20 (orange) are similar. Since the variant frequencies correlate between different assays but the relative coverage depths do not, this further supports the conclusion that for AmpliSeq-based assays, in regions of DNA that are not detected or amplified well, there is no apparent bias in the detection of synthetic sequences.
Overall, the data suggest that for AmpliSeq-based assays there is no detection bias where synthetic DNA may be detected preferentially compared to genomic DNA when amplicons are generated and/or detected with relatively lower efficiencies.
This cannot rule out a consistent bias that affects all targets equally. However, such a bias would be nearly impossible to detect and identify. For example, the use of an alternative method like qPCR or dPCR itself involves the use of PCR amplification – which is also used for the AmpliSeq assay. Since the data for two different AmpliSeq-based assays is similar, qPCR and dPCR results would also be expected to be similar.
More information about our Seraseq Solid Tumor Mutation Mix-I (AF20) and (AF10) can be found here.