Conta: Methods for detecting trace amounts of contamination

Introduction

  • Next generation sequencing (NGS) assays of cell-free DNA (cfDNA) must achieve high sensitivity and specificity in order to accurately detect circulating tumor DNA, enabling the early detection of cancer.
  • Contaminating DNA from adjacent samples in library preparation plates may compromise specificity, because rare single nucleotide polymorphisms (SNPs) from the contaminant may look like low frequency somatic mutations. Methods that obtain a signal based on fragment size and methylation status may also be affected by a contaminating sample. Copy number variations (CNVs), pregnancy, and transplants may also generate contamination-like SNP signals in plasma.
  • Here we present conta, a package for detecting presence of cross-contamination in NGS samples with high reliability. The package includes methods to call putative contamination events based on population minor allele frequencies (MAF) as well as methods to detect source of contamination from possible candidates.