Why “sequenced from the source” matters for data provenance and reproducibility
Cell line misidentification, cross-contamination, and genetic drift are pervasive challenges in the life sciences. Ignoring these issues can lead to irreproducible research, faulty conclusions, and wasted resources that ripple far beyond a single experiment. These issues not only compromise experimental validity but also affect funding decisions and the credibility of published work. Importantly, there is also a significant risk of a cascading effect to other researchers building upon flawed data, amplifying errors across literature for potentially years to come.
Given the pace of biomedical research, the quality and transparency of omics data associated with every study is paramount. Ideally, data must be (1) derived from authenticated, characterized models; (2) generated under standardized conditions with documented workflows; and (3) distributed with complete provenance, linking datasets to physical materials that can be obtained and retested.
ATCC’s cell line omics datasets stand apart because they are “sequenced from the source”—produced directly from authenticated ATCC physical products by ATCC’s Sequencing & Bioinformatics Center, an ISO 9001–certified genomics laboratory and bioinformatics group. Pairing of the physical materials with data creates a closed loop of authenticity you can trust. Every dataset included in the ATCC Genome Portal is comprehensive, reproducible, and reliable, significantly reducing the uncertainty comes with third-party or aggregated data.
Data generation and quality control: Reference-quality WES and RNA-seq methods
Our optimized methodology is designed to deliver high-quality whole-exome and transcriptome data through a robust, multi-step sequencing workflow. We employ next-generation sequencing platforms to generate comprehensive exome coverage and RNA profiles, ensuring accurate representation of coding regions and gene expression. Each stage of the process is accompanied by stringent quality control measures, including assessments of read depth, coverage uniformity, and variant-calling accuracy. Only the datasets meeting predefined quality benchmarks are published to the ATCC Genome Portal.
Whole-exome sequencing (WES): ATCC’s WES workflow captures relevant genetic variants, thus establishing the starting point for researchers. We provide Clinvar annotations as well as the MSI-score for over 900 human cell lines. In addition, we provide WES and variant calls for over 50 mouse cell lines. This data enables researchers to have confidence in their starting material and provides the groundwork for reproducible experiments.
RNA-seq data: Derived from multiple biological replicates (typically 5 per cell line) and grown under standardized conditions, ATCC’s RNA‑seq datasets provide raw and normalized read counts for annotated genes in each genome. ATCC currently provides transcriptome profiles for over 900 human and mouse cell lines, and more data is being added quarterly.
Quality control: Every dataset undergoes stringent quality checks throughout the sequencing and bioinformatics process. For more information on our QC steps, please read our technical document. Documented data provenance with linkage to authenticated cell lines eliminates ambiguity in experimental design and enables reproducibility.
Programmatic access: In addition to accessing data via the ATCC Genome Portal website, we provide Supporting Members with a secure REST-API for programmatic access to our data. Our fully documented API includes a detailed tutorial for getting started and enables search queries and direct downloads for RNA-seq data, WES data, and structured metadata (JSON files).
The ATCC Genome Portal is built for reproducibility
The ATCC Genome Portal isn’t just a data repository; it’s a scientific resource built to address the real challenges of modern research. By directly coupling our authenticated cell lines with their rigorously generated omics datasets, ATCC has established a complete, traceable system for designing, validating, and reproducing experiments. The ATCC Genome Portal supports predictive and translational research by providing standardized, well‑documented data linked to physical materials, thereby improving comparability across studies and strengthening the evidentiary basis for discovery.
Ready to experience the full value of authenticated, reference‑quality omics data? If you already purchased ATCC cell lines, you can access their corresponding datasets directly in the ATCC Genome Portal. To explore the complete collection across all available models—and gain programmatic access, exclusive features, and additional member benefits—consider becoming an ATCC Genome Portal Supporting Member. Empower your research with comprehensive, standardized datasets that drive reproducibility, comparability, and innovation.
Did you know?
The ATCC Genome Portal has over 6,500 genomes, 500 exomes, and 3,000 transcriptomes and is continuing to grow.
Meet the authors
Briana Benton, BS
Program Manager, ATCC
Briana Benton is a Program Manager for ATCC’s Sequencing and Bioinformatics department. Her current focus is on the ATCC Genome Portal and expanding the collection of published reference genomes. Briana previously worked on the development of mock microbial communities for microbiome research and synthetic molecular standards for molecular diagnostics assays. Prior to joining ATCC, she developed molecular diagnostic assays for the Henry M. Jackson Foundation.
Jonathan Jacobs, PhD
Senior Director of Bioinformatics, ATCC
Dr. Jonathan Jacobs leads ATCC’s Sequencing & Bioinformatics Center and the development of the ATCC Genome Portal. He has over 20 years of experience in molecular genetics, bioinformatics, and microbial genomics, and he has worked throughout his career at the interface of academia, government, and industry. He holds a joint Research Professor appointment at Syracuse University’s Forensic & National Security Sciences Institute in support of microbial forensics graduate student training and research, and he actively collaborates with several US public health laboratories involved in pathogen genomics research and surveillance. Dr. Jacobs is also certified in Product Management from Pragmatic Institute, and he has led successful commercial launches of several bioinformatics products into the market.
Explore our featured resources
Discover the ATCC Genome Portal
The ATCC Genome Portal is a rapidly growing ISO 9001–compliant database of high-quality reference genomes from authenticated microbial strains in the ATCC collection. Through this cloud-based platform, you can easily access and download meticulously curated whole-genome sequences from your browser or our secure API. With high-quality, annotated data at your fingertips, you can confidently perform bioinformatics analyses and make insightful correlations.
MoreDiscover ATCC's Cell Line Omics Data
Learn more about our standardized workflow for producing transcriptomics data from the authenticated cell lines within our collection.
MoreATCC Genome Portal: Meticulously curated 'omics data from our gold-standard cell lines
The ATCC Genome Portal is a rapidly growing ISO 9001–compliant database of whole-exome and RNA-seq datasets from authenticated cell lines in the ATCC collection.
More