NGS data have completely changed the field of biomedical research by enabling the development of personalized medicine, revealing new biomarkers and therapeutic targets, providing unprecedented insights into the genetic basis of diseases, expanding our understanding of basic science, and speeding up translational and clinical research initiatives.1-4 However, data incompleteness, inconsistencies, and inaccuracies are major issues with publicly curated databases of NGS data.5 Even with the best of intentions, biases and artifacts can still be introduced into data due to differences in sequencing technologies, sample preparation techniques, and data processing pipelines.6 Differences in metadata standards, curation procedures, and data annotation across databases can also further complicate data integration and interpretation. These issues present serious concerns for scientists that rely on publicly managed databases for their studies as reproducibility, data reuse, and downstream analysis reliability can be significantly hampered.
To address reproducibility concerns, ATCC has developed standardized sequencing, assembly, and annotation pipelines to generate NGS data from the authenticated biological materials within our biorepository.6 To date, we have sequenced over 4,500 microbial strains and 400 cell lines and have made the resulting curated ‘omics data available to researchers through the ATCC Genome Portal and ATCC Cell Line Land, respectively.
The ATCC Genome Portal is a rapidly growing ISO 9001–compliant database of reference-quality whole-genome sequences from authenticated microbial strains in the ATCC collection. This cloud-based platform, which is hosted with our partner One Codex®, enables users to easily access and download thousands of meticulously curated whole-genome sequences from their browser or our secure API. Further, through our new Supporting Membership opportunities we now offer extended capabilities such as Discrepancy Reports, which enable users to align their sequencing data with our reference genomes to evaluate SNPs, indels, deletions, and other genetic differences.
For our human and mouse cell lines, we partnered with QIAGEN® to provide whole-transcriptome (RNA-seq) datasets through ATCC Cell Line Land—a product of QIAGEN® OMICSOFT. By combining QIAGEN's best-in-class bioinformatics solutions with ATCC's fully authenticated and characterized cell lines, users are offered unparalleled access to the accurate datasets and credible materials needed to streamline the discovery of gene signatures, biomarkers, and therapeutic targets.
Overall, our aim is to leverage technology for tackling biomedical challenges of reproducibility. By providing data provenance that is traceable, standardized, and authenticated to its original source, we are supporting scientific credibility and data reproducibility efforts. As we look toward the future, we will provide continual updates to the ATCC Genome Portal and ATCC Cell Line Land to further broaden the utility of NGS data in basic and translational research.
Did you know?
We aim to deliver a minimum of 1,000 new authenticated datasets to both the ATCC Genome Portal and ATCC Cell Line Land each year.
Meet the author
Ajeet Singh, PhD
Senior Scientist, ATCC
Dr. Ajeet Singh is Senior Scientist at ATCC where he is focused on providing reference-grade whole transcriptome data that is authenticated, standard, and traceable to physical source materials available in ATCC’s biorepository. Prior to joining ATCC, Dr. Singh received his PhD in Agricultural Plant Pathology where he performed research focused on epidemiology and integrated management of plants pests and diseases. He then performed postdoctoral research at the National Institute of Environmental Health Sciences and subsequently worked as a Senior Staff Scientist at the National Cancer Institute. Dr. Singh has extensive experience in biomedical research with his research career expanding an array of interrelated disciplines exploring epigenetics, chromatin and gene expression in reproductive developmental toxicology, stem cell biology, and cancer.
Explore our featured resources
Discover ATCC's Transcriptomics Data
Learn more about our standardized workflow for producing transcriptomics data from the authenticated cell lines within our collection.
MoreDiscover the ATCC Genome Portal
The ATCC Genome Portal is a rapidly growing ISO 9001–compliant database of high-quality reference genomes from authenticated microbial strains in the ATCC collection. Through this cloud-based platform, you can easily access and download meticulously curated whole-genome sequences from your browser or our secure API. With high-quality, annotated data at your fingertips, you can confidently perform bioinformatics analyses and make insightful correlations.
MoreReferences
- Jerzy KK. Next Generation Sequencing - Advances, Applications and Challenges. InTech. doi: 10.5772/60489, 2016.
- Schuster SC. Next-generation sequencing transforms today's biology. Nat Methods 5(1): 16-18, 2008.
- Kalayinia S, et al. Next generation sequencing applications for cardiovascular disease. Ann Med 50(2): 91-109, 2018.
- Parikh VN, Ashley EA. Next-Generation Sequencing in Cardiovascular Disease: Present Clinical Applications and the Horizon of Precision Medicine. Circulation 135(5): 406-409, 2017.
- Cheng C, et al. Methods to improve the accuracy of next-generation sequencing. Front Bioeng Biotechnol 11: 982111, 2023.
- Pfeifer JD, et al. Reference Samples to Compare Next-Generation Sequencing Test Performance for Oncology Therapeutics and Diagnostics. Am J Clin Pathol 157(4): 628-638, 2022.