Author: Ajeet Singh, PhD
It is projected that genomics research will generate between 2 and 40 exabytes of data by 2025.1 This anticipated surge in genomic data is primarily driven by the rapid expansion of large-scale collaborations and biomedical research initiatives, and it has resulted in an exponential increase in the amount of public data available. While this wealth of data has significantly enhanced the ability to extract valuable insights for real-world experiments, relying solely on public data can lead to erroneous conclusions and present various challenges such as ensuring accuracy, traceability, and appropriate custodianship of verified and accessible biological materials.
The problem with public data
Over the past 25 years, there has been remarkable progress in characterizing the cellular transcriptome on a genome-wide scale. Initially, the field relied on microarray technology to study the cellular transcriptome, but it has since shifted toward next-generation sequencing (NGS) techniques to generate comprehensive transcriptome or genome profiles of cells. This advancement has resulted in an unprecedented accumulation of data, enabling a deeper understanding of genome functionality and its implication for human health and disease.1-4
NGS has revolutionized the precise measurement of gene expression levels, providing comprehensive insights into cellular transcriptomes and genomes. These datasets hold tremendous potential for unraveling the functional information encoded within DNA sequences.5 However, the utilization of genomic data is marred by various challenges, including misclassification of sequences, chimeric genome assemblies, sample contamination, sequencing errors, mislabeling or data errors, data omission, data obfuscation, international misconduct, and association with unverified or poor-quality biomaterials.3,6 These concerns have raised serious apprehensions about the use of genomic data and have significantly impacted crucial areas such as hypothesis generation in basic research, biodiversity and environmental sciences, diagnostics and epidemiology, forensics, food safety, biodefense, and numerous other fields.
Creating authenticated genomic data
The issue of irreproducibility in biomedical research is not a new problem, and various attempts have been made by different groups to address it.7 However, none of these efforts have focused on creating authenticated genomic data. So, what exactly is meant by “authenticated genomic data”?
Authenticated genomic data refers to data that meets the following criteria:6,8
- Traceability to physical materials: The data can be traced back to specific physical samples or materials, ensuring transparency and accountability.
- Produced with defined quality assurance metrics: The data generation process adheres to well-defined quality assurance metrics, ensuring reliability and accuracy.
- Reproducibility across multiple tests: The data can be reproduced consistently across multiple experiments or analyses, validating its robustness.
- Repeatable by independent researchers: The data can be independently replicated by different groups of researchers, further validating its authenticity.
By establishing standardized omics data derived from authenticated materials and using them as reference data, scientists can achieve greater credibility, reproducibility, and consistency in their research findings.
Establishing accredited reference standards for precision therapeutics
The pursuit of scientific research heavily relies on the utilization of cell lines and genomic data to investigate diverse biological phenomena; however, ensuring reproducibility and reliability remains a critical challenge. To enhance scientific rigor and maximize reproducibility, ATCC and QIAGEN developed ATCC Cell Line Land. This resource combines ATCC’s fully authenticated and characterized cell lines with QIAGEN’s best-in-class bioinformatic solutions, providing researchers with unparalleled access to accurate transcriptomic and genomic datasets derived from credible biological materials.
With ATCC Cell Line Land, you can expect:
- Enhanced cell line authentication
Misidentification or contamination of cell lines poses a major obstacle to reproducible research. ATCC Cell Line Land offers an expanded cell line authentication program that employs gold-standard methods like DNA profiling and short tandem repeat (STR) analysis. By ensuring accurate identification of cell lines, researchers can have unwavering confidence in the integrity of their experiments and data.
- Rigorous quality control and comprehensive characterization
Through various techniques like karyotyping, mycoplasma testing, and phenotypic analysis, ATCC ensures the authenticity, purity, and functionality of cell lines. Researchers gain access to extensive characterization data, enabling them to select the most suitable cell lines for their experiments with confidence.
- Transparent and accessible data
Researchers can easily access detailed information about each cell line, including its origin, passage history, authentication data, and characterization results. This facilitates informed decision-making regarding the selection of cell lines and empowers researchers to replicate and build upon previous research with ease.
- Educational resources
As part of ATCC Cell Line Land, we offer a wealth of educational resources, including webinars, workshops, and online courses. These resources enable researchers to enhance their understanding of cell line authentication, quality control, and best practices in experimental design. By equipping researchers with the necessary knowledge and tools, ATCC empowers the scientific community to foster a culture of reproducibility.
- Collaborative network and knowledge sharing
The online platform serves as a hub where researchers can connect with experts, share protocols, and engage in discussions about best practices. This collaborative network not only enhances scientific rigor but also promotes the adoption of standardized methods, reducing variability in experimental outcomes and improving reproducibility across the scientific community.
ATCC Cell Line Land represents a significant leap forward in enhancing scientific rigor and maximizing reproducibility in research. Through its focus on expanded cell line authentication, rigorous quality control, transparent data, educational resources, and collaborative networks, ATCC provides researchers with the necessary support and resources to conduct robust and reproducible experiments. With ATCC Cell Line Land, researchers can confidently contribute to the advancement of scientific knowledge in a reliable and transparent manner, ensuring the integrity of their data and findings.
Download a PDF of this white paperDownload now
Learn more about ATCC Cell Line Land
This is a poster presented at Bio-IT World 2023 that describes our standardized laboratory and bioinformatics workflows for the whole-transcriptome analysis of over 200 ATCC cell lines.More
Quickly find the cell models you need with curated ‘omics data traceable back to authenticated cell lines.More
- Stephens ZD, et al. Big Data: Astronomical or Genomical? PLoS Biol 13: e1002195, 2015.
- Peng RD, Hicks SC. Reproducible Research: A Retrospective. Annu Rev Public Health 42: 79-93, 2021.
- Freedman LP, Cockburn IM, Simcoe TS. The Economics of Reproducibility in Preclinical Research. PLoS Biol 13: e1002165, 2015.
- Capes-Davis A, Neve RM. Authentication: A Standard Problem or a Problem of Standards? PLoS Biol 14: e1002477, 2016.
- Costa V, Angelini C, De Feis I, Ciccodicola A. Uncovering the complexity of transcriptomes with RNA-Seq. J Biomed Biotechnol 2010: 853916, 2010.
- Wallach JD, Boyack KW, Ioannidis JPA. Reproducible research practices, transparency, and open access data in the biomedical literature, 2015-2017. PLoS Biol 16: e2006930, 2018.
- Raphael MP, Sheehan PE, Vora GJ. A controlled trial for reproducibility. Nature 579: 190-192, 2020.
- Brito JJ, et al. Recommendations to enhance rigor and reproducibility in biomedical research. Gigascience 9, 2020.