The use of unauthenticated biological reference materials, lack of source metadata, and frequent gaps in data provenance have mounted concerns about the reliability and reproducibility of preclinical research data. To overcome this problem, we characterized the whole transcriptomes of over 70 human and mouse kidney cell lines frequently used as model systems in
toxicology,
drug development,
viral and biologics production, and
cancer research. Because the resulting whole transcriptomic datasets are traceable back to authenticated cell lines, they can be used as molecular reference controls for RNA profiling, and that enables scientists to make well-informed decisions regarding the selection of cell lines for experimentation, study design, and result interpretation. In this study, we used whole transcriptome sequencing (WTS) to evaluate the baseline gene expression profiles of the different kidney cell lines. These analyses provide a comprehensive view of the transcriptome in each cell line, allowing for the accurate detection of known, novel, and rare transcripts; evaluation of long-read lengths for full-length transcripts; and the study of fusion genes and splice variants. Our initial survey of the data revealed transcriptional similarities among the kidney cell lines and their molecular traits. We identified the total number of genes expressed and their relative abundance in various cell types. A comparative analysis of the parental HEK-293 (
ATCC CRL-1573) cell line and its derivative cell lines was also performed. Here, we used WTS to reveal the comprehensive landscape of RNA expressed between HEK-293 and its derivative 293.STAT1 BAX KO. The 293.STAT1 BAX KO (
ATCC CRL-1573-VHG) cell line was derived from HEK-293 by using CRISPR-Cas9 gene-editing technology to create a
STAT1 BAX double knockout that exhibits enhanced virus production capability as compared to the parental cell line. Through our comparative analysis of these cell lines, we discovered the total number of genes that are differentially expressed in the knockout cells as compared to the parental cells. The high-quality transcriptomic datasets obtained in this study provide valuable molecular insights on different cell lines, saving researchers time and resources related to cell line characterization while enabling better selection of cell lines for preclinical studies. Through key advances in cell line characterization, high-throughput RNA-seq data accompanied with related metadata improve scientific rigor and can help reduce the risk of irreproducibility.