ATCC 100 Years Logo Anniversary ATCC 100 Years Logo Anniversary Cart 0
  • Careers
  • Support

Generalizing Herpesvirus Genome Assembly: A Novel Bioinformatics Approach

Poster
3D rendering of DNA with rows of ones and zeros across the image.

SFAF 2025

Santa Fe, New Mexico, United States

May 20, 2025

Abstract

Herpesviruses, such as herpes simplex virus type 1 (HSV-1) and Epstein-Barr virus (EBV), are highly prevalent DNA viruses that can cause recurring, lifelong infections and are associated with various cancers. Due to its prevalence in humans and other species, it is vital to study these viruses to gain a better understanding of how they infect and persist in their hosts. Herpesviruses were originally regarded as viruses with low genomic diversity, but as more herpesviruses are sequenced with a higher accuracy, that paradigm has started to shift as new research shows that they can evolve faster than previously assumed.

However, these viruses are notoriously difficult to assemble due to their large genome size (125-240 kb in length), extremely high GC content, and numerous terminal and internal repeat regions. With available short-read only approaches, resulting assemblies are often fragmented; on the other hand, genomes assembled using long-read only approaches are more complete, however, they often fail to capture the inverted terminal repeat (ITR) regions. Publicly available herpesvirus genomes tend to use a more manual approach that involves laboriously curating each contig. Due to manual intervention, these assemblies are more complete; however, for high-throughput labs, such methods cannot be easily employed. 

Here, we present our assembly pipeline which leverages long-read Oxford Nanopore (ONT) data and includes herpesvirus specific pre-processing steps to generate complete herpesvirus genomes. Briefly, long reads are first trimmed according to ATCC’s quality standards, and the resulting reads are then binned based on taxonomy use Kraken2. Any reads that classify as “virus” move forward to assembly using the Flye assembler. Uniquely, this pipeline does not employ de-hosting of the reads as this process can remove essential herpesvirus reads as the human host genome contains sequences homologous to herpesviruses. Genomes assembled using this pipeline were more complete and resolved the ITR regions. Through this method, we were also able to assemble more complete genomes of other DNA viruses, such as adenoviruses.

Download the poster to learn about our assembly pipeline for generating complete herpesvirus genomes.

Download

Presenter

Nikhita Puthuveetil, headshot.

Nikhita Puthuveetil, MS

Senior Bioinformatician, Sequencing and Bioinformatics Center, ATCC

Nikhita Puthuveetil is a bioinformatician at ATCC that performs routine bioinformatics analysis on internal sequencing submissions, primarily SARS-CoV-2 samples as well as plasmid and bacterial samples. She also works with her team to aid in the development of the ATCC Genome Portal. She first joined ATCC as an bioinformatics intern in 2019 where she worked to create an internal sequencing dashboard in R to track and manage sequencing at the Sequencing and Bioinformatics Center. She has an MS in Bioinformatics from Virginia Commonwealth University.

DNA rods with bacteria.

Reference-quality sequences

Through the ATCC Genome Portal, you can easily search, access, and analyze thousands of reference-quality genome sequences. Our optimized methodology is designed to achieve complete, circularized (when biologically appropriate), and contiguous genomic elements by using short-read (virology collection) and hybrid (bacteriology, mycology, and protistology collections) assembly techniques. We then take our workflow one step further by accompanying each stage of the process with rigorous quality control analyses that ensure the highest quality data. Only the data that passes all quality control criteria are published to the ATCC Genome Portal. Visit the portal today to find the high-quality data you need for your research.

Visit the portal