• Careers
  • Support
Cluster of thin, pink, rod-shaped Mycobacterium tuberculosis.

Reclassification of the Mycobacterium tuberculosis Complex (MTBC) Species as Mycobacterium tuberculosis

April 26, 2018, at 12:00 PM ET


The species of the Mycobacterium tuberculosis Complex (MTBC)—M. tuberculosis, M. africanum, M. bovis, M. caprae, M. microti, and M. pinnipedii—are very closely related. In this webinar, we will discuss the techniques used to examine the MTBC in order to unravel this taxonomic mystery. Using phylogenomic techniques to compare the type strains of these species, we discovered that all of these “species” are, in fact, M. tuberculosis. We further examined all the strains deposited in GenBank under those species names and found all of them to be strains of M. tuberculosis. All known strains of three other putative MTBC members (“M. canettii”, “M. mungi”, and “M. orygis”) were similarly shown to be strain of M. tuberculosis. We have recently published a paper in the International Journal of Systematic and Evolutionary Microbiology officially unifying the previously separate MTBC species as M. tuberculosis.

Key Points

  • Using whole-genome sequencing (WGS) and phylogenomic analysis of the MTBC species type strains, we discovered that all of these “species” are, in fact, Mycobacterium tuberculosis
  • By similarly analyzing all the MTBC non-type strain whole-genome sequences (>3,700) in GenBank, we determined that all of these strains similarly should be considered to be strains of Mycobacterium tuberculosis.
  • We recommend the use of the infrasubspecific term ‘variant’ and infrasubspecific designations that generally retain the historical nomenclature associated with the groups or otherwise convey such characteristics (e.g., M. tuberculosis variant bovis).
  • ATCC is currently in the process of updating the nomenclature used in our catalog to reflect this phylogenomically modernized taxonomy.


Marco Riojas, headshot.

Marco Riojas, PhD

Scientist, ATCC and BEI Resources

Dr. Riojas has been with ATCC/BEI Resources for over 10 years. He has a Ph.D. in Biodefense, and his primary research interests include biodefense, bacterial systematics, phylogenomics, and genomic identification of species.

Questions and Answers

Does publication of this paper mean that the scientific community has to use the names you describe? For example, Mycobacterium tuberculosis variant africanum?

The short answer to the question is that it is not required. For a more complete answer, we have to consider the two interlinked main components of bacterial systematics separately. Bacterial systematics is the system of classifying bacteria and assigning them names. This breaks down into two main components. The first is taxonomy, which is the study of organizing different items (in this case, bacteria) into various groups based on similarities and differences. The second is nomenclature, which is the assignment of correct scientific names according to standard conventions. Taxonomy is what actually classifies bacteria as different or the same, and establishes their relationships to each other. Nomenclature is the names we give those groups to indicate that they are similar or different. They are interlinked in that we first use taxonomy to define taxonomic groups, then nomenclature to give those groups names. So they are not entirely independent.

Let’s look first at taxonomy. Taxonomy is to some degree a matter of scientific opinion. When new evidence is published showing a different taxonomy than was previously believed, not everyone is immediately accepting of it. Some scientists may disagree with the new taxonomic conclusions, often based on their own previous research or perceived shortcomings of the new research. For nomenclature, IJSEM is essentially the arbiter and gatekeeper of bacterial nomenclature. Publication of a name in IJSEM signifies approval of that particular nomenclature. Any name that has been published in IJSEM is considered valid. That means that as taxonomy becomes updated, several different equally valid names may be used according to the scientific opinion of the individual researchers.

A good example of this—and one that we cite in our paper—is what we call in our paper Mycobacterium tuberculosis variant caprae. When it was first isolated from goats, it was given the name Mycobacterium tuberculosis subspecies caprae. Then it was later found to be more closely related to Mycobacterium bovis and was given the name Mycobacterium bovis subspecies caprae. Then it was found to be different enough from Mycobacterium bovis to warrant being its own unique species, and it was given the name Mycobacterium caprae. What our research shows is that, in fact, none of the MTBC (including Mycobacterium caprae) are different enough from Mycobacterium tuberculosis to be considered as different species. So our nomenclatural conclusion is that it should be named Mycobacterium tuberculosis, specifically Mycobacterium tuberculosis variant caprae. Again, we use the term variant rather than the original term subspecies because we found that they aren’t even different enough to be considered subspecies.

If you scientifically agree with our conclusions, you should refer to this organism as Mycobacterium tuberculosis variant caprae. If you disagree with our conclusions, you should refer to this organism as the most appropriate validly published name that aligns with your taxonomic opinion, most likely Mycobacterium caprae. Overall, the ultimate answer to this question is simply that what you and the rest of the scientific community call these organisms is up to you and your individual scientific opinion. We believe that the variant-based scheme proposed in our paper is best as it is based on a comparison of the whole-genome sequences of the all the type strains. However, as long as you use a name that is validly published, it isn’t wrong.

If the M. tuberculosis Complex is actually a single species as your research shows, how is it that these were initially incorrectly identified as different species?

It is not entirely fair to say that they were incorrectly identified. It is important to think about these identifications in the context of scientific and technological history. Mycobacterium tuberculosis dates back to the late 1800s. Mycobacterium bovis was originally identified at the beginning of 1900s. M. africanum was identified as its own species in the 1960s. Technology has clearly improved since those days. While we initially had to rely only on gross morphology as the basis of our taxonomic classification, we then moved to biochemical characterizations, then eventually to the early genetic tests such as 16S. So as technology has advanced, we’ve been able to take closer and more accurate looks at organisms and update our taxonomy accordingly.

Therefore, the previous taxonomic groups weren’t so much incorrectly identified, but they were identified within the limitations of the technology of the time. The most modern technology at this time is whole-genome sequencing, which allows us to use the entirety of an organism’s genome to make taxonomic decisions. And yes, it is possible that one day newer technology or taxonomic criteria may come along that overrides our nomenclatural scheme.

What are the repercussions of the new taxonomic changes for clinicians?

For clinicians or laboratory workers, this change should have no effect. A patient infected with any Mycobacterium tuberculosis variant requires the same treatment and precautions during management. The only thing that has changed is the nomenclature.

Does this mean that there are really no differences between these different taxa that were previously known as species?

Definitely not, and we made a point in our paper to emphasize that our work does not imply that there are no differences. Even in the previous species-based taxonomic scheme, some of the species could be further subdivided into lineages that showed specific characteristics that were stable and differentiable. For example, Mycobacterium africanum showed a very distinct differentiation into two different West African lineages based on some regions of differences and spoligotyping. These different lineages still exist as stable and coherent entities, and we aren’t claiming otherwise. We are simply saying that the differences between these lineages don’t rise to the level of different species, or for that matter even subspecies.

Is this type of analysis specific to Mycobacterium, or can it be applied more broadly to other species?

This type of analysis is not specific to Mycobacterium, and it can absolutely be applied to other species. If you recall during the description of DDH and dDDH, I described how it can be used to tell whether two genomes belong to the same species or not. That can be any species at all. The sort of analysis that we reported in this paper is to compare a particular isolate against a species type strain to determine whether they belong to the same species, independent of what that species is. There is one slight limitation to this type of analysis, and that is that the whole-genome sequence of the type strain must be available to compare to. But as long as that type strain—for example, E. coli—is available, you can take the whole-genome sequence of any isolate and determine whether it falls within the circumscription of the type strain of E. coli. Here at ATCC, we’re beginning to use this type of whole-genome analysis to identify some of our organisms. This allows us to apply the most modern, whole-genome methods to the characterization and authentication of our material.

What are the limitations of dDDH?

The biggest limitation of the dDDH technique is that it is an inherently pairwise analysis. That is, it can only compare a single genome against one other single genome. In our paper, we compared 11 strains (the 9 MTBC species and 2 outgroups) against each other. The GGDC allows the batching of genome submissions by using a single query genome with up to 75 reference genomes that the query genome will be compared against. In order to complete a full bidirectional pairwise analysis, we had to submit Sample 1 for comparison against Samples 1 – 11. Then Sample 2 against Samples 1 – 11. Then Sample 3 against Samples 1 – 11, and so on. This means that in the end, the number of dDDH values in your results is n2, where n is the number of samples in your data set. In our case, that was 121 dDDH values. (We only show half of these because we combined two tables, the dDDH table and the ANI table, into a single one.) While 11 different GGDC submissions of 11 genomes each isn’t that terrible, the exponential nature of such an analysis quickly becomes quite cumbersome as the number of samples increases.

Another limitation is that dDDH cannot be used to delineate beyond the species or subspecies level (e.g., at the genus level). A good description of the technical reasons for this can be found on the Frequently Asked Questions (FAQ) page of the GGDC website.

It is worth noting that these two same limitations exist for the wet-lab DDH technique as well. Each DDH experiment can only compare one genome against another, and no genus-level threshold has been established for DDH either.

What assurances do you have that the various "species" evaluated (i.e., bovis) were not cross-contaminated with H37Rv?

All of the items we sequenced in our research are ATCC items that are available for distribution from our catalog. As with all of our products, these items were produced to the highest quality standards that our customers have come to expect from ATCC material. Our strict quality control process includes testing the purity of our cultures, biochemical testing, and other assays to verify that the sample corresponds to the expected species.

Given the difficulty in differentiating the MTBC variants from each other, it is not unreasonable to question whether the ATCC items we sequenced could be contaminated with a closely related organism. One of the most critical process steps we take for protecting our products is that we only work with one item at a time during the production process. Working with two different cultures at the same time invites contamination, so we make sure to avoid processing, managing, or manipulating multiple items simultaneously. Between our proven process controls and the high QC standards that we ensure our material meets, we think it is highly unlikely that the five items we sequenced are contaminated with H37Rv.

However, keep in mind that we also compared the genomes of other strains besides the five ATCC type strains that we sequenced. In addition to the “type” strains of “M. canettii”, “M. mungi”, and “M. orygis”, we also compared over 3,700 MTBC strains from GenBank. The results we found for all the strains we compared were the same: they all fall within the circumscription of M. tuberculosis. Even if the five ATCC strains we sequenced were contaminated with H37Rv, that wouldn’t explain seeing virtually identical results from all the existing genomes we compared as part of our research.

Was this approach also used to reclassify M. avium vs. M. avium subsp. paratuberculosis?

A numerical taxonomic approach was used in the original research that reclassified M. paratuberculosis into M. avium and created the three subspecies avium, paratuberculosis, and silvaticum. However, it has since been validated using dDDH.1

However, we have analyzed these genomes using dDDH and have validated these results using the same whole-genome techniques that we used in our MTBC paper. We are currently working on a publication that includes these results.

  1. Thorel MF, Krichevsky M, Levy-Frebault VV. Numerical taxonomy of mycobactin-dependent mycobacteria, emended description of Mycobacterium avium, and description of Mycobacterium avium subsp. avium subsp. nov., Mycobacterium avium subsp. paratuberculosis subsp. nov., and Mycobacterium avium subsp. silvaticum subsp. nov. Int J Syst Bacteriol 40: 254-260, 1990.

Are there any statistically significant differences between virulent and avirulent strains (e.g., bovis vs. BCG) or across M. tuberculosis and other MTBC species?

The virulence factors of the Mycobacterium tuberculosis variants is an active area of research, but it is outside the scope of the taxonomic research we conducted. If you’re interested in what makes some of the variants more pathogenic or attenuated than others, there is a good depth of literature on the topic. Here are two review articles that can help you get started:

  1. Orgeur M, Brosch R. Evolution of virulence in the Mycobacterium tuberculosis complex. Curr Opin Microbiol 41: 68-75, 2018.
  2. Forrellad MA, et al. Virulence factors of the Mycobacterium tuberculosis complex. Virulence 4: 3-66, 2013.

What is the definition of a variant?

We recommended in our paper that what were formerly known as species be considered as infrasubspecific subdivisions (i.e., a subdivision lower than subspecies) of M. tuberculosis.

Bacterial nomenclature is governed by the International Code of Nomenclature of Prokaryotes (ICNP). The ICNP officially only governs nomenclature down to the subspecies level. However, it does make some suggestions regarding infrasubspecific subdivisions in its Appendix 10:

“An infrasubspecific taxon is one strain or a set of strains showing the same or similar properties, and treated as a taxonomic group.”1

The infrasubspecific term is the word used to describe the infrasubspecific subdivision, and the term used is generally based upon the criteria for distinction. These can include terms like biovar (biochemical or physiological properties), morphovar (morphological characteristics), pathovar (pathogenic reactions in one or more hosts), serovar (antigenic characteristics), etc.1

As we describe in our paper, the differences between some of the strains differs. Some could be considered biovars, some morphovars, or even genomovars for those that are differentiable only according to some genomic features. As such, there doesn’t seem to be a single typically used term that properly encompasses the differences between them. So we took a slightly different take on it and intentionally did not incorporate the methodology into the infrasubspecific term. We isolated the “-var” suffix from the usual terms and made it its own term as the word “variant”.

By extension from the ICNP definition of an infrasubspecific taxon, we are essentially considering a variant as “one strain or a set of strains showing the same or similar properties, and treated as a taxonomic group.” The M. tuberculosis variants clade into distinct and differentiable lineages according to various properties. We think that “variant” is the most appropriate term.

1. Parker CT, Tindall BJ, Garrity GM. International Code of Nomenclature of Prokaryotes. Int J Syst Evol Microbiol 2015.

Have you used the gold standard approach (DNA-DNA hybridization) to support your data?

We have not. However, as I mentioned in the webinar, there are some significant challenges to performing DDH. The fact that it is so poorly reproducible means that you can run two samples against each other and get one answer, then you can run those same two samples against each other again and get a different answer. The scientific value of an assay like that is questionable. Most labs in industrialized nations with access to DNA sequencers would much rather sequence their strains and compare them via a sequence data–based technique such as dDDH or ANI. For more details describing dDDH and its calibration to DDH, you can check out a number of publications describing the Genome-to-Genome Distance Calculator (GGDC).1-4

  1. Meier-Kolthoff JP, et al. Genome sequence-based species delimitation with confidence intervals and improved distance functions. BMC Bioinformatics 14: 60, 2013
  2. Meier-Kolthoff JP, et al. Complete genome sequence of DSM 30083T, the type strain (U5/41T) of Escherichia coli, and a proposal for delineating subspecies in microbial taxonomy. Stand Genomic Sci 9: 2, 2014.
  3. Auch AF, et al. Digital DNA-DNA hybridization for microbial species delineation by means of genome-to-genome sequence comparison. Stand Genomic Sci 2: 117-134, 2010.
  4. Auch AF, Klenk HP, Goker M. Standard operating procedure for calculating genome-to-genome distances based on high-scoring segment pairs. Stand Genomic Sci 2: 142-148, 2010.