Release notesA fuller release history (including early draft versions) is available in the OTT documentation:
- Removed Viruses
- Small improvements to treatment of containers (X incertae sedis, samples)
Note that SILVA precedes Hibbett in the load order
Open Tree of Life reference taxonomy version 3.0
Version 3.0 draft 6 was generated on 26 February 2017.
Major changes since OTT 2.10
- Updated to NCBI Taxonomy dated 9 November 2016.
- Updated to GBIF backbone dated 29 July 2016. Unfortunately, as a consequence, about 200 species, mostly birds, have been renamed with no known link to the previous name. (Example: Collocalia ocista, formerly known as Aerodramus ocistus) This means occurrences of these species in previously curated studies will not be recognized in synthesis runs. It is hoped that this is a temporary problem.
- Cnidaria now gets its taxonomy from WoRMS instead of NCBI.
- Removed Viruses
- A number of taxa whose OTT id unnecessarily changed in the past have had their OTT ids restored, with the newer id as an id alias. This change should not have any ill effects for clients that are aware of id aliases (forwards.tsv).
- OTT identifiers ('taxa'): 3594550
- Visible: [TBD]
- Synonyms: 1842403 (big increase)
- Deprecated/hidden ids occurring in phylesystem: total 3686, but only 586 if lumped ids are excluded
- Deprecated/hidden ids occurring in studies in synthesis: total 1097, but only 118 if lumped ids are excluded
- Source taxa dissolved due to conflict (conflicts.tsv): 1171
Contents of download
All files use UTF-8 character encoding. For documentation about file formats, see the documentation in the reference taxonomy wiki, on github.
taxonomy.tsv: The file that contains the taxonomy.
synonyms.tsv: The list of synonyms.
forwards.tsv: Aliases ('forwarding pointers') - a list of OTT ids that are retired and should be replaced by new ones (usually due to 'lumping')
conflicts.tsv: Report on taxa from input taxonomies that are hidden because they are paraphyletic with respect to a higher taxon from a higher priority input taxonomy. Since this file is mainly for debugging purposes I change it from time to time without notice. The format used in 3.0 is very different from that used in 2.10.
deprecated.tsv: List of all taxon ids occurring in phylesystem studies that have been deprecated or suppressed since previous version. This includes ids that no longer identify any taxon, those that have been 'lumped' with other ids, and those for taxa that are suppressed in synthesis but weren't suppressed in the previous version.
version.tsv: The version of OTT.
transcript.out: Console debugging output generated during the taxonomy build process.
log.tsv: Debugging information related to homonym resolution.
otu_differences.tsv: list of differences with OTT 2.10, restricted to ids used in phylesystem
There are some new .json files in the dump containing metrics, created for the purpose of the taxonomy method writeup. Not documented as of this writing.
The reference taxonomy is an algorithmic combination of several source taxonomies. For code, see the source code repository. Version 3.0 draft 6 was generated using commit fc5cb5c.
Any errors in OTT should be assumed to have been introduced by the Open Tree of Life project until confirmed as originating in the source taxonomy.
Download locations are for the particular versions used to construct OTT 3.0. For new work, current versions of these sources should be retrieved.
Curated additions from the Open Tree amendments-1 repository, commit bcafdea. These taxa are added during OTU mapping using the curator application.
Taxonomy from: DS Hibbett, M Binder, JF Bischoff, M Blackwell, et al. A higher-level phylogenetic classification of the Fungi. Mycological Research 111(5):509-547, 2007. Newick string with revisions archived at http://figshare.com/articles/Fungal_Classification_2015/1465038.
Download location: https://github.com/OpenTreeOfLife/reference-taxonomy/tree/ott2.10draft11/feed/h2007
Taxonomy from: SILVA 16S ribosomal RNA database, version 115. See: Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glöckner FO (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Research 41 (D1): D590-D596. Web site: http://www.arb-silva.de/.
Download location: ftp://ftp.arb-silva.de/release_115/Exports/tax_ranks_ssu_115.csv.
Index Fungorum. Download location: derived from database query result files provided by Paul Kirk, 7 April 2014 (personal communication). Web site: http://www.indexfungorum.org/.
Download location (converted to OTT format): http://files.opentreeoflife.org/fung/fung-9/fung-9-ot.tgz.
Taxonomy from: Schäferhoff, B., Fleischmann, A., Fischer, E., Albach, D. C., Borsch, T., Heubl, G., and Müller, K. F. (2010). Towards resolving Lamiales relationships: insights from rapidly evolving chloroplast sequences. BMC evolutionary biology 10(1), 352.. Manually transcribed from the paper and converted to OTT format.
Download location: http://purl.org/opentree/ott/ott2.8/inputs/lamiales-20140118.tsv
World Register of Marine Species (WoRMS) - harvested from web site using web API over several days ending around 1 October 2015. Download location: http://files.opentreeoflife.org/worms/worms-1/worms-1-ot.tgz
NCBI Taxonomy, from the US National Center on Biotechnology Information. Web site: http://www.ncbi.nlm.nih.gov/Taxonomy/.
We used a version dated 9 November 2016. Archived location: http://files.opentreeoflife.org/ncbi/ncbi-20151006/ncbi-20151006.tgz.
Current version download location: ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz
GBIF Backbone Taxonomy, from the Global Biodiversity Information facility.
We used a version dated 2016-07-29. Download location: http://files.opentreeoflife.org/gbif/gbif-20160729/gbif-201609729.zip.
Current version download location (unverified): http://www.gbif.org/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c.
Interim Register of Marine and Nonmarine Genera (IRMNG).
We used a version dated 2014-01-31. Download location: http://purl.org/opentree/ott/ott2.8/inputs/IRMNG_DWC-2014-01-30.zip.
Taxon identifiers are carried over from OTT 2.10 when possible
It has been requested that we relay the following statement:
The Open Tree Taxonomy does not reproduce its sources in their entirety or in their original form of expression, but only uses limited information expressed in them. See "Scientific names of organisms: attribution, rights, and licensing" (http://dx.doi.org/10.1186/1756-0500-7-79) regarding use of taxonomic information and attribution.
Where taxonomies conflict regarding taxon relationships, they are resolved in favor of the higher priority taxonomy. The priority ordering is as given above, with the following exceptions:
The non-Fungi content of Index Fungorum is separated from the Fungi content and given a priority lower than NCBI and GBIF.
The non-Malacostraca, non-Ctenophora content of WoRMS is separated from the Malacostraca and Ctenophora content and given a priority lower than NCBI but higher than GBIF.