Open Tree of Life reference taxonomy version 2.10
Version 2.10 draft 11 was generated on 10 September 2016.
Major changes since OTT 2.9
- Updated to NCBI Taxonomy downloaded June 29, 2016.
- The basic taxon matching method has been rewritten. Among other improvements it
now considers all synonym/synonym matches, and matches based on ids.
- Names in IRMNG that are marked invalid (nomen nudum, etc.) do not become OTT taxa
(unless grandfathered because used in an OTU match). There are 365811 of these
although the corresponding OTT taxa are not removed if they also come from another source.
- IRMNG-only taxa are annotated 'hidden' to reduce number of problematic species.
There are 351271 of these.
- Build transcript is now in transcript.out.
- The download file synonyms.tsv contains information about where synonyms come from.
- Changes to improve the NCBI/SILVA mapping
- Improvements to preprocessing of NCBI Taxonomy, GBIF, and IRMNG
- All inputs are now archived, and most nondeterminism weeded out, to enhance reproducibility
- See below for taxonomy corrections
- OTT identifiers ('taxa'): 3453839 (74510 fewer, due to IRMNG trim)
- Visible: [TBD]
- Synonyms: 1001466 [contains some duplicates, get better number and show delta]
- Deprecated/hidden ids occurring in phylesystem: 531
- Deprecated/hidden ids occurring in studies in synthesis: 184
- Source taxa dissolved due to conflict (conflicts.tsv): 1167
Contents of download
All files use UTF-8 character encoding. For documentation about file formats, see the documentation in the reference taxonomy
taxonomy.tsv: The file that contains the taxonomy.
synonyms.tsv: The list of synonyms.
forwards.tsv: Forwarding pointers - a list of OTT ids that are
retired and should be replaced by new ones (usually due to
conflicts.tsv: Report on taxa from input taxonomies that are
hidden because they are paraphyletic with respect to a higher
taxon from a higher priority input taxonomy. Number in first column is depth in taxonomic tree of
nearest common ancestor of its children.
deprecated.tsv: List all taxon ids occurring in phylesystem
studies that have been deprecated since previous version. This
includes ids that no longer identify any taxon, those that have been
'lumped' with other ids, and those for taxa that are suppressed in
synthesis but weren't suppressed in the previous version.
version.tsv: The version of OTT.
transcript.out: Console debugging output generated during the taxonomy build process.
log.tsv: Debugging information related to homonym resolution.
weaklog.csv: internal debugging tool
The reference taxonomy is an algorithmic combination of several
source taxonomies. For code,
source code repository.
Version 2.10 draft 11 was generated using
Any errors in OTT
should be assumed to have been introduced by the Open Tree of Life
project until confirmed as originating in the source taxonomy.
Download locations are for the particular versions used to construct
OTT 2.10. For new work, current versions of these sources should be
Curated additions from the Open Tree amendments-1 repository, commit 4b3ba1a. These taxa are added during OTU mapping using the curator application.
DS Hibbett, M Binder, JF Bischoff, M Blackwell, et al.
A higher-level phylogenetic classification of the Fungi.
Mycological Research 111(5):509-547, 2007.
Newick string with revisions
archived at http://figshare.com/articles/Fungal_Classification_2015/1465038.
Download location: https://github.com/OpenTreeOfLife/reference-taxonomy/tree/ott2.10draft11/feed/h2007
Taxonomy from: SILVA 16S ribosomal RNA database, version 115.
See: Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J,
Glöckner FO (2013) The SILVA ribosomal RNA gene database project:
improved data processing and web-based tools.
Nucleic Acids Research 41 (D1): D590-D596.
Web site: http://www.arb-silva.de/.
Download location: ftp://ftp.arb-silva.de/release_115/Exports/tax_ranks_ssu_115.csv.
Download location: derived from database query result files provided by Paul
Kirk, 7 April 2014 (personal communication).
Web site: http://www.indexfungorum.org/.
Download location (converted to OTT format): http://files.opentreeoflife.org/fung/fung-9/fung-9-ot.tgz.
Schäferhoff, B., Fleischmann, A., Fischer, E., Albach, D. C., Borsch,
T., Heubl, G., and Müller, K. F. (2010). Towards resolving Lamiales
relationships: insights from rapidly evolving chloroplast
BMC evolutionary biology 10(1), 352..
Manually transcribed from the paper and converted to OTT format.
Download location: http://purl.org/opentree/ott/ott2.8/inputs/lamiales-20140118.tsv
World Registry of Marine Species (WoRMS) - harvested from web site using web API over several days ending around 1 October 2015.
Download location: http://files.opentreeoflife.org/worms/worms-1/worms-1-ot.tgz
NCBI Taxonomy, from the
US National Center on Biotechnology Information.
Web site: http://www.ncbi.nlm.nih.gov/Taxonomy/.
For OTT 2.10 we used a version downloaded from NCBI on 29 June 2016.
Download location: http://files.opentreeoflife.org/ncbi/ncbi-20151006/ncbi-20151006.tgz.
Current version download location:
GBIF Backbone Taxonomy, from the
Global Biodiversity Information facility.
We used a version dated 2013-07-02.
Download location: http://purl.org/opentree/gbif-backbone-2013-07-02.zip.
Current version download location:
Interim Register of Marine and Nonmarine Genera (IRMNG), from CSIRO.
We used a version dated 2014-01-31. Download location:
Taxon identifiers are carried over from OTT 2.9 when possible
It has been requested that we relay the following statement:
REUSE OF IRMNG CONTENT:
The Open Tree Taxonomy does not reproduce its sources in their
entirety or in their original form of expression, but only uses
limited information expressed in them. See "Scientific names of
organisms: attribution, rights, and licensing" (http://dx.doi.org/10.1186/1756-0500-7-79)
regarding use of taxonomic information and attribution.
Where taxonomies conflict regarding taxon relationships, they are
resolved in favor of the higher priority taxonomy. The priority
ordering is as given above, with the following exceptions:
The non-Fungi content of Index Fungorum is separated from the Fungi
content and given a priority lower than NCBI but higher than GBIF.
The non-Malacostraca content of WoRMS is separated from the
Malacostraca content and given a priority lower than NCBI but higher
Taxonomy corrections (incomplete list)
- Two records have irregular sources lists; the peculiar entries should be removed. The records are Eukaryota (304358) and SAR (5246039).
This is corrected in draft 12.
- There are a few redundant synonym records (45727 to be exact). Programs reading the synonyms.tsv file should ignore rows with a given id and name, other than the first such.
This is corrected in draft 12.
- The Makefile specifies the 20151006 version of NCBI taxonomy, but
actually the 20160629 version was used to build OTT 2.10.