ete3 has support for phyloxml which I use with archaeopteryx tree viewer for a lot of my day-to-day phylogenetics visualisation. My main reason for using phyloxml is one of convenience as I have a script that will easily add in the proper organism name onto the tree and I think that archaeopteryx is a really good basic tree viewer. I wanted to draw a tree from phyloxml in ete using my own style and to have the proper organism name to be rendered. In my phyloxml file I have this coded in as the scientific name for each leaf (see below for phyloxml snippet), so now all I needed to do was make this the node name when rendering the tree.
<clade>
<name>IMG_2526164742</name>
<branch_length>0.19955</branch_length>
<taxonomy>
<scientific_name>Desulfobacterium anilini DSM 4660</scientific_name>
</taxonomy>
</clade>
Easy, right? Wrong. I found that the interface for phyloxml was not the same as for newick formatted trees and unfortunately the documentation for phyloxml in ete3 is a bit lacking as there wasn’t a complete listing of methods for each class. After much messing around, looking at the source code of ete3 and examining python objects using the builtin dir
function I was able to get what I wanted. turns out that for each node/leaf I needed to access the phyloxml_clade
attribute, which has an attribute taxonomy, which implements an iterable interface (I think it’s probably a list), which I could then use to access the scientific name and make the name of the leaf for printing. It’s a little convoluted but easy when you know how.
from ete3 import Phyloxml
project = Phyloxml()
# iterate through the trees in the phyloxml file
for tree in project.get_phylogeny():
# go through the node in the tree
for node in tree:
# assign the node name from the data in the phyloxml file
node.name = node.phyloxml_clade.taxonomy[0].get_scientific_name()
tree.show()