I use KEGG a lot to understand microbial metabolism. KEGG is one of the largest resources of enzymes, biochemical reactions, genes, and molecules, all cross-linked and organized into what’s called metabolic maps. These maps are well-constructed images of enzymes that functions together for the same overall purpose like amino acid synthesis, or the metabolism of glucose. One of the great things about the website is the ability to color on your data to their metabolic maps. Doing this in bulk though can be very tedious as you need to view and download individual maps. Luckily, there is a great tutorial on how to dynamically color KEGG maps using biopython that I’ve used in the past to create PDF images.
While this works well, the current biopython graphics module is based on reportlab, which is a more niche graphics system than matplotlib. This means that the images made in biopython can’t easily be combined with other plots generated with matplotlib and the biopython implementation of drawing KEGG maps is restricted to producing PDF documents.
I was recently working on a project where I needed to combine the
KEGG maps with other types of plots, generated via matplotlib and
didn’t want to do it manually in a separate program. Instead I
looked into converting the KEGG_vis
module of biopython from
reportlab to matplotlib. Thankfully, the code is quite short and
self-contained, thus making the conversion process easy.
Visualizing KEGG data in biopython is driven by KGML, an XML markup
which describes the placement of objects in a pathway map. Biopython
parses this file and draws on the graphics elements based on
this information. Converting the drawing code was pretty
straightforward as many of the concepts between reportlab and
matplotlib are the same. The bulk of the drawing code happens in
the __add_graphics
method, which is responsible for adding in the
lines, circles, and rectangles described in KGML. There is almost
a 1:1 mapping between the reportlab constructs and the equivalent
matplotlib Patch API. For example drawing a line path in the
original reportlab version looked like
p = self.drawing.beginPath()
x, y = graphics.coords[0]
p.moveTo(x, y)
for (x, y) in graphics.coords:
p.lineTo(x, y)
self.drawing.drawPath(p)
self.drawing.setLineWidth(1) # Return to default
Which starts a line at an xy-coordinate and then iterates through all of the remaining point in the path
using the p.lineTo
method. The translation to matplotlib results in very similar code
x, y = graphics.coords[0]
verts = [
(x, y), # left, bottom
]
codes = [
Path.MOVETO,
]
for (x, y) in graphics.coords:
codes.append(Path.LINETO)
verts.append((x,y))
path = Path(verts, codes)
patch = patches.PathPatch(path)
self.ax.add_patch(patch)
The only major difference was that reportlab seems to use a global
state to keep track of things like the line color and weight;
throughout the original code there are calls to set the font, color,
and line and then return to the original state after certain calls
to draw a graphics object have been made. Alternatively, in matplotlib,
these modifications are passed in directly when creating a new patch
object. Internally I made this change to the API by adding a
**kwargs
argument to the __add_graphics
method.
# The following code snippet demonstrates the change from reportlab
# which used calls like setStrokeColor to globally change the state
# and the new interface which passes these arguments into __add_graphics
# as kwargs, which will be applied to the matplotlib patches.
for ortholog in self.pathway.orthologs:
for g in ortholog.graphics:
#self.drawing.setStrokeColor(color_to_reportlab(g.fgcolor))
#self.drawing.setFillColor(color_to_reportlab(g.bgcolor))
self.__add_graphics(g, fc=g.bgcolor, ec=g.fgcolor)
This new version of the KEGG_vis
library lives on my fork of biopython for now.