Dynamic types in Python.

I’m constantly learning new things about the Python language. I consider myself a pretty good python programmer but often you never need to use all of the language features when writing your own code. For example I’ve not used is the class factory pattern using the type built in function. I’ve been aware of class factories, and read a few blog posts but never grokked it, until now. A class factory is a function or another class that can create classes at runtime rather than you writing out the class definition in code.

Open Source Tax Software

Filing taxes in America sucks. Your options are to do it by hand, pay someone like Intuit, or if you are below a certain income threshold get some tax software for free. The kicker is that free tax software is from Intuit who will try very hard to make sure that you either don’t find it in the first place, or try to get you to pay for it and upsell you on something that should be free.

Using Strings in Pharo Smalltalk

Smalltalk syntax can be a little confusing coming from other languages. Here I’ll show some comparisions between Python string operations and Smalltalk. Substrings / Slicing Python strings use the slice notation where you can place up to three colon-separated values for the start, stop, and step. Python strings are 0-indexed and the stop argument is one past the final element that you want. s = 'abcdefg' s[1:] # bcdefg s[:2] # ab s[1:6] # bcdef s[1:-1] # bcdef The slice notation in Python is compact and versatile to be used when getting the beginning, middle, or end of a string.

Using Dictionaries in Pharo Smalltalk

Starting out with Smalltalk can be a little jarring as it doesn’t have the similar syntax as launguages that are more heavily inspired by C. Dictionaries are one kind of data structure where I noticed this the most so I put together my notes on using them in Pharo with some comparisons to Python. In many other languages there is a subscript operator that allows you to access a value in a dictionary (and also a position in an array).

Combining properites from different nodes in Gremlin

One of the key differences between SQL databases and graph databases is the concept of joining information from different nodes. In a tinkerpop-enabled graph database nodes have labels that define their type and properties that are part of that type. It’s natural to draw the comparison to a label being an SQL table and the properties of the nodes being the columns of that table. But trying to extend that analogy to compare joining in SQL is a little murky.

Exploratory data analysis with Pharo Smalltalk

The first time I heard about Smalltalk was reading through the wikipedia page for Ruby, which mentioned it as an influence. At the time I was just a few months into my transition from a wet-lab biologist into a bioinformatician and trying to decide between Perl, Python, and Ruby as a scripting language to learn. Python became my language of choice after a long battle with Perl (this was some years ago and Perl was much more relevant).

A Make for URIs

Make has been a one of the key tools in my arsenal for gettings things done. Although it was developed for compiling code, its functionality can be generalized to any process that requires files to be generated based on dependancies. I recommend you look at these slides by Vince Buffalo as a good introduction to using make for scientific workflows. Make works by creating a dependancy graph of files and their prerequisites using the last time the file was modified as a way to determine if a file needs to be remade.

Drawing KEGG pathway maps using biopython and matplotlib

I use KEGG a lot to understand microbial metabolism. KEGG is one of the largest resources of enzymes, biochemical reactions, genes, and molecules, all cross-linked and organized into what’s called metabolic maps. These maps are well-constructed images of enzymes that functions together for the same overall purpose like amino acid synthesis, or the metabolism of glucose. One of the great things about the website is the ability to color on your data to their metabolic maps.

Checking back in on CRISPRs

As part of my PhD thesis I studied an emerging field of bacterial adaptive immunity, known as CRISPR. At the time I was interested in tracking this type of immune system in bacterial communities to track co-evolution between bacteria and their viruses. For anyone interested and brave enough, here is a link to my thesis. After I finished up writing, submitting, and ultimately obtaining my PhD I realised that large portions of the literature review section would never be published in a scientific journal.

Using Amazon Neptune full text search

I’ve been trying out Amazon Neptune’s full text search feature. Overall it’s been a great experience although there are a few caveats when searching that means that you’ll have to craft your queries carefully to make full use of the feature. The tinkerpop standard has some text searching features however it lacks any advanced features such as searching using regular expressions or even case-insensitive searching. It’s left to different implementations to augment this text searching capability.