bio(logist | informatician)

Yet another blog about bioinformatics

about me

Hi, my name is Connor Skennerton and I’m a Postdoctoral researcher at Caltech in the field of metagenomics.  I received my PhD at UQ in the Australian Centre for Ecogenomics.  I started out as a wet-lab guy, however over the course of my PhD I was sucked into the world of bioinformatics and I’m now pretty firmly a computer guy. I love looking at genomes and developing hypotheses on ecology and physiology based on genomic data. Luckily I’m still allowed back into the lab sometimes to help me keep sane in my new world of perl, python and C++

recent public projects

Status updating…

found on

Genome Bin Decontamination

- - posted in bioinformatics

Genome bins comming off automated pipelines can be contaminated with parts of other genomes. As part of my workflow I use CheckM (I’m biased since I’m a coauthor) to assess the contamination of genome bins using single-copy marker genes. If you’re lucky then the genome bins that you’re interested in will be relatively complete without much contamination. Unfortunately that isn’t always the case. In this blog post I’m going to run through some of the analyses that I did on a genome bin that was 90% complete but 70% contaminated. This is exploratory analysis to see if I can manually improve the bin over what the automated tools can do.

Testing Out Seqan’s Multipattern Search Implementations

- - posted in benchmarks, seqan

I recently discovered Seqan, a header-only C++ library for bioinformatics. I’ve been playing around with the toolkit to make some small programs just to see whether I want to use it in a larger project. So far I’ve written prepmate, an adaptor trimming program for Illumina’s Nextera mate-pair libraries; and fxtract, a grep-like program for extracting fasta/fastq records from large files. One of the algorithms that I use in fxtract and in another program I’ve written, crass, is to search for multiple patterns simultaneously (in this case a number of different DNA motifs). Seqan implements a number of algorithms for multipattern matching (checkout their tutorial page), however they don’t give many clues as to why you would choose one algorithm over another. So I decided to take a few of these implementations out for a spin using fxtract.