Crisprtools by ctSkennerton

What's it for?

Crisprtools was developed to parse the crispr file format, which is an xml markup for describing Clustered Regularly Interspersed Short Palindromic Repeats. Crisprtools is written in c++ and uses libcrispr for all the heavy lifting.

What do you get?

Crisprtools is implemented as a single unix command-line executable that contains a number of subcommands, each for performing a type of query or manipulation of a crispr file. Currently the subcommands are:

stat
extract
filter
rm
merge
draw
sanitise

Installation

Crisprtools is built using the GNU build system (autoconf) so in most unix systems you should be able to type the following commands:

[./autogen.sh]
./configure
make

Note that you must have libcrispr installed on your system first

Authors and Contributors

Crisprtools was primarily written by Connor Skennerton (@ctSkennerton) with contributions from Mike Imelfort (@minillinim)

Found A bug?

Log an issue on github and I'll get right on it

Usage

 crisprtools <command> [<args>] <file.crispr>

Commands and Options:

 merge [-hso OUTFILE] file1.crispr file2.crispr [1,n]
      take two or more .crispr files and merge them together

      -h       Output help message
      -s       Sanitise the group names in the resulting output file
           so that all groups have consecutive identifiers, and
           that there are no clashes between group numbers
      -o OUTFILE
           Specify an output file for the merged .crispr file
           [default: crisprtools_merged.crispr ]

 sanitise [-ohcsdf] file.crispr
      change names and accession numbers of groups, spacers and
      flankers

      -h       Output help message
      -o OUTFILE
           Output file name, creates a sanitised copy of the input
           file  [default: sanitise input file inplace]
      -s       Sanitise the spacers
      -c       Sanitise the contigs
      -d       Sanitise the direct repeats
      -f       Sanitise the flanking sequences

 extract [-ghyxsdfCoO] file.crispr
      get data out of a .crispr file

      -h       print this handy help message
      -o DIR   Output file directory  [default: .]
      -O STRING
           Give a custom prefix to each of the outputed files
           [default: ]
      -g INT[,n]
           A comma separated list of group IDs that you would like
           to extract data from.  Note that only the group number
           is needed, do not use prefixes like 'Group' or 'G',
           which are sometimes used in file names or in a .crispr
           file
      -s       Extract the spacers of the listed group
      -d       Extract the direct repeats of the listed group
      -f       Extract the flanking sequences of the listed group
      -C       Supress coverage information when printing spacers
      -x       Split the results into different files for each group.
           If multiple types are set i.e. -sd then both the spac-
           ers and direct repeats from each group will be in the
           one file
      -y       Split the results into different files for each type of
           sequence from all selected groups.  Only has an effect
           if multiple types are set.

 stat [-aghpst] [--header] file.crispr
      get some statistics of the CRISPRs described

      -a       print out aggregate summary, can be combined with -t -p
      -h       print this handy help message
      -H       print out column headers in tabular output
      -g INT[,n]
           a comma separated list of group IDs that you would like
           to see stats for.
      -p       pretty print
      -s       separator string for tabular output [default: '']
      -t       tabular output

 rm [-ho] -g <groups> file.crispr
      remove a group

      -h       Output help message
      -g INT[,n]
           A comma separated list of group IDs that you would like
           to remove
      -o OUTFILE
           Output file name. Default behaviour changes file
           inplace

 draw [-ghyoaf] file.crispr
      render a graphviz image of some or all of the CRISPRs described
      in the file

      -h       print this handy help message
      -o DIR   output file directory  [default: .]
      -g INT[,n]
           A comma separated list of group IDs that you would like
           to extract data from.  Note that only the group number
           is needed, do not use prefixes like 'Group' or 'G',
           which are sometimes used in file names or in a .crispr
           file
      -a STRING
           The Graphviz layout algorithm to use [default: dot ]
      -f STRING
           The output format for the image, equivelent to the -T
           parameter of Graphviz executables [default: eps]
      -c COLOUR
           The colour scale to use for coverage information.  The
           available choices are:
               red-blue
               blue-red
               red-blue-green
               green-blue-red

 filter [-ohsdf] file.crisprr
      remove groups based on criteria

      -h       Print this handy help message
      -o FILE  Output file name, creates a filtered copy of the input
           file  [default: modify input file inplace]
      -s INT   Filter based on the number of spacers the spacers
      -d INT   Filter based on the direct repeats
      -f INT   Filter based on the flanking sequences