small utilities for manipulating .crispr files
This project is maintained by ctSkennerton
Crisprtools was developed to parse the crispr file format, which is an xml markup for describing Clustered Regularly Interspersed Short Palindromic Repeats. Crisprtools is written in c++ and uses libcrispr for all the heavy lifting.
Crisprtools is implemented as a single unix command-line executable that contains a number of subcommands, each for performing a type of query or manipulation of a crispr file. Currently the subcommands are:
Crisprtools is built using the GNU build system (autoconf) so in most unix systems you should be able to type the following commands:
[./autogen.sh]
./configure
make
Note that you must have libcrispr
installed on your system first
Crisprtools was primarily written by Connor Skennerton (@ctSkennerton) with contributions from Mike Imelfort (@minillinim)
Log an issue on github and I'll get right on it
crisprtools <command> [<args>] <file.crispr>
merge [-hso OUTFILE] file1.crispr file2.crispr [1,n]
take two or more .crispr files and merge them together
-h Output help message
-s Sanitise the group names in the resulting output file
so that all groups have consecutive identifiers, and
that there are no clashes between group numbers
-o OUTFILE
Specify an output file for the merged .crispr file
[default: crisprtools_merged.crispr ]
sanitise [-ohcsdf] file.crispr
change names and accession numbers of groups, spacers and
flankers
-h Output help message
-o OUTFILE
Output file name, creates a sanitised copy of the input
file [default: sanitise input file inplace]
-s Sanitise the spacers
-c Sanitise the contigs
-d Sanitise the direct repeats
-f Sanitise the flanking sequences
extract [-ghyxsdfCoO] file.crispr
get data out of a .crispr file
-h print this handy help message
-o DIR Output file directory [default: .]
-O STRING
Give a custom prefix to each of the outputed files
[default: ]
-g INT[,n]
A comma separated list of group IDs that you would like
to extract data from. Note that only the group number
is needed, do not use prefixes like 'Group' or 'G',
which are sometimes used in file names or in a .crispr
file
-s Extract the spacers of the listed group
-d Extract the direct repeats of the listed group
-f Extract the flanking sequences of the listed group
-C Supress coverage information when printing spacers
-x Split the results into different files for each group.
If multiple types are set i.e. -sd then both the spac-
ers and direct repeats from each group will be in the
one file
-y Split the results into different files for each type of
sequence from all selected groups. Only has an effect
if multiple types are set.
stat [-aghpst] [--header] file.crispr
get some statistics of the CRISPRs described
-a print out aggregate summary, can be combined with -t -p
-h print this handy help message
-H print out column headers in tabular output
-g INT[,n]
a comma separated list of group IDs that you would like
to see stats for.
-p pretty print
-s separator string for tabular output [default: '']
-t tabular output
rm [-ho] -g <groups> file.crispr
remove a group
-h Output help message
-g INT[,n]
A comma separated list of group IDs that you would like
to remove
-o OUTFILE
Output file name. Default behaviour changes file
inplace
draw [-ghyoaf] file.crispr
render a graphviz image of some or all of the CRISPRs described
in the file
-h print this handy help message
-o DIR output file directory [default: .]
-g INT[,n]
A comma separated list of group IDs that you would like
to extract data from. Note that only the group number
is needed, do not use prefixes like 'Group' or 'G',
which are sometimes used in file names or in a .crispr
file
-a STRING
The Graphviz layout algorithm to use [default: dot ]
-f STRING
The output format for the image, equivelent to the -T
parameter of Graphviz executables [default: eps]
-c COLOUR
The colour scale to use for coverage information. The
available choices are:
red-blue
blue-red
red-blue-green
green-blue-red
filter [-ohsdf] file.crisprr
remove groups based on criteria
-h Print this handy help message
-o FILE Output file name, creates a filtered copy of the input
file [default: modify input file inplace]
-s INT Filter based on the number of spacers the spacers
-d INT Filter based on the direct repeats
-f INT Filter based on the flanking sequences