Amplishot

Amplicon-Shotgun

View the Project on GitHub ctSkennerton/Amplishot

Amplishot: Amplicon-Shotgun

Currently microbial community profiling sudies rutienly use 454 pyrosequencing to generate 10,000 - 100,000 reads from particular variable regions of the 16S rRNA gene. Unfortunately 454 pyrosequencing has a number of short falls such as homopolymer errors. Furthermore taxonomic resolution can be lost when using 454 pyrosequencing due to the smaller fragment of the 16S rRNA gene that is being analyzed. Amplishot combines amplification of the full 16S rRNA gene sequence with de novo reconstruction of full-length 16S rRNA genes from specially constructed "Amplishot" Illumina sequencing libraries or from metagenomes.

Dependancies

OTU Clustering Dependancies

Assembly Dependancies

You must have one of the following

Installation

You can either download the latest source code or a particular version from github. Once downloaded change into the Amplishot directory and run the following:

 [sudo] python setup.py install

or if you do not have sudo on your computer use the --prefix option to change the installation directory.

Command-line interface

Amplishot has a single executable called amplishot; you can see some basic help by running the command amplishot -h. The command-line options for Amplishot only offer a subset of the options that are available. Most options are changed by using a configuration file. The Amplishot configuration file is written in YAML, which is a simple markup language; before you try to modify the configuration file it might be helpful to read up on the YAML syntax. Command-line options and a configuration file can be used in tandum. Any options specified on the command-line will overwrite the corresponding value in the configuration file. If changes have been made to a configuration by using command-line options, a new configuration file will be outputted to the global output directory with a datetime signature so that no previous configuration details are lost. A new configuration file will not be outputted if there are no changes to the current configuration set.

Configuration options and their values

Program related blocks

Some of the underlying programs used in Amplishot can be controlled precisely by specifying a block in the configuration file containing options specific to that program. Each of these blocks is specified with a key that is identical to the program name; within each block are program specific key-value pairs.
The program specific key-value pairs must be indented by 4 spaces ( not tabs ), this indentation must be consistent throughout the entire configuration file. Currently program related blocks are available for both the assembly and taxonomy assignment parts of Amplishot

Assembly

Phrap

Specify extra options using the phrap key. Any of the command-line options available in phrap (listed here) can be used as the keys in the phrap block, however you must not add in the dash (-) prefix for the options. For example to modify stringency of the assembly, you could change the scoring matrix:

phrap:
    penalty: -9
    gap_ext: -11
    gap_init: -12
    minscore: 350

Just because you can do this does not mean that you should unless you know exactly what you are doing or are experimenting when Amplishot is producing sub-standard results. The scoring matrix and other assembly parameters used in phrap have already been altered to generate accurate 16S assemblies, so the default settings should work well.

Taxonomy Assignment

Taxonomy assignment is handled in Amplishot after the reconstruction of full-length 16S sequences has occurred. There are a number of different methods for taxonomic assignment that include some of those available in Qiime 1.6.0. The taxonomy assignment method is determined from the Amplishot configuration file with the assign_taxonomy_method key. By default the Bowtie2 taxonomy assigner is used. The valid values for each classifier are shown below:

Configuration File options

For all taxonomy assigners a special block can be given in the configuration file for specific options. The key to this block must be the same as the value of the assign_taxonomy_method key. For example to use the blast taxon assigner the following code could be added into the configuration file:

assign_taxonomy_method: blast
blast:
    evalue: 1e-50
    blast_db: /full/path/to/blast/database

Options specific to all taxon assigners

Bowtie
Blast
Mothur
RDP

Example Configuration file

---
threads: 5
log_level: INFO
minimum_pairtig_length: 350 # minimum length of the overlapped pairs
pair_overlap_length: 30 # mimimum length of the overlap
mapper: bowtie # program used for read mapping 
mapping_similarity_cutoffs: [0.85, 0.90, 0.95, 0.98] # the sequence similarity required between the reference database and the reads
taxon_coverage: [2, 1000] # list of two numbers. The first is the minimum coverage, the second is the number of bases that need to be covered
assembly_method: ray # choose a genome assembler  
minimum_reconstruction_length: 1000 # minimum length of sequences that we define as 'full length'
otu_clustering_method: cdhit
otu_clustering_similarity: 0.97 # the similarity used for clustering full-length sequences from different samples into OTUs
read_mapping_percent: 0.90 # the percent identity that individual reads have to map with to be considered part of the reference
assign_taxonomy_method: blast
minimum_taxon_similarity: 0.90 # sequences that fall below this cutoff will be listed as no taxonomy
blast_db: '/srv/whitlam/bio/db/gg/from_www.secongenome.com/2012_10/gg_12_10_otus/rep_set/99_otus.fasta'

Tips for writing config files

Writing out the full file path names in the configuration file can be a real pain. However you can reduce the burden on yourself by taking advantage of some of the advanced features in the vim text editor. When in INSERT mode if you start typing a file path (like ~/) and then press CTRL-x CTRL-f, you'll get a popup menu of file paths!! You can use this to quickly add in file names to your config file.