Tutorials
1. LotuS2 Tutorial (using 454 reads)
Note: before starting this tutorial, make sure the lotus.cfg configuration file was set up manually or with the help of the autoinstaller script. Furthermore you will need a mapping file and two 454 runs for this tutorial, they can be downloaded
here (114 Mb).
OTU Creation Tutorial
In this tutorial we will create a genus abundance table from two 454 sequencer runs using the LotuS pipeline. A genus abundance table contains counts of different genera in a several of samples – Rows are the different genera and columns the samples. As a simple example, take a look at this table:
Bacteria;?;?;? | 24 | 52 | 39 | 63 | 181 |
Bacteroides | 169 | 27 | 7 | 42 | 6 |
Porphyromonadaceae;? | 370 | 346 | 621 | 565 | 224 |
This example table tells us how often we observe unclassified Bacteria (phylum), Bacteroides (genus) and unclassified Porphyromonadaceae (family) in the 5 samples bl10, bl11, bl12, bl128 and bl13. Note that this is what the genus abundance table will look like, the family and phylum taxa are not strictly from this taxonomic level; rather this represents all OTU's that could not be classified to genus level but only until the phylum and family level, respectively.
A matrix like this will be used for the next tutorial on numerical ecology and created from raw sequence data within this tutorial. In a recent experiment, we sequenced 73 samples in two 454 runs, the raw fasta and quality files are in the archieve also linked at the beginning of the section, please download them now if not done before.
Unpack the files and go to this directory and become aware of the files required to use LotuS.
The sequence files were multiplexed before the experiment, that is a small nucleotide sequence – the barcode - was attached to each read, specific for each experiment. A mapping file is typically used, containing the link between a sequence barcode and the name of the experiment and is essential to demultiplex the fasta files. This mapping file is map.txt in the archieve. Take a look at the mapping file and try to find out what each column means (some columns contain experimental data and are not required to process the data files): the following mapping file /Users/Tomas/data/map.txt is:
#SampleID BarcodeSequence LinkerPrimerSequence fnaFile qualFile Description
bl9 ACGAGTGCGT CCGTCAATTCMTTTRAGT run.1.fna run.1.qual FVB
bl10 ACGCTCGACA CCGTCAATTCMTTTRAGT run.1.fna run.1.qual FVB
bl11 AGACGCACTC CCGTCAATTCMTTTRAGT run.1.fna run.1.qual FVB
bl12 AGCACTGTAG CCGTCAATTCMTTTRAGT run.1.fna run.1.qual FVB
...
bl34 ACGAGTGCGT CCGTCAATTCMTTTRAGT run.2.fna run.2.qual FVB
bl35 ACGCTCGACA CCGTCAATTCMTTTRAGT run.2.fna run.2.qual FVB
bl36 AGACGCACTC CCGTCAATTCMTTTRAGT run.2.fna run.2.qual FVB
Since we want to make sure the quality filtering of the input file is strict, the pipeline LotuS offers a several quality filtering options that can be complementary to each other. Open the sdm_454.txt file from your LotuS install dir. Reading the comments (line starting with “#”) to each option can help you understand what each option is good for (as a guideline, an OTU is a clustering of similar sequences with the aim to have one cluster of sequences for each species that was originally present in the samples. Take into account that sequencers make errors and that PCR amplification of the 16S rDNA also introduces errors).
After the quality filtering parameters have been saved in e.g. sdm_options1.txt, LotuS is ready to process your files using this command with explicit or relative file paths:
In our experience, processing these two 454 runs with LotuS should take less than 5 min (which is a lot faster than comparable pipelines). Once LotuS is finished, go to the specified output folder and copy the genus.txt from [output directory]/higherLvl to a separate directory for the next tutorial, where this matrix will be analyzed with the help of R.
The finished LotuS run can also be
downloaded here, these files are required for the following R tutorial.
Hint: LotuS can also use gzipped fasta/qual/fastq files, to save harddisk space for large experiments. This is automatically detected if the sequence filename ends with ".gz".
Data sources
All the material provided in this tutorial are from metagenomic study on mice knockouts. Further analysis of the data can be found in:
Hildebrand, F., Nguyen, A. T. L., Brinkman, B., Yunta, R. G., Cauwe, B., Vandenabeele, P., … Raes, J. (2013). Inflammation-associated enterotypes, host genotype, cage and inter-individual effects drive gut microbiota variation in common laboratory mice. Genome Biology, 14(1), R4. doi:10.1186/gb-2013-14-1-r4