Tutorials

1. LotuS2 Tutorial (using 454 reads)

Note: before starting this tutorial, make sure the lotus.cfg configuration file was set up manually or with the help of the autoinstaller script. Furthermore you will need a mapping file and two 454 runs for this tutorial, they can be downloaded here (114 Mb).

OTU Creation Tutorial

In this tutorial we will create a genus abundance table from two 454 sequencer runs using the LotuS pipeline. A genus abundance table contains counts of different genera in a several of samples – Rows are the different genera and columns the samples. As a simple example, take a look at this table:

Genus	bl10	bl11	bl12	bl128	bl13
Bacteria;?;?;?	24	52	39	63	181
Bacteroides	169	27	7	42	6
Porphyromonadaceae;?	370	346	621	565	224

This example table tells us how often we observe unclassified Bacteria (phylum), Bacteroides (genus) and unclassified Porphyromonadaceae (family) in the 5 samples bl10, bl11, bl12, bl128 and bl13. Note that this is what the genus abundance table will look like, the family and phylum taxa are not strictly from this taxonomic level; rather this represents all OTU's that could not be classified to genus level but only until the phylum and family level, respectively. A matrix like this will be used for the next tutorial on numerical ecology and created from raw sequence data within this tutorial. In a recent experiment, we sequenced 73 samples in two 454 runs, the raw fasta and quality files are in the archieve also linked at the beginning of the section, please download them now if not done before. Unpack the files and go to this directory and become aware of the files required to use LotuS. The sequence files were multiplexed before the experiment, that is a small nucleotide sequence – the barcode - was attached to each read, specific for each experiment. A mapping file is typically used, containing the link between a sequence barcode and the name of the experiment and is essential to demultiplex the fasta files. This mapping file is map.txt in the archieve. Take a look at the mapping file and try to find out what each column means (some columns contain experimental data and are not required to process the data files): the following mapping file /Users/Tomas/data/map.txt is:

#SampleID	BarcodeSequence	LinkerPrimerSequence	fnaFile	qualFile	Description
bl9	ACGAGTGCGT	CCGTCAATTCMTTTRAGT	run.1.fna	run.1.qual	FVB
bl10	ACGCTCGACA	CCGTCAATTCMTTTRAGT	run.1.fna	run.1.qual	FVB
bl11	AGACGCACTC	CCGTCAATTCMTTTRAGT	run.1.fna	run.1.qual	FVB
bl12	AGCACTGTAG	CCGTCAATTCMTTTRAGT	run.1.fna	run.1.qual	FVB
...
bl34	ACGAGTGCGT	CCGTCAATTCMTTTRAGT	run.2.fna	run.2.qual	FVB
bl35	ACGCTCGACA	CCGTCAATTCMTTTRAGT	run.2.fna	run.2.qual	FVB
bl36	AGACGCACTC	CCGTCAATTCMTTTRAGT	run.2.fna	run.2.qual	FVB

Since we want to make sure the quality filtering of the input file is strict, the pipeline LotuS offers a several quality filtering options that can be complementary to each other. Open the sdm_454.txt file from your LotuS install dir. Reading the comments (line starting with “#”) to each option can help you understand what each option is good for (as a guideline, an OTU is a clustering of similar sequences with the aim to have one cluster of sequences for each species that was originally present in the samples. Take into account that sequencers make errors and that PCR amplification of the 16S rDNA also introduces errors). After the quality filtering parameters have been saved in e.g. sdm_options1.txt, LotuS is ready to process your files using this command with explicit or relative file paths:

./[path_lotus_dir]/lotus2 -i [directory_with_unpacked_archive] -m [directory_with_unpacked_archive]/map.txt -s [path_to_your]/sdmopt_1.txt -o [choose a directory]

In our experience, processing these two 454 runs with LotuS should take less than 5 min (which is a lot faster than comparable pipelines). Once LotuS is finished, go to the specified output folder and copy the genus.txt from [output directory]/higherLvl to a separate directory for the next tutorial, where this matrix will be analyzed with the help of R.

The finished LotuS run can also be downloaded here, these files are required for the following R tutorial.

Hint: LotuS can also use gzipped fasta/qual/fastq files, to save harddisk space for large experiments. This is automatically detected if the sequence filename ends with ".gz".

Data sources

All the material provided in this tutorial are from metagenomic study on mice knockouts. Further analysis of the data can be found in:
Hildebrand, F., Nguyen, A. T. L., Brinkman, B., Yunta, R. G., Cauwe, B., Vandenabeele, P., … Raes, J. (2013). Inflammation-associated enterotypes, host genotype, cage and inter-individual effects drive gut microbiota variation in common laboratory mice. Genome Biology, 14(1), R4. doi:10.1186/gb-2013-14-1-r4