Is there a windows version?
No. The main problem is here that some of the properitary software used in LotuS2 is not Windows compatible. Therefore we will not release a Windows version any time soon.
I need to upload my demultiplexed data to a public repository, how do I get the demultiplexed files?
Use the option "-saveDemultiplex 1".
What is the difference between sdm and LotuS2?
sdm is an integral part of LotuS2, responsible for quality filtering, demultiplexing, sequence format changes and seed extension. However, sdm was conceptualized as a stand-alone software. E.g. I personally use sdm to quality filter sequence files before assembling bacterial genomes. To get more information of the sdm interface, execute the sdm binary without arguments ("./sdm") and a help is displayed.
Can I use gzip compressed files?
Yes. For config and mapping files this is not supported but all sequence files can be compressed using gzip. Just make sure the file ending is ".gz" and sdm will assume this is a compressed file. Note that on some systems sdm compilation with zlib library may fail; the autoinstaller attempts to detect this and compile sdm without zlib support.
What part of the sequence is cut?
Everything that is the remainder of technical processes is removed, if possible. E.g. Giving Barcodes in the sequence, will remove all sequence upstream of the Barcodes (including heterogenity spacer, illumina primer). If Fwd and Rev 16S amplification Primers are provided in the mapping file (and they are found in a read), everything upstream of these is removed (including Barcodes, het spacer etc.).
Should I keep unclassified OTUs (-keepUnclassified option)?
In general we do not recommend this, as these sequences could be environmental sequences that are not 16S (e.g. eukaryotic genomes contain regions with distant homology to frequently used 16S primer pairs). If you assume that you might have new phyla in your sample or species very distant from known organisms, you can deactivate this option, but I would still recommend to cross check with e.g. NCBI Blast that an unknown OTU is not a random gene. This option is activated by default, because it was confusing if a large part of reads went silently missing.
My Barcodes are reverse complemented, can I set an option to take care of this?
This should be automatically detected: sdm has an inbuilt algorithm that checks in the first 5000 sequences, if more reverse complemented BCs can be detected and will use this information for the rest of the file. However, BCs have to be consistent in their direction, as the direction information is assummed to be the same within each file.
I do not want to use RDP assigned taxonomies, but use reference databases. Should I use the SILVA or Greengenes 16S ref databases?
Both databases have a large selection of taxa included, though SILVA has a faster release cycle and is currently more up to date, the last GG release was in 2013. Also, Silva includes LSU and Eukaryotic (18S/28S) sequences, so greengenes can only be used for bacterial SSU (16S) sequences.
How to choose a good cutoff length of sequences?
Changing the TruncateSequenceLength and minSeqLength is fine tuning to your data set - just remember to keep these parameters equal. As a general rule of thumb: you want to have as long as possible reads, but every read below that length will be excluded. Further, the accumulated error has to be below e.g. 0.5 (parameter maxAccumulatedError), so longer reads means more errors and here you have to fine a good balance between read errors and sequence length. (All parameters are in the sdm_XXX.txt option files)
How to further optimize my LotuS2 run?
First of all you need to optimize the number of sequences you gain vs the number of errors you allow to pass into OTU building. This is mainly done in the sdm_opt* files, the files I provide on the website are just general purpose suggestions.
Second, choose your clustering algorithm according to your needs. UPARSE is my general recommendation; some users have reported better clusterings in the usearch7x versions. SEED clusters are very sensitive, to a point where read errors could cluster into a new OTU, but if you need pseudo-strain resolution this might work for you. cd-hit clusters are very uniform, that is no dynamic adaptation of identity deprending on cluster size/shape like uparse and swarm do. These are plain "good old clusters".
Third, think about what taxonomic assignments you need and from which database. RDP provides often a very robust assignment at genus level, but greengenes/Silva can allow annotations at species level.
I'm testing over a set of our samples and wondering why, no matter if using GG or SLV database, never a single OTU is classified at species level?
For the records, with QIIME I get at least some OTU assigned at species level.
Depends on the environment you work in. So for gut environment, you should get a good fraction of OTUs assigned to species level; other environments like Arctic samples are often not well represented at species level.
In LotuS2 we avoid the best-hit-assignment (unless specified with option -useBestBlastHitOnly). LotuS2 has a least common ancestor algorithm that looks if there are several hits of similar quality to different species of the same genus/family/class etc. It then goes to the node of hits that capture 95% of hits (with some additional checks if reference sequences even have a species assignment etc). Further, if the identity of the hit is below a certain threshold (set in the lotus.cfg file), it will not assign species, genus, family etc. labels, if not higher than e.g. 95% similarity to the database hit. This is to prevent falsely assigned species names, even if this means retuning a lot of genera without species assignments.
Where are reads exactly removed during the LotuS2 run
1st) during Quality filtering and also dereplication (these are later counted into the OTU matrix by similarity comparison, but not used for OTU construction). 2nd) is during Chimera detection steps. 3rd) (optional) unassigned OTU's and all associated reads can be removed (option -keepUnclassified 0).
How can I cite LotuS2 in my work?
The citation is Hildebrand F, Tadeo RY, Voigt AY, Bork P, Raes J. 2014. LotuS2: an efficient and user-friendly OTU processing pipeline. Microbiome 2: 30. and the paper is available here.