To use LotuS2 you need to either install it manually using github, or install it via conda or docker. If you opt for conda or docker you can skip to Setup.
For installing LotuS2 manually, you have to download the latest release from github.
LotuS2 comes with an autoinstaller script to automatically set up LotuS2 and download the required databases. This step is only necessary when installing LotuS2 from github. Change directory into the previously extracted lotus folder and run the following command to start the autoinstall process. For the autoinstall no sudo permissions are required. The autoinstall script will automatically guide you through the installation.
If usearch is not already installed and in your path, autoinstall will ask you to specify an absolute path to the usearch binary. You can skip this step and setup usearch later with one of the following commands:
LotuS2 can be installed via conda. For that you have to first add the channels bioconda and conda-forge. As LotuS2 has many dependencies, it is advised to install the program in a separate conda environment (as stated in the following).
The docker setup is coming soon!.
LotuS2 is frequently updated to improve user experience and to include new functionalities. To update LotuS2 after a manual installation, follow the steps from the download section to download and extract the latest release from github (Downloads). Then call the following command to copy the new files into the old version to update LotuS2. That way, the downloaded databases will be still available and you do not have to run autoinstall again.
You can also perform a fresh install, but then you have to rerun the autoinstall script again.
With conda, you only need to run:
Please be aware that if you relocate the LotuS2 folder LotuS2 will fail to find the utilized third-party software. This is because LotuS2 stores absolute paths to the programs in the lOTUs.cfg file. If you want to move your folder you can change the paths in the config file with the following command:
If you would like to use dada2, LotuS2 will attempt to install this via autoInstall.pl. However, in our experience this is instable and we recommend to install dada2 manually in your global R distribution first, before starting the LotuS2 install. LotuS2 will require Rscript accessible in your bash.
In case that fastqFile or fnaFile and qualFile were specified in the mapping file, this has to be the directory with input files
Warning: The output directory will be completely removed at the beginning of the LotuS2 run. Please ensure this is a new directory or contains only disposable data!
input qual file (not defined in case of fastq or input directory)
Filepath to fastq formated file with barcodes (this is a processed mi/hiSeq format). The complementary option in a mapping file would be the column "MIDfqFile"
SDM option file, defaults to "configs/sdm_miSeq.txt" in current dir
LotuS.cfg, config file with program paths
sequencing platform: PacBio, 454, miSeq or hiSeq
number of threads to be used
temporary directory used to save intermediate results
give the forward primer used to amplify DNA region (e.g. 16S primer fwd)
give the reverse primer used to amplify DNA region (e.g. 16S primer rev)
Level of verbosity from printing all program calls and program output (3) to not even printing errors (0). Default: 1
(1) Saves all demultiplexed reads (unfiltered) in the [outputdir]/demultiplexed folder, for easier data upload. (2) Only saves quality filtered demultiplexed reads and continues LotuS2 run subsequently. (3) Saves demultiplexed file into a single fq, saving sample ID in fastq/a header. (0) No demultiplexed reads are saved. (Default: 0)
Skip most of the lotus pipeline and only run a taxonomic classification on a fasta file. E.g. lotus2 -taxOnly <some16S.fna> -refDB SLV
(1) Only redo the taxonomic assignments (useful for replacing a DB used on a finished lotus run). (0) Normal lotus run. (Default: 0)
Remove likely contaminant OTUs/ASVs based on alignment to provided fasta. This option is useful for low-bacterial biomass samples, to remove possible host genome contaminations (e.g. human/mouse genome)
(0)?!?: keep offtarget hits against offtargetDB in output fasta and otu matrix, default 0
(1) save extra tmp files like chimeric OTUs or the raw blast output in extra dir. (0) do not save these. (Default: 0)
(1) Includes unclassified OTUs (i.e. no match in RDP/Blast database) in OTU and taxa abundance matrix calculations. (0) does not take these OTUs into account. (Default: 1)
(1) Continue reading fastq files, even if single entries are incomplete (e.g. half of qual values missing). (0) Abort lotus run, if fastq file is corrupt. (Default: 0)
(SLV) Silva LSU (23/28S) or SSU (16/18S), (GG greengenes (only SSU available), (HITdb) (SSU, human gut specific), (PR2) LSU spezialized on Ocean environmentas, (UNITE) ITS fungi specific, (beetax) bee gut specific database and tax names. \nDecide which reference DB will be used for a similarity based taxonomy annotation. Databases can be combined, with the first having the highest prioirty. E.g. "PR2,SLV" would first use PR2 to assign OTUs and all unaasigned OTUs would be searched for with SILVA, given that \"-amplicon_type LSU\" was set. Can also be a custom fasta formatted database: in this case provide the path to the fasta file as well as the path to the taxonomy for the sequences using -tax4refDB. See also online help on how to create a custom DB. (Default: GG)
In conjunction with a custom fasta file provided to argument -refDB, this file contains for each fasta entry in the reference DB a taxonomic annotation string, with the same number of taxonomic levels for each, tab separated.
(LSU) large subunit (23S/28S) or (SSU) small subunit (16S/18S). (Default: SSU)
(bacteria) bacterial 16S rDNA annnotation, (fungi) fungal 18S/23S/ITS annotation. (Default: bacteria)
Confidence thresshold for RDP.(Default: 0.8)
Confidence thresshold for UTAX. (Default: 0.8)
Previously doBlast. (0) deactivated (just use RDP); (1) or (blast) use Blast; (2) or (lambda) use LAMBDA to search against a 16S reference database for taxonomic profiling of OTUs; (3) or (utax): use UTAX with custom databases; (4) or (vsearch) use VSEARCH to align OTUs to custom databases; (5) or (usearch) use USEARCH to align OTUs to custom databases. (Default: 0)
(1) do not use LCA (lowest common ancestor) to determine most likely taxonomic level (not recommended), instead just use the best blast hit. (0) LCA algorithm. (Default: 0)
Min horizontal coverage of an OTU sequence against ref DB. (Default: 0.5)
Min fraction of reads with identical taxonomy. (Default: 0.9)
(1) Create greengenes output labels instead of OTU (to be used with greengenes specific programs such as BugBase). (Default: 0)
(1) use ITSx to only retain OTUs fitting to ITS1/ITS2 hmm models; (0) deactivate. (Default: 1)
Parameters for ITSx to extract partial (%) ITS regions as well. (0) deactivate. (Default: 0)
(1) use LULU (https://github.com/tobiasgf/lulu) to merge OTUs based on their occurence. (Default: 1)
(0) do not build OTU phylogeny; (1) use fasttree2; (2) use iqtree2. (Default: 1)
Sequence clustering algorithm: (1) UPARSE, (2) swarm, (3) cd-hit, (6) unoise3, (7) dada2. Short keyword or number can be used to indicate clustering (Default: uparse)
Clustering threshold for OTUs. (Default: 0.97)
Clustering distance for OTUs when using swarm clustering. (Default: 1)
Skew in chimeric fragment abundance (uchime option). (Default: 2)
Add chimeras to count up OTUs/ASVs. (Default: F)
(0) do OTU chimera checks. (1) no chimera check at all. (2) Deactivate deNovo chimera check. (3) Deactivate ref based chimera check. (Default: 0)
Minimum size of dereplicated clustered, one form of noise removal. Can also have a more complex syntax, see examples. Default 1
The maximum number of basepairs that two reads are overlapping. (Default: 300)
custom flash parameters, since this contains spaces the command needs to be in parentheses: e.g. -flash_param "-r 150 -s 20". Note that this option completely replaces the default -m and -M flash options (i.e. need to be reinserted, if wanted)]
DNA sequence, usually reverse primer or reverse adaptor; all sequence beyond this point will be removed from OTUs. This is redundant with the "ReversePrimer" option from the mapping file, but gives more control (e.g. there is a problem with adaptors in the OTU output). (Default: "")
(1) check for crosstalk. Note that this requires in most cases 64bit usearch. (Default: 0)
Print LotuS2 version
Mapping_file: only checks mapping file and exists.
mapping_file: creates a new mapping file at location, based on already demultiplexed input (-i) dir. E.g. lotus2 -create_map mymap.txt -i /home/dir_with_demultiplex_fastq
Provide the absolute path to your local usearch binary file, this will be installed to be useable with LotuS2 in the future.
The mapping file, specified via the "-map" or "-m" argument to LotuS2, is used within the pipeline to demultiplex fasta + qual or fastq files, by using the program sdm (included in LotuS2). The first line in the mapping file is the header and has to start with "#". Entries have to be tab separated and the mapping file should be stored as text file. The number and names of columns are not limited (this will be exported to the .biom file) and only columns with column names in the table below will be used for processing the read files. Thus, the desired subparts of the pipeline are activated by having these column names in the header of the mapping file (see also Usage examples).
"automap.pl" creates automatically LotuS2 compatible maps from input dirs, if the run is already demultiplexed. Use perl automapl.pl to get instructions on how to use this helper script.
|SampleID||Sample Identifier, has to be unique for each Barcode.|
|BarcodeSequence||The Barcode (MID) tag assigned to each sample.|
|Barcode2ndPair||in case of dual indexed reads, use this column to specify the BC on the 2nd read pair|
|ForwardPrimer||Sequence used for 16S amplification, usually (unless paired end mode) is after the Barcode. Can contain IUPAC redundant nucleotides. The old tag LinkerPrimerSequence also works for this option.|
|ReversePrimer||Reverse Primer Sequence (IUPAC code)|
|fastqFile||Gives relative location of fastq file, such that [-i][fastqFile] gives the absolute path to fastq file|
|fnaFile||see fastqFile. However, fasta formated file instead of fastq format|
|qualFile||see fastqFile. However, quality file corresponding to fasta file instead of fastq format|
|SampleIDinHead||ID in header of fasta/fastq file, that identifies Sample //This replaces Barcode (MID) scanning\\|
|MIDfqFile||extra fastq file that contains only MIDs (equivalent to LotuS2 script option "-barcode"). Requires paired reads.|
|CombineSamples||Option to combine samples, that may be distributed across several files. A new tag is set here and all samples with this specific tag will be merged into a new tag with tag as SampleID. Note that SampleID's themselves have still to be unique.|
|SequencingRun||If you use dada2 with multiple sequencing runs, in principle you need to run dada2 separately for each run. If so, please include this information in the mapping file under the "SequencingRun" column or store the input fastq for each sequencing run under different folders.|
Typically, SampleID, BarcodeSequence and LinkerPrimerSequence are required. If the reverse primer should be identified and removed, ReversePrimer is required. If more than 1 sequencing run was used, it is often easier to demultiplex all fasta / fastq sequences at once. For this purpose LotuS2/sdm offers either the fastqFile column, or in case of fasta+qual input files, the two columns fnaFile and qualFile. Within these columns the relative path towards input files can be specified. Note that this format is very similar to the Qiime mapping file format, but for additional options exclusive to LotuS2.
After the LotuS2 run has finished, the output folder specified via the -o option contains the following files and subfolders:
|OTU.txt||OTU abundance matrix|
|OTU.fna||Fasta formatted extended OTU seed sequences|
|OTUphylo.nwk||Newick formatted phylogenetic tree between sequences|
|hiera_BLAST.txt||OTU taxonomy assignments based on Blastn|
|hiera_RDP.txt||OTU taxonomy assignments based on RDP classifier|
|phyloseq.Rdata||Phyloseq object ready to be loaded in R|
|higherLvl/||Directory with Species, Genus, Family, Class, Order and Phylum abundance matrices|
|primary/||Directory with copies of sdm options and (sometimes automatically modified) mapping file|
|LotusLogs/||Directory with log files from the pipeline runs, various statistics to demultiplexing, clustering, taxonomy assignments and quality assurance steps.|
|LotusLogs/LotuS_runlog.txt||This file tracks overall progression and reports the most basic stats of the single processing steps.|
|LotusLogs/LotuS_progout.txt||The concatenated output of all programs run by the lotus pipelines, useful for finding errors in case the pipeline aborts, or just for curiosity.|
|LotusLogs/LotuS_cmds.txt||All commands executed by the LotuS2 pipeline, for reproducibility.|
|LotusLogs/citations.txt||Citations to all programs used by each specific LotuS2 run, please try to acknowledge this software in case of a publication using LotuS2|
|LotusLogs/demulti.log.*||Log File for sdm (includes a log file for each single fna/fastq file)|
|LotusLogs/dada2_p_errF.pdf||Error profile plots for the forward reads (dada2 option)|
|ExtraFiles/OTU.MSA.fna||Multiple Alignment of OTU’s|
|LotusLogs/demulti.acceptsPerFile.log||Number and percentage of reads that was accepted per sample - can be used to find samples that consistently had reduced read quality.|
|LotusLogs/demulti.acceptsPerSample||Percentage of reads that was accepted per input file - can be used to find sequencer output file that consistently had reduced read quality.|
|LotusLogs/demulti_lenHist.txt||Length histogram of sequences that passed sdm|
|LotusLogs/demulti_qualHist.txt||Quality histogram of sequences that passed sdm|
|LotusLogs/SeedExtensionStats.txt||Min/Avg/Max stats on Seed sequence length, quality accumulated error and similarity to OTU consensus sequence|