Tutorial

Minimal Example

Prerequisities

Installing

Downloading Databases

Setting Persistent Database Path

Running Analysis

Slurm Workload Manager

Minified Test-Run Dataset

mothulity is simple to use. Nevertheless, it won’t hurt to show some brief usage example.

Minimal Example

Below you can find a minimal example of installation, setting things up and usage. It should be self-explainatory. If not - each step is explained in the subsequent sections.

mkdir databases_directory
pip install --user mothulity
mothulity_dbaser databases_directory --silva-119
mothulity --set-align-database-path databases_directory/silva.nr_v119.align
mothulity --set-taxonomy-database-path databases_directory/silva.nr_v119.tax
mothulity project/fastq/directory -r bash -n my_first_mothulity_project

Prerequisities

In this brief tutorial we will assume you are working in your $HOME directory.
Create a directory for storing databases. For our example, let it be ~/databases_directory.
Download the test dataset to your home directory and unzip it.
mothulity uses a simple naming convention for the input fastq files. Have a look at two pairs of files inside ~/MiSeq_SOP:

F3D0_S188_L001_R1_001.fastq
F3D0_S188_L001_R2_001.fastq
F3D1_S189_L001_R1_001.fastq
F3D1_S189_L001_R2_001.fastq

this is how mothulity sees it:

Sample name	Direction	Extension
F3D0	R1	fastq
F3D0	R2	fastq
F3D1	R1	fastq
F3D1	R2	fastq

The separator is _.
The sample name is the first part of the name.
R1 means forward and R2 means backward.
fastq extension means it is a valid file to take as an input.

Installing

mothulity is available as Python package. It can be installed with pip:

pip install --user mothulity

mothulity comes with Mothur bundled. If you are fine with this, go ahead and install it system-wide. Nevertheless, it is a good practise to install software in a separate, virtual environment. Moreover, there is still an ongoing debate over where pip is supposed to store data files (on which mothulity depends heavily). The behaviour of pip in this matter can vary in different distributions. It is highly recommended that you use the --user option outside virtualenv.

Downloading Databases

There would be no 16S/ITS analysis with a database. mothulity_dbaser can help with that - give it a path where you want your files to be downloaded and type of the database.

Example

mothulity_dbaser ~/databases_directory --silva-119

Setting Persistent Database Path

mothulity needs to know where the databases live. You can specify the paths each time you run the analysis using arguments:

--align-database ~/databases_directory/silva.nr_v119.align

and

--taxonomy-database ~/databases_directory/silva.nr_v119.tax

so example usage would look like:

mothulity project/fastq/directory -r bash -n my_first_mothulity_project --align-database ~/databases_directory/silva.nr_v119.align --taxonomy-database ~/databases_directory/silva.nr_v119.tax

or you can set it persistently with commands:

mothulity --set-align-database-path ~/databases_directory/silva.nr_v119.align

and

mothulity --set-taxonomy-database-path ~/databases_directory/silva.nr_v119.tax

Running Analysis

Once the databases path is set up, you can easily run your analysis:

mothulity ~/MiSeq_SOP -r bash -n my_first_mothulity_project

~/MiSeq_SOP is where your fastq files are.

-r bash indicates shell to use. If you are using some exotic shell, pass its name here. If you are using workload manager, use a command to submit a job. For SLURM it would be sbatch

-n my_first_mothulity_project is used to name files, directories and give a title the final output.

The output is placed in ~/MiSeq_SOP/analysis/OTU/analysis_my_first_mothulity_project.html and should look like this

Slurm Workload Manager

mothulity can be conveniently used with Slurm Workload Manager so it is good idea to use it on your HPC/computing facility. It requires two steps:

Configuration of your queues/jobs.
Specifying the sbatch as shell to run.

There are three options to manage Slurm Workload Manager:

--add-slurm-setting
--list-slurm-settings
--use-slurm-setting

The user is free to go with any configuration really. A real-life example might be:

mothulity --add-slurm-setting "name=big_queue partition=long processors=32 exclusive"

The options specified here are used as SBATCH flags. The name keyword is reserved and is used to call the desired settings later on. The setting is permanent. Another setting with the same name would overwrite the previous one!

mothulity ~/MiSeq_SOP -n my_first_mothulity_project -r sbatch --use-slurm-setting big_queue

This tells mothulity to run the analysis using SLURM with previously defined big_queue settings. It renders:

#SBATCH --job-name="my_first_mothulity_project"
#SBATCH --partition=test
#SBATCH --exclusive

and puts it before the rest of the script that runs Mothur

If you want to what settings are already saved - type:

mothulity --list-slurm-settings

Minified Test-Run Dataset

The above example uses real data (real means any interpretation makes sense in real world) and real databases. If you just want to test if everything works as expected save yourself time and RAM, then you can use mothulity built-in test-run dataset. If mothulity is correctly installed - you do not need to download anything. Just type:

mothulity --get-test-run-data

It copies the test_run_database test_run_samples directories into your CWD. The procedure work just the same as described above, the files are just much, much smaller.

Tutorial

Table of Contents

Minimal Example

Prerequisities

Installing

Downloading Databases

Example

Setting Persistent Database Path

Running Analysis

Slurm Workload Manager

Minified Test-Run Dataset