Extension: Denoising by sample¶
Introduction¶
Here we will denoise by each individual sample, rather than on the entire dataset. This requires us to go back to the data generated in an earlier tutorial. We strongly recommend making a separate directory to perform this experimentation in, and using clear file names so you don’t get the data mixed up!
Data and software
The input data for this tutorial is a directory containing one FASTQ or FASTA file per sample. If you’ve been following along step-by-step, this is the data produced in the pair merging tutorial in the previous section. Alternatively, the 3_merged
directory within the sectionA archive can be used as example data.
This extension uses the VSEARCH software.
Denoising by sample¶
Exercise
- If your input data is FASTQ files, quality filter each FASTA file using the
vsearch --fastx_filter...
command from the quality filtering tutorial, but running individually on each sample one at a time using a bash loop (hint: there’s a loop command at the end of that tutorial). - Run the dereplication command (
vsearch --derep_fulllength...
) from the dereplication tutorial on each sample individually using a bash loop - Run the denoising command (
vsearch --cluster_unoise...
) from the denoising tutorial on each sample individually using a bash loop - Concatenate the result files using
sed
in the same way as in the data concatenation tutorial. - Re-dereplicate the output file using the following command:
vsearch --derep_fulllength input.fasta --sizein --sizeout --relabel uniq --output output.fasta
- Compare the total unique read numbers and size distributions of your output to the version produced by whole-dataset denoising