vignettes/01_preprocessing.Rmd
01_preprocessing.Rmd
For filtering and trimming of the raw reads we usually use the DADA2 functions but wrap them in a reproducible workflow step.
library(mbtools)
## Also loading:
## - dada2=1.12.1
## - data.table=1.12.6
## - ggplot2=3.2.1
## - magrittr=1.5
## - phyloseq=1.28.0
## - ShortRead=1.42.0
## - yaml=2.2.0
## Found tools:
## - minimap2=2.17-r941
## - slimm=0.3.4
## - samtools=1.9
##
## Attaching package: 'mbtools'
## The following object is masked _by_ 'package:BiocGenerics':
##
## normalize
## The following object is masked from 'package:graphics':
##
## layout
We will again use our helper function to get a list of sequencing files.
path <- system.file("extdata/16S", package = "mbtools")
files <- find_read_files(path)
print(files)
## forward
## 1: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/16S/F3D0_S188_L001_R1_001.fastq.gz
## 2: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/16S/F3D1_S189_L001_R1_001.fastq.gz
## 3: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/16S/F3D2_S190_L001_R1_001.fastq.gz
## 4: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/16S/F3D3_S191_L001_R1_001.fastq.gz
## 5: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/16S/Mock_S280_L001_R1_001.fastq.gz
## reverse
## 1: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/16S/F3D0_S188_L001_R2_001.fastq.gz
## 2: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/16S/F3D1_S189_L001_R2_001.fastq.gz
## 3: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/16S/F3D2_S190_L001_R2_001.fastq.gz
## 4: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/16S/F3D3_S191_L001_R2_001.fastq.gz
## 5: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/16S/Mock_S280_L001_R2_001.fastq.gz
## id injection_order lane
## 1: F3D0 188 1
## 2: F3D1 189 1
## 3: F3D2 190 1
## 4: F3D3 191 1
## 5: Mock 280 1
All mbtools
workflow step come with corresponding config_*
that returns an example/default configuration. Changes can be done a-posteriori or by directly passing in the parameters. We will specify a temporary directory as storage point for the preprocessed data and truncate the forward reads to 240 bp and the reverse reads to 200 bp (based on our previous quality assessment).
config <- config_preprocess(out_dir = tempdir(), truncLen = c(240, 200))
config
## $threads
## [1] 1
##
## $out_dir
## [1] "/var/folders/55/dv0p21y96g1cq84sr1zd3kym0000gr/T//RtmpUvWuNE"
##
## $trimLeft
## [1] 10
##
## $truncLen
## [1] 240 200
##
## $maxEE
## [1] 2
##
## $truncQ
## [1] 2
##
## $maxN
## [1] 0
##
## attr(,"class")
## [1] "config"
We can see that there are some more parameters that we could specify.
We can now run our preprocessing step.
filtered <- preprocess(files, config)
## INFO [2019-11-05 08:11:52] Preprocessing reads for 5 paired-end samples...
## INFO [2019-11-05 08:11:59] 4.03e+04/4.48e+04 (89.75%) reads passed preprocessing.
This will report the percentage of passed reads on the logging interface but you can also inspect that in detail by
## raw preprocessed id
## 1: 7793 6992 F3D0
## 2: 5869 5210 F3D1
## 3: 19620 17706 F3D2
## 4: 6758 6114 F3D3
## 5: 4779 4280 Mock