mbtools
has support for aligning long metagenomic reads for Oxford Nanopore data. Before proceeding we recommend you preprocess the reads first as described in an earlier vignette.
library(mbtools)
We use minimap2 for everything since it performs as good as other aligners but does not require explicit building of the reference. This way your reference database can just be a (compressed) fasta file.
As example data we will use 3 samples which are subsampled data from MinION sequencing for a list of 10 reference genomes in equal abundances.
Let’s create our file list for the example data and reference database:
fi <- system.file("extdata/nanopore", package = "mbtools") %>%
find_read_files()
ref <- system.file("extdata/genomes/zymo_mock.fna.gz",
package = "mbtools")
Which are 3 single-end files.
As always we will need A config object.
config <- config_align(
reference = ref,
threads = 3,
use_existing = FALSE
)
config
## $reference
## [1] "/Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/genomes/zymo_mock.fna.gz"
##
## $build_index
## [1] FALSE
##
## $threads
## [1] 3
##
## $alignment_dir
## [1] "alignments"
##
## $max_hits
## [1] 100
##
## $use_existing
## [1] FALSE
##
## $limited_memory
## [1] FALSE
##
## attr(,"class")
## [1] "config"
This will be sufficient to align reads. As always the first argument can also be an artifact from quality_control
or preprocess
.
alns <- align_long_reads(fi, config)
## INFO [2019-11-05 08:20:12] Aligning 3 samples on 3 threads. Keeping up to 100 secondary alignments.
## INFO [2019-11-05 08:20:15] Finished aligning even1.
## INFO [2019-11-05 08:20:18] Finished aligning even2.
## INFO [2019-11-05 08:20:21] Finished aligning even3.
You will get an output artifact that logs the created alignments…
## id alignment success
## 1: even1 alignments/even1.bam TRUE
## 2: even2 alignments/even2.bam TRUE
## 3: even3 alignments/even3.bam TRUE
…the size of all the alignments on disk…
print(alns$disk_size, unit = "auto")
## 12.4 Mb
…and the logs in case something goes wrong.
cat(alns$logs[[1]])
## [M::mm_idx_gen::1.672*1.18] collected minimizers
## [M::mm_idx_gen::1.926*1.42] sorted minimizers
## [M::main::1.926*1.42] loaded/built the index for 10 target sequence(s)
## [M::mm_mapopt_update::2.044*1.39] mid_occ = 25
## [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 10
## [M::mm_idx_stat::2.126*1.38] distinct minimizers: 10418059 (91.27% are singletons); average occurrences: 1.132; average spacing: 5.347
## [M::worker_pipeline::2.866*1.62] mapped 1000 sequences
## [M::main] Version: 2.17-r941
## [M::main] CMD: minimap2 -acx map-ont -t 3 --secondary=yes -N 100 -I 100G /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/genomes/zymo_mock.fna.gz /Library/Frameworks/R.framework/Versions/3.6/Resources/library/mbtools/extdata/nanopore/even1_S1_L001_R1_001.fastq.gz
## [M::main] Real time: 2.949 sec; CPU: 4.734 sec; Peak RSS: 0.642 GB