This can be used for a variety of applications the most common ones are:
removing sequences from the host
removing ribosomal sequences
removing contaminants
This function uses minimap2 to align and identify hits and does not require a prebuilt index.
remove_reference(reads, out, reference, alignments = NA, threads = 3)
| reads | A character vector containing the read files in fastq format.
Can be generated using |
|---|---|
| out | A folder to which to save the filtered fastq files. |
| reference | Path to a fasta file (can be gzipped) that contains the sequences to filter. Can be a genome or transcripts. |
| alignments | Whether to keep the alignment. If not NA should be a string indicating the path to the output bam file. |
| threads | How many threads to use for mapping. |
A numeric vector with two entries. The number of sequences after filtering (non-mapped), and the number of removed sequences (mapped).
NULL#> NULL