WipeReads

Introduction

WipeReads is a tool for removing reads from indexed BAM files. It respects pairing information and can be set to remove reads whose duplicate maps outside of the target region. The main use case is to remove reads mapping to known ribosomal RNA regions (using a supplied BED file containing intervals for these regions).

Example

To open the help menu:

java -jar Biopet-0.2.0.jar tool WipeReads -h

WipeReads - Region-based reads removal from an indexed BAM file

Usage: WipeReads [options]

  -l <value> | --log_level <value>
        Log level
  -h | --help
        Print usage
  -v | --version
        Print version
  -I <bam> | --input_file <bam>
        Input BAM file
  -r <bed/gtf/refflat> | --interval_file <bed/gtf/refflat>
        Interval BED file
  -o <bam> | --output_file <bam>
        Output BAM file
  -f <bam> | --discarded_file <bam>
        Discarded reads BAM file (default: none)
  -Q <value> | --min_mapq <value>
        Minimum MAPQ of reads in target region to remove (default: 0)
  -G <rgid> | --read_group <rgid>
        Read group IDs to be removed (default: remove reads from all read groups)
  --limit_removal
        Whether to remove multiple-mapped reads outside the target regions (default: yes)
  --no_make_index
        Whether to index output BAM file or not (default: yes)

GTF-only options:
  -t <gtf_feature_type> | --feature_type <gtf_feature_type>
        GTF feature containing intervals (default: exon)

Advanced options:
  --bloom_size <value>
        Expected maximum number of reads in target regions (default: 7e7)
  --false_positive <value>
        False positive rate (default: 4e-7)

This tool will remove BAM records that overlaps a set of given regions.
By default, if the removed reads are also mapped to other regions outside
the given ones, they will also be removed.

To run the tool:

java -jar Biopet-0.2.0.jar tool WipeReads --input_file myBam.bam \
--interval_file myRibosomal_regions.bed --output_file myFilteredBam.bam

Output

This tool outputs a bam file containing all the reads not inside a ribosomal region. And optionally a bam file with only the ribosomal reads