Shiva

Introduction

This pipeline is build for variant calling on NGS data (preferably Illumina data). It is based on the best practices) of GATK in terms of their approach to variant calling. The pipeline accepts .fastq & .bam files as input.


Tools for this pipeline


Example

Note that one should first create the appropriate configs.

Sample input extensions

Please refer to our mapping pipeline for information about how the input samples should be handled.

Shiva is a special pipeline in the sense that it can also start directly from bam files. Note that one should alter the sample config field from R1 into bam.

Full pipeline

The full pipeline can start from fastq or from bam file. This pipeline will include pre-process steps for the bam files.

To view the help menu, execute:

biopet pipeline shiva -h

Arguments for Shiva:
 -sample,--onlysample <onlysample>               Only Sample
 -config,--config_file <config_file>             JSON config file(s)
 -DSC,--disablescatterdefault                    Disable all scatters

To run the pipeline:

biopet pipeline shiva -config MySamples.json -config MySettings.json -run

A dry run can be performed by simply removing the -run flag from the command line call.

Only variant calling

It is possible to run Shiva while only performing its variant calling steps. This has been separated in its own pipeline named shivavariantcalling. As this calling pipeline starts from BAM files, it will naturally not perform any pre-processing steps.

To view the help menu, execute:

java -jar </path/to/biopet.jar> pipeline shivavariantcalling -h

Arguments for ShivaVariantcalling:
 -BAM,--inputbams <inputbams>          Bam files (should be deduped bams)
 -sample,--sampleid <sampleid>         Sample ID (only effects summary and not required)
 -library,--libid <libid>              Library ID (only effects summary and not required)
 -config,--config_file <config_file>   JSON config file(s)
 -DSC,--disablescatter                 Disable all scatters

To run the pipeline:

biopet pipeline shivavariantcalling -config MySettings.json -run

A dry run can be performed by simply removing the -run flag from the command line call.


Variant caller

At this moment the following variant callers can be used

Config options

To view all possible config options please navigate to our Gitlab wiki page Config

Required settings

Confignamespace Name Type Default Function
- output_dir String Path to output directory
Shiva variantcallers List[String] Which variant callers to use

Config options

ConfignNamespace Name Type Default Function
shiva species String unknown_species Name of species, like H.sapiens
shiva reference_name String unknown_reference_name Name of reference, like hg19
shiva reference_fasta String reference to align to
shiva dbsnp String vcf file of dbsnp records
shiva variantcallers List[String] variantcaller to use, see list
shiva use_indel_realigner Boolean true Realign indels
shiva use_base_recalibration Boolean true Base recalibrate
shiva use_analyze_covariates Boolean false Analyze covariates during base recalibration step
shiva bam_to_fastq Boolean false Convert bam files to fastq files
shiva correct_readgroups Boolean false Attempt to correct read groups
shiva amplicon_bed Path Path to target bed file
shiva regions_of_interest Array of paths Array of paths to region of interest (e.g. gene panels) bed files
vcffilter min_sample_depth Integer 8 Filter variants with at least x coverage
vcffilter min_alternate_depth Integer 2 Filter variants with at least x depth on the alternate allele
vcffilter min_samples_pass Integer 1 Minimum amount of samples which pass custom filter (requires additional flags)
vcffilter filter_ref_calls Boolean true Remove reference calls

Since Shiva uses the Mapping pipeline internally, mapping config values can be specified as well. For all the options, please see the corresponding documentation for the mapping pipeline.

Exome variant calling

If one calls variants with Shiva on exome samples and a amplicon_bed file is available, the user is able to add this file to the config file. When the file is given, the coverage over the positions in the bed file will be calculated plus the number of variants on each position. If there is an interest in a specific region of the genome/exome one is capable to give multiple regionOfInterest.bed files with the option regions_of_interest (in list/array format).

A short recap: the option amplicon_bed can only be given one time and should be composed of the amplicon kit used to obtain the exome data. The option regions_of_interest can contain multiple bed files in list format and can contain any region a user wants. If multiple regions are given, the pipeline will make an coverage plot over each bed file separately.

Modes

Shiva furthermore supports three modes. The default and recommended option is multisample_variantcalling. During this mode, all bam files will be simultaneously called in one big VCF file. It will work with any number of samples.

On top of that, Shiva provides two separate modes that only work with a single sample. Those are not recommended, but may be useful to those who need to validate replicates.

Mode single_sample_variantcalling calls a single sample as a merged bam file. I.e., it will merge all libraries in one bam file, then calls on that.

The other mode, library_variantcalling, will call simultaneously call all library bam files.

The config for these therefore is:

namespace Name Type Default Function
shiva multisample_variantcalling Boolean true Default, multisample calling
shiva single_sample_variantcalling Boolean false Not-recommended, single sample, merged bam
shiva library_variantcalling Boolean false Not-recommended, single sample, per library

CNV calling

In addition to standard variant calling, Shiva also supports CNV calling. One can enable this option by setting the cnv_calling config option to true.

For CNV calling Shiva uses the Kopisu as a module. Please see the documentation for Kopisu.

Example configs

Config example

samples:
    SampleID:
        libraries:
            lib_id_1:
                bam: YourBam.bam
            lib_id_2:
                R1: file_R1.fq.gz
                R2: file_R2.fq.gz
dbsnp: <dbsnp.vcf.gz>
vcffilter:
    min_alternate_depth: 1
output_dir: <output directory>
variantcallers:
    - haplotypecaller
    - unifiedgenotyper
    - haplotypecaller_gvcf

Additional XHMM CNV calling example

shiva:
    cnv_calling: true
kopisu:
    use_cnmops_method: false
    use_freec_method: false
    use_xhmm_method: true
amplicon_bed: <path_to_bed>
xhmm:
    discover_params: <path_to_file>
    exe: <path_to_executable>

References

Getting Help

If you have any questions on running Shiva, suggestions on how to improve the overall flow, or requests for your favorite variant calling related program to be added, feel free to post an issue to our issue tracker at GitHub. Or contact us directly via: SASC email