The script is based on the GATK best practices workflow for RNA-Seq data. https://gatk.broadinstitute.org/hc/en-us/articles/360035531192-RNAseq-short-variant-discovery-SNPs-Indels- Script developed by Aditya Singh https://github.com/aditya-88 [email protected]
The script takes in the following arguments:
- Path to the reference genome
- Path to the input BAM file aligned using STAR two-pass mode
- Path to the output directory
- Known sites for GATK BaseRecalibrator
The script performs the following steps:
- Mark duplicates
- Split'N'Trim and reassign mapping qualities
- Base quality score recalibration
- Variant calling
- Variant filtering
The script requires the following tools to be installed:
- GATK
- Picard tools (for MarkDuplicates, is part of GATK)
- STAR (for alignment, not required for this script)
bash rnavc.sh <reference genome> <input BAM file> <output directory> <threads> <memory in GB> <BED file> <GATK 4.x executable>
Unix/ Linux system with BASH/ ZSH
SAMTools
GATK 4.x