Walkthrough of the Oct 2013 GATK workshop hands-on session

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin
edited September 1 in Tutorials

Note: the exact data files we used in this tutorial are no longer available. However, you can use the files in the resource bundle to work through this tutorial. You may need to adapt the filenames accordingly.


Map and mark duplicates

http://gatkforums.broadinstitute.org/discussion/2799/howto-map-and-mark-duplicates

Starting with aligned (mapped) and deduplicated (dedupped) reads in .sam file to save time.

- Generate index

Create an index file to enable fast seeking through the file.

java -jar BuildBamIndex.jar I= dedupped_20.bam

- Prepare reference to work with GATK

http://gatkforums.broadinstitute.org/discussion/2798/howto-prepare-a-reference-for-use-with-bwa-and-gatk

Create a dictionary file and index for the reference.

java -jar CreateSequenceDictionary.jar R=human_b37_20.fasta O=human_b37_20.dict

samtools faidx human_b37_20.fasta 

Getting to know GATK

- Run a simple walker: CountReads

Identify basic syntax, console output: version, command recap line, progress estimates, result if applicable.

java -jar GenomeAnalysisTK.jar -T CountReads -R human_b37_20.fasta -I dedupped_20.bam -L 20

- Add a filter to count how many duplicates were marked

Look at filtering summary.

java -jar GenomeAnalysisTK.jar -T CountReads -R human_b37_20.fasta -I dedupped_20.bam -L 20 -rf DuplicateRead

- Demonstrate how to select a subset of read data

This can come in handy for bug reports.

java -jar GenomeAnalysisTK.jar -T PrintReads -R human_b37_20.fasta -I dedupped_20.bam -L 20:10000000-11000000 -o snippet.bam

- Demonstrate the equivalent for variant calls

Refer to docs for many other capabilities including selecting by sample name, up to complex queries.

java -jar GenomeAnalysisTK.jar -T SelectVariants -R human_b37_20.fasta -V dbsnp_b37_20.vcf -o snippet.vcf -L 20:10000000-11000000

Back to data processing

- Realign around Indels

http://gatkforums.broadinstitute.org/discussion/2800/howto-perform-local-realignment-around-indels

java -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R human_b37_20.fasta -I dedupped_20.bam -known indels_b37_20.vcf -o target_intervals.list -L 20 

java -jar GenomeAnalysisTK.jar -T IndelRealigner -R human_b37_20.fasta -I dedupped_20.bam -known indels_b37_20.vcf -targetIntervals target_intervals.list -o realigned_20.bam -L 20 

- Base recalibration

http://gatkforums.broadinstitute.org/discussion/2801/howto-recalibrate-base-quality-scores-run-bqsr

java -jar GenomeAnalysisTK.jar -T BaseRecalibrator -R human_b37_20.fasta -I realigned_20.bam -knownSites dbsnp_b37_20.vcf -knownSites indels_b37_20.vcf -o recal_20.table -L 20

java -jar GenomeAnalysisTK.jar -T PrintReads -R human_b37_20.fasta -I realigned_20.bam -BQSR recal_20.table -o recal_20.bam -L 20

java -jar GenomeAnalysisTK.jar -T BaseRecalibrator -R human_b37_20.fasta -I recalibrated_20.bam -knownSites dbsnp_b37_20.vcf -knownSites indels_b37_20.vcf -o post_recal_20.table -L 20

java -jar GenomeAnalysisTK.jar -T AnalyzeCovariates -R human_b37_20.fasta -before recal_20.table -after post_recal_20.table -plots recalibration_plots.pdf -L 20 

- ReduceReads

http://gatkforums.broadinstitute.org/discussion/2802/howto-compress-read-data-with-reducereads

java -jar GenomeAnalysisTK.jar -T ReduceReads -R human_b37_20.fasta -I recalibrated_20.bam -o reduced_20.bam -L 20 

- HaplotypeCaller

http://gatkforums.broadinstitute.org/discussion/2803/howto-call-variants-on-a-diploid-genome-with-the-haplotypecaller

java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R human_b37_20.fasta -I reduced_20.bam --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 -o variants_20.vcf -L 20 
Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

Comments

Sign In or Register to comment.