Getting a Large VCF File output on Variant call

saketkcsaketkc Posts: 2Member
edited November 2012 in Ask the GATK team

HI

I am using the following set of commands on GATK2.1.13 to generate a VCF file

echo `java -Xmx20g -jar /usr/bin/GenomeAnalysisTK.jar -I B2_with_ReadGroup.ddup.sorted.bam -R human_g1k_v37.fasta -T RealignerTargetCreator  -o my.intervals -et NO_ET -K /root/sandbox/saket.kumar_iitb.ac.in.key`
echo "Realignment Done at `date`"
echo "Starting IndelRealigner at `date`"

echo `java -Xmx20g -jar /usr/bin/GenomeAnalysisTK.jar -I B2_with_ReadGroup.ddup.sorted.bam -R human_g1k_v37.fasta -T IndelRealigner -targetIntervals my.intervals -o myrealignedBam.bam  -et NO_ET -K /root/sandbox/saket.kumar_iitb.ac.in.key`
echo "Realignment done at `date`"
echo "Starting UnifiedGenotyper at `date`"
echo `java -Xmx20g -jar /usr/bin/GenomeAnalysisTK.jar -l INFO -R human_g1k_v37.fasta -T UnifiedGenotyper    -I myrealignedBam.bam    -o mygatk_vcf.vcf    --output_mode EMIT_ALL_SITES -et NO_ET -K /root/sandbox/saket.kumar_iitb.ac.in.key`
echo "Gentoypxing complete at `date`"

When i do a 'mpileup' for B2_with_ReadGroup.ddup.sorted.bam , I get a devcent 10 MB VCF file. But on the last ste of the above pipeline, my " mygatk_vcf.vcf " is goinging into 81GBs !!

Do you know what is wrong ?

Post edited by Geraldine_VdAuwera on

Answers

  • ebanksebanks Posts: 683GATK Developer mod

    You are asking the Unified Genotyper to do something very different from mpileup, so it's no surprise that they don't match up. Have you looked at our best practices documentation, because I would note that you aren't following them. Have you looked at the documentation for EMIT_ALL_SITES?

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

Sign In or Register to comment.