Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

VariantRecalibrator won't recognize my #CHROM header line

Hi, I am attempting to use VariantRecalibrator with an output file that was generated using GATK UnifiedGenotyper.
My input file quite obviously has the correct VCF format, as it is an output file from another GATK analysis, with 2 input samples. Originally, I had used the file extension .tsv, then I changed it to VCF, then specifically told GATK that it is a VCF format file -input:name,VCF
Why is it telling me ERROR MESSAGE: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file ?
My file looks like:
...

contig=<ID=GL000223.1,length=180455>

contig=<ID=GL000195.1,length=182896>

contig=<ID=GL000212.1,length=186858>

contig=<ID=GL000222.1,length=186861>

contig=<ID=GL000200.1,length=187035>

contig=<ID=GL000193.1,length=189789>

contig=<ID=GL000194.1,length=191469>

contig=<ID=GL000225.1,length=211173>

contig=<ID=GL000192.1,length=547496>

reference=file:///genesis/scratch/sohrab_temp/kareys_tmp/pipeline/reference/GRCh37-lite.fa

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT RD20133 RD9116

1 10109 . A T 469.44 . AC=2;AF=0.500;AN=4;BaseQRankSum=-0.327;DP=355;DS;Dels=0.01;FS=4.955;HaplotypeScore=31.4598;MLEAC=2;MLEAF=0.500;MQ=20.83;MQ0=35;MQRankSum=-1.810;QD=1.32;ReadPosRankSum=-0.645 GT:AD:DP:GQ:PL 0/1:142,40:164:99:360,0,473 0/1:125,30:139:99:138,0,440
1 10177 . A C 78.50 . AC=2;AF=0.500;AN=4;BaseQRankSum=6.189;DP=351;Dels=0.02;FS=0.000;HaplotypeScore=178.7560;MLEAC=2;MLEAF=0.500;MQ=20.22;MQ0=21;MQRankSum=0.359;QD=0.22;ReadPosRankSum=-0.218
GT:AD:DP:GQ:PL 0/1:155,43:178:90:90,0,938
...

The command I used is:
java -Xmx2g -XX:MaxPermSize=512m -jar GenomeAnalysisTK.jar -T VariantRecalibrator -R GRCh37-lite.fa -input:name,VCF GATK_F1_A22267_A22268_realign.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.hg19.sites.nochr.vcf -resource:omni,known=false,training=true,truth=false,prior=12.0 1000G_omni2.5.hg19.sites.nochr.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=8.0 dbsnp_132_nochr.hg19.vcf -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an HRun -mode SNP -recalFile GATK_F1_A22267_A22268_realign.tsv.recal -tranchesFile GATK_F1_A22267_A22268_realign.tsv.tranches

My resource files say "nochr" because I am using a reference that has no CHR in it, but which I have used successfully for other analysis in the GATK pipelines (eg UnifiedGenotyper, etc)

Thanks!
karey

Answers

Sign In or Register to comment.