Holiday Notice:
The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!

VariantRecalibrator won't recognize my #CHROM header line

Hi, I am attempting to use VariantRecalibrator with an output file that was generated using GATK UnifiedGenotyper.
My input file quite obviously has the correct VCF format, as it is an output file from another GATK analysis, with 2 input samples. Originally, I had used the file extension .tsv, then I changed it to VCF, then specifically told GATK that it is a VCF format file -input:name,VCF
Why is it telling me ERROR MESSAGE: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file ?
My file looks like:
...

contig=<ID=GL000223.1,length=180455>

contig=<ID=GL000195.1,length=182896>

contig=<ID=GL000212.1,length=186858>

contig=<ID=GL000222.1,length=186861>

contig=<ID=GL000200.1,length=187035>

contig=<ID=GL000193.1,length=189789>

contig=<ID=GL000194.1,length=191469>

contig=<ID=GL000225.1,length=211173>

contig=<ID=GL000192.1,length=547496>

reference=file:///genesis/scratch/sohrab_temp/kareys_tmp/pipeline/reference/GRCh37-lite.fa

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT RD20133 RD9116

1 10109 . A T 469.44 . AC=2;AF=0.500;AN=4;BaseQRankSum=-0.327;DP=355;DS;Dels=0.01;FS=4.955;HaplotypeScore=31.4598;MLEAC=2;MLEAF=0.500;MQ=20.83;MQ0=35;MQRankSum=-1.810;QD=1.32;ReadPosRankSum=-0.645 GT:AD:DP:GQ:PL 0/1:142,40:164:99:360,0,473 0/1:125,30:139:99:138,0,440
1 10177 . A C 78.50 . AC=2;AF=0.500;AN=4;BaseQRankSum=6.189;DP=351;Dels=0.02;FS=0.000;HaplotypeScore=178.7560;MLEAC=2;MLEAF=0.500;MQ=20.22;MQ0=21;MQRankSum=0.359;QD=0.22;ReadPosRankSum=-0.218
GT:AD:DP:GQ:PL 0/1:155,43:178:90:90,0,938
...

The command I used is:
java -Xmx2g -XX:MaxPermSize=512m -jar GenomeAnalysisTK.jar -T VariantRecalibrator -R GRCh37-lite.fa -input:name,VCF GATK_F1_A22267_A22268_realign.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.hg19.sites.nochr.vcf -resource:omni,known=false,training=true,truth=false,prior=12.0 1000G_omni2.5.hg19.sites.nochr.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=8.0 dbsnp_132_nochr.hg19.vcf -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an HRun -mode SNP -recalFile GATK_F1_A22267_A22268_realign.tsv.recal -tranchesFile GATK_F1_A22267_A22268_realign.tsv.tranches

My resource files say "nochr" because I am using a reference that has no CHR in it, but which I have used successfully for other analysis in the GATK pipelines (eg UnifiedGenotyper, etc)

Thanks!
karey

Answers

Sign In or Register to comment.