We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Can't get calls for multiple records at same position in the alleles input file; UnifiedGenotyper

Is there a way in the UnifiedGenotyper using the GENOTYPE_GIVEN_ALLELES mode to generate an output vcf line for each of multiple records at the same position in the vcf alleles input file?

Thanks

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I'm not sure I understand what you mean by multiple records, can please clarify by giving an example?

  • durtschidurtschi Member
    edited March 2013

    I'd like to use UnifiedGenotyper to look at the same position for more than one allele and give me an output line for each one.

    Here is the limitation I've run into...

    Here is my example command:

    java -Xmx4g -jar $GATK -T UnifiedGenotyper -R $REF_GENOME --logging_level INFO -L $REFERENCE_VCF -I $SAMPLE_BAM -o $OUTPUT_VCF --genotype_likelihoods_model BOTH -stand_call_conf 0.0 -stand_emit_conf 0.0 -dcov 1000 --max_deletion_fraction 1.0 --min_base_quality_score 17 --genotyping_mode GENOTYPE_GIVEN_ALLELES --alleles $REFERENCE_VCF --output_mode EMIT_ALL_SITES --max_alternate_alleles 6
    

    Here are entries in my example vcf file used for --alleles parameter ($REFERENCE_VCF):
    (Note multiple entries at each chromosomal position)

    1 237675150 rs57830998 A G 93.33 gatkOCT2011SNP AC=1;AF=0.50;AN=2;BaseQRankSum=2.662;DB;DP=50;Dels=0.00;FS=0.000;HRun=0;HaplotypeScore=13.9877;MQ=49.41;MQ0=0;MQRankSum=2.406;QD=1.87;ReadPosRankSum=0.639 GT:AD:DP:GQ:PL 0/1:38,12:50:99:123,0,542

    1 237675150 rs57409517 A AG 458.59 PASS AC=1;AF=0.50;AN=2;BaseQRankSum=-1.753;DB;DP=51;FS=0.000;HRun=0;HaplotypeScore=511.5992;MQ=49.64;MQ0=0;MQRankSum=1.472;QD=8.99;ReadPosRankSum=1.152 GT:AD:DP:GQ:P0/1:42,6:51:99:498,0,534

    1 247597365 rs10925022 G A 708.46 gatkOCT2011SNP AC=2;AF=1.00;AN=2;DB;DP=45;Dels=0.00;FS=0.000;HRun=0;HaplotypeScore=35.7293;MQ=43.94;MQ0=0;QD=15.74 GT:AD:DP:GQ:PL 1/1:1,44:45:60.19:741,60,0

    1 247597365 rs74163771 G GTGTGTTCTGAGGCCTTCTCTATTCCAGAGCTCTCTGGTCAGA 11375.48 PASS AC=1;AF=0.50;AN=2;BaseQRankSum=-2.503;DB;DP=70;FS=0.000;HRun=0;HaplotypeScore=3408.9089;MQ=48.73;MQ0=0;MQRankSum=-3.727;QD=162.51;ReadPosRankSum=4.353;SB=-0.00 GT:AD:DP:GQ:PL 0/1:6,22:70:99:11375,0,3360

    15 38641774 . A T 97.37 gatkOCT2011SNP AC=1;AF=0.50;AN=2;BaseQRankSum=0.000;DP=36;Dels=0.03;FS=0.000;HRun=1;HaplotypeScore=29.9528;MQ=37.57;MQ0=0;MQRankSum=2.307;QD=2.63;ReadPosRankSum=-2.672;SB=-19.45 GT:AD:DP:GQ:PL 0/1:31,3:36:48.23:97,0,48

    15 38641774 rs34352434 A AAAT 1823.71 PASS AC=2;AF=1.00;AN=2;DB;DP=31;FS=0.000;HRun=0;HaplotypeScore=400.0079;MQ=35.79;MQ0=0;QD=58.83;SB=-66.89 GT:AD:DP:GQ:PL 1/1:4,27:31:90.31:1866,90,0

    And here are the warning messages I get from both a GATK 1 and a GATK 2 version:

    WARN 10:48:57,545 UnifiedGenotyperEngine - Multiple valid VCF records detected in the alleles input file at site 1:237675150, only considering the first record

    WARN 10:48:57,807 UnifiedGenotyperEngine - Multiple valid VCF records detected in the alleles input file at site 1:247597365, only considering the first record

    WARN 10:48:57,944 UnifiedGenotyperEngine - Multiple valid VCF records detected in the alleles input file at site 15:38641774, only considering the first record

    What I would like is for each VCF record to be considered even though there are multiple records at some sites. Is there any workaround other than splitting up my --alleles input file to avoid "Multiple valid VCF records" at the same chromosomal coordinates?

    thanks again

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Ah, I see -- you need to merge your variants using CombineVariants. Be sure to read about all the different options in the doc.

  • heskettheskett Portland, Oregon. USAMember

    @Geraldine_VdAuwera said:
    Ah, I see -- you need to merge your variants using CombineVariants. Be sure to read about all the different options in the doc.

    I don't believe this solves the problem.

    dbSNP VCF has multiple VCF records at many positions, but GATK gives a warning and ultimately an error for this. How does one get around it?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    One way is to separate out the SNPs and indels then re-merge them.

Sign In or Register to comment.