We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Bug in PhaseByTransmission: incorrect parsing / interpretation of PED file

TimHughesTimHughes Member
edited October 2013 in Ask the GATK team

PhaseByTransmission does not seem to correctly parse the PED file:

I have a vcf file with the following samples

grep "#CHR" trioPE06.variantsOnly.targetsPad30.ug.vcf
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  Holmgren-excap2-PE-trio-D05-2963    Holmgren-excap2-PE-trio-Fre12-053   Holmgren-excap2-PE-trio-Fre12-055

And a PED file that looks like this:

% cat progressive_encephalopathies_run2.ped
PE06 Holmgren-excap2-PE-trio-D05-2963 0 0 2 2
PE06 Holmgren-excap2-PE-trio-Fre12-053 1 0 1 1
PE06 Holmgren-excap2-PE-trio-Fre12-055 0 1 2 1

And I try to phase by transmission but get the following error where the gatk has incorrectly inferred that there are 4 members in family PE06 when there are just 3:

INFO  11:49:41,527 PhaseByTransmission - Caution: Family PE06 has 4 members; At the moment Phase By Transmission only supports trios and parent/child pairs. Family skipped.

Any help with resolving this issue is much appreciated. I guess it could be something to do with my formatting of the PED file but I have been over it many times checking invisibles + more. The same file actually works if I comment out one of the parents and just run with a parent/child pair, but as soon as I have all lines in the PED file, it throws the same error.

The full output is as follows

[email protected]:/Volumes/jbod/proj_eirik_encephalon/130808_D00132_0045_BC272JACXX/Unaligned/130808_D00132.B.Project_Holmgren-excap2-2013-07-05
% java -Xmx2g -jar ${gatkDir} -T PhaseByTransmission -R ${ref} -V trio${trioName}.variantsOnly.targetsPad${padding}.ug.vcf -ped progressive_encephalopathies_run2.ped -o trio${trioName}.variantsOnly.targetsPad${padding}.ug.phased.vcf --MendelianViolationsFile trio${trioName}.variantsOnly.targetsPad${padding}.ug.phased.vcf_mendelianViolations
INFO  11:49:40,864 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  11:49:40,867 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.5-2-gf57256b, Compiled 2013/05/01 09:27:02 
INFO  11:49:40,867 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO  11:49:40,867 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
INFO  11:49:40,872 HelpFormatter - Program Args: -T PhaseByTransmission -R /Users/tim/home/PLATFORM/refData/b37/genomic/gatkBundle_2.5/human_g1k_v37_decoy.fasta -V trioPE06.variantsOnly.targetsPad30.ug.vcf -ped progressive_encephalopathies_run2.ped -o trioPE06.variantsOnly.targetsPad30.ug.phased.vcf --MendelianViolationsFile trioPE06.variantsOnly.targetsPad30.ug.phased.vcf_mendelianViolations 
INFO  11:49:40,873 HelpFormatter - Date/Time: 2013/10/02 11:49:40 
INFO  11:49:40,873 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  11:49:40,873 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  11:49:40,880 ArgumentTypeDescriptor - Dynamically determined type of trioPE06.variantsOnly.targetsPad30.ug.vcf to be VCF 
INFO  11:49:41,056 GenomeAnalysisEngine - Strictness is SILENT 
INFO  11:49:41,228 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
INFO  11:49:41,254 RMDTrackBuilder - Loading Tribble index from disk for file trioPE06.variantsOnly.targetsPad30.ug.vcf 
INFO  11:49:41,368 PedReader - Reading PED file progressive_encephalopathies_run2.ped with missing fields: [] 
INFO  11:49:41,441 PedReader - Phenotype is other? false 
INFO  11:49:41,507 GenomeAnalysisEngine - Creating shard strategy for 0 BAM files 
INFO  11:49:41,522 GenomeAnalysisEngine - Done creating shard strategy 
INFO  11:49:41,523 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
INFO  11:49:41,523 ProgressMeter -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining 
INFO  11:49:41,527 PhaseByTransmission - Caution: Family PE06 has 4 members; At the moment Phase By Transmission only supports trios and parent/child pairs. Family skipped. 
INFO  11:49:42,802 GATKRunReport - Uploaded run statistics report to AWS S3 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.5-2-gf57256b): 
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Bad input: No PED file passed or no trios found in PED file. Aborted.
##### ERROR ------------------------------------------------------------------------------------------

Best Answer

Answers

  • TimHughesTimHughes Member

    Great. Thanks a lot :) that solved it for me.

    Regarding the fact that my incorrect PED file will run if I comment out one of the parents, here is the detail in case it is of interest.

    Commenting out one of the parents in my incorrect file:

    cat progressive_encephalopathies_run2.ped
    PE06 Holmgren-excap2-PE-trio-D05-2963 0 0 2 2
    #PE06 Holmgren-excap2-PE-trio-Fre12-053 1 0 1 1
    PE06 Holmgren-excap2-PE-trio-Fre12-055 0 1 2 1
    

    But of course the run gives useless results:

    java -Xmx2g -jar ${gatkDir} -T PhaseByTransmission -R ${ref} -V trio${trioName}.variantsOnly.targetsPad${padding}.ug.vcf -ped progressive_encephalopathies_run2.ped -o trio${trioName}.variantsOnly.targetsPad${padding}.ug.phased.vcf --MendelianViolationsFile trio${trioName}.variantsOnly.targetsPad${padding}.ug.phased.vcf_mendelianViolations --pedigreeValidationType SILENT
    INFO  10:31:28,635 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  10:31:28,638 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.5-2-gf57256b, Compiled 2013/05/01 09:27:02 
    INFO  10:31:28,638 HelpFormatter - Copyright (c) 2010 The Broad Institute 
    INFO  10:31:28,638 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
    INFO  10:31:28,643 HelpFormatter - Program Args: -T PhaseByTransmission -R /Users/tim/home/PLATFORM/refData/b37/genomic/gatkBundle_2.5/human_g1k_v37_decoy.fasta -V trioPE06.variantsOnly.targetsPad30.ug.vcf -ped progressive_encephalopathies_run2.ped -o trioPE06.variantsOnly.targetsPad30.ug.phased.vcf --MendelianViolationsFile trioPE06.variantsOnly.targetsPad30.ug.phased.vcf_mendelianViolations --pedigreeValidationType SILENT 
    INFO  10:31:28,643 HelpFormatter - Date/Time: 2013/10/03 10:31:28 
    INFO  10:31:28,644 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  10:31:28,644 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  10:31:28,651 ArgumentTypeDescriptor - Dynamically determined type of trioPE06.variantsOnly.targetsPad30.ug.vcf to be VCF 
    INFO  10:31:28,825 GenomeAnalysisEngine - Strictness is SILENT 
    INFO  10:31:29,006 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
    INFO  10:31:29,031 RMDTrackBuilder - Loading Tribble index from disk for file trioPE06.variantsOnly.targetsPad30.ug.vcf 
    INFO  10:31:29,148 PedReader - Reading PED file progressive_encephalopathies_run2.ped with missing fields: [] 
    INFO  10:31:29,221 PedReader - Phenotype is other? false 
    INFO  10:31:29,287 GenomeAnalysisEngine - Creating shard strategy for 0 BAM files 
    INFO  10:31:29,302 GenomeAnalysisEngine - Done creating shard strategy 
    INFO  10:31:29,303 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
    INFO  10:31:29,303 ProgressMeter -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining 
    INFO  10:31:50,843 PhaseByTransmission - Number of complete trio-genotypes: 0 
    INFO  10:31:50,843 PhaseByTransmission - Number of trio-genotypes containing no call(s): 0 
    INFO  10:31:50,843 PhaseByTransmission - Number of trio-genotypes phased: 0 
    INFO  10:31:50,843 PhaseByTransmission - Number of resulting Het/Het/Het trios: 0 
    INFO  10:31:50,843 PhaseByTransmission - Number of remaining single mendelian violations in trios: 0 
    INFO  10:31:50,844 PhaseByTransmission - Number of remaining double mendelian violations in trios: 0 
    INFO  10:31:50,844 PhaseByTransmission - Number of complete pair-genotypes: 0 
    INFO  10:31:50,844 PhaseByTransmission - Number of pair-genotypes containing no call(s): 0 
    INFO  10:31:50,844 PhaseByTransmission - Number of pair-genotypes phased: 0 
    INFO  10:31:50,844 PhaseByTransmission - Number of resulting Het/Het pairs: 0 
    INFO  10:31:50,844 PhaseByTransmission - Number of remaining mendelian violations in pairs: 0 
    INFO  10:31:50,844 PhaseByTransmission - Number of genotypes updated: 0 
    INFO  10:31:53,619 ProgressMeter -            done        1.12e+05   24.0 s        3.6 m     97.6%        24.0 s     0.0 s 
    INFO  10:31:53,619 ProgressMeter - Total runtime 24.32 secs, 0.41 min, 0.01 hours 
    INFO  10:31:55,003 GATKRunReport - Uploaded run statistics report to AWS S3 
    
Sign In or Register to comment.