Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Bug in PhaseByTransmission: incorrect parsing / interpretation of PED file

TimHughesTimHughes Member
edited October 2013 in Ask the GATK team

PhaseByTransmission does not seem to correctly parse the PED file:

I have a vcf file with the following samples

grep "#CHR" trioPE06.variantsOnly.targetsPad30.ug.vcf
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  Holmgren-excap2-PE-trio-D05-2963    Holmgren-excap2-PE-trio-Fre12-053   Holmgren-excap2-PE-trio-Fre12-055

And a PED file that looks like this:

% cat progressive_encephalopathies_run2.ped
PE06 Holmgren-excap2-PE-trio-D05-2963 0 0 2 2
PE06 Holmgren-excap2-PE-trio-Fre12-053 1 0 1 1
PE06 Holmgren-excap2-PE-trio-Fre12-055 0 1 2 1

And I try to phase by transmission but get the following error where the gatk has incorrectly inferred that there are 4 members in family PE06 when there are just 3:

INFO  11:49:41,527 PhaseByTransmission - Caution: Family PE06 has 4 members; At the moment Phase By Transmission only supports trios and parent/child pairs. Family skipped.

Any help with resolving this issue is much appreciated. I guess it could be something to do with my formatting of the PED file but I have been over it many times checking invisibles + more. The same file actually works if I comment out one of the parents and just run with a parent/child pair, but as soon as I have all lines in the PED file, it throws the same error.

The full output is as follows

[email protected]:/Volumes/jbod/proj_eirik_encephalon/130808_D00132_0045_BC272JACXX/Unaligned/130808_D00132.B.Project_Holmgren-excap2-2013-07-05
% java -Xmx2g -jar ${gatkDir} -T PhaseByTransmission -R ${ref} -V trio${trioName}.variantsOnly.targetsPad${padding}.ug.vcf -ped progressive_encephalopathies_run2.ped -o trio${trioName}.variantsOnly.targetsPad${padding}.ug.phased.vcf --MendelianViolationsFile trio${trioName}.variantsOnly.targetsPad${padding}.ug.phased.vcf_mendelianViolations
INFO  11:49:40,864 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  11:49:40,867 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.5-2-gf57256b, Compiled 2013/05/01 09:27:02 
INFO  11:49:40,867 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO  11:49:40,867 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
INFO  11:49:40,872 HelpFormatter - Program Args: -T PhaseByTransmission -R /Users/tim/home/PLATFORM/refData/b37/genomic/gatkBundle_2.5/human_g1k_v37_decoy.fasta -V trioPE06.variantsOnly.targetsPad30.ug.vcf -ped progressive_encephalopathies_run2.ped -o trioPE06.variantsOnly.targetsPad30.ug.phased.vcf --MendelianViolationsFile trioPE06.variantsOnly.targetsPad30.ug.phased.vcf_mendelianViolations 
INFO  11:49:40,873 HelpFormatter - Date/Time: 2013/10/02 11:49:40 
INFO  11:49:40,873 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  11:49:40,873 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  11:49:40,880 ArgumentTypeDescriptor - Dynamically determined type of trioPE06.variantsOnly.targetsPad30.ug.vcf to be VCF 
INFO  11:49:41,056 GenomeAnalysisEngine - Strictness is SILENT 
INFO  11:49:41,228 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
INFO  11:49:41,254 RMDTrackBuilder - Loading Tribble index from disk for file trioPE06.variantsOnly.targetsPad30.ug.vcf 
INFO  11:49:41,368 PedReader - Reading PED file progressive_encephalopathies_run2.ped with missing fields: [] 
INFO  11:49:41,441 PedReader - Phenotype is other? false 
INFO  11:49:41,507 GenomeAnalysisEngine - Creating shard strategy for 0 BAM files 
INFO  11:49:41,522 GenomeAnalysisEngine - Done creating shard strategy 
INFO  11:49:41,523 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
INFO  11:49:41,523 ProgressMeter -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining 
INFO  11:49:41,527 PhaseByTransmission - Caution: Family PE06 has 4 members; At the moment Phase By Transmission only supports trios and parent/child pairs. Family skipped. 
INFO  11:49:42,802 GATKRunReport - Uploaded run statistics report to AWS S3 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.5-2-gf57256b): 
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Bad input: No PED file passed or no trios found in PED file. Aborted.
##### ERROR ------------------------------------------------------------------------------------------

Best Answer

Answers

  • TimHughesTimHughes Member

    Great. Thanks a lot :) that solved it for me.

    Regarding the fact that my incorrect PED file will run if I comment out one of the parents, here is the detail in case it is of interest.

    Commenting out one of the parents in my incorrect file:

    cat progressive_encephalopathies_run2.ped
    PE06 Holmgren-excap2-PE-trio-D05-2963 0 0 2 2
    #PE06 Holmgren-excap2-PE-trio-Fre12-053 1 0 1 1
    PE06 Holmgren-excap2-PE-trio-Fre12-055 0 1 2 1
    

    But of course the run gives useless results:

    java -Xmx2g -jar ${gatkDir} -T PhaseByTransmission -R ${ref} -V trio${trioName}.variantsOnly.targetsPad${padding}.ug.vcf -ped progressive_encephalopathies_run2.ped -o trio${trioName}.variantsOnly.targetsPad${padding}.ug.phased.vcf --MendelianViolationsFile trio${trioName}.variantsOnly.targetsPad${padding}.ug.phased.vcf_mendelianViolations --pedigreeValidationType SILENT
    INFO  10:31:28,635 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  10:31:28,638 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.5-2-gf57256b, Compiled 2013/05/01 09:27:02 
    INFO  10:31:28,638 HelpFormatter - Copyright (c) 2010 The Broad Institute 
    INFO  10:31:28,638 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
    INFO  10:31:28,643 HelpFormatter - Program Args: -T PhaseByTransmission -R /Users/tim/home/PLATFORM/refData/b37/genomic/gatkBundle_2.5/human_g1k_v37_decoy.fasta -V trioPE06.variantsOnly.targetsPad30.ug.vcf -ped progressive_encephalopathies_run2.ped -o trioPE06.variantsOnly.targetsPad30.ug.phased.vcf --MendelianViolationsFile trioPE06.variantsOnly.targetsPad30.ug.phased.vcf_mendelianViolations --pedigreeValidationType SILENT 
    INFO  10:31:28,643 HelpFormatter - Date/Time: 2013/10/03 10:31:28 
    INFO  10:31:28,644 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  10:31:28,644 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  10:31:28,651 ArgumentTypeDescriptor - Dynamically determined type of trioPE06.variantsOnly.targetsPad30.ug.vcf to be VCF 
    INFO  10:31:28,825 GenomeAnalysisEngine - Strictness is SILENT 
    INFO  10:31:29,006 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
    INFO  10:31:29,031 RMDTrackBuilder - Loading Tribble index from disk for file trioPE06.variantsOnly.targetsPad30.ug.vcf 
    INFO  10:31:29,148 PedReader - Reading PED file progressive_encephalopathies_run2.ped with missing fields: [] 
    INFO  10:31:29,221 PedReader - Phenotype is other? false 
    INFO  10:31:29,287 GenomeAnalysisEngine - Creating shard strategy for 0 BAM files 
    INFO  10:31:29,302 GenomeAnalysisEngine - Done creating shard strategy 
    INFO  10:31:29,303 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
    INFO  10:31:29,303 ProgressMeter -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining 
    INFO  10:31:50,843 PhaseByTransmission - Number of complete trio-genotypes: 0 
    INFO  10:31:50,843 PhaseByTransmission - Number of trio-genotypes containing no call(s): 0 
    INFO  10:31:50,843 PhaseByTransmission - Number of trio-genotypes phased: 0 
    INFO  10:31:50,843 PhaseByTransmission - Number of resulting Het/Het/Het trios: 0 
    INFO  10:31:50,843 PhaseByTransmission - Number of remaining single mendelian violations in trios: 0 
    INFO  10:31:50,844 PhaseByTransmission - Number of remaining double mendelian violations in trios: 0 
    INFO  10:31:50,844 PhaseByTransmission - Number of complete pair-genotypes: 0 
    INFO  10:31:50,844 PhaseByTransmission - Number of pair-genotypes containing no call(s): 0 
    INFO  10:31:50,844 PhaseByTransmission - Number of pair-genotypes phased: 0 
    INFO  10:31:50,844 PhaseByTransmission - Number of resulting Het/Het pairs: 0 
    INFO  10:31:50,844 PhaseByTransmission - Number of remaining mendelian violations in pairs: 0 
    INFO  10:31:50,844 PhaseByTransmission - Number of genotypes updated: 0 
    INFO  10:31:53,619 ProgressMeter -            done        1.12e+05   24.0 s        3.6 m     97.6%        24.0 s     0.0 s 
    INFO  10:31:53,619 ProgressMeter - Total runtime 24.32 secs, 0.41 min, 0.01 hours 
    INFO  10:31:55,003 GATKRunReport - Uploaded run statistics report to AWS S3 
    
Sign In or Register to comment.