US Holiday notice: this Thursday and Friday (Nov 25-26) the forum will be unattended. Normal service will resume Monday Nov 29. Happy Thanksgiving!

Wrong number of fields in PED files in PhaseByTransmission

huboqianghuboqiang Posts: 5Member
edited January 2013 in Ask the GATK team

Hello,all

while using the walker PhaseByTransmission I always get this error:

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.1-12-ga99c19d): 
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: File associated with name java.io.FileReader@5cf7c5b5 is malformed: Bad PED line 1: wrong number of fields
##### ERROR ------------------------------------------------------------------------------------------

my conmmand is :

java -jar GenomeAnalysisTK-2.1-12-ga99c19d/GenomeAnalysisTK.jar -T  PhaseByTransmission -R GRCh37.fasta -V trios_457.chr22.vcf -ped trios_457.chr22.ped -pedValidationType SILENT -o o1.vcf

and my ped file is like this:

fam1    s_4     0       0       1       1       C       C       C       C       G       G
fam1    s_5     0       0       2       2       T       T       T       T       G       G
fam1    s_7     s_4     s_5     2       2       C       T       C       T       G       G

I do counted my vcf ped and map files and the result is:

-bash-4.1$ head -1 trios_457.chr22.ped |wc -w
1892         #( 6 columns for info + 943*2 columns for alleles )
-bash-4.1$ wc -l trios_457.chr22.map 
943 
-bash-4.1$ grep -v "#" trios_457.chr22.vcf | wc -l
943

My question is what's wrong with my my PED line?

Post edited by Geraldine_VdAuwera on

Best Answers

  • huboqianghuboqiang Posts: 5Member
    edited January 2013 Answer ✓

    Thank you, Geraldine! I've solved the problem. For the first question, I checked the source code in Github and found it is because this PED file is not the PED file in PLINK. It contains only the first 6 columns in a PLINK format PED file, and no alleles, like a FAM file in PLINK. So I suggest the team to make that clear in the work-flows

    For my second question, that's because of the mismatch of @RGs in my BAM header and my RG tags in the records. Thanks for your help!

    Post edited by huboqiang on

Answers

  • huboqianghuboqiang Posts: 5Member
    edited January 2013

    Besides, I created my ped file by using vcftools

    vcftools --vcf trios_457.chr22.vcf --plink --out trios_457.chr22.ped

    I changed some columns so the PED file like this:

    fam1 s_4 0 0 1 1 C C C C G G ……

    fam1 s_5 0 0 2 2 T T T T G G .......

    fam1 s_7 s_4 s_5 2 2 C T C T G G ......

    Post edited by huboqiang on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,682Administrator, GATK Developer admin

    Hi there, could you please try again with the latest version of GATK? This may be a bug that was fixed since 2.1.

    Geraldine Van der Auwera, PhD

  • huboqianghuboqiang Posts: 5Member

    Oh I tried again with 2.3-5, and the error message is the same. Is there anything wrong with my PED file and can I generate that using vcftools?

  • huboqianghuboqiang Posts: 5Member

    In addition Geraldine, I met with another problem using ReadBackedPhasing. Did it mean that there are some problems in my vcf file?

    java -jar GenomeAnalysisTK-2.3-5-g49ed93c/GenomeAnalysisTK.jar -T ReadBackedPhasing -R GRCh37.fasta -I 457.sort.bam --variant trios_457.all.vcf -L trios_457.all.vcf -o RBPphased_all.vcf --phaseQualityThresh 20.0

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.NoClassDefFoundError: com/sun/javadoc/ProgramElementDoc at org.broadinstitute.sting.utils.exceptions.UserException$ReadMissingReadGroup.(UserException.java:281) at org.broadinstitute.sting.gatk.filters.MalformedReadFilter.checkHasReadGroup(MalformedReadFilter.java:63) at org.broadinstitute.sting.gatk.filters.MalformedReadFilter.filterOut(MalformedReadFilter.java:53) at org.broadinstitute.sting.gatk.filters.CountingFilteringIterator.getNextRecord(CountingFilteringIterator.java:110) at org.broadinstitute.sting.gatk.filters.CountingFilteringIterator.(CountingFilteringIterator.java:58) at org.broadinstitute.sting.gatk.datasources.reads.SAMDataSource.applyDecoratingIterators(SAMDataSource.java:718) at org.broadinstitute.sting.gatk.datasources.reads.SAMDataSource.getIterator(SAMDataSource.java:633) at org.broadinstitute.sting.gatk.datasources.reads.SAMDataSource.getIterator(SAMDataSource.java:587) at org.broadinstitute.sting.gatk.datasources.reads.SAMDataSource.seek(SAMDataSource.java:532) at org.broadinstitute.sting.gatk.executive.MicroScheduler.getReadIterator(MicroScheduler.java:315) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:70) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:281) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:237) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:147) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91) Caused by: java.lang.ClassNotFoundException: com.sun.javadoc.ProgramElementDoc at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) ... 16 more

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.3-5-g49ed93c):
    ERROR
    ERROR Please visit the wiki to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: com/sun/javadoc/ProgramElementDoc
    ERROR ------------------------------------------------------------------------------------------

    Thanks!

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,682Administrator, GATK Developer admin

    Thanks for reporting your solution; I'll update the documentation accordingly.

    Geraldine Van der Auwera, PhD

  • MurthiMurthi SouthKoreaPosts: 2Member

    Hi to all

    I got the same error msg. But i tried with the above solutions. But i am getting error msg.

    same as like above, i created the ped file with VCFtools and i used the same

    Still the error status is same..

    Is there any alternative solutions ....

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,682Administrator, GATK Developer admin

    Hi Murthi,

    Can you please post the contents of your PED file?

    Geraldine Van der Auwera, PhD

  • MurthiMurthi SouthKoreaPosts: 2Member
    edited September 2013

    M41 G6 0 0 1 1 Korean
    M41 G5 0 0 2 1 Korean
    M41 G4 G6 G5 1 2 Korean

    The file created by VCFtools

    G4.variant G4.variant 0 0 0 0 0 0 G A C T A G A G 0 0......
    G5.variant2 G5.variant2 0 0 0 0 0 0 0 0 C T 0 0 0 0 AC A ......
    G6.variant3 G6.variant3 0 0 0 0 C C 0 0 C T 0 0 A G 0 0........

    Post edited by Geraldine_VdAuwera on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,682Administrator, GATK Developer admin

    Hi Murthi,

    Actually the PED file to use here is not exactly the same format as the PLINK file. For this the PED should contain only the first 6 columns in a PLINK format PED file, and no alleles, like a FAM file in PLINK. See @huboqiang's posts earlier in this thread.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.