# Wrong number of fields in PED files in PhaseByTransmission

Hello,all

while using the walker PhaseByTransmission I always get this error:

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.1-12-ga99c19d):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR
##### ERROR MESSAGE: File associated with name java.io.FileReader@5cf7c5b5 is malformed: Bad PED line 1: wrong number of fields
##### ERROR ------------------------------------------------------------------------------------------


my conmmand is :

java -jar GenomeAnalysisTK-2.1-12-ga99c19d/GenomeAnalysisTK.jar -T  PhaseByTransmission -R GRCh37.fasta -V trios_457.chr22.vcf -ped trios_457.chr22.ped -pedValidationType SILENT -o o1.vcf


and my ped file is like this:

fam1    s_4     0       0       1       1       C       C       C       C       G       G
fam1    s_5     0       0       2       2       T       T       T       T       G       G
fam1    s_7     s_4     s_5     2       2       C       T       C       T       G       G


I do counted my vcf ped and map files and the result is:

-bash-4.1$head -1 trios_457.chr22.ped |wc -w 1892 #( 6 columns for info + 943*2 columns for alleles ) -bash-4.1$ wc -l trios_457.chr22.map
943
-bash-4.1$grep -v "#" trios_457.chr22.vcf | wc -l 943  My question is what's wrong with my my PED line? Tagged: ## Best Answers • Member Posts: 5 edited January 2013 Answer ✓ Thank you, Geraldine! I've solved the problem. For the first question, I checked the source code in Github and found it is because this PED file is not the PED file in PLINK. It contains only the first 6 columns in a PLINK format PED file, and no alleles, like a FAM file in PLINK. So I suggest the team to make that clear in the work-flows For my second question, that's because of the mismatch of @RGs in my BAM header and my RG tags in the records. Thanks for your help! ## Answers • Member Posts: 5 edited January 2013 Besides, I created my ped file by using vcftools vcftools --vcf trios_457.chr22.vcf --plink --out trios_457.chr22.ped I changed some columns so the PED file like this: fam1 s_4 0 0 1 1 C C C C G G …… fam1 s_5 0 0 2 2 T T T T G G ....... fam1 s_7 s_4 s_5 2 2 C T C T G G ...... • Administrator, Dev Posts: 11,029 admin Hi there, could you please try again with the latest version of GATK? This may be a bug that was fixed since 2.1. Geraldine Van der Auwera, PhD • Member Posts: 5 Oh I tried again with 2.3-5, and the error message is the same. Is there anything wrong with my PED file and can I generate that using vcftools? • Member Posts: 5 In addition Geraldine, I met with another problem using ReadBackedPhasing. Did it mean that there are some problems in my vcf file? java -jar GenomeAnalysisTK-2.3-5-g49ed93c/GenomeAnalysisTK.jar -T ReadBackedPhasing -R GRCh37.fasta -I 457.sort.bam --variant trios_457.all.vcf -L trios_457.all.vcf -o RBPphased_all.vcf --phaseQualityThresh 20.0 ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR stack trace java.lang.NoClassDefFoundError: com/sun/javadoc/ProgramElementDoc at org.broadinstitute.sting.utils.exceptions.UserException$ReadMissingReadGroup.(UserException.java:281)
at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
... 16 more

##### ERROR ------------------------------------------------------------------------------------------



Thank you, Geraldine! I've solved the problem.
For the first question, I checked the source code in Github and found it is because this PED file is not the PED file in PLINK. It contains only the first 6 columns in a PLINK format PED file, and no alleles, like a FAM file in PLINK. So I suggest the team to make that clear in the work-flows

For my second question, that's because of the mismatch of @RGs in my BAM header and my RG tags in the records. Thanks for your help!

Thanks for reporting your solution; I'll update the documentation accordingly.

Geraldine Van der Auwera, PhD

Hi to all

I got the same error msg. But i tried with the above solutions. But i am getting error msg.

same as like above, i created the ped file with VCFtools and i used the same

Still the error status is same..

Is there any alternative solutions ....

Hi Murthi,

Geraldine Van der Auwera, PhD

M41 G6 0 0 1 1 Korean
M41 G5 0 0 2 1 Korean
M41 G4 G6 G5 1 2 Korean

The file created by VCFtools

G4.variant G4.variant 0 0 0 0 0 0 G A C T A G A G 0 0......
G5.variant2 G5.variant2 0 0 0 0 0 0 0 0 C T 0 0 0 0 AC A ......
G6.variant3 G6.variant3 0 0 0 0 C C 0 0 C T 0 0 A G 0 0........

