Complete this survey about your research needs and be entered to win an Amazon gift card or FireCloud credit.
Download the latest Picard release at https://github.com/broadinstitute/picard/releases.
GATK version 4.beta.5 is out. See the GATK4 beta page for download and details.

VariantsToBinaryPed and metaData file format?

M_TriebM_Trieb Member
edited August 2012 in Ask the GATK team

I have had trouble getting the VariantsToBinaryPed tool to work. Below is the error I have been getting. I am using a .vcf file produced by the UnifiedGenotyper and selected down to those variants present in one of the 11 people in the test file I am using (used SelectVariants to subset). I used VariantValidator to check the test file and there were no errors.

WARN 14:41:13,090 VCFStandardHeaderLines$Standards - Repairing standard header line for field AF because -- count types disagree; header has UNBOUNDED but standard is A INFO 14:41:13,808 GATKRunReport - Uploaded run statistics report to AWS S3
ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.ArrayIndexOutOfBoundsException: 1 at org.broadinstitute.sting.gatk.walkers.variantutils.VariantsToBinaryPed.initialize(VariantsToBinaryPed.java:116) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:47) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:269) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)
ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.0-36-gf5c1c1a):
ERROR
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: 1
ERROR ------------------------------------------------------------------------------------------

Also, can you post an example metaData file? I have tried reformating my .fam file a number of ways to see if that is a problem and haven't been to overcome this array out of bounds error.

Thank you

Answers

  • chartlchartl 7ccMember

    Hi there,

    Would you mind posting your command line, and the first few lines of your fam file?

    Thank you

  • Hello,

    The arrayOutOfBounds error has gone away. I changed the suffix on the file and this seems to be have fixed it.

    However, I am still not able to use the command! Below is the error output and attached are two different metaData files. I have noticed that it always gives me an error on the first sample from the vcf. This sample also always has the first character of the family ID column replaced with a different character in the GATK parsing report. I have pasted this output below. Look at column 1 for HapMap A, it is displayed as lapMap_A. I have tried this with different subsets of individuals and the first character is always misformated, often with non-ASCII characters. The command is at the top, and format for rows 1 and 6 of the metaData are Ok, they just pasted incorrectly.

    java -jar ~/Genomic_Tools/GenomeAnalysisTK-2.0-39-gd091f72/GenomeAnalysisTK.jar -T VariantsToBinaryPed -R ~/Genomic_Tools/Human_References/human_g1k_v37.fasta --variant HapMap_s.only.BIALLELIC.Ann.vcf --metaData HapMap_only_both.fam -mgq 0

    INFO 15:27:11,250 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 15:27:11,251 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.0-39-gd091f72, Compiled 2012/08/10 15:55:35
    INFO 15:27:11,251 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 15:27:11,251 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 15:27:11,252 HelpFormatter - Program Args: -T VariantsToBinaryPed -R /home/m-triebwasser/Genomic_Tools/Human_References/human_g1k_v37.fasta --variant HapMap_s.only.BIALLELIC.Ann.vcf --metaData HapMap_only_both.fam -mgq 0
    INFO 15:27:11,252 HelpFormatter - Date/Time: 2012/08/20 15:27:11
    INFO 15:27:11,252 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 15:27:11,252 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 15:27:11,256 ArgumentTypeDescriptor - Dynamically determined type of HapMap_s.only.BIALLELIC.Ann.vcf to be VCF
    INFO 15:27:11,262 GenomeAnalysisEngine - Strictness is SILENT
    INFO 15:27:11,315 RMDTrackBuilder - Loading Tribble index from disk for file HapMap_s.only.BIALLELIC.Ann.vcf
    lapMap_A HapMap_A 0 0 0 2
    HapMap_B HapMap_B 0 0 0 2
    HapMap_C HapMap_C 0 0 0 2
    HapMap_D HapMap_D 0 0 0 2
    HapMap_E HapMap_E 0 0 0 1
    HapMap_F HapMap_F 0 0 0 1
    HapMap_G HapMap_G 0 0 0 1
    HapMap_H HapMap_H 0 0 0 1
    INFO 15:27:11,965 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR A USER ERROR has occurred (version 2.0-39-gd091f72):
    ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ERROR Please do not post this error to the GATK forum
    ERROR
    ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: No metadata provided for sample HapMap_A
    ERROR ------------------------------------------------------------------------------------------
  • Hello, I thought I had attached this file to the previous comment. I pasted it instead. This is entitled: HapMap_only_both.fam

    HapMap_A HapMap_A 0 0 0 2
    HapMap_B HapMap_B 0 0 0 2
    HapMap_C HapMap_C 0 0 0 2
    HapMap_D HapMap_D 0 0 0 2
    HapMap_E HapMap_E 0 0 0 1
    HapMap_F HapMap_F 0 0 0 1
    HapMap_G HapMap_G 0 0 0 1
    HapMap_H HapMap_H 0 0 0 1

    It is as it appears in the above comment, except here the first character is an H versus an l.

    Thank you

  • chartlchartl 7ccMember

    Yes. That's the error I was expecting. I've pushed a fix into the walker and added a test for it.

    Best,

    Chris

  • Hello,

    I have tried 2.1-1 and 2.1-2. I am still unable to generate the binary ped file set using the VariantsToBinaryPed command. I am using the same files I refered to above and am getting the same error. The command and GATK output are here and the metaData file provided is below. The same error is returned if the HapMap name is in the first column, the second column (with a different number in each row for the first column), or in botht the first and second column. One thing I noticed first is that it no longer reprints the metaData file when initializing.

    Also, all older vcf files (including some of the ones in the GATK Resource Bundle) yield a warning about the VCF Header and say "repairing standard header filef or AF because --- count types disagree......"/ Is this normal?

    ######command

    java -jar ~/Genomic_Tools/GenomeAnalysisTK-2.1-2-g916702e/GenomeAnalysisTK.jar -T VariantsToBinaryPed -R ~/Genomic_Tools/Human_References/human_g1k_v37.fasta --variant HapMap_s.only.BIALLELIC.Ann.vcf --minGenotypeQuality 10 --metaData HapMap_only_famids.fam --dbsnp ~/Genomic_Tools/Human_References/b37_SNPs/dbsnp_135.b37.vcf --bed GATK_plink_files/HapMap.bed --bim GATK_plink_files/HapMap.bim --fam GATK_plink_files/HapMap.fam

    #####output

    INFO 14:43:55,848 HelpFormatter - --------------------------------------------------------------------------------
    INFO 14:43:55,850 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.1-2-g916702e, Compiled 2012/08/22 05:20:02
    INFO 14:43:55,850 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 14:43:55,850 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 14:43:55,850 HelpFormatter - Program Args: -T VariantsToBinaryPed -R /home/m-triebwasser/Genomic_Tools/Human_References/human_g1k_v37.fasta --variant HapMap_s.only.BIALLELIC.Ann.vcf --minGenotypeQuality 10 --metaData HapMap_only_famids.fam --dbsnp /home/m-triebwasser/Genomic_Tools/Human_References/b37_SNPs/dbsnp_135.b37.vcf --bed GATK_plink_files/HapMap.bed --bim GATK_plink_files/HapMap.bim --fam GATK_plink_files/HapMap.fam
    INFO 14:43:55,850 HelpFormatter - Date/Time: 2012/08/22 14:43:55
    INFO 14:43:55,850 HelpFormatter - --------------------------------------------------------------------------------
    INFO 14:43:55,851 HelpFormatter - --------------------------------------------------------------------------------
    INFO 14:43:55,854 ArgumentTypeDescriptor - Dynamically determined type of HapMap_s.only.BIALLELIC.Ann.vcf to be VCF
    INFO 14:43:55,856 ArgumentTypeDescriptor - Dynamically determined type of /home/m-triebwasser/Genomic_Tools/Human_References/b37_SNPs/dbsnp_135.b37.vcf to be VCF
    INFO 14:43:55,861 GenomeAnalysisEngine - Strictness is SILENT
    INFO 14:43:55,915 RMDTrackBuilder - Loading Tribble index from disk for file HapMap_s.only.BIALLELIC.Ann.vcf
    INFO 14:43:55,947 RMDTrackBuilder - Loading Tribble index from disk for file /home/m-triebwasser/Genomic_Tools/Human_References/b37_SNPs/dbsnp_135.b37.vcf
    WARN 14:43:56,009 VCFStandardHeaderLines$Standards - Repairing standard header line for field AF because -- count types disagree; header has UNBOUNDED but standard is A
    INFO 14:43:56,754 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR A USER ERROR has occurred (version 2.1-2-g916702e):
    ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ERROR Please do not post this error to the GATK forum
    ERROR
    ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: No metadata provided for sample HapMap_A
    ERROR ------------------------------------------------------------------------------------------
    ########## metaData file###########################3
  • chartlchartl 7ccMember
    edited August 2012

    Hi,

    Unfortunately I cannot reproduce the error. I pasted the .fam file into a file of the same name, and used a an empty VCF file with

    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HapMap_A HapMap_B HapMap_C HapMap_D HapMap_E HapMap_F HapMap_G HapMap_H

    as the last line. My run output is:

    INFO 14:11:49,140 HelpFormatter - --------------------------------------------------------------------------------

    INFO 14:11:49,143 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.1-4-ga58e2fb, Compiled 2012/08/22 10:14:23

    INFO 14:11:49,143 HelpFormatter - Copyright (c) 2010 The Broad Institute

    INFO 14:11:49,144 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk

    INFO 14:11:49,144 HelpFormatter - Program Args: -T VariantsToBinaryPed -mgq 10 -R /humgen/1kg/reference/human_g1k_v37.fasta -V HapMap.test.vcf -m HapMap_only_famids.fam -fam foo.fam -bed foo.bed -bim foo.bim

    INFO 14:11:49,144 HelpFormatter - Date/Time: 2012/08/23 14:11:49

    INFO 14:11:49,145 HelpFormatter - --------------------------------------------------------------------------------

    INFO 14:11:49,145 HelpFormatter - --------------------------------------------------------------------------------

    INFO 14:11:49,612 ArgumentTypeDescriptor - Dynamically determined type of HapMap.test.vcf to be VCF

    INFO 14:11:49,624 GenomeAnalysisEngine - Strictness is SILENT

    INFO 14:11:49,741 RMDTrackBuilder - Loading Tribble index from disk for file HapMap.test.vcf

    WARN 14:11:49,756 RMDTrackBuilder - Index file /humgen/gsa-hphome1/chartl/projects/gatk_testing/VariantsToBinaryPed/HapMap.test.vcf.idx is out of date (index older than input file), deleting and updating the index file

    INFO 14:11:49,758 RMDTrackBuilder - Creating Tribble index in memory for file HapMap.test.vcf

    INFO 14:11:49,802 IndexDictionaryUtils - Track /humgen/gsa-hphome1/chartl/projects/gatk_testing/VariantsToBinaryPed/HapMap.test.vcf
    doesn't have a sequence dictionary built in, skipping dictionary validation

    INFO 14:11:49,805 RMDTrackBuilder - Writing Tribble index to disk for file /humgen/gsa-hphome1/chartl/projects/gatk_testing/VariantsToBinaryPed/HapMap.test.vcf.idx

    INFO 14:11:50,328 VariantsToBinaryPed - 0 sites processed!

    INFO 14:11:50,328 VariantsToBinaryPed - Merging genotypes for HapMap_A

    INFO 14:11:50,329 VariantsToBinaryPed - Merging genotypes for HapMap_B

    INFO 14:11:50,329 VariantsToBinaryPed - Merging genotypes for HapMap_C

    INFO 14:11:50,329 VariantsToBinaryPed - Merging genotypes for HapMap_D

    INFO 14:11:50,330 VariantsToBinaryPed - Merging genotypes for HapMap_E

    INFO 14:11:50,330 VariantsToBinaryPed - Merging genotypes for HapMap_F

    INFO 14:11:50,330 VariantsToBinaryPed - Merging genotypes for HapMap_G

    INFO 14:11:50,330 VariantsToBinaryPed - Merging genotypes for HapMap_H

    INFO 14:11:50,332 TraversalEngine - Total runtime 0.39 secs, 0.01 min, 0.00 hours

    So I would suggest using the most recent unstable version. Sorry!

    Chris

  • I am having the same problem as @M_Trieb:

    ERROR MESSAGE: No metadata provided for sample 1

    I downloaded the latest version of GATK, version 2.1-8-g5efb575, and am still having the same issue. Because PLINK allows either tab or space delimited files I also tried converting my metaData file to space-delimited and when I did that, I got the array index out of bounds exception which was also mentioned above.

    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR stack trace 
    java.lang.ArrayIndexOutOfBoundsException: 1
          at org.broadinstitute.sting.gatk.walkers.variantutils.VariantsToBinaryPed.initialize(VariantsToBinaryPed.java:100)
          at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:47)
          at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:265)
          at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
          at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
          at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
          at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)
    

    To try to isolate the problem I created very simple test files for input. Here is the command line I used:

    java -Xmx8g -jar ~/bin/gatk-2.1-8/GenomeAnalysisTK.jar -R gatk-bundle-1.5-hg19/ucsc.hg19.fasta -T VariantsToBinaryPed --variant test.in.vcf --bed test.out.bed --bim test.out.bim --fam test.out.fam --metaData test.in.fam --dbsnp gatk-bundle-1.5-hg19/dbsnp_135.hg19.vcf --minGenotypeQuality 30 --majorAlleleFirst

    And here is the error I got:

    ERROR MESSAGE: No metadata provided for sample 1

    And attached are the test.in.vcf and test.in.fam files.

  • I am also receiving a similar error. Originally I received the "no metadata error" but I changed the input metadata file to be space delimited and that seemed to fix that issue. Now I am receiving the "ArrayIndexOutOfBounds" Error.

    Here is the command I am running:

    java -Xm2g -jar /data/homes/kamensh/GenomeAnalysisTK-2.1-8-g5efb575/GenomeAnalysisTK.jar \

    -R /data/phil/Ehringer_collab/genome/hg19.fasta \
    -T VariantsToBinaryPed \
    --variant UCB_1.priortarget.bwa.fixed.extracted.vcf \
    --bed UCB_1.ped \
    --bim UCB_1.map \
    --fam UCB_1.fam \
    --metaData Trial.fam \
    --minGenotypeQuality 0 \
    --majorAlleleFirst

    Here is the error information:

    INFO 15:26:28,641 HelpFormatter - --------------------------------------------------------------------------------
    INFO 15:26:28,642 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.1-8-g5efb575, Compiled 2012/08/30 14:22:17
    INFO 15:26:28,642 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 15:26:28,642 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 15:26:28,643 HelpFormatter - Program Args: -R /data/phil/Ehringer_collab/genome/hg19.fasta -T VariantsToBinaryPed --variant UCB_1.priortarget.bwa.fixed.extracted.vcf --bed UCB_1.ped --bim UCB_1.map --fam UCB_1.fam --metaData Trial.fam --minGenotypeQuality 0 --majorAlleleFirst
    INFO 15:26:28,643 HelpFormatter - Date/Time: 2012/09/19 15:26:28
    INFO 15:26:28,643 HelpFormatter - --------------------------------------------------------------------------------
    INFO 15:26:28,643 HelpFormatter - --------------------------------------------------------------------------------
    INFO 15:26:28,647 ArgumentTypeDescriptor - Dynamically determined type of UCB_1.priortarget.bwa.fixed.extracted.vcf to be VCF
    INFO 15:26:28,655 GenomeAnalysisEngine - Strictness is SILENT
    INFO 15:26:28,780 RMDTrackBuilder - Loading Tribble index from disk for file UCB_1.priortarget.bwa.fixed.extracted.vcf
    WARN 15:26:28,819 VCFStandardHeaderLines$Standards - Repairing standard header line for field GQ because -- type disagree; header has Float but standard is Integer
    INFO 15:26:30,476 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.ArrayIndexOutOfBoundsException: 1
    at org.broadinstitute.sting.gatk.walkers.variantutils.VariantsToBinaryPed.initialize(VariantsToBinaryPed.java:100)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:47)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:265)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.1-8-g5efb575):
    ERROR
    ERROR Please visit the wiki to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: 1
    ERROR ------------------------------------------------------------------------------------------

    Here is my metadata file:
    UCB_1 UCB_1 0 0 1 1

    Any help would be greatly appreciated!

Sign In or Register to comment.