Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GenotypeGVCF with gzipped gvcf files

Hi, can I combine zipped gvcf files using GenotypeGVCFs? I believe it makes an error, but I cannot use unzipped files due to space limit.
Are there any other method I can do?
I made a vcf file per sample and tried to merge those vcf files using vcftools, but failed.

Tagged:

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    It is technically feasible but the files have to be properly gzipped and tabix-indexed. What exactly did you try? Can you post the commands?

  • Kelly135Kelly135 KoreaMember
    edited January 2016

    I have bgzipped gvcf files and then tabixed them.
    bgzip A.g.vcf && tabix A.g.vcf.gz
    bgzip B.g.vcf && tabix B.g.vcf.gz

    Then tried genotypeGVCFs with following command.
    java -Xmx240g -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R genome.fa --variant gVCFlist.list --dbsnp dbsnp_132.hg19.vcf -o out.vcf

    And here is the error message.

    INFO 09:25:22,284 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.NumberFormatException: For input string: "G"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:492)
    at java.lang.Integer.parseInt(Integer.java:527)
    at htsjdk.tribble.readers.TabixReader.getIntv(TabixReader.java:292)
    at htsjdk.tribble.readers.TabixReader.access$400(TabixReader.java:46)
    at htsjdk.tribble.readers.TabixReader$IteratorImpl.next(TabixReader.java:394)
    at htsjdk.tribble.readers.TabixIteratorLineReader.readLine(TabixIteratorLineReader.java:43)
    at htsjdk.tribble.TabixFeatureReader$FeatureIterator.readNextRecord(TabixFeatureReader.java:161)
    at htsjdk.tribble.TabixFeatureReader$FeatureIterator.(TabixFeatureReader.java:149)
    at htsjdk.tribble.TabixFeatureReader.query(TabixFeatureReader.java:124)
    at org.broadinstitute.gatk.engine.refdata.tracks.RMDTrack.query(RMDTrack.java:119)
    at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedQueryDataPool.createIteratorFromResource(ReferenceOrd
    eredDataSource.java:241)
    at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedQueryDataPool.createIteratorFromResource(ReferenceOrd
    eredDataSource.java:185)
    at org.broadinstitute.gatk.engine.datasources.rmd.ResourcePool.iterator(ResourcePool.java:93)
    at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedDataSource.seek(ReferenceOrderedDataSource.java:168)
    at org.broadinstitute.gatk.engine.datasources.providers.RodLocusView.(RodLocusView.java:83)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.getLocusView(TraverseLociNano.java:129)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:80)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:319)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:107)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.3-0-g37228af):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: For input string: "G"
    ERROR ------------------------------------------------------------------------------------------

    I found this error has been discussed before and using gvcf instead of gvcf.gz is recommended. But due to storage limit, using unzipped gvcfs is unavailable. So I wonder if there's other method I could try. Thanks!

  • Kelly135Kelly135 KoreaMember
Sign In or Register to comment.