Include VCF filename in the error message when VCF file format is not recognized

mmokrejsmmokrejs Czech RepublicMember

Hi,
run RealignerTargetCreator but using multiple '-known file.vcf' arguments. One of the files is causing a problem. It is not very helpful message anyway:

GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar -T RealignerTargetCreator --num_threads 4 --num_cpu_threads_per_data_thread 4 -R ftp.broadinstitute.org/bundle/hg38/hg38bundle/Homo_sapiens_assembly38.fasta -I normal.bam -I cancer.bam -known ftp.broadinstitute.org/bundle/hg38/hg38bundle/Homo_sapiens_assembly38.known_indels.vcf -known ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/other_mapping_resources/ALL_20141222.dbSNP142_human_GRCh38.snps.vcf -known ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/other_mapping_resources/Mills_and_1000G_gold_standard.indels.b38.primary_assembly.vcf -known ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/other_mapping_resources/ALL.wgs.1000G_phase3.GRCh38.ncbi_remapper.20150424.shapeit2_indels.vcf -known ftp.ncbi.nih.gov/snp/organisms/human_9606_b147_GRCh38p2/VCF/GATK/common_all_20160527.vcf -known ussd-ftp.illumina.com/2016-1.0/hg38/small_variants/NA12877/NA12877.vcf -known ussd-ftp.illumina.com/2016-1.0/hg38/small_variants/NA12878/NA12878.vcf -o sample.forIndelRealigner.intervals

...

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 3.6-0-g89b7209):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://www.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: Your input file has a malformed header: Unexpected tag POS in line <ID=POS=POS-1,Number=0,Type=Flag,Description="POS has been adjusted due to missing REF in NCBI VCF file">
ERROR ------------------------------------------------------------------------------------------

Further, the documentation at https://software.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_indels_IndelRealigner.php does not say that the GVF files cannot be used. I thought that I could have included them as well as they also contain annotated indels:

For GRCh38 I think of:
ftp.ensembl.org/pub/release-85/variation/gvf/homo_sapiens/Homo_sapiens_structural_variations.gvf.gz
ftp.ensembl.org/pub/release-85/variation/gvf/homo_sapiens/Homo_sapiens.gvf.gz

For hg19 I would think of:
ftp.ensembl.org/pub/release-75/variation/gvf/homo_sapiens/1000GENOMES-phase_1_EUR.gvf.gz
ftp.ensembl.org/pub/release-75/variation/gvf/homo_sapiens/Homo_sapiens_structural_variations.gvf.gz
ftp.ensembl.org/pub/release-75/variation/gvf/homo_sapiens/Homo_sapiens.gvf.gz

I admit I am not familiar with their contents, but I thought GATK's RealignerTargetCreator, IndelRealigner and BaseRecalibrator will overcome eventual redundancy in their contents and pick lines each of them wants.

Thank you for your thoughts,
Martin

Answers

  • mmokrejsmmokrejs Czech RepublicMember
    edited August 2016

    Just in case GVF or gVCF files are supported, maybe this error message generating code should be updated as well?

    java -jar GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar -T RealignerTargetCreator ...

    ERROR ------------------------------------------------------------------------------------------
    ERROR A USER ERROR has occurred (version 3.6-0-g89b7209):
    ERROR
    ERROR This means that one or more arguments or inputs in your command are incorrect.
    ERROR The error message below tells you what is the problem.
    ERROR
    ERROR If the problem is an invalid argument, please check the online documentation guide
    ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
    ERROR
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions https://www.broadinstitute.org/gatk
    ERROR
    ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
    ERROR
    ERROR MESSAGE: Invalid command line: No tribble type was provided on the command line and the type of the file 'ftp.ensembl.org/pub/release-85/variation/gvf/homo_sapiens/Homo_sapiens_structural_variations.gvf.gz' could not be determined dynamically. Please add an explicit type tag :NAME listing the correct type from among the supported types:
    ERROR Name FeatureType Documentation
    ERROR BCF2 VariantContext (this is an external codec and is not documented within GATK)
    ERROR VCF VariantContext (this is an external codec and is not documented within GATK)
    ERROR VCF3 VariantContext (this is an external codec and is not documented within GATK)
    ERROR ------------------------------------------------------------------------------------------
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Sorry, GVF is not supported.
  • mmokrejsmmokrejs Czech RepublicMember

    Here are the files causing the error reported initially, probably because they are based in the GRCh naming nomenclature?

    --knownSites ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/other_mapping_resources/ALL.wgs.1000G_phase3.GRCh38.ncbi_remapper.20150424.shapeit2_indels.vcf

    --knownSites ftp.broadinstitute.org/bundle/hg38/hg38bundle/dbsnp_144.hg38.vcf
    --knownSites ussd-ftp.illumina.com/2016-1.0/hg38/small_variants/NA12877/NA12877.vcf
    --knownSites ussd-ftp.illumina.com/2016-1.0/hg38/small_variants/NA12878/NA12878.vcf
    --knownSites ftp.ensembl.org/pub/release-85/variation/gvf/homo_sapiens/Homo_sapiens.gvf.gz
    --knownSites ftp.ensembl.org/pub/release-85/variation/gvf/homo_sapiens/Homo_sapiens_structural_variations.gvf.gz

    FYI, here is what works:

    reference_flatfile="ftp.broadinstitute.org/bundle/hg38/hg38bundle/Homo_sapiens_assembly38.fasta"

    known_indel_calls="-known ftp.broadinstitute.org/bundle/hg38/hg38bundle/Homo_sapiens_assembly38.known_indels.vcf -known ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/other_mapping_resources/Mills_and_1000G_gold_standard.indels.b38.primary_assembly.vcf"

    known_sites="--knownSites ftp.broadinstitute.org/bundle/hg38/hg38bundle/dbsnp_144.hg38.Broad-style.vcf --knownSites ussd-ftp.illumina.com/2016-1.0/hg38/small_variants/NA12877/NA12877.vcf --knownSites ussd-ftp.illumina.com/2016-1.0/hg38/small_variants/NA12878/NA12878.vcf --knownSites ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/other_mapping_resources/Mills_and_1000G_gold_standard.indels.b38.primary_assembly.vcf"

Sign In or Register to comment.