Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GenotypeGVCFs Error:

ChayanChayan Lund University, SwedenMember
edited January 2017 in Ask the GATK team

Hi, I was trying to run GenotypeGVCFs with my data as follwing (version GATK 3.7):

java -jar /var/bin/GenomeAnalysisTK.jar -T GenotypeGVCFs -R RefGenome.fa --variant a-45.g.vcf --variant a-8.g.vcf --variant e-18.g.vcf --variant e-55.g.vcf --variant e-69.g.vcf --variant e-854.g.vcf --variant e-98.g.vcf -stand_call_conf 30 -o 7samples.vcf

And I got an error like this:

The list of input alleles must contain as an allele but that is not the case at position 4988; please use the Haplotype Caller with gVCF output to generate appropriate records

But I have produced the *g.vcf files using the Haplotype Caller. Kindly correct me if I am wrong or give me an solution. Thank you.

Tagged:

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    What was the command line you used to produce those files? Did you use the -ERC argument?
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Ah, there's your problem. Have a look at our best practices documentation; you'll see it specifies that you need to invoke HaplotypeCaller in "GVCF mode", with "-ERC GVCF". Otherwise what you produce is just a regular vcf.
  • ChayanChayan Lund University, SwedenMember

    Thank you ... :) I am trying according your suggestion.

  • evetcevetc Member
    Hello @Geraldine_VdAuwera ,

    I am very new to GATK and I seem to be recieving this same error.. although I have specified "-ERC GVCF" in my HaplotypeCaller as below (the --annotation part is commented out as this causes another error...):
    ```
    for f in *.bam;
    do
    samtools index $f
    name=$(echo "$f" | cut -f 1 -d '.');
    #echo $name.g.cg.gz
    time ~/bin/gatk/gatk --java-options "-Xmx4g" HaplotypeCaller\
    -R ~/Pararge_aegeria/PA_genome_CW/index/Pararge_aegeria_v2.softmasked.fa\
    -I $f\
    -O $name.g.vcf \
    --create-output-variant-index\
    #--annotation MappingQualityRankSumTest \
    #--annotation StrandBiasBySample \
    #--annotation QualByDepth \
    #--annotation ReadPosRankSumTest \
    #--annotation RMSMappingQuality \
    #--annotation FisherStrand \
    #--annotation Coverage \
    -ERC GVCF
    done
    ```
    I then ran GenomicsBDImport like this:
    ```
    samples=$(find . | sed 's/.\///' | grep -E 'g.vcf$' | sed 's/^/--variant /')
    ~/bin/gatk/gatk --java-options "-Xmx36G" GenomicsDBImport \
    $(echo $samples)\
    --genomicsdb-workspace-path my_database1 \
    --intervals ~/Pararge_aegeria/PA_genome_CW/index/Pararge_aegeria_v2.softmasked.fa.bed
    ```

    When I run:
    ```
    ~/bin/gatk/gatk GenotypeGVCFs \
    -R ~/Pararge_aegeria/PA_genome_CW/index/Pararge_aegeria_v2.softmasked.fa \
    -V gendb://my_database1 \
    -G StandardAnnotation -new-qual \
    -O test_output.vcf
    ````

    It outputs this error as well as many of this warning:
    ```
    WARNING: No valid combination operation found for INFO field AC - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field AF - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field AN - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field FS - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field QD - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field SOR - the field will NOT be part of INFO fields in the generated VCF records
    ***********************************************************************

    A USER ERROR has occurred: The list of input alleles must contain <NON_REF> as an allele but that is not the case at position 1182; please use the Haplotype Caller with gVCF output to generate appropriate records

    ***********************************************************************
    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
    ```

    Any ideas??
    Thanks,
    Eve
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited June 24

    Hi @evetc

    Can you please try to use -ERC GVCF before the commented --annotation arguments and see if you get the same error?

    PS: Checkout Terra for end-to-end GATK pipelining solutions and let us know what more pipelines we can add that will make using GATK easier for you! For more details on whether this is the right fit for you checkout our blog page.

  • evetcevetc Member
    Hi @bhanuGandham

    Ok will give that a go no and let you know.
    Thank you
    Eve
  • evetcevetc Member
    Appologies for my delay in response. I have been doing fieldwork with limited internet.
    I have run:
    ```
    for f in *.bam;
    do
    samtools index $f
    name=$(echo "$f" | cut -f 1 -d '.');
    #echo $name.g.cg.gz
    time ~/bin/gatk/gatk --java-options "-Xmx4g" HaplotypeCaller\
    -R ~/Pararge_aegeria/PA_genome_CW/index/Pararge_aegeria_v2.softmasked.fa\
    -I $f\
    -O $name.g.vcf \
    --create-output-variant-index\
    -ERC GVCF
    done
    ```

    However whilst it is running I recieve a lot of these notifications:
    ```WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null```

    What does this mean? Sometimes I recieve this notifcation when I run my test samples but other times I have not?

    I carried on regardless to see if I get the same error as I did previously:
    I ran GenomicsBDImport using the same code as previous (although this time calling the database "my_database2").

    GenotypeGCVFs still produced these warnings:
    ```
    WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
    ```

    and at end the output is this, although does not have the error I had previously.
    ```
    18:37:30.357 INFO ProgressMeter - Traversal complete. Processed 1911228 total variants in 466.0 minutes.
    18:37:30.511 INFO GenotypeGVCFs - Shutting down engine
    [30 June 2019 18:37:30 BST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 510.00 minutes.
    ```
    What do all these warnings mean?
    Thank you,
    Eve
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited July 1

    Hi @evetc

    Warnings are just to inform you that for some variants a particular annotation cannot be calculated and maybe missing. This is however not a cause for concern. You can ignore that WARN message. As long as the tool ran to completion, you should be fine.

  • evetcevetc Member
    Ok thank you @bhanuGandham. Do you know why they might not be able to be calculated? Is it due to the low number of individuals (currently 3) I am testing it on?
Sign In or Register to comment.