GATK GenomicsDBImport error : Duplicate field name AF found in vid attribute "fields"

emixaMemixaM FranceMember

Hello GATK team!

I am currently following your best practices for Mutect2 somatic calling. In the steps of creating a PoN, I got my normal samples' gVCF perfectly fine.

However, at the previous step of using CreateSomaticPanelOfNormals, I need to use GenomicsDBImport.

This step is not working and I am running out of ideas.

This is the command I used :

gatk --java-options "-XX:+UseParallelGC -XX:ParallelGCThreads=4 -Xmx14g -Xms14g"          
   -R Homo_sapiens_assembly38.fasta
   -V 1.vcf.gz -V 2.vcf.gz -V 3.vcf.gz -V 4.vcf.gz -V 5.vcf.gz -V 6.vcf.gz -V 7.vcf.gz -V 8.vcf.gz -V 9.vcf.gz -V 10.vcf.gz -V 11.vcf.gz -V 12.vcf.gz -V 13.vcf.gz -V 14.vcf.gz -V 15.vcf.gz -V 16.vcf.gz -V 17.vcf.gz -V 18.vcf.gz -V 19.vcf.gz -V 20.vcf.gz -V 21.vcf.gz -V 22.vcf.gz -V 23.vcf.gz -V 24.vcf.gz
   -L wgs_calling_regions.hg38.interval_list
   --genomicsdb-workspace-path pon_db

And the last lines of the log are :

16:04:31.323 INFO  ProgressMeter - Starting traversal
16:04:31.323 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Batches Processed   Batches/Minute
16:04:33.428 INFO  GenomicsDBImport - Importing batch 1 with 24 samples
Duplicate field name AF found in vid attribute "fields"
terminate called after throwing an instance of 'FileBasedVidMapperException'
  what():  FileBasedVidMapperException : Duplicate fields exist in vid attribute "fields"

It is true, if I look for AF occurences in one of the VCF header, I find :

$ zgrep "AF" 1.vcf.gz
##FORMAT=<ID=AF,Number=A,Type=Float,Description="Allele fractions of alternate alleles in the tumor">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
##INFO=<ID=POPAF,Number=A,Type=Float,Description="negative-log-10 population allele frequencies of alt alleles">

What is the issue according to you?

  • emixaMemixaM FranceMember

    Thanks a lot @amjadd !

    I experienced exactly the same : the complaint of GenomicsDBImport about MNPs, then the removing with SelectVariants. So I removed the SelectVariants step and used --max-mnp-distance 0 in the normal calling step of PoN creation and it worked beautifully.


  • RishabhRishabh IndiaMember
    > @amjadd said:
    > I got the same error when I used GenomicsDBImport on the output of SelectVariants. In the beginning, GenomicsDBImport was complaining about MNPs, so I removed them with SelectVariants and got this error. I finally got around it by using --max-mnp-distance 0 in Mutect2 while creating the panel

    can you please tell me the command you used as my plob is still not resolved
  • emixaMemixaM FranceMember

    @Rishabh see my comment above, it answers your question.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    @Rishabh let me know if that resolved the issue.

  • RishabhRishabh IndiaMember
    Its still not resolved. New error poped out

    java -jar /PathToGatk GenomicsDBImport -R Hg19_Reference.fa -L V6.bed --genomicsdb-workspace-path pon_db -V normal1.vcf.gz -V normal2.vcf.gz -V normal3.vcf.gz -V normal4.vcf.gz -V normal5.vcf.gz

    I ran this command it ran for 9 hours and after that this error poped (as pasted below) and my 500 gb is consumed in this process generating 500 gb tmp files. Can you help me out as to where i am wrong.

    terminate called after throwing an instance of 'GenomicsDBConfigException'
    what(): GenomicsDBConfigException : Syntax error in JSON file /tmp/loader_1078290248841257395.json
    Aborted (core dumped)

    real 506m47.078s
    user 363m31.020s
    sys 29m9.088s
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @Rishabh

    Can you please post the entire error log.

