Picard Sort Vcf Error

Hello.

I am using GATK version 3.6, picard-2.8.2.jar

I downloaded hapmap_3.3.hg38.vcf from gatk resource bundle. I then used the below command to remove chr notation.
awk '{gsub(/^chr/,""); print}' hapmap_3.3.hg38.vcf > no_chr_hapmap_3.3.hg38.vcf.vcf

Before (hapmap_3.3.hg38.vcf)
chr1 2242065 rs263526 T C . PASS AC=724;AF=0.259;AN=2792
chr1 2242417 rs16824926 C . . PASS AN=530
chr1 2242880 rs11581436 A . . PASS AN=540

After (no_chr_hapmap_3.3.hg38.vcf.vcf)
1 6421563 rs4908891 G A . PASS AC=1086;AF=0.389;AN=2792
1 6421782 rs4908892 A G . PASS AC=1692;AF=0.606;AN=2792
1 6421856 rs12078257 T C . PASS AC=368;AF=0.132;AN=2790

Then, use Picard SortVcf to sort the no_chr_hapmap_3.3.hg38.vcf.vcf
java -jar picard-2.8.2.jar SortVcf I=removedChr_HapMap.vcf O=sortedHapMap.vcf SEQUENCE_DICTIONARY=hg38.dict

hg38.dict
@SQ SN:1 LN:248956422 UR:file:/media/ubuntu/Elements/TOOL/hg38.fa M5:2648ae1bacce4ec4b6cf337dcae37816
@SQ SN:10 LN:133797422 UR:file:/media/ubuntu/Elements/TOOL/hg38.fa M5:907112d17fcb73bcab1ed1c72b97ce68
@SQ SN:11 LN:135086622 UR:file:/media/ubuntu/Elements/TOOL/hg38.fa M5:1511375dc2dd1b633af8cf439ae90cec
@SQ SN:12 LN:133275309 UR:file:/media/ubuntu/Elements/TOOL/hg38.fa M5:e81e16d3f44337034695a29b97708fce

I have then encountered this error:

Exception in thread "main" java.lang.IllegalArgumentException: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=chr1,length=248956422,dict_index=0,assembly=20) was found when SAMSequenceRecord(name=1,length=248956422,dict_index=0,assembly=null) was expected.
at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:126)
at picard.vcf.SortVcf.doWork(SortVcf.java:95)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)
Caused by: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=chr1,length=248956422,dict_index=0,assembly=20) was found when SAMSequenceRecord(name=1,length=248956422,dict_index=0,assembly=null) was expected.
at htsjdk.samtools.SAMSequenceDictionary.assertSameDictionary(SAMSequenceDictionary.java:170)
at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:124)
... 4 more

I have tried a lot of times but still getting back the same error. Kindly do advise how can I solve this problem.

I would then like to perform SelectVariants to extract variants that missed in HapMap but present in my dataset.

Thank you so much in advance.

Cheers,
Moon

Tagged:

Issue · Github
by Sheila

Issue Number
1678
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
chandrans

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Did you update the index file of your vcf after editing it to remove chr? If not, please do so and see if the error persists.
  • ymoonymoon malaysiaMember

    @Geraldine_VdAuwera Good day and thanks much for the suggestion.

    I have used IGVtools to index the removedChr_HapMap, generated new vcf.idx. However, the error persists.

    Exception in thread "main" java.lang.IllegalArgumentException: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=chr1,length=248956422,dict_index=0,assembly=20) was found when SAMSequenceRecord(name=1,length=248956422,dict_index=0,assembly=null) was expected.
    at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:126)
    at picard.vcf.SortVcf.doWork(SortVcf.java:95)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)
    Caused by: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=chr1,length=248956422,dict_index=0,assembly=20) was found when SAMSequenceRecord(name=1,length=248956422,dict_index=0,assembly=null) was expected.
    at htsjdk.samtools.SAMSequenceDictionary.assertSameDictionary(SAMSequenceDictionary.java:170)
    at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:124)
    ... 4 more
    Thank you.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Ah, the assembly property seems to be different. You'll need to edit it to match (replace null by 20 in the header).

  • rturbarturba Member
    edited February 13

    Hello there! I am also getting a similar error, though I don't know how to fix it. I got the VCF file from a collaborator and I'm running with a reference that I downloaded and indexed, but they are supposed to be exactly the same reference.

        Exception in thread "main" java.lang.IllegalArgumentException: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=chrUn,length=62550211,dict_index=5,assembly=null) was found when SAMSequenceRecord(name=chrM,length=15742,dict_index=5,assembly=null) was expected.
                at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:127)
                at picard.vcf.SortVcf.doWork(SortVcf.java:96)
                at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268)
                at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
                at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)
        Caused by: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=chrUn,length=62550211,dict_index=5,assembly=null) was found when SAMSequenceRecord(name=chrM,length=15742,dict_index=5,assembly=null) was expected.
                at htsjdk.samtools.SAMSequenceDictionary.assertSameDictionary(SAMSequenceDictionary.java:169)
                at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:125)
                ... 4 more
    

    I mean, isn't the purpose of the SortVcf to reorganize the VCF file according to the reference? Why is it complaining that my dictionaries are different? (I apologize in advance for the very newbie question).

    I was first trying to run SelectVariants in a vcf.gz file, and I got the following error:

        IndexDictionaryUtils - Track variant doesn't have a sequence dictionary built in, skipping dictionary validation!
    

    I decided to run again with an uncompressed VCF file to see if I still got the same error message, but I got a different warning:

            Input files /u/flashscratch/flashscratch2/r/rturba/stickleback/sorel_data/first12/./stickleback12.filtered.vcf and reference have incompatible contigs. Please see https://software.broadinstitute.org/gatk/documentation/article?id=63 for more information. Error details: The contig order in /u/flashscratch/flashscratch2/r/rturba/stickleback/sorel_data/first12/./stickleback12.filtered.vcf and reference is not the same; to fix this please see: (https://www.broadinstitute.org/gatk/guide/article?id=1328),  which describes reordering contigs in BAM and VCF files..
            ##### ERROR   /u/flashscratch/flashscratch2/r/rturba/stickleback/sorel_data/first12/./stickleback12.filtered.vcf contigs = [chrI, chrII, chrIII, chrIV, chrIX, chrUn, chrV, chrVI, chrVII, chrVIII, chrX, chrXI, chrXII, chrXIII, chrXIV, chrXIX, chrXV, chrXVI, chrXVII, chrXVIII, chrXX, chrXXI, chrM]
            ##### ERROR   reference contigs = [chrI, chrII, chrIII, chrIV, chrIX, chrM, chrUn, chrV, chrVI, chrVII, chrVIII, chrX, chrXI, chrXII, chrXIII, chrXIV, chrXIX, chrXV, chrXVI, chrXVII, chrXVIII, chrXX, chrXXI]
    

    Sorry this is so long, but I was trying to contextualize the problem, to see if maybe it was something I did wrong in the previous steps.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @rturba
    Hi,

    Are you using the latest version of Picard? Can you also try deleting the VCF index and re-generating it?

    Thanks,
    Sheila

  • I'm using Picard v.2.13.2. I tried creating a new index at IGV and running the uncompressed version of the file as well, and I still get the same type of error.

  • I downloaded the newer version and I'm still getting the same error :(

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @rturba
    Hi,

    Can you post the VCF header that shows the contigs and the FASTA .dict file? It looks like there may be a slight difference in contig length that may be causing this issue.

    -Sheila

  • rturbarturba Member

    Sure! Thank you - and sorry for the late reply.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @rturba
    Hi,

    I am not sure what is going on here. The lengths are all the same, so that was not the issue. Can you submit a bug report? Instructions are here.

    Thanks,
    Sheila

  • rturbarturba Member

    I just thought about this. I'm trying to sort a joined VCF file, the one with all the samples added together. Do you think this might be the problem? Should I use the sortVcf only for VCF files of single samples?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @rturba
    Hi,

    I don't think that should make a difference, but you can try and get back to us.

    Thanks,
    Sheila

  • rturbarturba Member

    Just as an update, I finally got my collaborator reference file and managed to split the samples from the joint genotype file. I have not tried to test it using the single sample files, though.

  • I received this same error using Picard 2.18.2:

    Exception in thread "main" java.lang.IllegalArgumentException: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=1,length=60348388,dict_index=0,assembly=null) was found when SAMSequenceRecord(name=10,length=46591166,dict_index=0,assembly=null) was expected.
            at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:127)
            at picard.vcf.SortVcf.doWork(SortVcf.java:96)
            at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:282)
            at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
            at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)
    Caused by: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=1,length=60348388,dict_index=0,assembly=null) was found when SAMSequenceRecord(name=10,length=46591166,dict_index=0,assembly=null) was expected.
            at htsjdk.samtools.SAMSequenceDictionary.assertSameDictionary(SAMSequenceDictionary.java:169)
            at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:125)
            ... 4 more
    

    I checked the contig lengths and they are the same in the fasta/dict file and my vcf as well.

    I do know that the VCF was generated with a version of the reference that included all the scaffolding contigs, and the reference dictionary that I tried to resort with is a truncated version of the reference (only the proper chromosomes 1-25 and MT).

    However, it seems strange to me that the error is reporting on proper chromosomes that exist in both files, and presumably should be reordered according to the reference dictionary. If there was going to be a problem I expected it to be with the 'extra' contigs listed in the VCF but not the reference dictionary.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Neurogrok
    Hi,

    I do know that the VCF was generated with a version of the reference that included all the scaffolding contigs, and the reference dictionary that I tried to resort with is a truncated version of the reference (only the proper chromosomes 1-25 and MT).

    Any chance you can try with the original reference? Or, manually add in those extra scaffolding contigs?

    -Sheila

  • AmirAmir Member
    edited August 24

    Hello GATK team,

    I hate to post a redundant question but I am having a similar problem. Here is what I get from SortVCF of PICARD:

    Exception in thread "main" java.lang.IllegalArgumentException: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=chr2,length=242193529,dict_index=1,assembly=20) was found when SAMSequenceRecord(name=chr10,length=133797422,dict_index=1,assembly=null) was expected. at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:127) at picard.vcf.SortVcf.doWork(SortVcf.java:96) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108) Caused by: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=chr2,length=242193529,dict_index=1,assembly=20) was found when SAMSequenceRecord(name=chr10,length=133797422,dict_index=1,assembly=null) was expected. at htsjdk.samtools.SAMSequenceDictionary.assertSameDictionary(SAMSequenceDictionary.java:169) at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:125) ... 4 more

    I have also attached my hg38 dict file and the header of my VCF file. It doesn't seem to be a discordance in the length of chromosomes. It's just the order of chromosomes that I was expecting PICARD to fix it.
    PICARD version is "2.17.0". My VCF file doesn't have an index and I can not run GATK on it because of contig issue. That's what brought me to PICARD to fix my VCF file for GATK. I built index file for my VCF file using IGV tools but it still didn't work.

    I tried to provide as much information as I could but please let me know if you need you need any other information.

    Thanks
    -Amir

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Amir
    Hi Amir,

    Perhaps some of the tips in this thread will help.

    -Sheila

Sign In or Register to comment.