Attention:
The frontline support team will be unavailable to answer questions until May27th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!

Troubles with Genotype concordance hg19 Sequence dictionaries are not the same size (93, 25)

RosmaninhoRosmaninho Member
edited May 2018 in Ask the GATK team

Hello everyone,

I'm using the Best Practice workflow for Germline short variant discovery (SNPs + Indels) to analyze raw files of 9 whole-exome sequencing samples. I've mostly managed to follow all the steps until the callset evaluation.
However I've hit a big stumbling block on the Picard Tool GenotypeConcordance step.

I am trying to run the following command:
srun shifter --image=broadinstitute/gatk:latest gatk --java-options '-Xmx4G' GenotypeConcordance -CV=$vcf_files/cohort_output.vcf.gz -CS=sample1 -TV=$truth/NA12878.knowledgebase.snapshot.20131119.hg19.vcf.gz -O=$workdir/cohort_output_SNP.vcf

  • My input is cohort_output.vcf.gz.
  • The sample name is sample1 (When I added RG to all samples I named them all sample1 not sure if this won't come back to bite me as well).
  • For the truth vcf I am using the NA12878.knowledgebase.snapshot.20131119.hg19.vcf.gz that was available in the resource bundle.

However, even though I aligned my samples and did everything with the hg19 annotation I am getting this error:
htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequence dictionaries are not the same size (93, 25)

I double-checked and I've aligned my samples to an hg19 fasta file, however its dictionary is different than the one present on the Broad Resource bundle. This did not give me any problem with any of the other bundled sets. But right now I've hit a snag. I really don't know how to overcome this. I'm well aware that I might have to redo everything from the start.

**Is there any way to make this work?

Is the Picard GenotypeConcordance tool still the recommend way to perform variant evaluation in GATK4?

What's the difference of GenotypeConcordance for CollectVariantCallingMetrics? Why is this one removed from GATK4?

Is it recommended to use the Beta Concordance tool instead of GenotypeConcordance?**

This is the full output:
15:52:56.974 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/build/libs/gatk-package-4.0.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed May 02 15:52:56 UTC 2018] GenotypeConcordance --TRUTH_VCF /mnt/nfs/lobo/PRECISE-NFS/pedro.rosmaninho/VariantRecalibrator_files/NA12878.knowledgebase.snapshot.20131119.hg19.vcf.gz --CALL_VCF /mnt/beegfs/scratch/PRECISE/pedro.rosmaninho/recalibrated_variant_snp_g4_applied/cohort_output.vcf.gz --OUTPUT /mnt/beegfs/scratch/PRECISE/pedro.rosmaninho/genotype_concordance/cohort_output_SNP.vcf --CALL_SAMPLE sample1 --OUTPUT_VCF false --INTERSECT_INTERVALS true --MIN_GQ 0 --MIN_DP 0 --OUTPUT_ALL_ROWS false --USE_VCF_INDEX false --MISSING_SITES_HOM_REF false --IGNORE_FILTER_STATUS false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Wed May 02 15:52:56 UTC 2018] Executing as [email protected] on Linux 4.4.0-104-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11; Deflater: Intel; Inflater: Intel; Picard version: Version:4.0.3.0
[Wed May 02 15:52:57 UTC 2018] picard.vcf.GenotypeConcordance done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=1539833856
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequence dictionaries are not the same size (93, 25)
at htsjdk.samtools.util.SequenceUtil.assertSequenceListsEqual(SequenceUtil.java:237)
at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:320)
at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:306)
at picard.vcf.GenotypeConcordance.doWork(GenotypeConcordance.java:350)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:269)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Using GATK jar /gatk/build/libs/gatk-package-4.0.3.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4G -jar /gatk/build/libs/gatk-package-4.0.3.0-local.jar GenotypeConcordance -CV=/mnt/beegfs/scratch/PRECISE/pedro.rosmaninho/recalibrated_variant_snp_g4_applied/cohort_output.vcf.gz -CS=sample1 -TV=/mnt/nfs/lobo/PRECISE-NFS/pedro.rosmaninho/VariantRecalibrator_files/NA12878.knowledgebase.snapshot.20131119.hg19.vcf.gz -O=/mnt/beegfs/scratch/PRECISE/pedro.rosmaninho/genotype_concordance/cohort_output_SNP.vcf
srun: error: compute-1: task 0: Exited with exit code 3
srun: Terminating job step 248876.0

Best Answer

Answers

  • RosmaninhoRosmaninho Member

    So redo everything from the alignment?
    I think that's the case. I used a version of hg19 present in my institution cluster, I only downloaded the Broad resource bundle after and indeed the hg19 fasta version and dictionary in the bundle are different. I was just hopeful that since it's still hg19 that there might be way to not have to redo everything.

    By the way, is that the correct pick to use as the gold standard genome?

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    As far as I remember human_g1k_b37 was used as the reference genome for GRCh37 and for GRCh38 current hg38_analysis set with decoys and HLA was used as reference.

    There may be a version samples mapped to hg19 analysis set instead of b37 but I am not sure if they are readily available. I would pick the ones used by Broad and 1000G for GIAB comparisons and workflow validations.

  • RosmaninhoRosmaninho Member

    Could you point me to the gold standard in the 1000genomes website? I can't find it in their ftp.

  • RosmaninhoRosmaninho Member

    Ok, I found the issue. I did my alignment with Bowtie2 and ran into exactly the same problem as this user.

    https://gatkforums.broadinstitute.org/gatk/discussion/11358/there-is-a-bug-in-collectsequencingartifactmetrics-in-gatk-4-0

    It should be perfectly clear that the alignment should be solely done with bwa.

  • RosmaninhoRosmaninho Member

    Well, I still have the same issue. :/
    And using bwa to generate the dict file with the hg19 in the Broad resource bundle is not working for me.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited June 2018

    @Rosmaninho
    Hi,

    If you are just trying to compare your VCFs, and the entire Best Practice pipeline has worked so far, you may consider using Concordance with --disable-sequence-dictionary-validation. Let me know if that works.

    -Sheila

  • RosmaninhoRosmaninho Member

    Thank you so much Sheila. I decided to redo everything anyway. The hg19 annotation in my institution is slightly different than the one in the Resource Bundle so I decided to start over again. I'll use that parameter if I find any problem.

  • @Sheila
    I managed to overcome this dictionary issue. However, I stumbled into another problem:

    When running the following command:

    GenotypeConcordance --TRUTH_VCF /mnt/nfs/lobo/PRECISE-NFS/pedro.rosmaninho/gatk_resource_bundle/NA12878.knowledgebase.snapshot.20131119.hg19.vcf --CALL_VCF /mnt/beegfs/scratch/PRECISE/pedro.rosmaninho/recalibrated_variant_indels_applied/rec_cohort_output_snp_indels.vcf.gz --OUTPUT /mnt/beegfs/scratch/PRECISE/pedro.rosmaninho/genotype_concordance/EX425_cohort_output_snp_indels_concordance --CALL_SAMPLE EX425_sm --INTERVALS /mnt/nfs/lobo/PRECISE-NFS/pedro.rosmaninho/SeqCapEZ_Exome_v3.0_Design_Annotation_files/SeqCap_EZ_Exome_v3_hg19_primary_targets.interval_list --TMP_DIR /mnt/beegfs/scratch/PRECISE/pedro.rosmaninho/tmpdir --OUTPUT_VCF false --INTERSECT_INTERVALS true --MIN_GQ 0 --MIN_DP 0 --OUTPUT_ALL_ROWS false --USE_VCF_INDEX false --MISSING_SITES_HOM_REF false --IGNORE_FILTER_STATUS false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false

    The resulting genotype_concordance_summary_metrics file outputs extremely low genotype concordance with the used truth set - NA12878.knowledgebase.snapshot.20131119.hg19.vcf - in the specified interval list - SeqCap_EZ_Exome_v3_hg19_primary_targets.interval_list.

    VARIANT_TYPE TRUTH_SAMPLE CALL_SAMPLE GENOTYPE_CONCORDANCE
    SNP NA12878 EX425_sm 0.417713
    INDEL NA12878 EX425_sm 0.255236

    I also used the mapped NA12878 bam files available in the 1000genomes ftp (ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/phase3/data/NA12878/exome_alignment/) with my pipeline and when analyzing the genotype concordance with NA12878.knowledgebase.snapshot.20131119.hg19.vcf I obtained the following values:

    VARIANT_TYPE TRUTH_SAMPLE CALL_SAMPLE GENOTYPE_CONCORDANCE
    SNP NA12878 NA12878_sm 0.856956
    INDEL NA12878 NA12878_sm 0.525507

    So I must be doing something wrong.

  • RosmaninhoRosmaninho Member
    edited September 2018

    @Sheila @shlee
    Well, started everything again with b37 annotation from the resource bundle.

    Reached again to Genotype Concordance and I am comparing to the NA12878.knowledgebase.snapshot.20131119.b37.vcf in the same bundle and I get the following error:

    htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequence dictionaries are not the same size (84, 86)

    I checked the headers of my vcf files and they're as follow:

            ##contig=<ID=1,length=249250621>
            ##contig=<ID=2,length=243199373>
            ##contig=<ID=3,length=198022430>
            ##contig=<ID=4,length=191154276>
            ##contig=<ID=5,length=180915260>
            ##contig=<ID=6,length=171115067>
            ##contig=<ID=7,length=159138663>
            ##contig=<ID=8,length=146364022>
            ##contig=<ID=9,length=141213431>
            ##contig=<ID=10,length=135534747>
            ##contig=<ID=11,length=135006516>
            ##contig=<ID=12,length=133851895>
            ##contig=<ID=13,length=115169878>
            ##contig=<ID=14,length=107349540>
            ##contig=<ID=15,length=102531392>
            ##contig=<ID=16,length=90354753>
            ##contig=<ID=17,length=81195210>
            ##contig=<ID=18,length=78077248>
            ##contig=<ID=19,length=59128983>
            ##contig=<ID=20,length=63025520>
            ##contig=<ID=21,length=48129895>
            ##contig=<ID=22,length=51304566>
            ##contig=<ID=X,length=155270560>
            ##contig=<ID=Y,length=59373566>
            ##contig=<ID=MT,length=16569>
            ##contig=<ID=GL000207.1,length=4262>
            ##contig=<ID=GL000226.1,length=15008>
            ##contig=<ID=GL000229.1,length=19913>
            ##contig=<ID=GL000231.1,length=27386>
            ##contig=<ID=GL000210.1,length=27682>
            ##contig=<ID=GL000239.1,length=33824>
            ##contig=<ID=GL000235.1,length=34474>
            ##contig=<ID=GL000201.1,length=36148>
            ##contig=<ID=GL000247.1,length=36422>
            ##contig=<ID=GL000245.1,length=36651>
            ##contig=<ID=GL000197.1,length=37175>
            ##contig=<ID=GL000203.1,length=37498>
            ##contig=<ID=GL000246.1,length=38154>
            ##contig=<ID=GL000249.1,length=38502>
            ##contig=<ID=GL000196.1,length=38914>
            ##contig=<ID=GL000248.1,length=39786>
            ##contig=<ID=GL000244.1,length=39929>
            ##contig=<ID=GL000238.1,length=39939>
            ##contig=<ID=GL000202.1,length=40103>
            ##contig=<ID=GL000234.1,length=40531>
            ##contig=<ID=GL000232.1,length=40652>
            ##contig=<ID=GL000206.1,length=41001>
            ##contig=<ID=GL000240.1,length=41933>
            ##contig=<ID=GL000236.1,length=41934>
            ##contig=<ID=GL000241.1,length=42152>
            ##contig=<ID=GL000243.1,length=43341>
            ##contig=<ID=GL000242.1,length=43523>
            ##contig=<ID=GL000230.1,length=43691>
            ##contig=<ID=GL000237.1,length=45867>
            ##contig=<ID=GL000233.1,length=45941>
            ##contig=<ID=GL000204.1,length=81310>
            ##contig=<ID=GL000198.1,length=90085>
            ##contig=<ID=GL000208.1,length=92689>
            ##contig=<ID=GL000191.1,length=106433>
            ##contig=<ID=GL000227.1,length=128374>
            ##contig=<ID=GL000228.1,length=129120>
            ##contig=<ID=GL000214.1,length=137718>
            ##contig=<ID=GL000221.1,length=155397>
            ##contig=<ID=GL000209.1,length=159169>
            ##contig=<ID=GL000218.1,length=161147>
            ##contig=<ID=GL000220.1,length=161802>
            ##contig=<ID=GL000213.1,length=164239>
            ##contig=<ID=GL000211.1,length=166566>
            ##contig=<ID=GL000199.1,length=169874>
            ##contig=<ID=GL000217.1,length=172149>
            ##contig=<ID=GL000216.1,length=172294>
            ##contig=<ID=GL000215.1,length=172545>
            ##contig=<ID=GL000205.1,length=174588>
            ##contig=<ID=GL000219.1,length=179198>
            ##contig=<ID=GL000224.1,length=179693>
            ##contig=<ID=GL000223.1,length=180455>
            ##contig=<ID=GL000195.1,length=182896>
            ##contig=<ID=GL000212.1,length=186858>
            ##contig=<ID=GL000222.1,length=186861>
            ##contig=<ID=GL000200.1,length=187035>
            ##contig=<ID=GL000193.1,length=189789>
            ##contig=<ID=GL000194.1,length=191469>
            ##contig=<ID=GL000225.1,length=211173>
            ##contig=<ID=GL000192.1,length=547496>
            ##contig=<ID=NC_007605,length=171823,assembly=NCBI37>
            ##contig=<ID=hs37d5,length=35477943,assembly=NCBI37>
    

    This is the dict in the NA12878.knowledgebase.snapshot.20131119.b37.vcf

        ##contig=<ID=1,length=249250621,assembly=b37>
        ##contig=<ID=2,length=243199373,assembly=b37>
        ##contig=<ID=3,length=198022430,assembly=b37>
        ##contig=<ID=4,length=191154276,assembly=b37>
        ##contig=<ID=5,length=180915260,assembly=b37>
        ##contig=<ID=6,length=171115067,assembly=b37>
        ##contig=<ID=7,length=159138663,assembly=b37>
        ##contig=<ID=8,length=146364022,assembly=b37>
        ##contig=<ID=9,length=141213431,assembly=b37>
        ##contig=<ID=10,length=135534747,assembly=b37>
        ##contig=<ID=11,length=135006516,assembly=b37>
        ##contig=<ID=12,length=133851895,assembly=b37>
        ##contig=<ID=13,length=115169878,assembly=b37>
        ##contig=<ID=14,length=107349540,assembly=b37>
        ##contig=<ID=15,length=102531392,assembly=b37>
        ##contig=<ID=16,length=90354753,assembly=b37>
        ##contig=<ID=17,length=81195210,assembly=b37>
        ##contig=<ID=18,length=78077248,assembly=b37>
        ##contig=<ID=19,length=59128983,assembly=b37>
        ##contig=<ID=20,length=63025520,assembly=b37>
        ##contig=<ID=21,length=48129895,assembly=b37>
        ##contig=<ID=22,length=51304566,assembly=b37>
        ##contig=<ID=X,length=155270560,assembly=b37>
        ##contig=<ID=Y,length=59373566,assembly=b37>
        ##contig=<ID=MT,length=16569,assembly=b37>
        ##contig=<ID=GL000207.1,length=4262,assembly=b37>
        ##contig=<ID=GL000226.1,length=15008,assembly=b37>
        ##contig=<ID=GL000229.1,length=19913,assembly=b37>
        ##contig=<ID=GL000231.1,length=27386,assembly=b37>
        ##contig=<ID=GL000210.1,length=27682,assembly=b37>
        ##contig=<ID=GL000239.1,length=33824,assembly=b37>
        ##contig=<ID=GL000235.1,length=34474,assembly=b37>
        ##contig=<ID=GL000201.1,length=36148,assembly=b37>
        ##contig=<ID=GL000247.1,length=36422,assembly=b37>
        ##contig=<ID=GL000245.1,length=36651,assembly=b37>
        ##contig=<ID=GL000197.1,length=37175,assembly=b37>
        ##contig=<ID=GL000203.1,length=37498,assembly=b37>
        ##contig=<ID=GL000246.1,length=38154,assembly=b37>
        ##contig=<ID=GL000249.1,length=38502,assembly=b37>
        ##contig=<ID=GL000196.1,length=38914,assembly=b37>
        ##contig=<ID=GL000248.1,length=39786,assembly=b37>
        ##contig=<ID=GL000244.1,length=39929,assembly=b37>
        ##contig=<ID=GL000238.1,length=39939,assembly=b37>
        ##contig=<ID=GL000202.1,length=40103,assembly=b37>
        ##contig=<ID=GL000234.1,length=40531,assembly=b37>
        ##contig=<ID=GL000232.1,length=40652,assembly=b37>
        ##contig=<ID=GL000206.1,length=41001,assembly=b37>
        ##contig=<ID=GL000240.1,length=41933,assembly=b37>
        ##contig=<ID=GL000236.1,length=41934,assembly=b37>
        ##contig=<ID=GL000241.1,length=42152,assembly=b37>
        ##contig=<ID=GL000243.1,length=43341,assembly=b37>
        ##contig=<ID=GL000242.1,length=43523,assembly=b37>
        ##contig=<ID=GL000230.1,length=43691,assembly=b37>
        ##contig=<ID=GL000237.1,length=45867,assembly=b37>
        ##contig=<ID=GL000233.1,length=45941,assembly=b37>
        ##contig=<ID=GL000204.1,length=81310,assembly=b37>
        ##contig=<ID=GL000198.1,length=90085,assembly=b37>
        ##contig=<ID=GL000208.1,length=92689,assembly=b37>
        ##contig=<ID=GL000191.1,length=106433,assembly=b37>
        ##contig=<ID=GL000227.1,length=128374,assembly=b37>
        ##contig=<ID=GL000228.1,length=129120,assembly=b37>
        ##contig=<ID=GL000214.1,length=137718,assembly=b37>
        ##contig=<ID=GL000221.1,length=155397,assembly=b37>
        ##contig=<ID=GL000209.1,length=159169,assembly=b37>
        ##contig=<ID=GL000218.1,length=161147,assembly=b37>
        ##contig=<ID=GL000220.1,length=161802,assembly=b37>
        ##contig=<ID=GL000213.1,length=164239,assembly=b37>
        ##contig=<ID=GL000211.1,length=166566,assembly=b37>
        ##contig=<ID=GL000199.1,length=169874,assembly=b37>
        ##contig=<ID=GL000217.1,length=172149,assembly=b37>
        ##contig=<ID=GL000216.1,length=172294,assembly=b37>
        ##contig=<ID=GL000215.1,length=172545,assembly=b37>
        ##contig=<ID=GL000205.1,length=174588,assembly=b37>
        ##contig=<ID=GL000219.1,length=179198,assembly=b37>
        ##contig=<ID=GL000224.1,length=179693,assembly=b37>
        ##contig=<ID=GL000223.1,length=180455,assembly=b37>
        ##contig=<ID=GL000195.1,length=182896,assembly=b37>
        ##contig=<ID=GL000212.1,length=186858,assembly=b37>
        ##contig=<ID=GL000222.1,length=186861,assembly=b37>
        ##contig=<ID=GL000200.1,length=187035,assembly=b37>
        ##contig=<ID=GL000193.1,length=189789,assembly=b37>
        ##contig=<ID=GL000194.1,length=191469,assembly=b37>
        ##contig=<ID=GL000225.1,length=211173,assembly=b37>
        ##contig=<ID=GL000192.1,length=547496,assembly=b37>
        ##reference=file:///humgen/1kg/reference/human_g1k_v37.fasta
    

    However, I can't figure out for the life of me where the hell did my vcf get those two extra lines since the dict of the b37 annotation in the resource bundle only has 84 lines.

        >             @HD   VN:1.0  SO:unsorted
    
                    @SQ   SN:1    LN:249250621    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:1b22b98cdeb4a9304cb5d48026a85128
                    @SQ   SN:2    LN:243199373    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:a0d9851da00400dec1098a9255ac712e
                    @SQ   SN:3    LN:198022430    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:fdfd811849cc2fadebc929bb925902e5
                    @SQ   SN:4    LN:191154276    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:23dccd106897542ad87d2765d28a19a1
                    @SQ   SN:5    LN:180915260    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:0740173db9ffd264d728f32784845cd7
                    @SQ   SN:6    LN:171115067    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:1d3a93a248d92a729ee764823acbbc6b
                    @SQ   SN:7    LN:159138663    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:618366e953d6aaad97dbe4777c29375e
                    @SQ   SN:8    LN:146364022    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:96f514a9929e410c6651697bded59aec
                    @SQ   SN:9    LN:141213431    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:3e273117f15e0a400f01055d9f393768
                    @SQ   SN:10   LN:135534747    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:988c28e000e84c26d552359af1ea2e1d
                    @SQ   SN:11   LN:135006516    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:98c59049a2df285c76ffb1c6db8f8b96
                    @SQ   SN:12   LN:133851895    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:51851ac0e1a115847ad36449b0015864
                    @SQ   SN:13   LN:115169878    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:283f8d7892baa81b510a015719ca7b0b
                    @SQ   SN:14   LN:107349540    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:98f3cae32b2a2e9524bc19813927542e
                    @SQ   SN:15   LN:102531392    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:e5645a794a8238215b2cd77acb95a078
                    @SQ   SN:16   LN:90354753 UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:fc9b1a7b42b97a864f56b348b06095e6
                    @SQ   SN:17   LN:81195210 UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:351f64d4f4f9ddd45b35336ad97aa6de
                    @SQ   SN:18   LN:78077248 UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c
                    @SQ   SN:19   LN:59128983 UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:1aacd71f30db8e561810913e0b72636d
                    @SQ   SN:20   LN:63025520 UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:0dec9660ec1efaaf33281c0d5ea2560f
                    @SQ   SN:21   LN:48129895 UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:2979a6085bfe28e3ad6f552f361ed74d
                    @SQ   SN:22   LN:51304566 UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:a718acaa6135fdca8357d5bfe94211dd
                    @SQ   SN:X    LN:155270560    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:7e0e2e580297b7764e31dbc80c2540dd
                    @SQ   SN:Y    LN:59373566 UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:1fa3474750af0948bdf97d5a0ee52e51
                    @SQ   SN:MT   LN:16569    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:c68f52674c9fb33aef52dcf399755519
                    @SQ   SN:GL000207.1   LN:4262 UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:f3814841f1939d3ca19072d9e89f3fd7
                    @SQ   SN:GL000226.1   LN:15008    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:1c1b2cd1fccbc0a99b6a447fa24d1504
                    @SQ   SN:GL000229.1   LN:19913    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:d0f40ec87de311d8e715b52e4c7062e1
                    @SQ   SN:GL000231.1   LN:27386    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:ba8882ce3a1efa2080e5d29b956568a4
                    @SQ   SN:GL000210.1   LN:27682    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:851106a74238044126131ce2a8e5847c
                    @SQ   SN:GL000239.1   LN:33824    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:99795f15702caec4fa1c4e15f8a29c07
                    @SQ   SN:GL000235.1   LN:34474    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:118a25ca210cfbcdfb6c2ebb249f9680
                    @SQ   SN:GL000201.1   LN:36148    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:dfb7e7ec60ffdcb85cb359ea28454ee9
                    @SQ   SN:GL000247.1   LN:36422    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:7de00226bb7df1c57276ca6baabafd15
                    @SQ   SN:GL000245.1   LN:36651    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:89bc61960f37d94abf0df2d481ada0ec
                    @SQ   SN:GL000197.1   LN:37175    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:6f5efdd36643a9b8c8ccad6f2f1edc7b
                    @SQ   SN:GL000203.1   LN:37498    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:96358c325fe0e70bee73436e8bb14dbd
                    @SQ   SN:GL000246.1   LN:38154    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:e4afcd31912af9d9c2546acf1cb23af2
                    @SQ   SN:GL000249.1   LN:38502    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:1d78abec37c15fe29a275eb08d5af236
                    @SQ   SN:GL000196.1   LN:38914    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:d92206d1bb4c3b4019c43c0875c06dc0
                    @SQ   SN:GL000248.1   LN:39786    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:5a8e43bec9be36c7b49c84d585107776
                    @SQ   SN:GL000244.1   LN:39929    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:0996b4475f353ca98bacb756ac479140
                    @SQ   SN:GL000238.1   LN:39939    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:131b1efc3270cc838686b54e7c34b17b
                    @SQ   SN:GL000202.1   LN:40103    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:06cbf126247d89664a4faebad130fe9c
                    @SQ   SN:GL000234.1   LN:40531    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:93f998536b61a56fd0ff47322a911d4b
                    @SQ   SN:GL000232.1   LN:40652    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:3e06b6741061ad93a8587531307057d8
                    @SQ   SN:GL000206.1   LN:41001    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:43f69e423533e948bfae5ce1d45bd3f1
                    @SQ   SN:GL000240.1   LN:41933    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:445a86173da9f237d7bcf41c6cb8cc62
                    @SQ   SN:GL000236.1   LN:41934    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:fdcd739913efa1fdc64b6c0cd7016779
                    @SQ   SN:GL000241.1   LN:42152    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:ef4258cdc5a45c206cea8fc3e1d858cf
                    @SQ   SN:GL000243.1   LN:43341    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:cc34279a7e353136741c9fce79bc4396
                    @SQ   SN:GL000242.1   LN:43523    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:2f8694fc47576bc81b5fe9e7de0ba49e
                    @SQ   SN:GL000230.1   LN:43691    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:b4eb71ee878d3706246b7c1dbef69299
                    @SQ   SN:GL000237.1   LN:45867    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:e0c82e7751df73f4f6d0ed30cdc853c0
                    @SQ   SN:GL000233.1   LN:45941    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:7fed60298a8d62ff808b74b6ce820001
                    @SQ   SN:GL000204.1   LN:81310    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:efc49c871536fa8d79cb0a06fa739722
                    @SQ   SN:GL000198.1   LN:90085    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:868e7784040da90d900d2d1b667a1383
                    @SQ   SN:GL000208.1   LN:92689    UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:aa81be49bf3fe63a79bdc6a6f279abf6
                    @SQ   SN:GL000191.1   LN:106433   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:d75b436f50a8214ee9c2a51d30b2c2cc
                    @SQ   SN:GL000227.1   LN:128374   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:a4aead23f8053f2655e468bcc6ecdceb
                    @SQ   SN:GL000228.1   LN:129120   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:c5a17c97e2c1a0b6a9cc5a6b064b714f
                    @SQ   SN:GL000214.1   LN:137718   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:46c2032c37f2ed899eb41c0473319a69
                    @SQ   SN:GL000221.1   LN:155397   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:3238fb74ea87ae857f9c7508d315babb
                    @SQ   SN:GL000209.1   LN:159169   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:f40598e2a5a6b26e84a3775e0d1e2c81
                    @SQ   SN:GL000218.1   LN:161147   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:1d708b54644c26c7e01c2dad5426d38c
                    @SQ   SN:GL000220.1   LN:161802   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:fc35de963c57bf7648429e6454f1c9db
                    @SQ   SN:GL000213.1   LN:164239   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:9d424fdcc98866650b58f004080a992a
                    @SQ   SN:GL000211.1   LN:166566   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:7daaa45c66b288847b9b32b964e623d3
                    @SQ   SN:GL000199.1   LN:169874   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:569af3b73522fab4b40995ae4944e78e
                    @SQ   SN:GL000217.1   LN:172149   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:6d243e18dea1945fb7f2517615b8f52e
                    @SQ   SN:GL000216.1   LN:172294   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:642a232d91c486ac339263820aef7fe0
                    @SQ   SN:GL000215.1   LN:172545   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:5eb3b418480ae67a997957c909375a73
                    @SQ   SN:GL000205.1   LN:174588   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:d22441398d99caf673e9afb9a1908ec5
                    @SQ   SN:GL000219.1   LN:179198   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:f977edd13bac459cb2ed4a5457dba1b3
                    @SQ   SN:GL000224.1   LN:179693   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:d5b2fc04f6b41b212a4198a07f450e20
                    @SQ   SN:GL000223.1   LN:180455   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:399dfa03bf32022ab52a846f7ca35b30
                    @SQ   SN:GL000195.1   LN:182896   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:5d9ec007868d517e73543b005ba48535
                    @SQ   SN:GL000212.1   LN:186858   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:563531689f3dbd691331fd6c5730a88b
                    @SQ   SN:GL000222.1   LN:186861   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:6fe9abac455169f50470f5a6b01d0f59
                    @SQ   SN:GL000200.1   LN:187035   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:75e4c8d17cd4addf3917d1703cacaf25
                    @SQ   SN:GL000193.1   LN:189789   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:dbb6e8ece0b5de29da56601613007c2a
                    @SQ   SN:GL000194.1   LN:191469   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:6ac8f815bf8e845bb3031b73f812c012
                    @SQ   SN:GL000225.1   LN:211173   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:63945c3e6962f28ffd469719a747e73c
                    @SQ   SN:GL000192.1   LN:547496   UR:file:/humgen/gsa-hpprojects/GATK/bundle/5974/b37/human_g1k_v37.fasta M5:325ba9e808f669dfeee210fdd7b470ac
    

    Seriously, what are the two final lines in my vcf that are screwing me coming from? Why??

  • RosmaninhoRosmaninho Member
    edited September 2018

    I changed those final two lines from
    ##contig=<ID=NC_007605,length=171823,assembly=NCBI37>
    ##contig=<ID=hs37d5,length=35477943,assembly=NCBI37>
    to
    ##source=<ID=NC_007605,length=171823,assembly=NCBI37>
    ##source=<ID=hs37d5,length=35477943,assembly=NCBI37>

    And now it's running, but this is completely moronic. I did everything by the book

  • DamandeepKaurDamandeepKaur Member
    edited October 2018

    Hi,

    I am having trouble in running GenotypeConcordance, I think I am not sure which truthset to be used... I have aligned my reads with UCSC hg38 reference genome followed by GATK best practices?

  • Hi shlee,
    Nothing like NA12878.knowledgebase.snapshot.20131119.b37.vcf for b37 in resource bundle has been provided, when I try with 1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf it gives me error that file doesnt have genotypes and when I try with Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf it gives me error of uncompatible contigs...

    Please help.. I am stuck on it from a while now.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    @DamandeepKaur, I think you are accessing our bundle from the FTP site. We have some public genomics data in the cloud and you can find them at https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0. In this Google Cloud Storage bucket, we see some 1000 Genomes Project callsets. You should be able to subset out the NA12878 sample from these towards your truthset. I hope this helps.

Sign In or Register to comment.