To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

There is a bug in "CollectSequencingArtifactMetrics" in GATK 4.0

I am addressing WES data with GATK for mutation calling.
I selected two samples to have a test. One is tumor sample, one is matched normal sample.

I used bowtie2 to do the alignment (hg19). and then use Picard to do the sort, and remove duplicates.
and then used GATK to do the Base Recalibration. and then Mutect2.

I met an error in the step of filtering.
When I use CollectSequencingArtifactMetrics, two parameters are needed: a bam file, and a GATK reference.

gatk CollectSequencingArtifactMetrics \
-I tumor.bam \
-R ref.fasta \
-O tumor_artifact \
--FILE_EXTENSION ".txt"

The command line is:
gatk CollectSequencingArtifactMetrics
-I ../MySam/SRR5038441_Pst_MD_BQSR.bam
-R ucsc.hg19.fasta (from GATK ftp site)
-O tumor_artifact
--FILE_EXTENSION ".txt"

The error is: Sequence dictionaries are not the same size (84, 93).

The reason is that, there are 84 lines in the sequence dictionary in bam file and 93 line dictionary in reference.
The sequence dictionary in bam was generated by bowtie2. I selected all the reference files with hg19.
The reference that used by bowtie2 might be different from the reference downloaded from GATK resource (your ftp site).

and then, I printed the head of bam file, and checked with the "ucsc.hg19.dict", and tried to remove the extra 9 lines.
such as
SN:chr4_ctg9_hap1
SN:chr6_apd_hap1
SN:chr6_cox_hap2
SN:chr6_dbb_hap3
SN:chr6_mann_hap4
SN:chr6_mcf_hap5
SN:chr6_qbl_hap6
SN:chr6_ssto_hap7
SN:chr17_ctg5_hap1

and I also address in the same way in the ucsc.hg19.fasta.fai.

but still got the error:
Sequences at index 0 don't match: 0/249250621/chr1 0/16571/chrM/UR=file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta/M5=d2ed829b8a1628d16cbeee88e88e39eb

but I think the updated file "ucsc.hg19.fasta.fai" is consistent with "ucsc.hg19.dict". (I checked each line)

Finally, I test this question with hg38.
I downloaded the index files of bowtie2 with hg38, and re-alignment the original file and get a new bam file.
and then I re-download the GATK reference with hg38 from GATK resource bundle(ftp://ftp.broadinstitute.org/bundle/hg38/).

gatk CollectSequencingArtifactMetrics
-I ../MySam/441.bam
-R Homo_sapiens_assembly38.fasta
-O tumor_artifact
--FILE_EXTENSION ".txt"

I still got the similar error.: Sequence dictionaries are not the same size (195, 3366).
This error means that there are 195 lines in the sequence dictionaries in bam file but 3366 lines in GATK reference.

In summary, although bowtie2 and gatk may follow the same standard for the same genome, but there are still some functions are not compatible. Many metrics calculated from picard can not address the bam files, if they have different size of dictionaries compare to that in reference.
However, there is an exception:
I called the function "CollectOxoGMetrics", no error.

gatk CollectOxoGMetrics -I ../MySam/SRR5038441_Pst_MD_BQSR.bam -R ucsc.hg19.fasta -O tumor_artifact.txt

Pls pay attention, this bam file has 84 lines of sequence dictionary, and the 93 lines of dictionary in ucsc.hg19.fasta.

Best Answer

  • SheilaSheila Broad InstituteMember, Broadie, Moderator
    Accepted Answer

    @JakeJi2345
    Hi,

    So, you have used the same reference throughout your analysis? I have not come across this issue in testing. Can you post the BAM header (specifically the @SQ lines) and the FASTA dict file?

    Thanks,
    Sheila

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator
    Accepted Answer

    @JakeJi2345
    Hi,

    So, you have used the same reference throughout your analysis? I have not come across this issue in testing. Can you post the BAM header (specifically the @SQ lines) and the FASTA dict file?

    Thanks,
    Sheila

  • The header of bam file is here (generated by samtools view -H):

    @HD VN:1.5 SO:coordinate
    @SQ SN:chr1 LN:249250621
    @SQ SN:chr2 LN:243199373
    @SQ SN:chr3 LN:198022430
    @SQ SN:chr4 LN:191154276
    @SQ SN:chr5 LN:180915260
    @SQ SN:chr6 LN:171115067
    @SQ SN:chr7 LN:159138663
    @SQ SN:chr8 LN:146364022
    @SQ SN:chr9 LN:141213431
    @SQ SN:chr10 LN:135534747
    @SQ SN:chr11 LN:135006516
    @SQ SN:chr12 LN:133851895
    @SQ SN:chr13 LN:115169878
    @SQ SN:chr14 LN:107349540
    @SQ SN:chr15 LN:102531392
    @SQ SN:chr16 LN:90354753
    @SQ SN:chr17 LN:81195210
    @SQ SN:chr18 LN:78077248
    @SQ SN:chr19 LN:59128983
    @SQ SN:chr20 LN:63025520
    @SQ SN:chr21 LN:48129895
    @SQ SN:chr22 LN:51304566
    @SQ SN:chrX LN:155270560
    @SQ SN:chrY LN:59373566
    @SQ SN:chrM LN:16571
    @SQ SN:chr1_gl000191_random LN:106433
    @SQ SN:chr1_gl000192_random LN:547496
    @SQ SN:chr4_gl000193_random LN:189789
    @SQ SN:chr4_gl000194_random LN:191469
    @SQ SN:chr7_gl000195_random LN:182896
    @SQ SN:chr8_gl000196_random LN:38914
    @SQ SN:chr8_gl000197_random LN:37175
    @SQ SN:chr9_gl000198_random LN:90085
    @SQ SN:chr9_gl000199_random LN:169874
    @SQ SN:chr9_gl000200_random LN:187035
    @SQ SN:chr9_gl000201_random LN:36148
    @SQ SN:chr11_gl000202_random LN:40103
    @SQ SN:chr17_gl000203_random LN:37498
    @SQ SN:chr17_gl000204_random LN:81310
    @SQ SN:chr17_gl000205_random LN:174588
    @SQ SN:chr17_gl000206_random LN:41001
    @SQ SN:chr18_gl000207_random LN:4262
    @SQ SN:chr19_gl000208_random LN:92689
    @SQ SN:chr19_gl000209_random LN:159169
    @SQ SN:chr21_gl000210_random LN:27682
    @SQ SN:chrUn_gl000211 LN:166566
    @SQ SN:chrUn_gl000212 LN:186858
    @SQ SN:chrUn_gl000213 LN:164239
    @SQ SN:chrUn_gl000214 LN:137718
    @SQ SN:chrUn_gl000215 LN:172545
    @SQ SN:chrUn_gl000216 LN:172294
    @SQ SN:chrUn_gl000217 LN:172149
    @SQ SN:chrUn_gl000218 LN:161147
    @SQ SN:chrUn_gl000219 LN:179198
    @SQ SN:chrUn_gl000220 LN:161802
    @SQ SN:chrUn_gl000221 LN:155397
    @SQ SN:chrUn_gl000222 LN:186861
    @SQ SN:chrUn_gl000223 LN:180455
    @SQ SN:chrUn_gl000224 LN:179693
    @SQ SN:chrUn_gl000225 LN:211173
    @SQ SN:chrUn_gl000226 LN:15008
    @SQ SN:chrUn_gl000227 LN:128374
    @SQ SN:chrUn_gl000228 LN:129120
    @SQ SN:chrUn_gl000229 LN:19913
    @SQ SN:chrUn_gl000230 LN:43691
    @SQ SN:chrUn_gl000231 LN:27386
    @SQ SN:chrUn_gl000232 LN:40652
    @SQ SN:chrUn_gl000233 LN:45941
    @SQ SN:chrUn_gl000234 LN:40531
    @SQ SN:chrUn_gl000235 LN:34474
    @SQ SN:chrUn_gl000236 LN:41934
    @SQ SN:chrUn_gl000237 LN:45867
    @SQ SN:chrUn_gl000238 LN:39939
    @SQ SN:chrUn_gl000239 LN:33824
    @SQ SN:chrUn_gl000240 LN:41933
    @SQ SN:chrUn_gl000241 LN:42152
    @SQ SN:chrUn_gl000242 LN:43523
    @SQ SN:chrUn_gl000243 LN:43341
    @SQ SN:chrUn_gl000244 LN:39929
    @SQ SN:chrUn_gl000245 LN:36651
    @SQ SN:chrUn_gl000246 LN:38154
    @SQ SN:chrUn_gl000247 LN:36422
    @SQ SN:chrUn_gl000248 LN:39786
    @SQ SN:chrUn_gl000249 LN:38502
    @RG ID:SRR5038441 SM:SAMN06041278 PL:Illumina
    @PG ID:bowtie2 PN:bowtie2 VN:2.3.2 CL:"/opt/apps/bowtie/2.3.2/bin/bowtie2-align-s --wrapper basic-0 --end-to-end -p 32 -x ./Myindex/hg19 --rg-id SRR5038441 --rg SM:SAMN06041278 --rg PL:Illumina -S ./MySam/SRR5038441.sam -1 ./MyFastq/SRR5038441_1.fastq -2 ./MyFastq/SRR5038441_2.fastq"
    @PG ID:MarkDuplicates VN:2.17.4-SNAPSHOT CL:MarkDuplicates INPUT=[SRR5038441_Pst.bam] OUTPUT=SRR5038441_Pst_MD.bam METRICS_FILE=441.mkdup.metrics CREATE_INDEX=true MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true ADD_PG_TAG_TO_READS=true REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX= OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false PN:MarkDuplicates
    @PG ID:GATK ApplyBQSR VN:4.0.0.0 CL:ApplyBQSR --output ../MySam/SRR5038441_Pst_MD_BQSR.bam --bqsr-recal-file 441_1.grp --input ../MySam/SRR5038441_Pst_MD.bam --reference ucsc.hg19.fasta --preserve-qscores-less-than 6 --use-original-qualities false --quantize-quals 0 --round-down-quantized false --emit-original-quals false --global-qscore-prior -1.0 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --disable-tool-default-read-filters false PN:GATK ApplyBQSR

    Now, the sequence dictionary download from GATK resource is here:

    @HD VN:1.0 SO:unsorted
    @SQ SN:chrM LN:16571 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:d2ed829b8a1628d16cbeee88e88e39eb
    @SQ SN:chr1 LN:249250621 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:1b22b98cdeb4a9304cb5d48026a85128
    @SQ SN:chr2 LN:243199373 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:a0d9851da00400dec1098a9255ac712e
    @SQ SN:chr3 LN:198022430 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:641e4338fa8d52a5b781bd2a2c08d3c3
    @SQ SN:chr4 LN:191154276 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:23dccd106897542ad87d2765d28a19a1
    @SQ SN:chr5 LN:180915260 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:0740173db9ffd264d728f32784845cd7
    @SQ SN:chr6 LN:171115067 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:1d3a93a248d92a729ee764823acbbc6b
    @SQ SN:chr7 LN:159138663 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:618366e953d6aaad97dbe4777c29375e
    @SQ SN:chr8 LN:146364022 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:96f514a9929e410c6651697bded59aec
    @SQ SN:chr9 LN:141213431 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:3e273117f15e0a400f01055d9f393768
    @SQ SN:chr10 LN:135534747 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:988c28e000e84c26d552359af1ea2e1d
    @SQ SN:chr11 LN:135006516 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:98c59049a2df285c76ffb1c6db8f8b96
    @SQ SN:chr12 LN:133851895 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:51851ac0e1a115847ad36449b0015864
    @SQ SN:chr13 LN:115169878 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:283f8d7892baa81b510a015719ca7b0b
    @SQ SN:chr14 LN:107349540 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:98f3cae32b2a2e9524bc19813927542e
    @SQ SN:chr15 LN:102531392 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:e5645a794a8238215b2cd77acb95a078
    @SQ SN:chr16 LN:90354753 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:fc9b1a7b42b97a864f56b348b06095e6
    @SQ SN:chr17 LN:81195210 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:351f64d4f4f9ddd45b35336ad97aa6de
    @SQ SN:chr18 LN:78077248 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c
    @SQ SN:chr19 LN:59128983 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:1aacd71f30db8e561810913e0b72636d
    @SQ SN:chr20 LN:63025520 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:0dec9660ec1efaaf33281c0d5ea2560f
    @SQ SN:chr21 LN:48129895 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:2979a6085bfe28e3ad6f552f361ed74d
    @SQ SN:chr22 LN:51304566 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:a718acaa6135fdca8357d5bfe94211dd
    @SQ SN:chrX LN:155270560 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:7e0e2e580297b7764e31dbc80c2540dd
    @SQ SN:chrY LN:59373566 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:1e86411d73e6f00a10590f976be01623
    @SQ SN:chr1_gl000191_random LN:106433 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:d75b436f50a8214ee9c2a51d30b2c2cc
    @SQ SN:chr1_gl000192_random LN:547496 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:325ba9e808f669dfeee210fdd7b470ac
    @SQ SN:chr4_ctg9_hap1 LN:590426 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:fa24f81b680df26bcfb6d69b784fbe36
    @SQ SN:chr4_gl000193_random LN:189789 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:dbb6e8ece0b5de29da56601613007c2a
    @SQ SN:chr4_gl000194_random LN:191469 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:6ac8f815bf8e845bb3031b73f812c012
    @SQ SN:chr6_apd_hap1 LN:4622290 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:fe71bc63420d666884f37a3ad79f3317
    @SQ SN:chr6_cox_hap2 LN:4795371 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:18c17e1641ef04873b15f40f6c8659a4
    @SQ SN:chr6_dbb_hap3 LN:4610396 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:2a3c677c426a10e137883ae1ffb8da3f
    @SQ SN:chr6_mann_hap4 LN:4683263 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:9d51d4152174461cd6715c7ddc588dc8
    @SQ SN:chr6_mcf_hap5 LN:4833398 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:efed415dd8742349cb7aaca054675b9a
    @SQ SN:chr6_qbl_hap6 LN:4611984 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:094d037050cad692b57ea12c4fef790f
    @SQ SN:chr6_ssto_hap7 LN:4928567 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:3b6d666200e72bcc036bf88a4d7e0749
    @SQ SN:chr7_gl000195_random LN:182896 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:5d9ec007868d517e73543b005ba48535
    @SQ SN:chr8_gl000196_random LN:38914 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:d92206d1bb4c3b4019c43c0875c06dc0
    @SQ SN:chr8_gl000197_random LN:37175 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:6f5efdd36643a9b8c8ccad6f2f1edc7b
    @SQ SN:chr9_gl000198_random LN:90085 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:868e7784040da90d900d2d1b667a1383
    @SQ SN:chr9_gl000199_random LN:169874 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:569af3b73522fab4b40995ae4944e78e
    @SQ SN:chr9_gl000200_random LN:187035 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:75e4c8d17cd4addf3917d1703cacaf25
    @SQ SN:chr9_gl000201_random LN:36148 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:dfb7e7ec60ffdcb85cb359ea28454ee9
    @SQ SN:chr11_gl000202_random LN:40103 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:06cbf126247d89664a4faebad130fe9c
    @SQ SN:chr17_ctg5_hap1 LN:1680828 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:d89517b400226d3b56e753972a7cad67
    @SQ SN:chr17_gl000203_random LN:37498 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:96358c325fe0e70bee73436e8bb14dbd
    @SQ SN:chr17_gl000204_random LN:81310 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:efc49c871536fa8d79cb0a06fa739722
    @SQ SN:chr17_gl000205_random LN:174588 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:d22441398d99caf673e9afb9a1908ec5
    @SQ SN:chr17_gl000206_random LN:41001 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:43f69e423533e948bfae5ce1d45bd3f1
    @SQ SN:chr18_gl000207_random LN:4262 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:f3814841f1939d3ca19072d9e89f3fd7
    @SQ SN:chr19_gl000208_random LN:92689 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:aa81be49bf3fe63a79bdc6a6f279abf6
    @SQ SN:chr19_gl000209_random LN:159169 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:f40598e2a5a6b26e84a3775e0d1e2c81
    @SQ SN:chr21_gl000210_random LN:27682 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:851106a74238044126131ce2a8e5847c
    @SQ SN:chrUn_gl000211 LN:166566 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:7daaa45c66b288847b9b32b964e623d3
    @SQ SN:chrUn_gl000212 LN:186858 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:563531689f3dbd691331fd6c5730a88b
    @SQ SN:chrUn_gl000213 LN:164239 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:9d424fdcc98866650b58f004080a992a
    @SQ SN:chrUn_gl000214 LN:137718 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:46c2032c37f2ed899eb41c0473319a69
    @SQ SN:chrUn_gl000215 LN:172545 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:5eb3b418480ae67a997957c909375a73
    @SQ SN:chrUn_gl000216 LN:172294 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:642a232d91c486ac339263820aef7fe0
    @SQ SN:chrUn_gl000217 LN:172149 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:6d243e18dea1945fb7f2517615b8f52e
    @SQ SN:chrUn_gl000218 LN:161147 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:1d708b54644c26c7e01c2dad5426d38c
    @SQ SN:chrUn_gl000219 LN:179198 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:f977edd13bac459cb2ed4a5457dba1b3
    @SQ SN:chrUn_gl000220 LN:161802 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:fc35de963c57bf7648429e6454f1c9db
    @SQ SN:chrUn_gl000221 LN:155397 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:3238fb74ea87ae857f9c7508d315babb
    @SQ SN:chrUn_gl000222 LN:186861 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:6fe9abac455169f50470f5a6b01d0f59
    @SQ SN:chrUn_gl000223 LN:180455 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:399dfa03bf32022ab52a846f7ca35b30
    @SQ SN:chrUn_gl000224 LN:179693 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:d5b2fc04f6b41b212a4198a07f450e20
    @SQ SN:chrUn_gl000225 LN:211173 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:63945c3e6962f28ffd469719a747e73c
    @SQ SN:chrUn_gl000226 LN:15008 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:1c1b2cd1fccbc0a99b6a447fa24d1504
    @SQ SN:chrUn_gl000227 LN:128374 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:a4aead23f8053f2655e468bcc6ecdceb
    @SQ SN:chrUn_gl000228 LN:129120 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:c5a17c97e2c1a0b6a9cc5a6b064b714f
    @SQ SN:chrUn_gl000229 LN:19913 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:d0f40ec87de311d8e715b52e4c7062e1
    @SQ SN:chrUn_gl000230 LN:43691 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:b4eb71ee878d3706246b7c1dbef69299
    @SQ SN:chrUn_gl000231 LN:27386 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:ba8882ce3a1efa2080e5d29b956568a4
    @SQ SN:chrUn_gl000232 LN:40652 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:3e06b6741061ad93a8587531307057d8
    @SQ SN:chrUn_gl000233 LN:45941 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:7fed60298a8d62ff808b74b6ce820001
    @SQ SN:chrUn_gl000234 LN:40531 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:93f998536b61a56fd0ff47322a911d4b
    @SQ SN:chrUn_gl000235 LN:34474 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:118a25ca210cfbcdfb6c2ebb249f9680
    @SQ SN:chrUn_gl000236 LN:41934 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:fdcd739913efa1fdc64b6c0cd7016779
    @SQ SN:chrUn_gl000237 LN:45867 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:e0c82e7751df73f4f6d0ed30cdc853c0
    @SQ SN:chrUn_gl000238 LN:39939 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:131b1efc3270cc838686b54e7c34b17b
    @SQ SN:chrUn_gl000239 LN:33824 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:99795f15702caec4fa1c4e15f8a29c07
    @SQ SN:chrUn_gl000240 LN:41933 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:445a86173da9f237d7bcf41c6cb8cc62
    @SQ SN:chrUn_gl000241 LN:42152 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:ef4258cdc5a45c206cea8fc3e1d858cf
    @SQ SN:chrUn_gl000242 LN:43523 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:2f8694fc47576bc81b5fe9e7de0ba49e
    @SQ SN:chrUn_gl000243 LN:43341 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:cc34279a7e353136741c9fce79bc4396
    @SQ SN:chrUn_gl000244 LN:39929 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:0996b4475f353ca98bacb756ac479140
    @SQ SN:chrUn_gl000245 LN:36651 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:89bc61960f37d94abf0df2d481ada0ec
    @SQ SN:chrUn_gl000246 LN:38154 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:e4afcd31912af9d9c2546acf1cb23af2
    @SQ SN:chrUn_gl000247 LN:36422 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:7de00226bb7df1c57276ca6baabafd15
    @SQ SN:chrUn_gl000248 LN:39786 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:5a8e43bec9be36c7b49c84d585107776
    @SQ SN:chrUn_gl000249 LN:38502 UR:file:/humgen/gsa-hpprojects/GATK/bundle/ucsc.hg19/ucsc.hg19.fasta M5:1d78abec37c15fe29a275eb08d5af236

    you can see that the number of lines "@SQ" in both files are different.
    In the dict file from GATK resource, there are total 93 @SQ, but 84 in bam file.
    And the 94 @SQ in GATK dict file include 84 @SQ in bam header, but the function "CollectSequencingArtifactMetrics" in picard can not accept any differences between them. but, the function "CollectOxoGMetrics" can work very well use the same parameters as "CollectSequencingArtifactMetrics"

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @JakeJi2345
    Hi,

    I am really surprised that any tool works with the differences. They should not. Can you submit a bug report? Instructions are here.

    Thanks,
    Sheila

Sign In or Register to comment.