Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Call somatic short variants with mutect - BAM file contigs not matching the reference

Adam_U0Adam_U0 Member
edited September 2018 in Ask the GATK team

Dear GATK Staff,

I read a lot about this problem however it still occurs. I think I did all that I can withouts succes. Here is my run-function based on your tutorial about calling somatic variants:

gatk --java-options "-Xmx2g" Mutect2 \
      -R ucsc.hg19.fasta \
      -I 1_Tumor_sorted_markduplicates_RG.bam \
      -I 1_Blood_markduplicates_RG.bam \
      -tumor 1_tumor \
      -normal 1_normal \
      -pon 1_2_3_threesamplepon_chr.vcf.gz \
      --germline-resource af-only-gnomad.raw.sites.hg19.vcf.gz \
      --af-of-alleles-not-in-resource 0.0000025 \
      --disable-read-filter MateOnSameContigOrNoMappedMateReadFilter \
      -O P129_somatic_m2.vcf.gz \
      -bamout P129_tumor_normal_m2.bam

All reference files I found here:

bioinfo5pilm46.mit.edu/software/GATK/resources/

Unfortunately it's still a problem with chromosome names:

reads contigs = [chr1, chr2, chr3, chr4, chr5....]
reads features = [1,2,3....]

I checked everything, the lengths and names of chromosomes of my .bam files are exactly the same as in the reference.

Lengths:

samtools view -H 1_Tumor_sorted_markduplicates_RG.bam |grep '^@SQ'>chromosomes.txt

same as in here:

http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.chrom.sizes

Chromosome names:

samtools idxstats 1_Tumor_sorted_markduplicates_RG.bam | head -n 3
Each starts with 'chr'.

In the reference, also record in fasta starts with 'chr'.

In case of 1_2_3_threesamplepon_chr.vcf.gz I also checked it, each row after header starts with 'chr'.

af-only-gnomad.raw.sites.hg19.vcf.gz - also each row after header starts with 'chr'...

Everything is fine, but it doesn't work, there's always a problem, always an error, on each step of the analysis - based on your tutorial...

I'm fighting with this since monday during whole days... Please, could you help me?

Best Regards,
Adam

Best Answer

  • Accepted Answer

    ANSWER:

    In case of 1_2_3_threesamplepon_chr.vcf.gz I also checked it, each row after header starts with 'chr'.

    Yes because I add 'chr' manually. I created three files for PoN again using mentioned above reference - ucsc.hg19.fa and function worked.

    This question can be removed.

    Best regards,
    Adam

Answers

  • Adam_U0Adam_U0 Member
    Accepted Answer

    ANSWER:

    In case of 1_2_3_threesamplepon_chr.vcf.gz I also checked it, each row after header starts with 'chr'.

    Yes because I add 'chr' manually. I created three files for PoN again using mentioned above reference - ucsc.hg19.fa and function worked.

    This question can be removed.

    Best regards,
    Adam

Sign In or Register to comment.