Mutect 2 B38 germline resource

Hi,

Congratulations on GATK 4.0!

I'm looking at the instructions for Mutect2 where it suggests using a germline resource "--germline-resource af-only-gnomad.vcf.gz".

Do you have a version of this for b38 coming? Or know where I could obtain one?

Thanks

Dan

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited January 22

    @Dan_C
    Hi Dan,

    We only provide the b37 version right now. You will have to use Picard's LiftoverVCF to liftover that version to hg38. This thread may help as well.

    -Sheila

  • Dan_CDan_C Member

    Thanks for the pointer Sheila.

    I have tried cross-map already but there were enough regions that wouldn't lift over that I was a bit cautious about using it.

    I see that there is now a af-only-gnomad.hg38.vcf.gz file on your ftp. Is that one that you have lifted over?

    Thanks

    Dan

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Dan_C
    Hi Dan,

    Yes, that file was lifted over from b37, however, if I am not mistaken, it only contains sites from chr17.

    -Sheila

  • Dan_CDan_C Member

    Hi Sheila,

    It looks like it contains all the chromosomes. e.g.

    tabix af-only-gnomad.hg38.vcf.gz chr1 |head -n 2
    chr1 10067 . T TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC 30.35 PASS AC=3;AF=7.384e-05
    chr1 10108 . CAACCCT C 46514.30 PASS AC=6;AF=0.0001525

    tabix af-only-gnomad.hg38.vcf.gz chr22 |head -n 2

    chr22 10584988 rs193197355 G A 83553.80 PASS AC=263;AF=0.008503
    chr22 10584990 . T A 82.80 PASS AC=3;AF=9.699e-05

    Dan

  • shleeshlee CambridgeMember, Broadie, Moderator admin
    edited February 6

    Hi @Dan_C,

    Yes, these are the full-length lifted-over versions of the GRCh37 resources. I placed them in the FTP directory recently and talk about them in https://software.broadinstitute.org/gatk/blog?id=11337 and https://software.broadinstitute.org/gatk/documentation/article?id=11136.

    Interestingly, if you continue digging as you have done, you'll notice that chrY is absent.

    P.S. These lifted-over versions are provided as-is, with no guarantees. I lifted them over for use in the tutorials we write that use GRCh38. I had made a README on how I prepped these GRCh38 counterparts last summer and placed it alongside the Google bucket copies, as the liftover/remap processes were fairly involved.

    P.P.S. I do not recommend cross-map for this type of liftover.

  • Dan_CDan_C Member

    Thanks a lot @shlee for the information. Can you point me towards where the README is? I suspected the liftover might be more involved than it appears (it explains why one isn't already floating around) and it would be good to understand how you've done it. Can I ask why you don't recommend cross-map?

    Looking at your tutorial you suggest altering --af-of-alleles-not-in-resource for exome data. What would you recommend for whole genomes - which would include non-coding regions only covered by the 16k and coding regions covered by the ~200,000?

    Thanks again

    Dan

    Issue · Github
    by shlee

    Issue Number
    4366
    State
    closed
    Last Updated
    Assignee
    Array
    Closed By
    sooheelee
  • shleeshlee CambridgeMember, Broadie, Moderator admin
    edited February 7

    Hi @Dan_C,

    I think your best bet is to ask on the FireCloud forum about changes to that bucket, as I don't own the bucket and contributed data to it once, back in the summer for testing purposes and any modifications since were by others on the FireCloud side.

    I'd love to share what I learned about liftover/remap/crossmap someday and get feedback on what could be improved. It's not a priority in terms of work so this will not likely happen anytime soon if at all. Here is something for you though. Unlike Liftover/Remap, Crossmap retains the original ALT allele and for me gave fewer features back. To be more specific, Crossmap did not change the ALT allele itself for regions of inversion. Here's an example comparison between ENSEMBL-Crossmap and NCBI-Remap:

        7,8c7,8
        < chr1  494322  .       G       C
        < chr1  494515  .       T       A
        ---
        > chr1  494322  .       G       G
        > chr1  494515  .       T       T
    

    One of these has to be wrong! Or at the least, a pain to review and tweak.

    Picard-Liftover and NCBI-Remap work similarly but Picard-Liftover is much much easier to use than Remap (website or Perl script). Last I conversed with the NCBI folks (summer 2017), Remap cannot handle VCF headers that meet VCF specs and that GATK tools produce. So that's another, albeit minor, inconvenience.

    For the --af-of-alleles-not-in-resource, let me consult with the developers to be sure of recommendations. I do believe we recommend changing the parameter to represent the WGS constituents. Either number is rather small compared to the default 0.001. If you want to be super conservative, then you could run the tool twice, once with one value and once with the other value, then subset out variants for the respective coverage regions and combine the calls.

    I should say, if you are using a matched normal, then the effect of the population germline resource will be comparatively minimal. If you are not using a matched normal, then you will want to carefully tune --af-of-alleles-not-in-resource to match the resource.

    Post edited by shlee on
  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @Dan_C,

    For the --af-of-alleles-not-in-resource question, our developer says:

    We may eventually infer the value locally from the AC of nearby variants in the germline resource. Until then, I would use the smaller value i.e. the one for coding regions. The only possible harm would be a very few rare germline events that aren't in gnomAD, whereas if you set it too high you risk filtering true somatic events.

Sign In or Register to comment.