Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

Mutect 2 on mouse cells

Hi,
I am hoping to run Somatic-SNVs-Indels-GATK4 on whole exomes from a paired mouse cell lines. Just wondering where can I find the relevant intervals and reference files for this? Thanks!

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @pkhadka

    The mouse reference is called GRCm10 and can be downloaded from the NCBI website; https://www.ncbi.nlm.nih.gov/assembly/GCF_000001635.26. Unfortunately I don't know of any interval files for mouse.

  • bshifawbshifaw Member, Broadie, Moderator admin
    edited July 5

    @pkhadka

    I am trying to run your gatk4_mutect2 pipeline on Firecloud to call mutations from mouse whole genomes. You had previously helped me with this for human samples. I am having some trouble finding the right files to feed to the method for mouse genome. I was able to download mm9 mouse genome online, but I am not sure how to get the index, dictionary as well as SNP files. I was wondering if you had these files and if it would be possible to get access to them?

    We don't offer any resource files other than hg38 and b37(hg19) so you'll need to find/build your own for mouse. Since you have the mm9 reference genome you can quickly create the index and dictionary files for the reference using the following documented instructions how-can-i-prepare-a-fasta-file-to-use-as-reference.

    I'm not sure what you mean by SNP files, the only other resource files listed for the pipeline in the workflow json is gnomad and contamination. You'll need to find a gnomad like resource for the workflow, essentially its a gemline population resource provided to Mutect2 as evidence of alleles being germline or somatic. More info on the original gnomad resource can be found here: https://gnomad.broadinstitute.org/about.

  • pkhadkapkhadka Member

    @bshifaw thanks for the response. I was indeed referring to the gnomad resource file. Since I am running a matched tumor-normal pair, is it okay to run the method without providing this file?

  • bshifawbshifaw Member, Broadie, Moderator admin

    Mutect2 rejects from consideration any sites that are most likely to be germline or artifacts based on the paired normal, germline population resource (gnomad), and a panel of normal.

    Having a germline population resource is beneficial and is part of best practices, but you still have the paired normal and panel of normal to help reject sites that are germline. I'm not certain how much more germline sites would be included in the results without this file.

  • pkhadkapkhadka Member

    @bshifaw I ran the method without gnomad-like resource and pon since I don't have these available right now. It ran fine but has been stuck in the "Mutect2.M2" task for over two days. It seems like this task completed successfully but it's not calling the next task. Do you know what the problem could be?

  • bshifawbshifaw Member, Broadie, Moderator admin

    Where is this method being run? There should be a stderr or stdlog file by cromwell detailing why the workflow stopped.

  • pkhadkapkhadka Member

    @bshifaw I shared the workspace with you if you want to take a look

  • bshifawbshifaw Member, Broadie, Moderator admin
    edited July 9

    Just a heads up for anyone else following along on the thread, the last question is more of a Terra question and would be best answered on the Terra forum.

    Looks like one of your shards (23) in the M2 task is still running, thats why it hasn't moved on to the next task. Probably because the region its processing is a bit tricky. The shared has prememption set to 10 so it may run up to 10 times before running to completion if it continues to be preempted, unfortunately this shard takes longer than 24 hours to complete and google automatically terminates preemptible instances after they run for 24 hours.

    Many of your other shards also take a very long time to complete, ~16 hours. You may just want to increase the scatter_count for the workflow. Currently its set to 50, increasing this will spread out the work out to more shards and hopefully decrease the run time for the shards.

    You could also let it get preempted the max number of times (10), then on its final attempt (11th) it should run on a non preemptable machine which wouldn't get terminated after 24hrs and should run to completion.

    Post edited by bshifaw on
  • pkhadkapkhadka Member
    edited July 10

    @bshifaw I was able to get through that step by setting the scatter_count to 500 (it didn't work when I set it to 200) but I am getting an error at MergeVCFs step. It looks like something to do with the sorting of the VCFs but I am not sure why. I didn't supply any file for pon or gnomad input (which I think needs to have the same sorting order as the chromosomes?)
    java.lang.IllegalStateException: The elements of the input Iterators are not sorted according to the comparator htsjdk.variant.variantcontext.VariantContextComparator

  • bshifawbshifaw Member, Broadie, Moderator admin
    edited July 15

    You may need to create another version of the workflow with an additional task to sort the VCFs before merging them using a reference dictionary. To confirm this is the case you can look at the output vcfs header and check whether they follow MergeVCFs requirements.

    The input variant data must adhere to the following rules:
    If there are samples, those must be the same across all input files.
    Input file headers must be contain compatible declarations for common annotations (INFO, FORMAT fields) and filters.
    Input files variant records must be sorted by their contig and position following the sequence dictionary provided or the header contig list.

    This workflow hasn't been tested on samples other than human so not sure how it will react.

    Here is a related forum post to your error message:
    mergevcfs-elements-of-the-input-iterators-are-not-sorted-according-to-the-comparator

    Post edited by bshifaw on
Sign In or Register to comment.