About "Ask the team"

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
edited January 2013 in Ask the GATK team
This is the place to post any questions, problems or bug reports for the GATK development team to look at.

We try to respond within a reasonably short amount of time, but keep in mind that we are not support agents -- we are programmers and scientists, with code to write and data to analyze. In the same spirit, because our resources are limited and our time precious, we ask that you please consult all available sources of information in the GATK Guide and previous posts in this forum before posting your question here, and above all, refrain from posting problems that are clearly identified as USER ERRORS in the GATK's output. Thank you!

We also welcome discussions and responses from everyone in the user community. If you know something, say something!
Post edited by Geraldine_VdAuwera on

Comments

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    edited November 2012

    Please note that this post is intended to explain what this section of the forum is about, not for asking actual questions about the software. If you have a question that you would like to address to the GATK development team, please use the big "Ask a Question" button at the top left of this page to create a new post.

  • msp_a11msp_a11 Member

    It seems that this button isn't showing on my page - could you please advise what to do in this case? Many thanks!

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    Hi @msp_a11

    We had to change some settings to control spam. I reset the filter. Can you please try again? Thank you.

    Regards
    Bhanu

  • msp_a11msp_a11 Member

    Thanks - now working! I was starting to doubt myself ;)

  • Theostef974Theostef974 Member
    edited November 15

    Hi,

    I'm trying to ask something on the forum but I don't have the "ask a question button"...
    Can you tell me how to do in this case?

    Thank you,

  • Oh I undersand thank you. Do you think the technical difficulty will be long? Because I don't know in which thread I could ask without being out of the main subject of section.

    Thank you again,

    Theo

  • Hello I am still in the impossibility to write a new question on the forum...Is there still a problem?

    thanks

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    Hi @Theostef974

    Yes there is still a problem and as it is a holiday week we will be able to get around to this only next week. We are trying to make our forum spam proof. Couple of options you can try:
    1) Being more active on the forum will increase your ranking and you will be able to post new threads. Currently only low rank users cannot start new threads.
    2) Ask questions on existing threads.

    I hope this helps. :smile:

    Regards
    Bhanu

  • Thank you for you answer,

    I'm going to try to ask my question in an existing thread in relation with my interrogation.

    Theo

  • oneillkzaoneillkza Member

    Running joint-discovery-gatk4-local.wdl on hg19

    (Posting this here, since per the above posts, the "ask a question" button is disabled. Please feel free to move this to a thread.)

    I'm trying to run the joint-discovery-gatk4-local.wdl on data aligned to hg19. You have provided example input json files, but only for the hg38 case. I'm in the process of generating the (many!) inputs it needs, but had a few questions:

    Question 1

    Many of the files needed are supplied in the GATK bundle ftp site. However, the centre I'm at has banned regular ftp (we can only use sftp), and there are quite a few files to download. Is there an easy way to get the contents of ftp://ftp.broadinstitute.org/bundle/hg19/ in a single file? My alternatives are to download the files one by one via the web browser, or to write a script using wget to scrape them.

    Question 2

    The input file lists a number of resource files, all of which have obvious corresponding files available in ftp://ftp.broadinstitute.org/bundle/hg38/. However, it looks like there's been a lot of consolidation of files between hg19 and hg38, and it's not entirely clear which ones to use (e.g. there are two different dbSNP files, two different hapmap files, etc). Is there a table somewhere documenting which of these are the best practices to include?

    "##_COMMENT4": "RESOURCE FILES",
      "JointGenotyping.dbsnp_vcf": "/home/bshifaw/broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf",
      "JointGenotyping.dbsnp_vcf_index": "/home/bshifaw/broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx",
      "JointGenotyping.one_thousand_genomes_resource_vcf": "/home/bshifaw/broad-references/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz",
      "JointGenotyping.one_thousand_genomes_resource_vcf_index": "/home/bshifaw/broad-references/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi",
      "JointGenotyping.omni_resource_vcf": "/home/bshifaw/broad-references/hg38/v0/1000G_omni2.5.hg38.vcf.gz",
      "JointGenotyping.omni_resource_vcf_index": "/home/bshifaw/broad-references/hg38/v0/1000G_omni2.5.hg38.vcf.gz.tbi",
      "JointGenotyping.mills_resource_vcf": "/home/bshifaw/broad-references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz",
      "JointGenotyping.mills_resource_vcf_index": "/home/bshifaw/broad-references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi",
      "JointGenotyping.axiomPoly_resource_vcf": "/home/bshifaw/broad-references/hg38/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz",
      "JointGenotyping.axiomPoly_resource_vcf_index": "/home/bshifaw/broad-references/hg38/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz.tbi",
      "JointGenotyping.hapmap_resource_vcf": "/home/bshifaw/broad-references/hg38/v0/hapmap_3.3.hg38.vcf.gz",
      "JointGenotyping.hapmap_resource_vcf_index": "/home/bshifaw/broad-references/hg38/v0/hapmap_3.3.hg38.vcf.gz.tbi",
    

    **Question 3 **

    How important are these resource files / do they constitute best practices? The WDL as written in that repository requires that all of them be provided as inputs, and Cromwell won't execute it if they aren't. However, I note that @Geraldine_VdAuwera has a year-old pull request with a version of the WDL file that does not require any of these resources. Is it safe to use (or adapt) that, or does not using the known SNP VCFs fall outside GATK best practices?

    Question 4

    I've managed to generate my own JointGenotyping.eval_interval_list by using something like the below script.

    $VCFUTILS splitchr -l 50000000 ./GRCh37-lite.fa.fai > hg19_intervals_50M.txt
    cat hg19_intervals_50M.txt | tr ':' '\t' | tr '-' '\t' > hg19_intervals_50M.bed
    $GATK BedToIntervalList -I hg19_intervals_50M.bed -O hg19_intervals_50M.list -SD GRCh37-lite.dict
    

    However, I note that there is also a need for a JointGenotyping.unpadded_intervals_file. In the hg38 JSON, this is /home/bshifaw/broad-references/hg38/v0/hg38.even.handcurated.20k.intervals. However, there does not seem to be an equivalent even for hg38 in the Broad bundle ftp site. What is this file, how is it generated, is it critical to the running of joint genotyping, and if not, what do I need to change in the WDL to disable it as an input?

    I'll probably have some more questions as a I go, but thought this would be a good start.

    Thanks!

  • oneillkzaoneillkza Member

    Moved my question (in the post above) to its own thread

  • AdelaideRAdelaideR Unconfirmed, Member, Broadie, Moderator admin

    @Theostef974 Please check the banner at the top of the page for links to where you can post your question as a new user. We hope to have the forum up and running as usual soon.

Sign In or Register to comment.