To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

TCGA legacy archive, muTect v119?

Hello,
I want to reproduce some of the .vcf files I got from TCGA legacy archive from their .bam files. The headers included the following line:
softwareName=<muTect,CallIndelsPipeline>,softwareVer=<119,65>,softwareParam=<.>
I wonder if it indicates the MuTect version is 1.1.9 because the distributed version from CGA is 1.1.4, GATK hosts 1.1.7. Googled for v1.1.9 only saw people mentioning v1.1.7 or earlier.
Also, the current TCGA pipeline involved MuTect2. However one difference I noticed was that MuTect2 would give each record the value:
##FORMAT=<ID=AF,Number=1,Type=Float,Description="Allele fraction of the event in the tumor">
with almost default parameters. A vcf file from TCGAlegacy would instead include:
##FORMAT=<ID=FA,Number=.,Type=Float,Description="Fractions of reads (excluding MQ0 from both ref and alt) supporting each reported alternative allele, per sample">
So I guess it was not a version2 they used? If I can't get a v1.1.9 (i.e. assuming it was a tweaked version by TCGA team), could v1.1.7 or v2 be a equivalent tool? (I'm reproducing the germline callings, so somatic filter seems don't matter too much here)

Thank you!

Tagged:

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator
    edited October 2017

    @Nrium
    Hi,

    The VCF header should have all the commands and tools that were run listed under the list of FORMAT annotations.

    What exactly are you trying to do? For producing germline only calls, you should use HaplotypeCaller.

    -Sheila

  • Hi @Sheila ,

    The VCF files from TCGAlegacy repository don't appear to include notes of their commands and tools (for example a header may say softwareName=<muTect,CallIndelsPipeline>,softwareVer=<119,65>,softwareParam=<.> without further details). TCGA's current release might have them, but all germline calls are filtered out there.

    I noticed some inconsistency between TCGA's VCF and a call made by myself (however with different QC rules). Could be a matter of QC or differences between calling methods. Have been using HaplotypeCaller, too, but would be great if I can reproduce TCGA's results by following their pipeline.

    Xiaowen

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Nrium
    Hi Xiaowen,

    Okay, before we delve into this further, I need you to confirm this difference occurs if you use the exact same QC rules as the TCGA run.

    Also, I just tried Mutect v1, and the VCF I get clearly specifies the command and version in the header.

    Thanks,
    Sheila

Sign In or Register to comment.