Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

panel-of-normals

efratkligefratklig IsraelMember

Hello,

About panel-of-normals, I understood that it is not essential but it is better to use it. Is there a standard panel of normals (PON) file been created for GATK4 mutect2 like for germline-resource (ftp://[email protected]/bundle/Mutect2/)? I tried to create a PON from my samples (3 normal samples), but in step 2 - Create a GenomicDB too many files were opened (I stopped it when about 50,000 files were opened). How many files are opened in this step and in addition, is it OK to create a PON from only 3 samples?

Thanks

Tagged:

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @efratklig

    We usually recommend a PoN with 40 exome samples. Ideally, the PoN should include technically similar samples that were sequenced on the same platform, e.g. HiSeqX, using the same chemistry and analyzed using the same reference genome and tool-chain. However, even an unmatched PoN is better than no PoN at all. This is because mapping artifacts and polymerase slippage errors occur for pretty much the same genomic loci for short read sequencing approaches.

    Here are some PoNs that we provide for hg38 and hg19:
    https://console.cloud.google.com/storage/browser/gatk-best-practices/somatic-hg38
    https://console.cloud.google.com/storage/browser/gatk-best-practices/somatic-b37

  • efratkligefratklig IsraelMember

    Hi @bhanuGandham,

    Thanks a lot for your answer. I tried to work with the file for hg38 from the link you mentioned, but there seemed to be a problem with the file, it seems something like html format. Can you please check it?

    Thanks

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    @efratklig

    I am not sure I understand. Can you please post a screenshot of the link to this html format.

  • efratkligefratklig IsraelMember

    Hi @bhanuGandham,

    I downloaded the file: 1000g_pon.hg38.vcf.gz from the link:
    https://console.cloud.google.com/storage/browser/gatk-best-practices/somatic-hg38
    but, it wasn't in gzip format and when I opened it the first lines were:

    Thanks

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    @efratklig

    I just downloaded it and works just fine for me. Please post the commands ands steps you are following to download the data.

  • efratkligefratklig IsraelMember

    Hi @bhanuGandham,

    I finally solved it. The issue was that I tried to download it from the link URL and it is needed to be download from the Public.
    Thanks

  • adam_diadam_di PKUMember
    edited August 18
    Hi @bhanuGandham,
    I'm using Mutect2 to call somatic mutations following the best practice, and I used the PoN file for hg38 provided by you above. Besides, all the files are from the resource bundle.However, an error message occurred like this:
    [error message]
    org.broadinstitute.hellbender.exceptions.GATKException: Error initializing feature reader for path /gpfs/share/home/1801111726/GATK_bundle/somatic-hg38_1000g_pon.hg38.vcf
    at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:375)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:327)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:277)
    at org.broadinstitute.hellbender.engine.FeatureManager.addToFeatureSources(FeatureManager.java:245)
    at org.broadinstitute.hellbender.engine.FeatureManager.initializeFeatureSources(FeatureManager.java:208)
    at org.broadinstitute.hellbender.engine.FeatureManager.<init>(FeatureManager.java:155)
    at org.broadinstitute.hellbender.engine.GATKTool.initializeFeatures(GATKTool.java:485)
    at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:706)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.onStartup(AssemblyRegionWalker.java:161)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:137)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)
    Caused by: htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file, for input source: /gpfs/share/home/1801111726/GATK_bundle/somatic-hg38_1000g_pon.hg38.vcf
    at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:263)
    at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:102)
    at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:127)
    at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:121)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:372)
    ... 14 more
    Caused by: htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file
    at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:115)
    at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:79)
    at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:37)
    at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:261)
    ... 18 more

    The command I was using is:
    gatk Mutect2 \
    -R $home_path/GATK_bundle/Homo_sapiens_assembly38.fasta \
    -I $input_ptc_TMB \
    -I $input_normal \
    -normal ${normal_list[$i]}-TMB \
    --germline-resource $home_path/GATK_bundle/af-only-gnomad.hg38.vcf.gz \
    --panel-of-normals $home_path/GATK_bundle/somatic-hg38_1000g_pon.hg38.vcf \
    -O $variant_path/${ptc_list[$i]}-TMB_somatic.vcf.gz

    What is happening? Thank you.
  • jiehuang001jiehuang001 Member

    Hi,

    For me to run GATK CNV, should I use the first PON file 1000g_pon.hg38.vcf.gz listed at https://console.cloud.google.com/storage/browser/gatk-best-practices/somatic-hg38? I thought that sequencing on the 1000G project was done like 10 years ago and it was on low-depth. So, is this really a good reference?

    Is this PON file for me to call CNV from WGS or WES?

    Also, does it matter if my sequenced samples are from a different race?

    BTW, I have a question on hg19 vs. hg38. I thought that all SNPs have "higher" positions in hg38 compared to hg19. It is like the house number on a street. When more and more houses are built on the street (more and more new DNA are found on the genome), most existing house will get a higher number. However, I found that is not always the case. Some SNPs got higher position while others got lower position when lifted from hg19 to hg38.

    Thank you very much & best regards,
    Jie

  • adam_diadam_di PKUMember
    Problem solved.
    Thank you
  • jiehuang001jiehuang001 Member

    Problem solved?

  • 29043594952904359495 Member

    @efratklig
    do you have access to this ftp?
    germline-resource (ftp://[email protected]/bundle/Mutect2/)
    but I need the password

    , how do you solve this, thanks a lot

  • bshifawbshifaw Member, Broadie, Moderator admin

    @2904359495 ,

    There isn't a password associated with gsapubftp-anonymous. Try to continue without entering a password, if that doesn't work there might something wrong with your network permissions which i'm not familiar in solving.

    Here is some info on ftp servers

  • jiehuang001jiehuang001 Member

    Hi, I still have some questions on this panel of normals:

    First, when I try to download using wget https://storage.cloud.google.com/gatk-best-practices/somatic-hg38/1000g_pon.hg38.vcf.gz, I will get a XML kind of file. For now, I have to click the link and download it to my laptop, and then transfer it to my server. So, how could i download the files directly to my server using a command such as wget?

    Second, is the af-only-gnomad.hg38.vcf.gz file available from the gnomad.broadinstitute.org "official" website, so that I am sure that I could always get the latest version? Currently, I foudn that the files on the official website is too big, with a lot of other annotation fields.

    Third, below is the screenshot of the first few lines of 1000g_pon.hg38.vcf.gz, which has a total of ~2.6 million lines. It simply has CHROM POS REF ALT. Is this how it is supposed to be? Does this mean these ~2.6 million spots are "normal", without duplication or deletion?

    I really need to get these questions straight. Please help!

    Best regards,
    Jie

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    This PON is not for CNV calling purposes. This is for somatic variant calling PON for mutect2. CNV PON must be generated by you using samples obtained with same technology and library prep which the number should be at least 30.

    You need to run at least 30 samples with similar prep and technology in COHORT mode to generate a model for calling CNVs later on.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited August 30

    Thank you for you input @SkyWarrior. GATK team appreciates the help.
    We are currently facing a large volume of questions on the forum and are looking into ways to tackle this volume and help everyone. In the meantime, we encourage the community to share suggestions and solutions.
    Thank you!

Sign In or Register to comment.