Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.


efratkligefratklig IsraelMember


About panel-of-normals, I understood that it is not essential but it is better to use it. Is there a standard panel of normals (PON) file been created for GATK4 mutect2 like for germline-resource (ftp://[email protected]/bundle/Mutect2/)? I tried to create a PON from my samples (3 normal samples), but in step 2 - Create a GenomicDB too many files were opened (I stopped it when about 50,000 files were opened). How many files are opened in this step and in addition, is it OK to create a PON from only 3 samples?




  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @efratklig

    We usually recommend a PoN with 40 exome samples. Ideally, the PoN should include technically similar samples that were sequenced on the same platform, e.g. HiSeqX, using the same chemistry and analyzed using the same reference genome and tool-chain. However, even an unmatched PoN is better than no PoN at all. This is because mapping artifacts and polymerase slippage errors occur for pretty much the same genomic loci for short read sequencing approaches.

    Here are some PoNs that we provide for hg38 and hg19:

  • efratkligefratklig IsraelMember

    Hi @bhanuGandham,

    Thanks a lot for your answer. I tried to work with the file for hg38 from the link you mentioned, but there seemed to be a problem with the file, it seems something like html format. Can you please check it?


  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin


    I am not sure I understand. Can you please post a screenshot of the link to this html format.

  • efratkligefratklig IsraelMember

    Hi @bhanuGandham,

    I downloaded the file: 1000g_pon.hg38.vcf.gz from the link:
    but, it wasn't in gzip format and when I opened it the first lines were:


  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin


    I just downloaded it and works just fine for me. Please post the commands ands steps you are following to download the data.

  • efratkligefratklig IsraelMember

    Hi @bhanuGandham,

    I finally solved it. The issue was that I tried to download it from the link URL and it is needed to be download from the Public.

  • adam_diadam_di PKUMember
    edited August 18
    Hi @bhanuGandham,
    I'm using Mutect2 to call somatic mutations following the best practice, and I used the PoN file for hg38 provided by you above. Besides, all the files are from the resource bundle.However, an error message occurred like this:
    [error message]
    org.broadinstitute.hellbender.exceptions.GATKException: Error initializing feature reader for path /gpfs/share/home/1801111726/GATK_bundle/somatic-hg38_1000g_pon.hg38.vcf
    at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:375)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:327)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:277)
    at org.broadinstitute.hellbender.engine.FeatureManager.addToFeatureSources(FeatureManager.java:245)
    at org.broadinstitute.hellbender.engine.FeatureManager.initializeFeatureSources(FeatureManager.java:208)
    at org.broadinstitute.hellbender.engine.FeatureManager.<init>(FeatureManager.java:155)
    at org.broadinstitute.hellbender.engine.GATKTool.initializeFeatures(GATKTool.java:485)
    at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:706)
    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.onStartup(AssemblyRegionWalker.java:161)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:137)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)
    Caused by: htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file, for input source: /gpfs/share/home/1801111726/GATK_bundle/somatic-hg38_1000g_pon.hg38.vcf
    at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:263)
    at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:102)
    at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:127)
    at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:121)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:372)
    ... 14 more
    Caused by: htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file
    at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:115)
    at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:79)
    at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:37)
    at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:261)
    ... 18 more

    The command I was using is:
    gatk Mutect2 \
    -R $home_path/GATK_bundle/Homo_sapiens_assembly38.fasta \
    -I $input_ptc_TMB \
    -I $input_normal \
    -normal ${normal_list[$i]}-TMB \
    --germline-resource $home_path/GATK_bundle/af-only-gnomad.hg38.vcf.gz \
    --panel-of-normals $home_path/GATK_bundle/somatic-hg38_1000g_pon.hg38.vcf \
    -O $variant_path/${ptc_list[$i]}-TMB_somatic.vcf.gz

    What is happening? Thank you.
  • jiehuang001jiehuang001 Member


    For me to run GATK CNV, should I use the first PON file 1000g_pon.hg38.vcf.gz listed at https://console.cloud.google.com/storage/browser/gatk-best-practices/somatic-hg38? I thought that sequencing on the 1000G project was done like 10 years ago and it was on low-depth. So, is this really a good reference?

    Is this PON file for me to call CNV from WGS or WES?

    Also, does it matter if my sequenced samples are from a different race?

    BTW, I have a question on hg19 vs. hg38. I thought that all SNPs have "higher" positions in hg38 compared to hg19. It is like the house number on a street. When more and more houses are built on the street (more and more new DNA are found on the genome), most existing house will get a higher number. However, I found that is not always the case. Some SNPs got higher position while others got lower position when lifted from hg19 to hg38.

    Thank you very much & best regards,

  • adam_diadam_di PKUMember
    Problem solved.
    Thank you
  • jiehuang001jiehuang001 Member

    Problem solved?

  • 29043594952904359495 Member

    do you have access to this ftp?
    germline-resource (ftp://[email protected]/bundle/Mutect2/)
    but I need the password

    , how do you solve this, thanks a lot

  • bshifawbshifaw moonMember, Broadie, Moderator admin

    @2904359495 ,

    There isn't a password associated with gsapubftp-anonymous. Try to continue without entering a password, if that doesn't work there might something wrong with your network permissions which i'm not familiar in solving.

    Here is some info on ftp servers

Sign In or Register to comment.