Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

GetPileupSummaries - ERROR Badly formed genome unclippedLoc

atariatari SwitzerlandMember

Hello,
I am getting the following error (pasted below) running GetPileupSummaries.
I used hg38 without alternative contigs !
I am running GATK 4.1.2.0 and I strictly followed GATK best practice for Mutect2 (but I don't use alternative contigs).

Thx a lot


GetPileupSummaries - Initializing engine
14:19:52.446 INFO FeatureManager - Using codec VCFCodec to read file file:///hpcnfs/scratch/CRU/afranceschini/htsflow/data/input/genomes/hg38/dbsnp/biallelic_hg38_af-only-gnomad.vcf.gz
14:19:52.681 INFO FeatureManager - Using codec VCFCodec to read file file:///hpcnfs/scratch/CRU/afranceschini/htsflow/data/input/genomes/hg38/dbsnp/biallelic_hg38_af-only-gnomad.vcf.gz
14:33:32.019 INFO GetPileupSummaries - Shutting down engine
[May 16, 2019 2:33:32 PM CEST] org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 13.70 minutes.
Runtime.totalMemory()=22424846336


A USER ERROR has occurred: Badly formed genome unclippedLoc: Contig chr1_KI270766v1_alt given as location, but this contig isn't present in the Fasta sequence dictionary


Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

Answers

  • atariatari SwitzerlandMember

    My biallelic_hg38_af-only-gnomad.vcf.gz file contained those alternative contigs. Now I am simply removing those with a "grep -v _alt" command.
    Is it there a command to remove it at the level of the SelectVariant application ?
    Thankyou

  • AdelaideRAdelaideR Member admin

    I think using grep is a good way to do this, as it removes all the lines that contain the "_alt" pattern in it.

    The option in SelectVariant is:

    --removeUnusedAlternates / -trimAlternates
    Remove alternate alleles not present in any genotypes
    When this flag is enabled, all alternate alleles that are not present in the (output) samples will be removed. Note that this even extends to biallelic SNPs - if the alternate allele is not present in any sample, it will be removed and the record will contain a '.' in the ALT column. Note also that sites-only VCFs, by definition, do not include the alternate allele in any genotype calls.
    
  • atariatari SwitzerlandMember

    I tried to add the option "--remove-unused-alternates true" in SelectVariants (GATK 4.1.2.0).
    It doesn't seem to work. If I grep the resulting vcf file I still see a lot of entries like "chr1_KI270766v1_alt".

  • AdelaideRAdelaideR Member admin

    @atari

    I imagine the grep -v command might be the best alternative then.

    Did that work for you?

  • atariatari SwitzerlandMember

    Yes, the grep command works fine. I will use that one. Thx.

Sign In or Register to comment.