Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

How do we know the stretch of variants that have been phased using haplotype caller

dhwanidhwani indiaMember

I am using the following command to run haplotype caller

/opt/apps/gatk/4.2.1/gatk HaplotypeCaller
--dbsnp /home/dhwani.dholakia/archive/files_required_for_exome_analysis/dbsnp/GRCH37.p17_refseq.vcf
-R /home/dhwani.dholakia/archive/files_required_for_exome_analysis/reference/Homo_sapiens.GRCh37.dna.chromosome.6.fa
-I base_recalib/abc_aligned_sorted_dupmarked_realigned_recalibrated.bam
-O haplotype_caller/abc_haplotyper.g.vcf
--emit-ref-confidence GVCF
-L home/dhwani.dholakia/archive/files_required_for_exome_analysis/coord.bed
--max-assembly-region-size 1000
-mbq 25
--native-pair-hmm-use-double-precision true
--bam-writer-type CALLED_HAPLOTYPES
-stand-call-conf 40
--activity-profile-out dd.txt

1) I would like to know like GATK 3.6 there was an option to define active regions, is there any option in GATK v4.2.1.
2) How do we know the variants in vcf file that they are phased.
As per my understanding the symbol "|" represents hat they are phased. But which parameters that i had missed could give me information that variants starting from one position to another is phased.

Best Answer

  • bhanuGandhambhanuGandham Cambridge MA admin
    Accepted Answer

    @dhwani

    1) Can you please specify which particular option in GATK3.6 you are reffering to
    2) If a variant has physical phasing information, it will have the "|" symbol in the PGT in the FORMAT filed and will also have an associated PID.

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    Accepted Answer

    @dhwani

    1) Can you please specify which particular option in GATK3.6 you are reffering to
    2) If a variant has physical phasing information, it will have the "|" symbol in the PGT in the FORMAT filed and will also have an associated PID.

  • dhwanidhwani indiaMember
    edited April 30

    @bhanuGandham . Please check this link

    https://software.broadinstitute.org/gatk/documentation/tooldocs/3.5-0/org_broadinstitute_gatk_tools_walkers_diagnostics_FindCoveredIntervals.php

    Have a look at advanced input which has a parameter name --activeRegionIn, which if i correctly understand as the list of interval (can be in bed file) would help to define active regions which would be further used for phasing.

    Post edited by dhwani on
  • dhwanidhwani indiaMember
    edited April 30

    @bhanuGandham
    With respect to second question i checked my vcf files out of 228 rows only two rows have PID information. How do i go in such a case. Have i done any mistake in my analysis .
    My commands are as follows:-
    /opt/apps/gatk/4.2.1/gatk Haome_analysis/reference/Homo_sapiens.GRCh37.dna.chromosome.6.fa -I base_recalib/abc_aligned_sorted_dupmarked_realigned_recalibrated.bam -O haplotype_caller/abc_haplotyper.g.vcf --emit-ref-confidence GVCF -L /home/dhwani.dholakia/archive/files_required_for_exome_analysis/coord.bed --max-assembly-region-size 1000 -mbq 25 -stand-call-conf 40

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @dhwani

    In GATK 4.1.2.0, if you want to enforce all regions in the interval list to be used as active regions this is the option to use: --force-active.

    HaplotypeCaller automatically emit phased genotypes (i.e the PGT and PID tags and 0|1 instead of 0/1 genotypes) in any mode. However, the phasing algorithm is very conservative and a huge amount of phasing is not found. So no doesn't look like you are making any mistakes.

Sign In or Register to comment.