Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

How do I use CombineVariants to combine snps and Indels ?

buskbusk SkienMember

When I use the following command to combine snp and INDEL calls from the same sample, they appear as two samples in the output-vcf. Hence, I also get two lanes with genotype-info, one for the Indels, and one for the snps. Is there a way to combine them so that the output vcf have only one lane of genotype-info?

java -Xmx8g -jar ' . $GATKdir . 'GenomeAnalysisTK.jar \\ -R ' . $hg19ref . '.fasta \\ -T CombineVariants \\ --variant snp.vqsr.filter.vcf \\ --variant indel.filter.vcf \\ -o ' . $fname . 'CombinedVariants.filter.vcf \\ -genotypeMergeOptions UNIQUIFY \\ 2>errCombineVar > CombineVarInfo.txt

Ø

Best Answer

Answers

  • AlFatlawiAlFatlawi Member
    Hi,
    is the PRIORITIZE will fit here?
    thanks
  • bshifawbshifaw Member, Broadie, Moderator admin

    Hi @AlFatlawi

    CombineVariants is a GATK3 tool and we currently only support GATK4. There is documentation that you can reference for the tool Here.

  • AlFatlawiAlFatlawi Member
    Hi @bshifaw
    Thanks for your reply.
    I filtered the SNPs and INDEL separately. Now I would like to combine the two vcf files.
    I moved recently to GATK 4, please what is the alternative?
    Regards
  • bshifawbshifaw Member, Broadie, Moderator admin

    MergeVcfs combines multiple variant files into a single variant file in GATK4 but it does not have the genotypeMergeOptions argument.

  • AlFatlawiAlFatlawi Member
    @bshifaw
    Thanks a lot for your fast response,
    Please, I would like to confirm my procedure with you:
    1- after applying the HaplotypeCaller, I merged the samples with CombineGVCFs, then apply GenotypeGVCFs, is that correct?
    The reason why I didn't use GenomicsDBImport is: this needs interval to work on a specific chromosome, but I need to merge all the at the same time.

    2- Is there any way I can apply GenomicsDBImport for all chromosomes (multiple intervals)?

    3- After that, I separate SNP and INDEL of the merged file with SelectVariants, then applied hard filtering on each of them. Then, based on your answer, I used MergeVcfs to merge the SNP and INDEL.
    is this also right?

    Sorry for many questions but I am new to GATK.

    Regards
  • bshifawbshifaw Member, Broadie, Moderator admin
    edited May 9
    1. Yes, according to the Germline short variant discovery you are on the right track.
    2. After GATK 4.0.8.0 (mentioned here) you can supply GenomicsDBImport more than one interval, so you can provide the command with all the chromosomes.
    3. That's correct.
    Post edited by bshifaw on
  • NeginNegin Member
    Dear @bshifaw,

    I also have a question in this regard,
    I am working on multiple bacteria dna sequences and using GATK4. For applying GenotypeGVCFs, first, I merged them with CombineVariants because GenomicsDBImport only supports diploid data! But, now, I see you are suggesting MergeVcfs. so, do you think in my case, I should replace CombineVariants with MergeVcfs?

    Thanks in advance,
    Negin
  • NeginNegin Member
    > @Negin said:
    > Dear @bshifaw,
    >
    > I also have a question in this regard,
    > I am working on multiple bacteria dna sequences and using GATK4. For applying GenotypeGVCFs, first, I merged them with CombineVariants because GenomicsDBImport only supports diploid data! But, now, I see you are suggesting MergeVcfs. so, do you think in my case, I should replace CombineVariants with MergeVcfs?
    >
    > Thanks in advance,
    > Negin

    Sorry, I made a mistake in my post!
    I applied CombineGVCFs rather than CombineVariants.
    Just to make it more clear, first I applied HaplotypeCaller for each sample separately and I saved the result of each in a separate gvcf file, and then I applied CombineGVCFs to merge these gvcf files and after that I applied GenotypeGVCFs. So, do you think I should replace CombineGVCFs with MergeVcfs?
    thanks, again
  • bshifawbshifaw Member, Broadie, Moderator admin

    @Negin
    No, Do not replace CombineGVCFs with MergeVcfs. You are correct in using CombineGVCFs after Haplotypecaller to merge your GVCFs. Also it looks like GenomicsDBimport does support non-diploid data according to this forum post

  • AlFatlawiAlFatlawi Member
    @bshifaw
    you are great team, thanks for your amazing support.
    Regards
  • NeginNegin Member
    thanks @bshifaw
Sign In or Register to comment.