Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

SelectVariants hangs while filtering out the biallelic SNP's

System specification
The version of my JAVA is: 1.7.0_79
The version of my GATK is: 3.7.0
My OS is: Ubuntu 14.04.2 LTS
My processor is: Intel Core i5-4440 CPU @ 3.10GHz × 4

I have been successful in creating a vcf file using Unified Genotyper. This Vcf file was obtained from a merged bam.

Information about the bam files
The merged bam was obtained from 10 different bam files.(10 different samples but of same organism)
These individual bam files are not very big either ranging around 50-100 Mb and the merged bam is 368M in size.
The needful such as co-ordinate sorting and adding the readgroups have been done.
There has been no issues with them when checked with ValidateSamFile.

Information about the vcf file
The number of records in the vcf are 282931.
The command used for vcf file generation is:
java -jar /dummy/GenomeAnalysisTK-3.7-0-gcfedb67/GenomeAnalysisTK.jar -T UnifiedGenotyper -I Realign.bam -R REF.fasta -o Calling.vcf -glm BOTH

When I try to run SelectVaraints on the above vcf file, the time remaining for the process to complete as indicated in the log file is 1176.1 w.

The command used for filtering the biallelic SNP's is:
nohup java -jar /dummy/GenomeAnalysisTK-3.7-0-gcfedb67/GenomeAnalysisTK.jar -T SelectVariants --variant Calling.vcf -R REF.fasta -o Biallelic.vcf -restrictAllelesTo BIALLELIC &

What is going wrong here is evading me at this point.
The VCF file generation took around 2 days using UnifiedGenotyper but the filtering is just hanging.
How can I fasten this step and what may be sources of error for such a behaviour

Best Answers

Answers

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    upgrade to Java 1.8

    I guess that is the biggest issue.

  • @SkyWarrior said:
    upgrade to Java 1.8

    I guess that is the biggest issue.

    Even with Java 1.8 it is slow.

  • @SkyWarrior said:
    It seems that there is something inherently wrong going on here. Unified Genotyper is usually very fast and it should not take more than a few hours even with the largest files.

    All most all the utilities are slow not just a few. Its really troubling me as to what may be going wrong. There is sufficient RAM space as well.

  • @shubhra said:
    System specification
    The version of my JAVA is: 1.7.0_79
    The version of my GATK is: 3.7.0
    My OS is: Ubuntu 14.04.2 LTS
    My processor is: Intel Core i5-4440 CPU @ 3.10GHz × 4

    I have been successful in creating a vcf file using Unified Genotyper. This Vcf file was obtained from a merged bam.

    Information about the bam files
    The merged bam was obtained from 10 different bam files.(10 different samples but of same organism)
    These individual bam files are not very big either ranging around 50-100 Mb and the merged bam is 368M in size.
    The needful such as co-ordinate sorting and adding the readgroups have been done.
    There has been no issues with them when checked with ValidateSamFile.

    Information about the vcf file
    The number of records in the vcf are 282931.
    The command used for vcf file generation is:
    java -jar /dummy/GenomeAnalysisTK-3.7-0-gcfedb67/GenomeAnalysisTK.jar -T UnifiedGenotyper -I Realign.bam -R REF.fasta -o Calling.vcf -glm BOTH

    When I try to run SelectVaraints on the above vcf file, the time remaining for the process to complete as indicated in the log file is 1176.1 w.

    The command used for filtering the biallelic SNP's is:
    nohup java -jar /dummy/GenomeAnalysisTK-3.7-0-gcfedb67/GenomeAnalysisTK.jar -T SelectVariants --variant Calling.vcf -R REF.fasta -o Biallelic.vcf -restrictAllelesTo BIALLELIC &

    What is going wrong here is evading me at this point.
    The VCF file generation took around 2 days using UnifiedGenotyper but the filtering is just hanging.
    How can I fasten this step and what may be sources of error for such a behaviour

    i have been able to achieve this by a using a different tool. However when I use SelectVariants to split the merged vcf file (with 10 samples) into sample wise vcf its showing the run time to be 1500 weeks or so.
    The process even starts off slow.Please help. The behaviour seems to repeat over any task I seek to do with SelectVariants (be it biallelic snp retention or splitting of the vcf file).

    Command used:
    nohup java -jar /GenomeAnalysisTK-3.7-0-gcfedb67/GenomeAnalysisTK.jar -T SelectVariants -R REF.fasta -V Calling.vcf -nt 15 -o calling-split-12.vcf -sn 98_12_trim &

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    -nt 15 OMG!

    Get rid of it.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Ah yes try not choking your computer to death with impossible demands :)

    Also you should not use UnifiedGenotyper anymore, that thing is a fossil.

  • @Geraldine_VdAuwera said:
    Ah yes try not choking your computer to death with impossible demands :)

    Also you should not use UnifiedGenotyper anymore, that thing is a fossil.

    As posted originally my command was :
    nohup java -jar /dummy/GenomeAnalysisTK-3.7-0-gcfedb67/GenomeAnalysisTK.jar -T SelectVariants --variant Calling.vcf -R REF.fasta -o Biallelic.vcf -restrictAllelesTo BIALLELIC & and it can be seen -nt option was not used. ( -nt 15 was then tried in desperate attempts to speed it up)

Sign In or Register to comment.