SNP Calling in pooled samble

LuntaryLuntary GenyoMember

HI every body!
I have perform a target DNAseq panel (aligment with BWA) and I have sampled pooled without index. I'm analysing my positive control (It has variants that I know that are real). I detect all in IGV. But I only detect 50% using HaplotypeCaller I have change several parameter as --minDanglingBranchLength =1.
One of the variant that I need detect have 50 read for mutated allele and 975 for the reference allele. So my variant is in the 5% of my read.
Any idea to detect this king of variants?

Thank you so much in advance!!!!!

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Sounds like you need to set the ploidy based on how many samples are pooled together to reflect the true ploidy of the pool.

  • LuntaryLuntary GenyoMember

    Thank yoy for the answer!!
    I'm going to try today. I have 10 sample so my ploidy is 20. It's right?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    If your organism is diploid, that's correct.

  • LuntaryLuntary GenyoMember

    Yes I'm working with human samples.
    However I can't run HaplotypeCaller I have the Java problem that everybody comment in the forum U^^
    I use the command -Xm32g but when HaplotypeCaller arrive to chr8 I receive the java error message and stop. Any idea?

    Thanks!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    What error message is that? (Almost every GATK error is a Java error, just because the code is all Java).

  • LuntaryLuntary GenyoMember

    Ups sorry!!
    I have the following error
    " ERROR MESSAGE: An error occurred because you did not provide enough memory to run this program. You can use the -Xmx argument (before the -jar argument) to adjust the maximum heap size provided to Java. Note that this is a JVM argument, not a GATK argument."

    My command is:
    java -Xmx32g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R /media/Toshiba\ HDD\ 1/Genomas\ Referencia/HG19_toBWA/hg19.fa -I /media/TI30928200A/HALO36_TRI2_ord.bam -minDanglingBranchLength 1 -ploidy 20 -o /media/TI30928200A/HALO36_TRI2RG_ploidi.vcf

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Ah I see, thank you. The main problem is that running with high ploidy requires more computational power because there are a lot more combinations to calculate. If you cannot increase the memory allocation further, you may need to reduce the complexity of the analysis. For example, you can limit the number of maximum alternate alleles to consider. You may also need to relax the minDanglingBranchLength parameter value.

  • LuntaryLuntary GenyoMember

    I have changed both parameters (minDanglingBranchLength 4, maximum alternate alleles 3) and HaplotypeCaller stop in the same point. However I tried UnifiedGenotyper and It work and I detect my variants. Could I use UnifiedGenotyper instead of HaplotypeCaller due to I can increase the memory?

  • LuntaryLuntary GenyoMember

    Thanks you so much Geraldine!!
    At this moment we are no really interesting in indel. But it's something that I have to consider with my PI.

Sign In or Register to comment.