GenotypeGVCFs: Long runtime exclusively with a single sample
I have been having some trouble with long runtime with several of GATK utilities.
However it was manageable.
I could arrive at a g.vcf file( I used HaplotypeCaller instead of UnifiedGenotyper upon a suggestion made on a seperate thread).
Now I two different g.vcf file for two different samples and for one of them I could get a vcf file using GenotypeGVCFs within 45 minutes or so.
However with another sample I am getting ** a 40 week long runtime.**
The samples are that of Aedes aegypti and Aedes albopictus (this is the one giving trouble).
The walker starts walking instantly with Aedes aegypti sample and gives me the vcf without any errors.However In the Aedes albopictus the walker itself is initiated after an hour or so.
The command used is:
java -jar GenomeAnalysisTK-3.7-0-gcfedb6 -T GenotypeGVCFs -nt 12 -R ref-ab/GCA_001444175.2_A.albopictus_v1.1_genomic.fasta --variant output-AB.raw.snps.indels.g.vcf -o genotyped-ab.vcf
It should be noted that this exact command has worked for the other sample(except that the necessary files were changed).
The log is as follows:
INFO 19:56:34,300 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 19:56:34,301 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
INFO 22:49:04,685 ProgressMeter - KQ560100.1:879201 0.0 2.9 h 15250.3 w 0.0% 37.4 w 37.4 w
INFO 22:50:04,687 ProgressMeter - KQ560100.1:879201 0.0 2.9 h 15250.3 w 0.0% 37.7 w 37.6 w
INFO 22:51:04,689 ProgressMeter - KQ560100.1:879201 0.0 2.9 h 15250.3 w 0.0% 37.9 w 37.9 w
INFO 22:52:04,690 ProgressMeter - KQ560100.1:879201 0.0 2.9 h 15250.3 w 0.0% 38.1 w 38.1 w
INFO 22:53:04,694 ProgressMeter - KQ560100.1:879201 0.0 2.9 h 15250.3 w 0.0% 38.3 w 38.3 w
(the run time is increasing instead of decreasing)
1)The genome sizes are:
1.9 G for A.albopictus and 1.4 G for A.aegypti
2)Cannot blame it on space
I have around 48 usable threads at the moment and enough RAM space
I have tried using different number of threads as well. Its not making any difference.
3) have tried re-running the a.aegypti sample parallely (to get rid of any doubts that the computation maybe have been faster due to uncertain variables at that point in time),and its reproducing its behaviour i.e gets done in 45 minutes or so.But the a.albopictus sample is still showing the same problem.