Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
shuangBroad ✭
About
- Username
- shuangBroad
- Location
- Broad75
- Joined
- Visits
- 42
- Last Active
- Roles
- Member, , Broadie, Dev
- Points
- 44
- Badges
- 7
- Location
- Broad75
Reactions
Comments
-
Hi @Sakhaa and @sarawasl , thanks for testing out our Spark SV pipeline! I do some questions before I can make concrete suggestions. * what motivated you to use an SV pipeline for CNVs? We do have gCNV pipeline for germline CNV calling, if that fi…
-
Admittedly, I've never run the whole pipeline in local mode (I don't have access to a powerful machine locally), though I have run certain non-memory-hungry stages locally without problems. Now, for the first problem you encountered, I suspect it is…
-
oh, that is some spark configuration that needs to be specified in the block of arguments after --. Sorry for the confusion. Please see here.
-
One thing though, your bam is named Wes738_final.bam. Is it a WES or WGS bam? The pipeline is for WGS bam, so it will not perform well for WES bams. One more thing, I'm not sure which version of GATK you are using, I noticed you are using 2bit refer…
-
Then it shouldn't be disk.
-
@SkyWarrior I've never seen that while running the pipeline. It does look like it is from the Spark dependency. It is possible that Spark cannot find enough memory and asks for disk for storage but the disk is low as well.
-
Hi @SkyWarrior . First of all, we designed the pipeline to be run on a Spark cluster so it is imaginable that it will not work on a local machine. Having said that, it will be great if the pipeline can work on a local machine with appropriate resour…
-
One follow up one that my fingers forgot to put in: given that the coverage is lower and the fact that the pipeline is assembly-based, I'd expect the sensitivity may get a bit lower (because assembly algos typically require relatively higher coverag…
-
There are multiple aspects that I can think of * it should run given the lower coverage * but I'd go slow first, i.e. only run with one sample or one trio as a pilot project * take a look at the output format first, we produce complex variants that…
-
Hi Luciano, I did an experiment using Google Dataproc, with the following configuration on a HG38 WGS 30X BAM, the pipeline finished in 100 minutes. So, it should not need that long a time to study your BAM if you use a Spark cluster similar to the…
-
Let me answer in two parts: Spark local mode Considering that you are running the Spark pipeline locally, which we haven't tested throughly when developing, I'd say consider limiting the number of executors by setting --num-executors and limiting …
-
Thanks for answering these many questions, Luciano. I suspected it was due to memory when you first reported, and looking at the time it took for the the pipeline to flounder, it points further to that direction (usually the pipeline finishes in su…
-
Hi Luciano, Thanks for reporting back this problem. I'm happy to help. Is it true that you are running the Spark pipeline on a local machine? If so, is the BAM you are studying a 30X WGS bam? Would you let us know how long did it take for the task…
-
@KUBN I've also noticed that you seem to be using HG19 reference with an interval that starts with "chr".
-
One strange thing I noticed in your shared IGV screen is that seems all reads are mapped to the reverse strand (also notice the read you high lighted has pair orientation of R1R2, which is atypical). I'm not sure if you are grouping reads by strand,…
-
I see. Thanks Chris!
-
Hi ChrisL , a follow-up question: is it possible to do a Tuple instead of Pair in the scatter construct? Thanks!
-
Hi @truns , thanks for reporting the issue and I'm sorry for the delay. It looks like a possible edge case that we haven't seen before (and we haven't tested this tool with PacBio reads yet either), so it would be great if you can share the alignme…
-
Hey @tkoyama , could you please re-run using the latest nightly build with an extra flag "-newQual" (or equivalently "--useNewAFCalculator") and see if the error is still there? Thanks.
-
Hi Nills, when you run HC, did you see a warning about alt allele removal in the following format? Removed alt alleles where ploidy is %d and original allele count is %d, whereas after trimming the allele count becomes %d. Alleles kept are:%s I am…
-
Just FYI for someone who might run into similar problems: I got similar problems when trying to build GATK with maven (the errors were with some math modules. Geraldine's suggestion for re-building worked for me. The errors I received: [ERROR] Fail…