Bucket objects have md5s maintained by Google and copies are all checksum-verified so the copies from the VM don't show up in the bucket until they are verified to be completely uploaded by google.
The md5, noted as "MD5", provided by google for the bucket object can be found by expanding the "More Info" section of the File Details pop-up for each of the workflows.
Additionally, by truncated files, are you referring to "Outputs" or "Workflow Log" previews in the Monitor tab as being truncated?
Please let me know of any follow-up questions.
If you want the original coordinates put into the new vcf, you can use the WRITE_ORIGINAL_POSITION (and possibly also the WRITE_ORIGINAL_ALLELES) argument.
regarding the message, you can run with WARN_ON_MISSING_CONTIG=true to get a warning...this would have shortened our back-and-forth significantly!!
Hey @jurhoades -
Just confirmed that there is currently no way to abort a single workflow within a submission as in your case with the sample set and this.samples. Unfortunately, the only way to stop the workflow on the lagging sample is to abort the entire submission.
Hi @HectorMarina and @manolis
1) As mentioned in the error log
"16:47:24.027 WARN GenomicsDBImport - A large number of intervals were specified. Using more than 100 intervals in a single import is not recommended and can cause performance to suffer. It is recommended that intervals be aggregated together." : the issue here is the large number of intervals.
2) We are trying to fix this issue on our end. For updates on resolution of this issue please follow this link: https://github.com/broadinstitute/gatk/issues/5300
3) A workaround for this in the time being is to use one invocation of GenomicsDBImport per contig, or group 10 contigs together and invoke each group separately.
I hope this helps.
Right. You use the set of variants produced by calling with on the fly recalibration, to generate the next recalibration table. Rinse, repeat. Initially you expect to see some differences in the plots from one to the next; eventually the plots stop changing, so you use that last set of variants as known set for the "real" recalibration.
To be clear, each iteration of BQSR should be done on the original bam file, not on the output of the previous recalibration. The only thing that changes is the set of variants used as known set.
You can bypass the PrintReads step by running HaplotypeCaller on the original bam file with the iteration's recal file passed using -BQSR so that the recalibration gets done on the fly.
One built-in way to do it is run AnalyzeCovariates each time and look at the plots. When you get to convergence, the plots will stop changing much.
Here is a document with information on supported interval list formats. https://software.broadinstitute.org/gatk/documentation/article?id=11009
I hope this helps. Please let me know if there is any other information you may need.
in that case, it seems that your chain is backwards! the first sequence name ("QIUM02000100.1" in your chain) is the "from" and the second ("17.10" in your chain) is the "to". This explains why you are getting no variants lifted over. It might be good to have the tool provide a more helpful error...
The developers got back to me and mentioned that porting CombineVariants to GTAK4 is a work in progress. I have let them know that this is a feature that the users wants and hence will be prioritized.
For now you can use it from GATK3.
I hope this helps.