as I know the GATK team do not support any more GATK3. Is better to try with the last version of GATK4 and check if the problem insist.
Thank you for asking your question on the forum.
It would be helpful to know the version of GATK that you are using. Also, please provide the exact commands used and a copy of any error messages in the output.
Here are a few hints for troubleshooting:
1.) A file that cannot finish being copied over usually indicates that not enough storage space was allocated for that task in your workflow, please make sure that you have allocated sufficient storage for that step.
2.) A java error early in the process that says: 'java.lang.OutOfMemoryError' usually indicates that the RAM is too low to run this particular task in the workflow. Try increasing the RAM by 10x. This usually happens when a new file set that is much larger is being put into the workflow which was built with smaller files.
3.) The inputs are corrupted or incomplete. This can happen when a previous process used multithreading to sort or add a read group to a BAM. The best practice is to run ValidateSamFile on the inputs for the task that failed to ensure that a complete bam without confusing fields has been entered.
4.) Another common error is that Read Groups are confusing, such as in the case when several chromosome bam files are merged, and the read groups have tags that refer back to the old file (e.g. chr.8.1) which may seem as if it is a completely different sample instead of just marking the chromosome. Try AddReadGroups to fix the naming conventions for your samples to be consistent.
5.) VCFs that are in an older version are sometimes not compatible with the latest version of GATK.
6.) VCFs are corrupted, try running ValidateVariants to check the file.
7.) GATK is not the latest version, or tools are being used from different versions that are retired.
8.) The reference files are not consistent with the Broad-supported reference genome set.
9.) There is a trailing whitespace or other character error in the config file or the metadata.
If you have investigated all of these fixes, please provide your commands, your errors and the version of GATK and we will continue to troubleshoot this problem with you.
The link to CollectFragmentCounts is broken in your post. Can you please fix that ?
You should not use M in your READ_STRUCTURE. That is for a UMI, or a molecular identifier, not for de-multiplexing. if that's what you have, you shouldn't be providing BARCODE_2, since the UMI is NOT a bar-code...it's "random" and so you do not want the data to be demultiplexed by it.
in short, if you have two sample barcodes , you need two B elements and two columns in your params file.
if you have one sample barcode and one UMI you should have one B and one M element in your READ_STRUCTURE, and only one SAMPLE_BARCODE column in your PARAMS
I had the same problem, but managed to solve by upgrading docker to the latest version.
It looks like you've specified a directory as an input to the workflow (/ngs/projects/3_GATK4/test_local_data-processing).
The problem stems from the fact that that directory also happens to be the location that you're executing the workflow from. When Cromwell tries to localize your input files, it sees the directory you specified and tries to copy the files it contains into cromwell-executions. However, cromwell-executions itself happens to be in that directory since that's where you started the workflow, so it ends up recursively trying to copy files until the file name becomes too long.
Just try running the workflow from a different directory and it should work.
Hi @bhanuGandham, yes, it worked ,it turned out that some of the gvcf files are overlapping with the adjacent ones which resulted in a record supposed to be the later ones being placed in front of the early locus along the chromosomes, I Just fixed this by deleting the redundant gvcf files.
We are having a technical difficulty at the moment. A workaround this is to ask a question on an existing thread.
Thanks for your help. I haven't encountered the issue since implementing the max-concurrent-job-limit argument. Hopefully this trend continues. I'll let you know if anything changes.
I think run mode is able to check the call cache but you'll still need to reference a config file which (a) enables a persistent database and (b) opts-in to call caching:
java -Dconfig.file=db_and_cc.conf -jar cromwell.jar run ...