If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
sample_gender.report.txt file is empty
I am getting a large number of errors like this from SVPreprocess:
SVQScript-1000.out:Exception in thread "main" org.broadinstitute.sv.commandline.ArgumentException: Gender map file not found: /hpc/research/users/jharris/SVModule/GeisingerGold/Genome_STRiP/URB875A1_metadata/sample_gender.report.txt
Which is not completely accurate: the sample_gender.report.txt file is present, but empty.
I gather from reading docs and tutorials that in previous versions of Genome STRiP, the user had to provide a "gender map file", but that now in v2.0.0, it does its own estimation using the genderMaskBedFile. It seems for whatever reason, this estimation process failed in my run.
I also found this sentence in the documentation (http://www.broadinstitute.org/software/genomestrip/org_broadinstitute_sv_qscript_SVPreprocess.html):
A report file is produced with sample gender and sex chromosome "dosage", but this information is currently not used in downstream processing by default. The user must explicitly specify a file containing the gender of each sample, which can be based on the read depth gender estimation or on the reported gender of each sample
First of all, it's simply not true that this report is not used downstream, as my dozens of error messages attest.
So, for a short term workaround, can I just fill in the required data in the empty sample_gender.report.txt file? Does this have the same format as I saw for the previous version of the program (i.e., a 2-column file with Sample name and either "M" or "F")?
Of course if I want to use this program going forward, I will need to try to understand why the gender estimation failed and how to mitigate it. Since all of the 1000+ log files have similar names ("SVQScript-NNNN.out"), what is the most efficient way for me to figure out which of those files represents the gender-estimation job, so I can learn more about why it failed?
If the internal gender estimation is unreliable, I could also work around the problem by providing my own gender report file up front. It looks like the previous version of the program had a genderMapFile option for this purpose, but the current documentation does not mention this option. Is there a way to override gender estimation if I can't easily fix whatever issue led to its failure?