Yes you can use it for downstream analysis even you have this warning. Our developers will take into consideration toning down the messaging. Its an issue that we are working on and for more details on it please go here.
I hope this helps.
Uploading your FASTQ files to the google bucket is the first step in setting up your workspace for an analysis.
After adding data to the google bucket, you will need to update the workspace’s data model by importing metadata- a description of what each of your FASTQ files are in terms of “entity” and their relationships. For example, do your FASTQs represent participants, samples, pairs or sets of participants, samples, or pairs. This description is a tab delimited text file that you will upload in the Data tab of your workspace with the “Import Metadata…” button. This link contains downloadable templates (at Step 4) for each entity type listed above and here you can view the specific order in which they should be uploaded to successfully update your data model.
A very basic example - for 10 samples from 10 participants, you will fill in a participant and sample template and upload the participant metadata file prior to the sample metadata file as per the order rules.
Once your data has been formally described and uploaded to your workspace, you will be able to see populated rows. Each row is a link that points from the workspace to the actual data that you uploaded to your google bucket.
The two links I reference also contain more detailed steps and documentation to setting up your data model. Please don't hesitate to reach out with any follow-up questions!
Apologies for the alarming WARNs. This issue is on our radar and being discussed at https://github.com/broadinstitute/gatk/issues/2689. Based on the discussion in the issue ticket, it appears that you can safely ignore these WARNs. Please let us know if the output of this genotyping step is unexpected.
I don't think we have one single document which covers all the differences between WES and WGS set ups for germline and somatic variant discovery. This is something we can look into doing in the future.
For now you could follow the best practices for germline and somatic variant discovery, and for each tool you have usage specifications in the tool docs describing any differences there may be for WES and WGS.
And for any specific questions that are not in the docs you can always reach out to us on the forum.
This link should help with the -L option differences between WES and WGS.
Oh I see I missed the timeout - that's another issue, but also has nothing to do with being Owner or not. Someone on the development team will need to debug this, but it likely has to do with having a large amount of data.
@kribio Thanks for reaching out- we will look into this asap and get back to you with an answer.
But to answer your question, yes in theory you can do that but we don't recommend it. CombineGVCFs should be used only for small number of samples.
For a large number of samples we recommend you use GenomicsDBImport
I hope this helps
MATE_NOT_FOUND errors are commonly encountered when you take a subset of a bam or when reads have that lost their mates because they sit on the edges. FixMateInformation is the way to go with this error. No this should not effect the quality of the variant calls.
What are all the arguments you see that are not in the tooldoc?