Questions about PublicPairedSingleSampleWf_170412.wdl
I was following the tutorial (How to) Map and clean up short read sequence data efficiently when I discovered the link to the complete protocol in WDL. After having run several steps, I have a few questions you may be able to help me with:
1) for the MergeBamAlignment step,
a) should I be using the uBAM file produced in the FastqToSam step or in the MarkIlluminaAdapters step?
b) what is the goal of the option ATTRIBUTES_TO_RETAIN=X0 and what would be the difference if using ATTRIBUTES_TO_RETAIN=XS as in the tutorial?
c) before discovering the WDL script, I had already run all my samples up to the SortAndFixSampleBam step (the one after MarkDuplicates) using the uBAM produced by the FastqToSam step and the option ATTRIBUTES_TO_RETAIN=XS for MergeBamAlignment, would this produce any unwanted or unexpected effect on the remaining portion of the WDL pipeline? Can I continue as is or should I fix the files? If so, how?
2) I am very interested on the Identity Validation of the samples for Quality Control as you mentioned here.
a) Could you tell me which platform/kit do you for performing the genotyping array for Fingerprinting?
b) When checking the task CheckContamination for cross-sample contamination, it points to the file WholeGenomeShotgunContam.vcf which I was unable to find in the GATK bundle or understand how it is produced. Could you give me a hint on that?
3) When checking on the bwa mem options used, I've noticed the following option "bwa mem -K 100000000" for which I was unable to find any documentation (online or from the command). Could you tell me what is it for?
Thank you very much in advance for any help you could provide me.