It looks like you're new here. If you want to get involved, click one of these buttons!
Dear All, I am very new to the analysis of NGS data.
I would like to merge the information of sample 1029 from HGDP (http://cdna.eva.mpg.de/denisova/VCF/human/HGDP01029.hg19_1000g.12.mod.vcf.gz) to SAN sample in Schuster et al 2010 ftp://ftp.bx.psu.edu/data/bushman/hg18/bam/KB1illumChr12.bam)
If I well understood, I should call the variants from the bam file and then merge with the vcf. Is it correct? Could you gently suggest me the best way to do it in your opinion? When should i convert my files to the same reference sequence?
In addition I am looking at http://gatkforums.broadinstitute.org/discussion/1186/best-practice-variant-detection-with-the-gatk-v4-for-release-2-0, and I am trying to do Variant Detection on the example file NA12878. I have some doubt, Where I can find MarkDuplicates tool? Should I invoke it just with -T argument? Or Do I need to install it?
I am really sorry, I am trying to understand GATK, but it is not rally intuitive, so of you have any tips or recommendation please let me know it.
Geraldine_VdAuwera
Posts: 2,486 admin
Hi there,
Unfortunately we don't have the resources available to help you step by step through your analysis, so I'm transferring your post to the "Ask the Community" section. Hopefully someone in the GATK user community will be able to give you some advice on how to do what you want.
Also, note that MarkDuplicates is not a GATK tool, it is a Picard tool, so you will need to download and install it separately. You can find the program easily by doing a Google search. Be sure to read their documentation about how to use it.
Good luck!
Answers
thank you, i hope to solve this problem, since I have some files in vcf format and one in bam format
thank you for your help!
- Spam
- Abuse
- Troll
0 · Off Topic Disagree Agree Like WTF ·