Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Question regarding truth set and VQSR
I am trying to develop a truth set of my SNPs, which I would eventually want to use in VQSR. This is my current plan. Could someone please guide me with the right way of proceeding with this project. I have 18 bam files. I am thinking of calling SNPs on 9 of them using GATK and Samtools. The 9 VCF files will be merged using GATK (JointGenotyping) and VCFmerge tools so that there will be one merged file for both GATK version and Samtools version. These two files are then intersected using BEDtools to get the common SNPs between them. This will be my "truth set" that I plan to use in VQSR for truth set. The remaining 9 bam files that aren't used to building the truth set will be used as the training set. Has anyone built a truth set and is this similar strategy that has been implemented. Any help is appreciated. Thanks