Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

(MuTect2) Combining 2 COSMIC vcf

I just got a hold of CosmicCodingMuts.vcf and CosmicNonCodingVariants.vcf which I would like to use for Mutect2. These were built from grch37 assembly and it seems that I need to have these resorted against my reference genome (ucsc.hg19.fasta.dict).

I found a previous post concerning this issue and used the following command

grep "^#" CosmicCodingMuts_v64_02042013_noLimit.vcf > VCF_Header
grep -v "^#" CosmicCodingMuts_v64_02042013_noLimit.vcf > Coding.clean
grep -v "^#" CosmicNonCodingVariants_v64_02042013_noLimit.vcf > NonCoding.clean
cat Coding.clean NonCoding.clean | sort -gk 2,2 | awk '{print "chr"$0}' | perl sortByRef.pl --k 1 - ucsc.hg19.fasta.fai > Cosmic.hg19
cat VCF_Header Cosmic.hg19 > Cosmic.hg19.vcf

But I end up with an empty vcf with just the header. Any input will be greatly appreciated. Thank you.

Best Answer


Sign In or Register to comment.