Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Naming SNPs..

Can a GATK tool automatically name detected variants, i.e. assign them a unique identifier within user-specified parameters?

Tagged:

Best Answer

Answers

  • Will_GilksWill_Gilks University of Sussex, UKMember ✭✭

    Naming variants in a vcf file

    My bash solution below. Would be interested if anyone has a better way. Naming variants by position is not necessarily a great idea as they can change by build version, so alternatively just make a list of unique random numbers, length=e.g. 8 and n=number of variants.

    # separate variant and header rows grep -v "^#" raw.vcf > variant_rows.vcf grep "^#" raw.vcf > header_rows.vcf
    # make names in format chr_position_alternate.allele - or anything unique or your choice. awk '{print $1"_"$2"_"$5}' variant_rows.vcf > names_list.txt
    # check unique identifiers. Couldn't work this into script. Needs to generate error if entries are non-unique. uniq -d file.txt
    # Replace comma in variant names with underscore ... sed -e "s/,/_/g" names_raw.txt > names_fixed.txt
    # replace blank names in field 3 of headless vcf with new names. awk 'FNR==NR{a[NR]=$1;next}{$3=a[FNR]}1' OFS='\t' names_fixed.txt variant_rows.vcf > with_names.vcf
    # put the head back on cat header_rows.vcf with_names.vcf > final.vcf

Sign In or Register to comment.