Attention:
The frontline support team will be unavailable to answer questions on April 15th and 17th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!

JEXL method to get the current sample name from single sample VCF file

simono101simono101 London, UKMember

I am experimenting with JEXL expressions in order to do some hard filtering on variants. I wonder if there is a method to tell the filter expression to operate on the current sample (assuming here a single sample VCF file) without knowing the sample name a priori e.g.

vc.getGenotype("Sample1").isHet()

Works just fine to sample heterozygous positions from a VCF where I know the sample will be called "Sample1", but can I refer to a sample name programmatically, e.g. something like: vc.getGenotype( sample() ).isHet()

Sorry if this is a really dumb question. (BTW I realise you could use a genotype_filter e.g. --genotypeFilterExpression "isHet == 1" to do the same thing, but I want to annotate the VCF in the FORMAT/FILTER field, rather than the INFO field for downstream variant selection operations.

Best Answer

Answers

  • simono101simono101 London, UKMember

    Hi @Geraldine_VdAuwera‌ thanks for replying. My solution was to use bcftools query in a script to get the sample name into a variable and then pass this in to the filter expressions, e.g.

    #!/bin/zsh
    sample=$(bcftools query -l my.vcf)  
    java -jar ~/Downloads/GenomeAnalysisTK.jar \
        -T VariantFiltration \
        -R ref.fasta \
        --variant my.vcf \
        --filterExpression "vc.getGenotype(\"$sample\").isHet()" \
        --filterName "HET_POS" \
        -o my.flt.vcf`
    
Sign In or Register to comment.