JEXL method to get the current sample name from single sample VCF file

simono101simono101 London, UKMember

I am experimenting with JEXL expressions in order to do some hard filtering on variants. I wonder if there is a method to tell the filter expression to operate on the current sample (assuming here a single sample VCF file) without knowing the sample name a priori e.g.

vc.getGenotype("Sample1").isHet()

Works just fine to sample heterozygous positions from a VCF where I know the sample will be called "Sample1", but can I refer to a sample name programmatically, e.g. something like: vc.getGenotype( sample() ).isHet()

Sorry if this is a really dumb question. (BTW I realise you could use a genotype_filter e.g. --genotypeFilterExpression "isHet == 1" to do the same thing, but I want to annotate the VCF in the FORMAT/FILTER field, rather than the INFO field for downstream variant selection operations.

Best Answer

Answers

  • simono101simono101 London, UKMember

    Hi @Geraldine_VdAuwera‌ thanks for replying. My solution was to use bcftools query in a script to get the sample name into a variable and then pass this in to the filter expressions, e.g.

    #!/bin/zsh
    sample=$(bcftools query -l my.vcf)  
    java -jar ~/Downloads/GenomeAnalysisTK.jar \
        -T VariantFiltration \
        -R ref.fasta \
        --variant my.vcf \
        --filterExpression "vc.getGenotype(\"$sample\").isHet()" \
        --filterName "HET_POS" \
        -o my.flt.vcf`
    
Sign In or Register to comment.