To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

JEXL method to get the current sample name from single sample VCF file

I am experimenting with JEXL expressions in order to do some hard filtering on variants. I wonder if there is a method to tell the filter expression to operate on the current sample (assuming here a single sample VCF file) without knowing the sample name a priori e.g.

vc.getGenotype("Sample1").isHet()

Works just fine to sample heterozygous positions from a VCF where I know the sample will be called "Sample1", but can I refer to a sample name programmatically, e.g. something like: vc.getGenotype( sample() ).isHet()

Sorry if this is a really dumb question. (BTW I realise you could use a genotype_filter e.g. --genotypeFilterExpression "isHet == 1" to do the same thing, but I want to annotate the VCF in the FORMAT/FILTER field, rather than the INFO field for downstream variant selection operations.

Best Answer

Answers

  • simono101simono101 London, UKMember

    Hi @Geraldine_VdAuwera‌ thanks for replying. My solution was to use bcftools query in a script to get the sample name into a variable and then pass this in to the filter expressions, e.g.

    #!/bin/zsh
    sample=$(bcftools query -l my.vcf)  
    java -jar ~/Downloads/GenomeAnalysisTK.jar \
        -T VariantFiltration \
        -R ref.fasta \
        --variant my.vcf \
        --filterExpression "vc.getGenotype(\"$sample\").isHet()" \
        --filterName "HET_POS" \
        -o my.flt.vcf`
    
Sign In or Register to comment.