We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

SelectVariants/getHomVarCount JEXL question

estif74estif74 Saint Paul, MN, USAMember

Hi there,

Had a JEXL syntax question that I was hoping to get some thoughts on. Let's say that I have a VCF file with 10 samples in it, S1 through S10. I would like to select the sites where among 5 specific samples, 3 are homozygous variant. So something like:

select ' vc.getHomVarCount(S1, S2, S3, S4, S5) >= 10 '

However this syntax doesn't work unfortunately. Is there a way to specify how to do something like this using vc.GetHomVarCount()?

Thanks as always!



  • estif74estif74 Saint Paul, MN, USAMember

    sorry my syntax example should have read:

    select ' vc.getHomVarCount(S1, S2, S3, S4, S5) >= 3 '

    Not enough coffee yet :-)

  • estif74estif74 Saint Paul, MN, USAMember

    Played around a bit with the subsetToSamples and sampleNames functions after reading through the variant context documentation, but couldn't get this to work. I'm guessing there may be an elegant way to accomplish this using these functions, but I think I figured out a bit of a workaround. Here's what I did:

    1. Run SelectVariants but subset the original VCF to include only the samples I'm interested in, ie -sn S1 \ -sn S2 \ etc. as well as a select statement -select 'vc.getHomVarCount()>=3'
    2. Then re-run SelectVariants using the concordance function to find the sites in the original VCF that are concordant with the output from the VCF generated in #1. The output in this VCF file will have all the samples in it: -V original.vcf \ -conc output1.vcf \ -o new.vcf

    Would still be curious about a more elegant solution from more of a learning perspective on how to better use JEXL.


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    If anyone has an elegant solution to do this it would be @gauthier :)

  • adominguesadomingues GermanyMember

    @Geraldine_VdAuwera, I am trying to achieve something very similar to what @estif74 did:

    • multisample vcf with 8 samples
    • select only homozygous variants that exist in at least 3 samples

    Using @estif74 solution I did:
    java -jar ~/imb-kettinggr/common_bin/GATK/3.6/GenomeAnalysisTK.jar \
    -T SelectVariants \
    --variant AllSamples.vcf \
    --maxNOCALLnumber 3 \
    --selectTypeToInclude SNP \
    -select "(QUAL > 30.0 || AD > 20 ) && vc.getHomVarCount() >=3 " \
    -o AllSamples.snps.filt.vcf

    However there are still variants left that are homozygous in less than 3 samples:
    chr10 283484 . T C 97.10 QD AC=6;AF=0.500;AN=12;BaseQRankSum=2.210;ClippingRankSum=-1.029;DP=294;ExcessHet=3.0103;FS=7.510;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.328;QD=0.74;ReadPosRankSum=-1.335;SF=1f,2,3f,5f,6f,7;SOR=2.270 GT:GQ:DP:PL:AD . 0/1:54:35:54,0,741:29,6 0/1:99:54:168,0,1225:41,13 0/1:68:44:68,0,921:37,7 . 0/1:78:54:78,0,1191:45,9 0/1:80:35:80,0,667:28,7 0/1:99:37:304,0,523:24,13

    Am I missing something?


  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    Hi António,

    I just tried using -select 'vc.getHomVarCount() >= 2' and it works for me. I only get sites where at least 2 samples are hom-var.

    Can you try running -select '(QUAL > 30.0 || AD > 20 )' -select 'vc.getHomVarCount() >=3' so the two conditions are on different lines? I'm wondering if somehow the second statement is getting lost.


  • adominguesadomingues GermanyMember

    Hi @Sheila,
    sorry for belated reply. It is working now. I think it was indeed the way I constructing the query that lead to the unexpected results. Great tip to split it into multiple -select chunks.


Sign In or Register to comment.