The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ` ) each to make a code block as demonstrated here.

Picard 2.10.2 is now available at https://github.com/broadinstitute/picard/releases.
GATK version 4.beta.2 (i.e. the second beta release) is out. See the GATK4 BETA page for download and details.

# Question about criteria selecting variants

DenverMember

As I said in my last post about splitting my 11 samples from the recalibrated VCF file. I now have a different question which is how to set up a criteria to select variants from this 11-sample-combined VCF. My criteria would be DP >= 20 and # of ALT reads >= 10. I know the AD is the sum of both REF and ALT reads, but I was wondering if there's any way to select by the # of ALT and DP >=20?

Should I use the "-T SelectVariants" or "-T VariantFiltration"? I am using GATK 2.5 on a remote Mac OS X server by the way.

Tagged:

It sounds like what you want is to build a complex JEXL expression. See the doc here for more details on how they work.

• DenverMember
edited August 2013

@Geraldine_VdAuwera said:
It sounds like what you want is to build a complex JEXL expression. See the doc here for more details on how they work.

Thanks Geraldine. I actually just started reading the complex JEXL expression domentation, but I found it a little bit sketchy and and will read it into more detail in a second. But here's my question with a concrete example:

In my combined VCF file, the Format Column is followed by 11 columns, each of which has one sample.

The format column content is: GT:AD:GQ:PL
The sample column content is like this 0/0:3,0:9:0,9,90 or ./.

From my understanding, the above highlighted 3,0 is the AD (3 being the REF allele and 0 being the ALT allele), but I don't see DP for each sample. I did include "-an DP" when I ran the VariantRecalibrator. In this case, how do I filter out those with AD (REF + ALT) < 20 or DP < 20 (?).

should I use SelectVariants or VariantFiltration? As I don't see the difference between the two.

Thanks again

• DenverMember

After looking through the Variant Context, I think I should use "-select vc.getAlleles().size()", but the question is how do I tell VariantContext to look at the ALT allele only?

The per-sample DP should be output automatically, so it's odd you're not seeing it. What caller did you use to call the variants? UG or HC?

The major difference between SelectVariants and VariantFiltration is that SelectVariants will output only the variants that pass the criteria you set, while VariantFiltration outputs all the variants, but with annotations in the filter fields about whether they passed or failed the criteria.

• DenverMember

I used HC to call the variants. GATK2.5

Hmm, I can't remember if we had an issue with DP in that version of HC. In any case you should be able to get DP added using VariantAnnotator, or have you tried that already?

• DenverMember

yes, I used VariantAnnotator to annotate the VCF files (containing 11 samples), but individual sample column still does NOT have DP value displayed, instead in the "INFO" column, there's DP, but the DP value is apparently a summation of all 11 samples' DP in each row. Is there any way to put DP in individual samples?

In VariantAnnotator, I used "-G StandardAnnotation"

• Member, Dev

I can confirm that HaplotypeCaller v2.5.2 does not output FORMAT-level DP annotations. I'm not a big fan of doing hard-filtering on depth or depth of alternate reads, but it's certainly a common filter to use. What I don't understand is how you want to select a variant based on the 11 samples' information - do you want to see your thresholds met in 1 sample? All 11? Something in between? Either way, I don't think any of the GATK tools work on the FORMAT fields like you want

• DenverMember

@pdexheimer said:
I can confirm that HaplotypeCaller v2.5.2 does not output FORMAT-level DP annotations. I'm not a big fan of doing hard-filtering on depth or depth of alternate reads, but it's certainly a common filter to use. What I don't understand is how you want to select a variant based on the 11 samples' information - do you want to see your thresholds met in 1 sample? All 11? Something in between? Either way, I don't think any of the GATK tools work on the FORMAT fields like you want

I am new to my lab and told that the tradition in filtering the VCF is to apply "DP > 30 && number_ALT_Allel >10", and they never used GATK before. Actually, I do no think I have to stick to the tradition. What's the common way of filtration/selection? Maybe I should dive in the documentation instead of the forum. Thanks

Diving into the documentation is a great way to start

I'll look into the DP issue nevertheless, to make sure the latest versions do output it as they should.

And I should add, after looking it up I found that VariantAnnotator will not annotate sample-level DP, you can only get it through HC or UG. So you will need to re-call your variants with the latest version of GATK to get the sample DP, if you want to use that.

• DenverMember

thanks Geraldine. It's not possible to run the latest version of GATK (2.6) at this moment, because my server is Java 1.6 and I have no way of upgrading it to java 1.7.

If I can recall with HC in future, what should I add to my following command if I want to have sample-level DP annotation? - just curious:

java -Xmx10g -Djava.awt.headless=true -jar \$CLASSPATH/GenomeAnalysisTK.jar \
-T HaplotypeCaller \
-R ./GATK_ref/hg19.fasta \
-I ./list_feeder/compressedbam.list \
-L ./GATK_ref/all_captured_human_exomes.bed \
-log ./GATK/VQSR2/HaplotypeCaller20130808.log \
-o ./GATK/VQSR2/output.raw.snps_indels.vcf

From 2.6 and up, HaplotypeCaller automatically emits sample DP, no need to specify it from cmd line.

• MilanMember

Is there anyway I can use the VariantFiltration walker and filter the vcf file using JEXL expression of hard filtering? I am interested in filtering my variants manually using AD(Allelic depth) and the DP (the depth passing the quality filter). 70% of my bases in the exome data have been read over 15 times. So after the Variant recalibration I want to filter my variants on the basis of reads which pass the filter quality above 20 (DP >=20) and the the AD >=20. I am not sure if the AD cut off will be sufficient enough but definitely if DP is greater than 20 than all my mutations which have been read over 20 times and those passed the quality filter will be selected . I am not interested in prioritizing my mutations on basis of functional and structural scores impact of mutations on proteins as given by Annovar so I want to filter on this criteria of DP and AD. I aware that Variant filtration walker will work with DP but not sure if it works with AD or not. I would like some suggestions. Any inputs?

Hi Vivek,

Yes, you can do that with VariantFiltration. Please read the documentation on using VariantFiltration and JEXL expressions, there are some examples that should be helpful.