I finished my RNA-seq variant callling using GATK pipeline described in your workflow.
But I realized that all the Allele Frequency(AF) values in the vcf files are 0.5 or 1.0.
Is it normal?
If you have one diploid sample, yes. Allele frequency = count of alleles in cohort / number of called chromosomes in cohort
To add to @pdexheimer's answer, what GATK reports as AF is the "ideal" frequency based on the assigned genotypes, not the actual observed frequency. To get that you need to calculate it from the AD values.
Hi @Geraldine_VdAuwera ,
Is there a GATK tool to calculate the 'actual observed frequency' .
Following is a example record from my vcf file :
1 897325 rs4970441 G C 802.77 PASS AC=1;AF=0.500;AN=2 GT:AD:DP:GQ:PL 0/1:63,41:104:99:831,0,1394
For the above record, I could programmatically calculate the 'actual observed frequency' from the AD values as : 41/(63+41) = 0.39
Is there a GATK tool to calculate this for me ?
I do not want to use the tool 'VariantsToTable', as it outputs a tab delimited file, and not a vcf file.
An ideal tool in my case would read a vcf file , and calculate 'actual observed frequency' , and then output a new vcf file with new AF or VAF tag
@Geraldine_VdAuwera , Thank you for the response
Thanks, I will write a small and custom module to perform this calculation
Hi, did anyone here get the 'actual observed frequency' referred to above? I'm surprised it's not part of the default outputs, or can it be now added with some command-line option or another tool? I am using the latest version of GATK. Thank you.
Perhaps this thread will help as well.