how can vqsr applied on small data set?

ying_sheng_1ying_sheng_1 Posts: 48Member
edited September 2012 in Ask the GATK team

We have data from target sequencing genes (only targeted two genes). We analyzed the data by GATK pipeline. Since the data set is too small, we tried hard filtration on both SNP and indels. At the same time, we sequenced the same sample by whole exome sequencing and filter SNP by VQSR. The quality of VQSR results is much better than hard filtration results. For economic reason, we need to develop analysis pipeline for target sequencing, is it ok to incorporate the target sequencing data into an exome sequencing data (merge the VCF files), do VQSR? I just worried the true sites in target sequencing data have different features compared to true sites in whole exome sequencing data.

Post edited by Geraldine_VdAuwera on

Best Answer

  • rpoplinrpoplin Posts: 122GATK Developer mod
    Answer ✓

    Hi there,

    This is a great idea and I would imagine that it might work well. It really depends on, as you say, the differences between the characteristics of the exome targeted sequencing and the small target sequencing. Perhaps one thing you can do to convince yourself that this is working is to create a plot of the VCF annotation distributions for the two data types. Hopefully the distributions will lie on top of each other.



Sign In or Register to comment.