Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
BQSR for WES data generated by different exome-capture platforms
We have over thousand WES samples generated by two different exome-capture platforms. Samples were multiplexed and sequenced on Illumina HiSeq, with each lane containing 3-10 samples. Since for most samples, there are not enough data to run BQSR, we plan to estimate the model parameters on one whole lane and then apply it separately to each sample. Considering that two exome-capture platforms were used, we are thinking to specify two interval files simultaneously (namely, –L interval.kit1 –L interval.kit2) during RealignerTargetCreator, BaseRecalibrator and HaplotypeCaller.
However, we are confused if union or intersection of the two interval files should be used in our case. Although our sample size is large and we may get useful information from regions unique to each exome-capture platform (as discussed in http://gatkforums.broadinstitute.org/gatk/discussion/4945/joint-genotyping-different-caputre-kits#), would the use of union of interval files result in off target sequences and mess up the results of BaseRecalibrator? (https://software.broadinstitute.org/gatk/events/slides/1504/GATKwr7-X-2-WGS_vs_WEx.pdf)
Or in our case, is it better to perform analysis in two different batches (one for each exome-capture platform) to generate gVCF files; and then perform joint genotyping and VSQR in all samples together? For your information, eventually the data were to be jointly analyzed with data generated from another WGS analysis.
Your kind advice is very much appreciated. Thanks in advance!