Hi, I'm using the best practices workflow and I'd like to work out how much of my genome is covered by x number of reads. Should I be using the realigned reads bam files to do this?
Sure. You can use the realigned bam file. You can also do a comparison of the realigned bam file with the original bam file to see if there are any major differences.
GATK has two tools you may be interested in: https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_coverage_DepthOfCoverage.php
Will the coverage from the realigned bam file be very different compared to the bamout file though? Meaning I'll probably end up with the wrong idea of how much of the genome is covered by how many reads?
I am confused. What exactly are you trying to compare? In your first post, I thought you were asking about comparing the original bam file to the indel realigned bam file.
No I just want to see how much of the genome is covered by reads, not in comparison to anything. But I want to make sure I use the right bam file i.e. the one which has the correct alignment of the reads, not one that is before reads get realigned, which I believe HaplotypeCaller might do?
For general coverage analysis purposes, you can use the bam file that you input to HC (or even an earlier one if you want to run your coverage analysis before the GATK preprocessing). The realignment done by HC is very localized and should not have significant impact at the genome or exome scale. You would only need to worry about it if you wanted extremely detailed coverage information per individual site for diagnostics purposes.