I am planning to analyze somatic variants from whole exome sequencing data. How should I enter the intervals in this case to indicate whole exome/genome?
Intervals are meant to indicate the regions you would like to attempt to call mutations. It's not necessary by any means, but it can speed up calling dramatically as you won't be attempting to call mutations in regions with very low coverage (ie 1x) due to off target sequencing.
Typically for whole exome sequencing, you'll have a list of "targets" from your vendor. These, perhaps plus some padding, are the intervals you want to use. You may have to do some format conversion from whatever your vendor provides into a format the GATK can understand (see other posts on this forum and the GATK forums). One nice suite of tools for manipulating coordinates in the BED format (which can be used for MuTect) is BEDTools.
Hope that helps
Thanks for your answer! I am assuming it will be alright to input BED file name at the '--intervals' section instead of explicitly naming every single interval? (Which is almost untenable for whole exome sequencing)
yes -- as with all GATK tools you can either specific --intervals chrom:position OR --intervals where filename is one of the supported files of intervals (e.g. BED)