ContEst:what if I do not have a genotype array of my nomal samples?

Hi,
I want to use ContEst to estimate the contamination levels of my patient-matched normal samples, but all my data are WGS data, and I dot not have genotype array of my normal samples.
My code is here:

java -jar \
GenomeAnalysisTK-3.6.jar \
-T ContEst \
-R hg19_complete.fasta \
-I:eval G01H.recal.bam \ #about 110G
-I:genotype G01N.recal.bam \ #about 115G
--popfile hg19_population_stratified_af_hapmap_3.3.vcf \
-isr INTERSECTION \
-population CHB \
-o contamination_results_G01Hbam_Nbam.txt

Result:
name population population_fit contamination confidence_interval_95_width confidence_interval_95_low confidence_interval_95_high sites
META CHB n/a 9.2 2.2 8.2 10.4 37

So my question is why there is only 37 sites? Is it means that I have to use genotype array as the input of parameter –genotype? Or it is because the mean coverage of my data is 22X, but ContEst requires at least 50x coverage homozygous sites.

Then I try to use HaplotypeCaller SelectVariants & VariantFiltration to create a .vcf file of my normal samples,
So I can run ContEst like these:
java -jar \
GenomeAnalysisTK-3.6.jar \
-T ContEst \
-R hg19_complete.fasta \
-I G01H_chr22.recal.bam \
--genotypes G01N_chr22_filtered_snaps.vcf \
--popfile hg19_population_stratified_af_hapmap_3.3.vcf \
-isr INTERSECTION \
-population CHB \
-o contamination_results_chr22_CHB.txt
Result:
name population population_fit contamination confidence_interval_95_width confidence_interval_95_low confidence_interval_95_high sites
META CHB n/a 0.1 0.1 0.1 0.2 724
When I use chr22 to test , Contest find 724 sites which is more than WGS data as above.

And my question is can I use a .vcf file which created by HaplotypeCaller as the input of parameter --genotypes

Issue · Github
by Sheila

Issue Number
1118
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Best Answer

Answers

  • escaonescaon Limoges, FranceMember

    @xiaolongge, you do not seem to be specifying the "-L" argument (populationSites.interval_list)

    Then, how do you avoid the following error without "-L" ? :
    ERROR MESSAGE: No population frequency annotation for CHB

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @escaon
    Hi,

    Can you please post the exact command you ran? Are you saying the command runs fine with -L but without it, you get an error?

    Thanks,
    Sheila

Sign In or Register to comment.