Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Can i skip GenomicsDBimport and CombinedGVCFs?

Hi ,
I am very unconfident about my result GenotypyeGVCF because the following steps in VQSR showed no TPs(or maybe extremely low). With only 31 human exome data, may i know is it okay to skip GenomicsDBImport after HC? This is GenomicsDBImport has left out value like SOR and FS despite there were picked out by HC.
Otherwise, is it okay to ask for some recommendations on the cutoff to filter away SNP without no GT? I am very confused with my findings using Select Variant....

Best Answer

Answers

  • bshifawbshifaw Member, Broadie, Moderator admin

    Hi wlai,

    Here is the type of inputs GenotypeGVCF is expecting and it sounds like your request falls within option 1.

    Input
    The GATK4 GenotypeGVCFs tool can take only one input track. Options are 1) a single single-sample GVCF 2) a single multi-sample GVCF created by CombineGVCFs or 3) a GenomicsDB workspace created by GenomicsDBImport. A sample-level GVCF is produced by HaplotypeCaller with the -ERC GVCF setting.

    The GVCF workflow which involves GenomicsDBimport or CombinedGVCFs is used to improve scalability and speed in the next step, GenotypeGVCFs. If you don't plan on using GenomicsDBimport and CombinedGVCFs you could run Haplotypecaller without the -ERC GVCFparameter and it will produce a VCF instead of GVCF.

  • wlaiwlai Member
    edited March 21
    hi @bshifaw

    - Below shows the output when running GenotypeGVCFS which has confused me, may i know am i expected to see this?

    WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
    WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records

    - This worries me because they seems like affect the FP and TP in VQSR steps.

    - At the same time, this appeared:

    11:29:11.811 WARN InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples
    11:29:11.816 WARN InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples
    11:29:12.022 WARN InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples
    11:29:12.088 WARN InbreedingCoeff - Annotation will not be calculated, must provide at least 10 samples

    - May i know am i expected to see this warning? Are they negligible? I found this header in every vcf.gz files generated by GenotypeGVCFs, seems like all sample had been taken into considerations. My Info column has getting lesser and lesser info, am i losing important info to be input into VQSR?

    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT AB118 AB120 AB124 AB142 AB149 BB155 BB159 BB163 BB171 BB174 JB179 JB181 LB182 LB183 LB185 LB191 LB196 LB197 LB202 LB203 LB205 LB209 LB212 LB216 PB219 PB221 PB23 PB4 PB45 PB50 PB53


    - command used:
    paste genename intervals| while read f1 f2; do gatk GenotypeGVCFs -R /references/hs38/hs38DH.fa -V gendb://${f1} -O ${f1}_${f2}.vcf.gz -L ${f2} -G StandardAnnotation -D $dbsnp --use-new-qual-calculator; done

    genename is the file which consists 28 database
    intervals is the gene intervals of interested.

    - I have 31 exome samples and i am only interested to look into specific that have been carried.
Sign In or Register to comment.