We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Construct genome with gvcf or vcf file?

I have 100 rice accessions and their mapping files from DNA sequencing. My goal is to construct the genome for a specific region for each rice accession. Could I just use the gvcf file after initial varaint calling? If so, how could I guarantee the quality for those SNP/InDel sites? Or must I use the vcf file after final genotype call?

Best Answer


  • AdelaideRAdelaideR Member admin

    Hello @purod

    I assume you are thinking of using the tool FastaAlternateReferenceMaker to do this?

    I guess it depends on how you are going to end up using the reference.

    The differences between GVCF and VCF are outlined in this post

    The GVCF will not necessarily contain all of the variant calls, so if you have subpopulations represented in your 100 rice accessions, you may want to generate alternate references using GVCF for each subpopulation instead of grouping them all together.

    But it really depends on what you are using the reference for downstream. Perhaps if you could provide a little more information about that, I can provide some more feedback.

  • purodpurod Member

    Thank you so much @AdelaideR,
    The downstream analysis I would like to do is to accurately construct the alternate reference genome for some specific regions. To be able to do that, I need to identify all reliable differences between the reference genome and accession specific genome. The difference depends on the SNP and InDel call from GATK pipeline based on DNA-sequencing data. Some of my questions are listed below:
    1. Which one contains more differences, GVCF or VCF, how much can I trust those differences?
    2. Will FastaAlternateReferenceMaker accept GVCF format?
    3. As I have learned, FastaAlternateReferenceMaker has some caveats, especially the one not being able to handle complex alleles. How can I deal with this?
    4. Some SNP or Indel might be missing, I plan to use imputation to get the missing information?

  • purodpurod Member

    Thank you for your response. It helps a lot. I will try to do it manually or take a look at plink.

  • purodpurod Member

    I finally use bcftools to construct personalized genome

Sign In or Register to comment.