Holiday Notice:
The Frontline Support team will be offline December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks as we get to all of your questions. Happy Holidays!

GenotypeGVCFS gives fewer samples than input.

rohitmanderohitmande San Diego, CAMember

Hi everyone,

I ran GenotypeGVCFs with the following command
java -Xmx$64g -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R --variant 08_22_2016_murat5.list --dbsnp dbsnp_144.hg38.vcf.gz -o murat5_08_22_2016_raw.vcf -log murat5_08_22_2016_raw.log -L MedExome_hg38_capture_targets.bed -nt 1 --max_alternate_alleles 6

The list I am inputting into --variant contains the paths to 397 gvcfs. When I run vcftools --vcf murat5_08_22_2016_raw.vcf I get the output:

VCFtools - v0.1.12b
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
--vcf murat5_08_22_2016_raw.vcf

After filtering, kept 380 out of 380 Individuals
After filtering, kept 397293 out of a possible 397293 Sites
Run Time = 95.00 seconds

Is there any reason why 17 samples are thrown out?

Thank you very much.

Best Answers

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @rohitmande
    Hi,

    I suspect the sample names are the same in the GVCFs that are missing. GenotypeGVCFs merges the same sample name GVCFs into one sample in its output.

    -Sheila

  • rohitmanderohitmande San Diego, CAMember

    Hi Sheila,

    I looked at the input list of gvcfs and could not find any duplicates. I also ran the command cat 08_22_2016.list | sort | uniq -d and it did not return any results.

    Issue · Github
    by Sheila

    Issue Number
    1224
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    vdauwera
  • rohitmanderohitmande San Diego, CAMember

    Hi Sheila,

    At the suggestion of another thread on this forum, I combined our 397 gvcfs into batches of 200 and 197, respectively and ran genotypegvcfs on those two combined gvcfs. VCFtools still gives the output

    VCFtools - v0.1.12b
    (C) Adam Auton and Anthony Marcketta 2009

    Parameters as interpreted:
    --vcf murat5_08_31_2016.vcf

    After filtering, kept 380 out of 380 Individuals
    After filtering, kept 396692 out of a possible 396692 Sites
    Run Time = 8.00 seconds

    We confirmed that all of the input samples are distinct. Is there any reason why 17 samples are missing?

Sign In or Register to comment.