UnifiedGenotyper removed samples?

james224james224 Member
edited November 2017 in Ask the GATK team

Hello everyone. I'm using GATK to call SNPs on data set of 54 samples, however, after I used UnifiedGenotyper I noticed that my vcf file now contains 24 samples, so 30 samples were removed. I used dcov 200 and emit-all-sites. It is normal that some samples were removed from the data set in the final vcf?. Thanks for your response.

Answers

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @james224. No, this is not normal. Can you double-check that all your @RG SM sample names are unique across the 54 samples?

  • james224james224 Member
    edited November 2017

    Hi shlee. Thanks for your response. I will check and keep you posted.

  • james224james224 Member

    Shlee. Effectively the ID its unique but the SM it's not unique because we mixed different lines (each line has 24 index). So thats the reason why UnifiedGenotyper removed some samples. However, now my dude its: how I have tu run UnifiedGenotyper avoiding the problem with the SM and recognize just ID?.

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    @james224,

    Our workflows accommodate multiplexing and therefore consolidate at the sample level. You can fix the readgroup sample SM fields to be unique at the sample level with Picard AddOrReplaceReadGroups.

    Btw, I hope you are aware that we recommend calling variants with HaplotypeCaller. UnifiedGenotyper is no longer a standard. You will still have to uniquify the SM fields for use with HaplotypeCaller.

Sign In or Register to comment.