The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at

Probable serious bug in VariantsToBinaryPed causing incorrect mapping of sample to genotype

TimHughesTimHughes Member Posts: 61

VariantsToBinaryPed would seem to expect the fam file (first six cols of ped file) to describe the samples in the same order as the input VCF file: if they are not in the same order, it would appear to not correctly map sample IDs with the genotypes in the output binary PED.

I found this issue because I converted trio VCF files to binary PED, and then computed kinship coefficients using the binary PED file which showed that the relationships were wrong. If I fixed the .fam file so that the sample IDs were in the same order as the .vcf file and re-run the conversion to binary PED, then the kinship coefficients are as they should be given the pedigree.

This also made me wonder whether PhaseByTransmission has the same problem, but initial tests would seem to indicate that PhaseByTransmission may handle correctly the scenario where the sample order differs between the .fam file and the .vcf files.

Best Answer


  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,130 admin

    Hi Tim, can you confirm that you're using the latest version of GATK?

    Geraldine Van der Auwera, PhD

  • TimHughesTimHughes Member Posts: 61

    Hi Geraldine,

    I am using version 2.5 (haven't upgraded to 2.6 because I don't have java 1.7).

    I have worked around by ensuring the FAM file respects the VCF sample order.

    Should be pretty quick for the developer of VariantsToBinaryPed to check whether there is a routine for SampleID matching between the fam file and the vcf in the tool code....

  • TimHughesTimHughes Member Posts: 61

    A quick test can easily be done by feeding a trio VCF file to VariantsToBinaryPed with a .fam file that is correct but where the samples are described in a different order form the VCF file.

    Then feed the produced .bed file to KING

    king -b myFile.bed --kinship

    The last column of the output file king.kin tells of any discrepancies between the declared and the empirical kinships.

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,130 admin

    That's fine, just making sure you're on an at least somewhat recent version. Even short checks add up when there's a lot of them, so it's worth filtering by version.

    Will check and let you know.

    Geraldine Van der Auwera, PhD

  • TimHughesTimHughes Member Posts: 61

    You are welcome! Pleased to be able to make my own tiny "contribution" to this great software that I use so much.

Sign In or Register to comment.