Probable serious bug in VariantsToBinaryPed causing incorrect mapping of sample to genotype

TimHughesTimHughes Posts: 61Member

VariantsToBinaryPed would seem to expect the fam file (first six cols of ped file) to describe the samples in the same order as the input VCF file: if they are not in the same order, it would appear to not correctly map sample IDs with the genotypes in the output binary PED.

I found this issue because I converted trio VCF files to binary PED, and then computed kinship coefficients using the binary PED file which showed that the relationships were wrong. If I fixed the .fam file so that the sample IDs were in the same order as the .vcf file and re-run the conversion to binary PED, then the kinship coefficients are as they should be given the pedigree.

This also made me wonder whether PhaseByTransmission has the same problem, but initial tests would seem to indicate that PhaseByTransmission may handle correctly the scenario where the sample order differs between the .fam file and the .vcf files.

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,171Administrator, GATK Dev admin

    Hi Tim, can you confirm that you're using the latest version of GATK?

    Geraldine Van der Auwera, PhD

  • TimHughesTimHughes Posts: 61Member

    Hi Geraldine,

    I am using version 2.5 (haven't upgraded to 2.6 because I don't have java 1.7).

    I have worked around by ensuring the FAM file respects the VCF sample order.

    Should be pretty quick for the developer of VariantsToBinaryPed to check whether there is a routine for SampleID matching between the fam file and the vcf in the tool code....

  • TimHughesTimHughes Posts: 61Member

    A quick test can easily be done by feeding a trio VCF file to VariantsToBinaryPed with a .fam file that is correct but where the samples are described in a different order form the VCF file.

    Then feed the produced .bed file to KING http://people.virginia.edu/~wc9c/KING/manual.html

    king -b myFile.bed --kinship
    

    The last column of the output file king.kin tells of any discrepancies between the declared and the empirical kinships.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,171Administrator, GATK Dev admin

    That's fine, just making sure you're on an at least somewhat recent version. Even short checks add up when there's a lot of them, so it's worth filtering by version.

    Will check and let you know.

    Geraldine Van der Auwera, PhD

  • TimHughesTimHughes Posts: 61Member

    You are welcome! Pleased to be able to make my own tiny "contribution" to this great software that I use so much.

Sign In or Register to comment.