How to determine sex from Bam Filles

paumarcpaumarc ZagrebMember

Hello

There is some easy way to detect the sex of the sequenced genome? I saw this entrance

http://gatkforums.broadinstitute.org/discussion/5903/is-there-a-walker-that-determines-sex-from-a-bam-or-vcf-file

but i am not sure if it is the same case (i am new at GATK), there is some "Hans on" manual or guide that i could use to start?

thanks

Comments

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @paumarc
    Hi,

    Unfortunately, that thread is probably your best help. We do not have any specific recommendations for determining the sex of a sample. Hopefully, some other users will jump in here with some helpful tips.

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I would recommend looking up phenotypic inference in the literature to find how other researchers do this. Good luck!

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    @paumarc Perhaps you could check the Y coverage. If there isn't a whole lot of it, then you are probably dealing with a female. Likewise, if your chromX depth is half of your autosomes, then your bam file probably originates from a male. I would probably not rely on this myself. It depends how certain you need to be.

    I just found an answer from @lindenb on biostars.org, where he suggests something similar:

    In our lab, we run GATK DepthOfCoverage with 3 beds (autosomes, chrX, chrY) to get 3 mean corverages. Females should have cov(X)>>cov(Y).
    

    Don't forget to remove the PARs for a more accurate result...

  • paumarcpaumarc ZagrebMember
  • Will_GilksWill_Gilks University of Sussex, UKMember ✭✭
    edited June 2016

    Hi @paumarc

    To calculate number of reads per chromosome - which will tell you indirectly the sex of your subjects - try Samtools idxstats http://samtools.sourceforge.net/ with a loop to run through the bam files.

    for i in *.bam; do
    samtools idxstats ${i} > ${i}.chromdepths.txt
    done;
    

    Which gives you an output like this (chromosome - length - reads - (something else)) :

    chr2L   23513712        8695825 341
    chr2R   25286936        8759063 401
    chr3L   28110227        9983508 385
    chr3R   32079331        11345968        491
    chrUn_DS485919v1        1021    0       0
    chrUn_DS483755v1        6936    829     2
    chrUn_DS485425v1        1143    211     0
    chrUn_DS484861v1        1395    97      0
    

    You might want to modify the path and prefix for your bams.

    In e.g. R, you can then calculate the ratio of X and Y linked reads compared to autosomal reads, having adjusted for the length of the chromosomes.

Sign In or Register to comment.