Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

Using GATK convert ped to vcf

JinyanJinyan Posts: 3Member
edited January 2013 in Ask the team

Before there is webpage for how to convert plink ped format to vcf format. But it seems that this link disappeared.

http://www.broadinstitute.org/gsa/wiki/index.php/Converting_ped_to_vcf

Thank you very much in advance.

Post edited by Geraldine_VdAuwera on
Tagged:

Best Answers

Answers

  • chongmchongm Posts: 33Member
    edited December 2013

    Thanks chapmanb for the script!

    For me, the script seems to be running into an error partway through chromosome 1:

    Traceback (most recent call last): File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 158, in main(*sys.argv[1:]) File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 29, in main fix_nonref_positions(vcf_file, ref_file) File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 148, in fix_nonref_positions parts = fix_vcf_line(parts, ref_base) File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 97, in fix_vcf_line elif ref_base != ref and complements[ref] == ref_base: KeyError: 'D'

    Any suggestions on how to fix this?

    Thanks,

    MC

    Post edited by chongm on

    Thanks,

    MC

  • chongmchongm Posts: 33Member

    Actually you know what I realized that my chip data had some deletions and insertions which might be problematic...

    Thanks,

    MC

  • chrchangchrchang Hong KongPosts: 1Member
    edited December 2013

    @chongm said: Thanks chapmanb for the script!

    For me, the script seems to be running into an error partway through chromosome 1:

    Traceback (most recent call last): File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 158, in main(*sys.argv[1:]) File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 29, in main fix_nonref_positions(vcf_file, ref_file) File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 148, in fix_nonref_positions parts = fix_vcf_line(parts, ref_base) File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 97, in fix_vcf_line elif ref_base != ref and complements[ref] == ref_base: KeyError: 'D'

    Any suggestions on how to fix this?

    Thanks,

    MC

    PLINK 1.9 can handle this:

    plink --vcf [filename] --make-bed --out [new prefix]

    Some useful flags:

    --keep-allele-order keeps the original reference allele (instead of automatically resetting based on minor/major)

    --biallelic-only throws out all variants with 2+ alternate alleles that show up (without this flag, the most common alternate allele is kept).

    --double-id, --const-fid, and --id-delim let you fine-tune how VCF sample IDs are converted to PLINK family + individual IDs.

    You can see more details at https://www.cog-genomics.org/plink2/input#vcf .

    Post edited by chrchang on
Sign In or Register to comment.