Attention: Want an end-to-end pipelining solution for GATK Best Practices?


Check out Terra here! For more details on whether this is the right fit for you checkout our blogs here.

Is it possible to merge the multiple samples within a vcf file?

Greetings from Australia,
Could someone please help me with this?
I got a single vcf file containing 14 samples at the end of the best practices pipeline.
For a bulk segregation analysis, I would like to have these samples merged to form 2 bulks.

A QTLseqr (R) function will take a single table as input (VariantsToTable) but it only allows 2 sample names (BulkA and BulkB) to be specified.

Thank you

Best Answer

  • shleeshlee Cambridge ✭✭✭✭✭
    Accepted Answer

    Hi @andrew_chen,

    Can you clarify your research aims? For example, upon merging, is it important to retain the QTL region differences, are these QTLs biallelic for each of your lines, etc.

    If the R program only accepts two samples as input, could you just specify two representative samples from each near-isogenic line? In this case, all you need is to make a VCF with the two representative samples.

    If representing each of the near-isogenic lines is important, and you need to retain the QTL differences, which make for multiallelic sites, then I would suggest checking out Mutect2 calling (tool doc; tutorial). See what section 2.1 of the tutorial says. You can call on your A group as one sample and your B group as the other sample in tumor-only mode then combine the callsets (see a note of caution in using CombineVariants here). You will have to ensure that within each BAM, the read group RG sample SM fields are identical for each group. Note that Mutect2 does not genotype in the traditional sense but will call every allele down to a very low allele fraction that it detects. This feature allows you to keep the QTL differences.

    That is cool you are researching bananas. Here is one of my favorite quotes:

    Time flies like a rocket.
    Fruit flies like a banana.

Answers

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @andrew_chen,

    You can use SelectVariants. If I understand what you want to do correctly, then --sample-name is the option you will use. You can specify it multiple times.

  • Hi @shlee,
    Thanks. But how do I merge the samples?
    Now I have a vcf file that have A1, A2, A3...A7 and B1, B2, B3...B7 samples.
    I would like to have 1 vcf file that has 2 samples (A-bulk and B-bulk).
    If I can't combine the samples in the vcf, should I just combine the bam files and re-run the analysis again?
    The problem is that the R program only allows me to specify 1 vcf/ 2 samples as input.

    These samples are near-isogenic lines (banana) differing only in the QTL region. They have 11 chromosomes.

    Thanks

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    Accepted Answer

    Hi @andrew_chen,

    Can you clarify your research aims? For example, upon merging, is it important to retain the QTL region differences, are these QTLs biallelic for each of your lines, etc.

    If the R program only accepts two samples as input, could you just specify two representative samples from each near-isogenic line? In this case, all you need is to make a VCF with the two representative samples.

    If representing each of the near-isogenic lines is important, and you need to retain the QTL differences, which make for multiallelic sites, then I would suggest checking out Mutect2 calling (tool doc; tutorial). See what section 2.1 of the tutorial says. You can call on your A group as one sample and your B group as the other sample in tumor-only mode then combine the callsets (see a note of caution in using CombineVariants here). You will have to ensure that within each BAM, the read group RG sample SM fields are identical for each group. Note that Mutect2 does not genotype in the traditional sense but will call every allele down to a very low allele fraction that it detects. This feature allows you to keep the QTL differences.

    That is cool you are researching bananas. Here is one of my favorite quotes:

    Time flies like a rocket.
    Fruit flies like a banana.

Sign In or Register to comment.