How to set parameter in UnifiedGenotype mode like samtools mpileup

I want to filt out shared SNPs and Indels between two vcf files generate from samtools mpileup and GATK UnifiedGenotyper with SelectVariants.

With samtools, I can set the mapping quality, filter out the variants not within an interval and then generating a new vcf to know how many variants with such command

samtools mpileup -C50 -q25 -uf reference.fa input.bam | bcftools call -c | vcfutils.pl varFilter -d5 -D30 > output.vcf

However, from the manual of GATK 3 UnifiedGenotyper, I didn't find out what arguments I can set the interval I want.
What i only know is that UnifiedGenotyper take the coverage of the input beneath 250
Is there any tools of argument in GATK 3 to generate vcf and filt the coverage at the same time?

Thanks for help

Best Answers

  • SheilaSheila Broad Institute admin
    Accepted Answer

    @tytolin
    Hi Tytolin,

    I see. And, I am assuming you cannot use a fraction to downsample to because you need an exact coverage? We usually don't recommend messing with the downsampling options in those tools. I will make a note in the tool doc.

    -Sheila

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @tytolin
    Hi,

    I think you are asking about the -L argument.

    I hope that helps.

    -Sheila

  • tytolintytolin Member

    Sorry for my unspecific expression,
    I mean the interval of mapping depth under the GATK 3's UG.
    How can I set the interval mapping depth i want under the GATK 3's UG ?

    And I have another question.
    With GATK 3 SelectVariants, I can extract or take the concordance or discordance in two different vcf files so I select the concordance between two vcf files from samtools mpileup and GATK UG.
    However, I have some trouble during the process of SV

    here is the command line:
    java -jar /opt/GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar -T SelectVariants -R reference.fa -V samtools_mpileup.vcf --concordance gatk.vcf -o concordance.vcf

    there is an error like "Line 2954785: there aren't enough columns for line scaffold5633"
    Is that mean that GATK don't support the vcf from samtools or I need some more process with vcf of samtools to make it available in GATK?

    from reading the vcf file in the same scaffold, there are some difference between those two files.
    Can we tell that those difference make SV unable to take the concordance between those two files?

    GATK.vcf
    scaffold147_len23570802_cov47   863     .       C       T       589.77  .       AC=2;AF=1.00;AN=2;DP=18;Dels=0.00;ExcessHet=3.0103;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.0
    0;MQ=60.00;MQ0=0;QD=32.76;SOR=1.179  GT:AD:DP:GQ:PL  1/1:0,18:18:45:618,45,0

    samtools.vcf
    scaffold147_len23570802_cov47   863     .       C       T       221.999 .       DP=18;VDB=0.239887;SGB=-0.689466;MQSB=0.697005;MQ0F=0;AF1=1;AC1=2;DP4=0,0,11,5;MQ=46;FQ=-74.9863
            GT:PL   1/1:255,48,0

  • tytolintytolin Member

    @Sheila

    Ok, I am going to try ValidateVariants out to know if there is some mistake in my samtools out put.
    If I meet other problem, I will comment here.

    Thank you.

  • tytolintytolin Member

    @Sheila

    The lowest --downsample_to_coverage that I could set is 200.
    I think it's better to announce users on the page because I can't run it when setting the
    --downsample_to_coverage 90.
    GATK told me that we only can set --downsample_to_coverage above 200.

    -tytolin

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    Accepted Answer

    @tytolin
    Hi Tytolin,

    I see. And, I am assuming you cannot use a fraction to downsample to because you need an exact coverage? We usually don't recommend messing with the downsampling options in those tools. I will make a note in the tool doc.

    -Sheila

Sign In or Register to comment.