SelectVariant (GATK 4.10.0) output site not in Interval list (?)

Phuong_Nguyen_1Phuong_Nguyen_1 Member
edited November 2018 in Ask the GATK team

Hi,

I have run SelectVariants just to output sites belonged to my interval list (i.e. a list of single site), which in the format like this:

...
chr21:28407515
chr21:29497692
chr21:30101335
chr21:31599751
chr21:33096867
...

However, when I run SelectVariants with following minimized parameters:

${gatk} SelectVariants \
-V ${sample}.f.vcf \
-L ${dir0}/mysite.intervals \
-O ${sample}.f.select.vcf \
&> log.SelectVariants.${sample}

The output vcf file contain both position chr21:30101334 and chr21:30101335:

        ...
        chr21   29497692        rs9974441       G       A,<NON_REF>     3445.77 PASS    BaseQRankSum=0.283;DB;DP=270;ExcessHet=3.0103;GQ_MEAN=2108.00;MLEAC=1,0;MLEAF=0.500,0.00;MQRankS
       chr21   30101334        .       GA      G,<NON_REF>     0       PASS    BaseQRankSum=-0.637;DP=326;ExcessHet=3.0103;GQ_MEAN=869.00;MLEAC=0,0;MLEAF=0.00,0.00;MQRankSum=0.000;NCC
        chr21   30101335        rs2831995       A       *,G,T,<NON_REF> 3364.77 PASS    BaseQRankSum=0.967;DB;DP=320;ExcessHet=3.0103;GQ_MEAN=3272.00;MLEAC=0,1,0,0;MLEAF=0.00,0.500,0.0
        chr21   31599751        rs2832663       T       C,<NON_REF>     6836.77 PASS    DB;DP=269;ExcessHet=3.0103;GQ_MEAN=801.00;MLEAC=2,0;MLEAF=1.00,0.00;NCC=0;RAW_MQandDP=968400,269
        ...

Could you please explain why this happened, how could I get only the site in interval file ?
Thank you very much!

Post edited by Phuong_Nguyen_1 on

Answers

  • EADGEADG KielMember ✭✭✭

    Hi, looks live you habe an Indel at this site starting at 30101334 Ending at 30101335 so it is selected.

  • @EADG said:
    Hi, looks live you habe an Indel at this site starting at 30101334 Ending at 30101335 so it is selected.

    Hi,
    Yeah I saw the indel at chr21:30101334, but it was also reported on chr21:30101335 as MIXED variants, so it seem to be quite redundant. Furthermore, chr21:30101334 is not belonged to my initial list of interested sites. Do you know how to report ONLY variants belonged to that predefined sites?

    Thank you !

  • EADGEADG KielMember ✭✭✭

    Hi @Phuong_Nguyen_1 ,

    is it reported as mixed because there is an Indel and a SNP at this position. To get only the position in your list, without spanning INDELs I would suggest to write a little bash-script which is grep the position from your interval.list in the result-file of select Variants.

    If you only care for SNP you can PreFilter your resultvfor SNP only.

    Greets EADG

  • @EADG said:
    Hi @Phuong_Nguyen_1 ,

    is it reported as mixed because there is an Indel and a SNP at this position. To get only the position in your list, without spanning INDELs I would suggest to write a little bash-script which is grep the position from your interval.list in the result-file of select Variants.

    If you only care for SNP you can PreFilter your resultvfor SNP only.

    Greets EADG

    Thank you for your suggestion, I will try the manual way.
    Best regards,

Sign In or Register to comment.