Attention:
The frontline support team will be unavailable to answer questions until May27th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!

SelectVariant (GATK 4.10.0) output site not in Interval list (?)

Phuong_Nguyen_1Phuong_Nguyen_1 Member
edited November 2018 in Ask the GATK team

Hi,

I have run SelectVariants just to output sites belonged to my interval list (i.e. a list of single site), which in the format like this:

...
chr21:28407515
chr21:29497692
chr21:30101335
chr21:31599751
chr21:33096867
...

However, when I run SelectVariants with following minimized parameters:

${gatk} SelectVariants \
-V ${sample}.f.vcf \
-L ${dir0}/mysite.intervals \
-O ${sample}.f.select.vcf \
&> log.SelectVariants.${sample}

The output vcf file contain both position chr21:30101334 and chr21:30101335:

        ...
        chr21   29497692        rs9974441       G       A,<NON_REF>     3445.77 PASS    BaseQRankSum=0.283;DB;DP=270;ExcessHet=3.0103;GQ_MEAN=2108.00;MLEAC=1,0;MLEAF=0.500,0.00;MQRankS
       chr21   30101334        .       GA      G,<NON_REF>     0       PASS    BaseQRankSum=-0.637;DP=326;ExcessHet=3.0103;GQ_MEAN=869.00;MLEAC=0,0;MLEAF=0.00,0.00;MQRankSum=0.000;NCC
        chr21   30101335        rs2831995       A       *,G,T,<NON_REF> 3364.77 PASS    BaseQRankSum=0.967;DB;DP=320;ExcessHet=3.0103;GQ_MEAN=3272.00;MLEAC=0,1,0,0;MLEAF=0.00,0.500,0.0
        chr21   31599751        rs2832663       T       C,<NON_REF>     6836.77 PASS    DB;DP=269;ExcessHet=3.0103;GQ_MEAN=801.00;MLEAC=2,0;MLEAF=1.00,0.00;NCC=0;RAW_MQandDP=968400,269
        ...

Could you please explain why this happened, how could I get only the site in interval file ?
Thank you very much!

Post edited by Phuong_Nguyen_1 on

Answers

  • EADGEADG KielMember ✭✭✭

    Hi, looks live you habe an Indel at this site starting at 30101334 Ending at 30101335 so it is selected.

  • @EADG said:
    Hi, looks live you habe an Indel at this site starting at 30101334 Ending at 30101335 so it is selected.

    Hi,
    Yeah I saw the indel at chr21:30101334, but it was also reported on chr21:30101335 as MIXED variants, so it seem to be quite redundant. Furthermore, chr21:30101334 is not belonged to my initial list of interested sites. Do you know how to report ONLY variants belonged to that predefined sites?

    Thank you !

  • EADGEADG KielMember ✭✭✭

    Hi @Phuong_Nguyen_1 ,

    is it reported as mixed because there is an Indel and a SNP at this position. To get only the position in your list, without spanning INDELs I would suggest to write a little bash-script which is grep the position from your interval.list in the result-file of select Variants.

    If you only care for SNP you can PreFilter your resultvfor SNP only.

    Greets EADG

  • @EADG said:
    Hi @Phuong_Nguyen_1 ,

    is it reported as mixed because there is an Indel and a SNP at this position. To get only the position in your list, without spanning INDELs I would suggest to write a little bash-script which is grep the position from your interval.list in the result-file of select Variants.

    If you only care for SNP you can PreFilter your resultvfor SNP only.

    Greets EADG

    Thank you for your suggestion, I will try the manual way.
    Best regards,

Sign In or Register to comment.