Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.

ami

About

Username
ami
Joined
Visits
344
Last Active
Roles
Member
Points
45
Badges
7

Comments

  • For the specific pipeline of variant discovery we remove the duplication reads from the same reason that you mentioned - remove sequencing errors that might be duplicate in the PCR process and thus count as real variants. The assumption is that if t…
  • It has been a while since I did it, but as far as I remember you can put the different steps in one command line with | bewteen them. For example: java -Xmx8G -jar /seq/software/picard/current/bin/picard.jar SamToFastq .... | bwa-mem... | gzip HL w…
  • hi, 1) since this pipeline is mainly for variant calling in RNA-seq, we found that the duplications add more errors than advantages. However, for many RNA-seq applications, it is better to keep the duplication. If there is a real variant in the RNA…
  • 2 comments: * the interleave file is not something that most people expect to see, so there should be some explanation about it and option also not to use it in case people want to use some other aligner. * the more efficient way is to stream the o…
  • @brohawndg also, please note that the new version of STAR can also add RG while you run STAR, if you prefer to do it in that step.
  • Just to let people on this thread know that 3.4 with the ASE tool was released last week. @tommycarstensen @Kath
  • Just a thought - @Sheila do you think @amitm can use GGA mode? we never tested it with RNA-seq, but in theory it should work and might be able to correctly call those positions. I also not sure if in this case the HC will ignore any minimum thresho…
  • @Sheila the link to DepthOfCovarage in the main article is broken.
  • @Kath it will be available at the beginning of next week after I will merge it to the main code base.
  • @sirian did you generated the HC bam output and checked it? I'm interesting to know if you find out what was going wrong on that site.
  • Hi @santayana‌ 1) since it is an alignment issue, I just wanted to make sure you are using 2 pass STAR and use the SJ from the first round in the second round. Do you? 2) That type of problem are suppose to be fixed by SplitNCigarString and not by …
  • @santayana‌ can you upload an IGV screenshot of such an example, I don't think you are thinking about "dangling heads" in the same way that we are, but I want to be sure. [I will let @Geraldine_VdAuwera‌ to replay about any future developm…
  • @corlagon‌ - thanks, we do aware of all those changes since we do touch bases with Alex. I assume that in the next versions of the pipeline, some of those changes or all of them will be included (in fact Alex included some of those options after our…
  • @GooderPanQi‌ , we never tried that ourself, but there is no reason why it should not work. You should probably follow the DNA pipeline, using gvcfs, and using the specific parameters for RNA. We was just discussing it with our friends in Mount Sana…
  • Hi @Patrick_Deelen‌ , We do not consider the known RNA editing sites as FP so we do not filter them out. In fact one implementation of our pipeline is to find the RNA editing events. One can filter out (using -XL for example or other ways that @Ger…
  • @s6juncheng‌ We do plan to provide such option/tool. We already doing something like that with allele specific expression (i'm currently working on that) and collaborate with few group to do the same with RNA editing. When we will have tools that a…
  • @sirin - yes, I meant using reference annotations in STAR alignment (although it might be useful to have an option for that input in SplitNcigarReads, thanks for the idea). The problem is that you will limit yourself to the known annotations, which …
  • Hi @sirian‌ "I actually don't quite understand how splitNcigarReads can remove "intronic overhang" if it is before the N cigar. How does it know it is intronic, without any annotation information" You are right, the tool does no…
  • @h_asif‌ It is called as homozygous for alternate allele (hom-var), HOWEVER, the quality of the call and the genotype is low, since you only have a coverage of 3 reads in that site. As you can see in the PL the differences between the hom-var and …
  • For the RNA data, I don't think it relevant to use the exome data as training, as you suggested. The VQSR is learning the error mode (and not the labels of what is true or false) so I expect that since RNA and DNA probably have different error modes…
  • Hi @vsmarwah, It is a very specific question about STAR options and I wouldn't want to answer without being 100% sure about it. I think you should ask the developer of the STAT tool. [However, your question gave me an idea, and I would suggest to …
  • Hi (again) @Mikebesanski‌, The only (critical) reason that we don't have such recommendations is that we didn't evaluate such project (with more then one sample simultaneously) so we can't share any conclusions. Again, the best approach is probably …
  • Hi @Mikebesanski‌, I'm also not sure what is the effect of downsampling on the allele specific expression analysis, although I think you won't see big differences if you use it or not (since the random downsampling should not change the ratio betwee…
  • @nbahlis‌ I will lust add to @Geraldine_VdAuwera‌ and say that we are working with the Broad's cancer group to combine our best practices pipeline with their tools.
  • @yg1‌ yes, it will work fine.
  • @Keifa_1983‌ why do you use BWA? If you are using the bwa-mem it can work, but you should know that it was not meant to be run on RNA-seq data. I expect that you will get many false calls due to that reason (although BWA-mem split reads, it does not…
  • @yasinkaymaz‌ - thanks again for your comments. When you have time, can you please generate a snapshot of those hot spots... we mentioned them few times, and I'm curious to see how they actually look like. Thanks!
  • @agout‌ we will be glad to get your impressions after you will try and compare the pipelines. I did pointed few differences that I expect to see in my comment to @yasinkaymaz‌ on April 18.
  • @mmterpstra, thanks for the suggestions, we haven't test all the STAR parameters extensively, but we do discuss that with Alex Dobin (the developer of star) and we appreciate any suggestions based on our users experience, so thanks again. btw - I as…
  • Using Tophat2 might create the "hot spots", but it will be good to verify it when you will have some data for us to check. You can also try the 2-pass star and see if you still see that problem. We appreciate your feedback, thanks.
  • Hi @yasinkaymaz, We do not use any known annotated junctions, to avoid issues like you mentioned ("I don't think they truly account for novel genes/splice junctions, which will introduce noise") but instead we use the first pass of the ali…
  • Hi @yasinkaymaz‌ , we didn't evaluate it but we collaborate with that team in order to compare and improve both pipelines together. I think that the main differences will be in the regions that are close to the spice positions, since they are filter…
  • Based on our results it was helping in removing many FP calls close to splicing positions. I would recommend using it. If you do and have your conclusions based on the calls with and without it, please share them with us! If you are planning on usin…
  • @sboyle‌ how many reads do you have? When I tested it on 250M reads, I got that several times (~170 locations), so it is expected, but I would not be worry about it, it should not have impact on the variant calling results. If you really care about…
  • @davidpz‌ Can you explain a little bit more what are you trying to do? Do you just want to check the genotypes of the new pipeline given the known alleles from the Sanger Institute (as you wrote: " I used SNPs and INDELs reference for NOD from …
  • @zzg‌ which aligner are you using? such a Cigar string does not make any sense, so I would suspect such results... You can write a GATK walker to filter such reads (I don't think that any of the tools can do it right now). Even if you do filter thes…
  • Everything you describe is right. The splice aligner will add N's to the cigar only to gap between 2 exons, and this is the part that we remove with the SplitNCigarReads, so you don't (implicitly) lose information and there is no part of the read t…
  • @yasinkaymaz‌ - can you give an example of what do you mean? I'm not sure I understand what you mean by "This way tool gets rid of possible miss-matches"
  • Hi @yasinkaymaz‌, Since the pipeline is currently recommended for single sample (as we did not test it yet with many samples), we do not have recommendations or enough experience with VQSR for that pipeline (you need to see enough data in order to …
  • @zzq : Just to give more info to @Geraldine_VdAuwera‌ answer - if you care only about SNPs, Tophat will perform only slightly worse than star (** based on our data ** ). Probably most of the differences will of variants that are close to the splicin…
  • Hi @NicolasRobine‌, thanks for pointing out the error in the STAR command. We did not use a known annotation file as you suggest since we would like to have a pipeline that is not dependent on such annotations (that are constantly being updated and…
  • Hi, I'm working on testing and creating a best practices pipeline for RNAseq data and I would be happy to hear and learn what tools and protocol do you use. What do you consider to be hight quality set and how do you evaluate your results. We can di…
  • (Quote) Did you had the chance to do it? We are currently looking on RNA data and it might be good to look on those examples. Thanks, Ami
  • just as technical note, we do run it all in parallel, for the 26K exomes, we ran joint calling on each chromosome, and scatter each chromosome to 1000 jobs. (since UG can work on each locus separately), you can do it relatively easily with Queue (th…
  • I don't have much experience with 4x WGS data (1KG) but I did ran joint calling of 26,000 exomes about 6-7 months ago and all the conclusions were that the variant calling got better with every amount of samples we added. In fact, currently, we (not…
  • Thanks for letting us know (both that it works fine and about the typo).
  • Hi, The new walker is called CoveredByNSamplesSites and I hope it will help you in your tasks. As far as I know, I'm the only one that used it so far and it was before most of the changes in the last GATK version were done, so please try it and let…
  • Just in case you still need help with that issue, I just wrote a walker that allow you to print out the sites (as intervals) that more than X% of them have at list Y coverage (based on their DP as Eric suggested). This walker will be part of the ne…
  • Just a quick note: you can use both -nt and -nct in GATK 2.2+ (each one by itself or even together). They provide different types of parallelization. BQSR on the other hand can work only with -nct, and this is the reason for the error you got.
  • Hi Martin, I fixed this issue and it will be part of the new version (2.3) probably next week. In cases where you try you use the PRIORITISE mode and -priority is not specified GATK now emits the proper error message. (I also changed some of the r…