Possible inconsistency in GATK 4.beta.6 source code
Hi GATK Team,
We are porting GATK4 to run on GPUs. We have found an inconsistency in the behavior of GATK 4.beta.6 in clipRead() functionality in ReadClipper.java while using HaplotypeCaller.
If the read does not require clipping (ops == null), clipRead() returns the original read otherwise it returns a copy of the clipped read. This leads to inconsistent behavior for users of this function such as finalizeRegions() in AssemblyBasedCallerUtils.java. Sometimes the clippedRead variable in the function is a copy of the original read and sometimes it is the original read. The variable clippedRead's base qualities are sometimes modified in a later part of the function and if the original read is returned from the clipRead() function, it will modify the original read. Now if the original read is used in another assembly region, it will have the adjusted quality scores from the previous region. On the other hand, for reads where copies are created, changes do not propagate from one region to another.
In our test cases that found this issue, the base qualities of the same read were different in different regions at the start of the processing of the regions. This behavior can impact the final vcf output.
Please let us know if this is the intended behavior. We would be happy to help with a minimal test case if required.
-- Ankit Sethia