Recommended way to compare GC bias across many samples?

JonRJonR IndianaMember
edited November 2017 in Ask the GATK team

I'm trying to compare the gc bias of hundreds of sequenced genomes. What's the best way to do this numerically? Parsing hundreds of pdf's will take too long and has it's own problems. Is there anything similar to R^2 that will make this easier?

SOFTWARE picard CollectGcBiasMetrics

OUTPUTS:

gc.summary_metrics.txt
METRICS CLASS picard.analysis.GcBiasSummaryMetrics

ACCUMULATION_LEVEL WINDOW_SIZE TOTAL_CLUSTERS ALIGNED_READS AT_DROPOUT GC_DROPOUT SAMPLE LIBRARY READ_GROUP
All Reads 100 1323882 2647157 2.589757 0.116276

gc_bias_metrics.txt
METRICS CLASS picard.analysis.GcBiasDetailMetrics

ACCUMULATION_LEVEL GC WINDOWS READ_STARTS MEAN_BASE_QUALITY NORMALIZED_COVERAGE ERROR_BAR_WIDTH SAMPLE LIBRARY READ_GROUP
All Reads 0 0 0 0 0 0
All Reads 1 0 0 0 0 0
All Reads 2 0 0 0 0 0
All Reads 3 0 0 0 0 0
All Reads 4 0 0 0 0 0
All Reads 5 0 0 0 0 0
All Reads 6 0 0 0 0 0
All Reads 7 0 0 0 0 0
...
...
All Reads 33 23848 12085 32 0.891584 0.00811
All Reads 34 27900 14102 32 0.889291 0.007489
All Reads 35 31859 15956 32 0.88117 0.006976
All Reads 36 36192 18072 31 0.878539 0.006535
All Reads 37 40135 20656 32 0.905504 0.0063
All Reads 38 46124 23464 32 0.89504 0.005843
All Reads 39 53670 27079 31 0.887705 0.005395
All Reads 40 63298 32882 31 0.913978 0.00504
All Reads 41 73198 37960 31 0.912419 0.004683
All Reads 42 86143 44950 31 0.918073 0.00433
...
...
110 total lines..

Answers

Sign In or Register to comment.