Revisions to Design a function that indicates significant deviations in response times

added 18 characters in body

edited Jul 10, 2021 at 19:01

3.2k
7
19
35

Have a look at HdrHistogram:

https://hdrhistogram.github.io/HdrHistogram/.

There are implementations for all kinds of languages.

What it effectively is, is a history of latency distributions. So you could have a latency distribution per second and if you run a benchmark for 60 seconds, you have a 60 latency distribution of 1 second. With HdrHistogram you can calculate the percentiles and other statistics per time window. And then you can add logic to detect if there are anomalies in whatever kind of statistic. E.g. if you easily detect if there are 10 consecutive windows with a too high p99. You could also aggregate the histograms and create latency distributions per minute/hour/day/week etc. So you don't need to deal with large quantities of tiny histograms.

The nice thing is that you make a final latency distribution and determine e.g. your percentiles; or remove e.g. warmup and cooldown. But you can also zoom into a particular region, e.g. there is a compaction causing problem at a particular moment, then you can zoom into exactly that section.

There are some other nice properties: because you have the full latency distribution, you can merge the latency distributions of multiple load generators. The common approach I see is that engineers average the percentiles of the 2 load generators, but that is mathematically incorrect.

If you are doing a latency test, make sure you also deal correctly with coordinated omission. If you don't deal with it correctly, the worst latencies in your benchmark are omitted and you will falsely assume your system is behaving better than it actually is. You can find some presentations of Gil Tene (author of HdrHistogram) on YouTube on this topic.

Have a look at HdrHistogram:

https://hdrhistogram.github.io/HdrHistogram/

There are implementations for all kinds of languages.

What it effectively is, is a history of latency distributions. So you could have a latency distribution per second and if you run a benchmark for 60 seconds, you have a 60 latency distribution of 1 second. With HdrHistogram you can calculate the percentiles and other statistics per time window. And then you can add logic to detect if there are anomalies in whatever kind of statistic. E.g. if you easily detect if there are 10 consecutive windows with a too high p99. You could also aggregate the histograms and create latency distributions per minute/hour/day/week etc. So you don't need to deal with large quantities of tiny histograms.

The nice thing is that you make a final latency distribution and determine e.g. your percentiles; or remove e.g. warmup and cooldown. But you can also zoom into a particular region, e.g. there is a compaction causing problem at a particular moment, then you can zoom into exactly that section.

There are some other nice properties: because you have the full latency distribution, you can merge the latency distributions of multiple load generators. The common approach I see is that engineers average the percentiles of the 2 load generators, but that is mathematically incorrect.

If you are doing a latency test, make sure you also deal correctly with coordinated omission. If you don't deal with it correctly, the worst latencies in your benchmark are omitted and you will falsely assume your system is behaving better than it actually is. You can find some presentations of Gil Tene (author of HdrHistogram) on YouTube on this topic.

Have a look at HdrHistogram.

There are implementations for all kinds of languages.

What it effectively is, is a history of latency distributions. So you could have a latency distribution per second and if you run a benchmark for 60 seconds, you have a 60 latency distribution of 1 second. With HdrHistogram you can calculate the percentiles and other statistics per time window. And then you can add logic to detect if there are anomalies in whatever kind of statistic. E.g. if you easily detect if there are 10 consecutive windows with a too high p99. You could also aggregate the histograms and create latency distributions per minute/hour/day/week etc. So you don't need to deal with large quantities of tiny histograms.

The nice thing is that you make a final latency distribution and determine e.g. your percentiles; or remove e.g. warmup and cooldown. But you can also zoom into a particular region, e.g. there is a compaction causing problem at a particular moment, then you can zoom into exactly that section.

There are some other nice properties: because you have the full latency distribution, you can merge the latency distributions of multiple load generators. The common approach I see is that engineers average the percentiles of the 2 load generators, but that is mathematically incorrect.

If you are doing a latency test, make sure you also deal correctly with coordinated omission. If you don't deal with it correctly, the worst latencies in your benchmark are omitted and you will falsely assume your system is behaving better than it actually is. You can find some presentations of Gil Tene (author of HdrHistogram) on YouTube on this topic.

added 179 characters in body

Source Link

edited Jul 10, 2021 at 9:13

pveentjer

121
4

Have a look at HDRHistogramHdrHistogram:

https://hdrhistogram.github.io/HdrHistogram/

There are implementations for all kinds of languages.

What it effectively is, is a history of latency distributions. So you could have a latency distribution per second and if you run a benchmark for 60 seconds, you have a 60 latency distribution of 1 second for each of the seconds. With HDRHdrHistogram you can calculate the percentiles and other statistics per time window. And then you can add logic to detect if there are anomalies in whatever kind of statistic. E.g. if you easily detect if there are 10 consecutive windows with a too high p99. You could also aggregate the histograms and create latency distributions per minute/hour/day/week etc. So you don't need to deal with large quantities of tiny histograms.

The nice thing is that you make a final latency distribution and determine e.g. your percentilespercentiles; or remove e.g. warmup and cooldown. But you can also zoom into a particular region, e.g. there is a compaction causing problem at a particular moment, then you can zoom into exactly that section.

There are some other nice properties: because you have the full latency distribution, you can merge the latency distributions of multiple load generators. The common approach I see is that engineers average the percentiles of the 2 load generators, but that is mathematically incorrect.

If you are doing a latency test, make sure you also deal correctly with coordinated omission. If you don't deal with it correctly, the worst latencies in your benchmark are omitted and you will falsely assume your system is behaving better than it actually is. You can find some presentations of Gil Tene (author of HRDHistogramHdrHistogram) on youtubeYouTube on this topic.

Have a look at HDRHistogram:

https://hdrhistogram.github.io/HdrHistogram/

There are implementations for all kinds of languages.

What it effectively is, is a history of latency distributions. So you could have a latency distribution per second and if you run a benchmark for 60 seconds, you have a latency distribution of 1 second for each of the seconds. With HDR you can calculate the percentiles and other statistics per time window. And then you can add logic to detect if there are anomalies in whatever kind of statistic. E.g. if you easily detect if there are 10 consecutive windows with a too high p99.

The nice thing is that you make a final latency distribution and determine e.g. your percentiles. But you can also zoom into a particular region, e.g. there is a compaction causing problem at a particular moment, then you can zoom into exactly that section.

There are some other nice properties: because you have the full latency distribution, you can merge the latency distributions of multiple load generators. The common approach I see is that engineers average the percentiles of the 2 load generators, but that is mathematically incorrect.

If you are doing a latency test, make sure you also deal correctly with coordinated omission. If you don't deal with it correctly, the worst latencies in your benchmark are omitted and you will falsely assume your system is behaving better than it actually is. You can find some presentations of Gil Tene (author of HRDHistogram) on youtube on this topic.

Have a look at HdrHistogram:

https://hdrhistogram.github.io/HdrHistogram/

There are implementations for all kinds of languages.

What it effectively is, is a history of latency distributions. So you could have a latency distribution per second and if you run a benchmark for 60 seconds, you have a 60 latency distribution of 1 second. With HdrHistogram you can calculate the percentiles and other statistics per time window. And then you can add logic to detect if there are anomalies in whatever kind of statistic. E.g. if you easily detect if there are 10 consecutive windows with a too high p99. You could also aggregate the histograms and create latency distributions per minute/hour/day/week etc. So you don't need to deal with large quantities of tiny histograms.

The nice thing is that you make a final latency distribution and determine e.g. your percentiles; or remove e.g. warmup and cooldown. But you can also zoom into a particular region, e.g. there is a compaction causing problem at a particular moment, then you can zoom into exactly that section.

There are some other nice properties: because you have the full latency distribution, you can merge the latency distributions of multiple load generators. The common approach I see is that engineers average the percentiles of the 2 load generators, but that is mathematically incorrect.

If you are doing a latency test, make sure you also deal correctly with coordinated omission. If you don't deal with it correctly, the worst latencies in your benchmark are omitted and you will falsely assume your system is behaving better than it actually is. You can find some presentations of Gil Tene (author of HdrHistogram) on YouTube on this topic.

Source Link

answered Jul 10, 2021 at 9:06

pveentjer

121
4

Have a look at HDRHistogram:

https://hdrhistogram.github.io/HdrHistogram/

There are implementations for all kinds of languages.

What it effectively is, is a history of latency distributions. So you could have a latency distribution per second and if you run a benchmark for 60 seconds, you have a latency distribution of 1 second for each of the seconds. With HDR you can calculate the percentiles and other statistics per time window. And then you can add logic to detect if there are anomalies in whatever kind of statistic. E.g. if you easily detect if there are 10 consecutive windows with a too high p99.

The nice thing is that you make a final latency distribution and determine e.g. your percentiles. But you can also zoom into a particular region, e.g. there is a compaction causing problem at a particular moment, then you can zoom into exactly that section.

There are some other nice properties: because you have the full latency distribution, you can merge the latency distributions of multiple load generators. The common approach I see is that engineers average the percentiles of the 2 load generators, but that is mathematically incorrect.

If you are doing a latency test, make sure you also deal correctly with coordinated omission. If you don't deal with it correctly, the worst latencies in your benchmark are omitted and you will falsely assume your system is behaving better than it actually is. You can find some presentations of Gil Tene (author of HRDHistogram) on youtube on this topic.

Stack Exchange Network

Return to Answer