Bill Duncan, in a blog on event-pair latency, Deep Dive, ETL Dotplots, showed a different way to look at a time series that includes latency or response times.

When I’m doing load tests (and I did a lot last year) I normally take a pair of plots like these as my raw data. These are from a disk on my test machine, Calvin:

It’s pretty obvious that some time around 15:49, latency started to rise, and that is roughly the point at which the offered load was 300 request per second.

If you plot latency against the load, you can then get a plot that looks like this:

Hmmn, that looks like the problem starts earlier, at the proverbial 80% of the system’s capacity, somewhere below 300 requests per second. Perhaps even down around 200, by eye. Maybe I should drill down…

And yes, it does look like something is happening around 200 TPS that’s keeping the transaction from completing immediately. Something like queuing.

However, this is a pretty approximate way to find the elbow in the curve. A proper queuing network model would be better, but let’s pretend I don’t have one: few enough folks do.

Bill points out that the amount of delay is important, and that it can be exposed by plotting the number of transactions that started in a second versus the number that completed. That’s easy to generate with awk and a spreadsheet, so I created a data set that looked like

As I’m doing a load test, I know the “started” will be a straight line, the one that goes from 0 to 400 in the very first diagram, but what does the “ended” line look like?

So I plotted it, and lo and behold, I got this:

The started transactions are the blue dots: the completed ones are the red dots, and they lag farther and farther behind the blue line until, at 200 transactions requested per second, they flatten out entirely.

After 200 TPS the system is in *A Bad State*, and we’re into significant queuing. In this diagram, the critical piece of information is one of the easiest to see, instead of one of the hardest. It’s far more obvious than an elbow in the curve which I try to draw through a cloud of points.

So, for both diagnosis, in Bill’s paper, and modelling, in mine, this is a new and useful way of representing time delays in time-series plots.

### Like this:

Like Loading...

Thanks for the mention Dave! Good analysis, but EPL dotplot style you have is not what I’d designed really. The Y axis should be “latency”. Each event will have a dot, they are not binned. (Not exactly true, as they bin on the resolution, but close.) It really doesn’t add much value really, but you could add counts in the Z axis if you wanted. I can show you examples of this which “look” cool, but really don’t add much value in the end.

Nevertheless, good analysis. The trick with looking at system “capacity”, is that it’s very much like a “hockey stick”. As you go beyond the proverbial 80% level, latency can increase dramatically. This is due to queuing theory, probabilities and the fact that the arrival times are like a Poisson distribution. (i.e. if the arrival times were consistent and evenly distributed, there wouldn’t be the problem.)

Cheers!

LikeLike