Bill Duncan, in a blog on event-pair latency, Deep Dive, ETL Dotplots, showed a different way to look at a time series that includes latency or response times.
When I’m doing load tests (and I did a lot last year) I normally take a pair of plots like these as my raw data. These are from a disk on my test machine, Calvin:
It’s pretty obvious that some time around 15:49, latency started to rise, and that is roughly the point at which the offered load was 300 request per second.
If you plot latency against the load, you can then get a plot that looks like this:
Hmmn, that looks like the problem starts earlier, at the proverbial 80% of the system’s capacity, somewhere below 300 requests per second. Perhaps even down around 200, by eye. Maybe I should drill down…
And yes, it does look like something is happening around 200 TPS that’s keeping the transaction from completing immediately. Something like queuing.
However, this is a pretty approximate way to find the elbow in the curve. A proper queuing network model would be better, but let’s pretend I don’t have one: few enough folks do.
Bill points out that the amount of delay is important, and that it can be exposed by plotting the number of transactions that started in a second versus the number that completed. That’s easy to generate with awk and a spreadsheet, so I created a data set that looked like
As I’m doing a load test, I know the “started” will be a straight line, the one that goes from 0 to 400 in the very first diagram, but what does the “ended” line look like?
So I plotted it, and lo and behold, I got this:
The started transactions are the blue dots: the completed ones are the red dots, and they lag farther and farther behind the blue line until, at 200 transactions requested per second, they flatten out entirely.
After 200 TPS the system is in A Bad State, and we’re into significant queuing. In this diagram, the critical piece of information is one of the easiest to see, instead of one of the hardest. It’s far more obvious than an elbow in the curve which I try to draw through a cloud of points.
So, for both diagnosis, in Bill’s paper, and modelling, in mine, this is a new and useful way of representing time delays in time-series plots.