I’ve recently had to run a few load tests on a service the company I’m employed by is launching soon. The company is sizable and has many services. I thought, what if I could just automate the hell out of that in such a way that I could just point it at another service and shoot. Then, everyone could have
sad graphs useful information about their services! What I wanted my solution to provide was a graph of performance based on CSV-shaped data. The data had to include a few percentiles of latency (50, 90, 99 % requests served, for example), means and standard deviations, for increasing values of concurrency. Error rates would be a plus. On my merry way I went to learn what existed on the market.
This article is a gathering of my findings, where I’ll compare the functionality of a few tools I looked at, I’ll then go on a bit about the strategies I considered and eventually employed, and then I’ll spend a few moments pondering whether or not this is necessary at all.
HTTP API load testing tools examined
I already had known about Apache Benchmark (
ab), and I wondered what else existed out there so I started asking around. Someone pointed me in the direction of Apigee’s
apib (available here), there’s also Vegeta, and I’ve seen another I didn’t end up using because it seemed heavily GUI-oriented and I was looking for automatable CLI programs (jmeter, if you are ever so inclined).
Results look as follow when using Apache Benchmark:
I couldn’t find a way to configure the output to be as rich as I wanted, so I had to drop it. You can get it to spit out the percentiles in a csv file, which is good for some use cases, but for what I had in mind I needed a few important percentiles, and at least the mean and standard deviation are available in the default output (but I’d have to parse them out). No biggie, but I still decided to look at other stuff.
Now THIS is fun. You can provide a set of options very similar to those
ab provide, plus a CSV output (and you can even get a header line only if you call
apib -T, which is useful for things like Google Charts (which I ended up using)).
Spoiler alert: This is what I ended up using, with a simple shell script:
for i in seq \`10 10 300\`; do <apib call here>; done > output.csv
Name,Throughput,Avg. Latency,Threads,Connections,Duration,Completed,Successful,Errors,Sockets,Min. latency,Max. latency,50% Latency,90% Latency,98% Latency,99% Latency,Latency Std Dev,Avg Client CPU,Avg Server CPU,Avg Server 2 CPU,Client Mem Usage,Server Mem,Server 2 Mem,Avg. Send Bandwidth,Avg. Recv. Bandwidth ,1274.824,7.843,4,10,30.021,38272,38272,0,4,4.303,167.000,6.337,12.787,15.961,20.973,4.337,42,0,0,51,0,0,0.70,18.89 ,1428.351,14.001,4,20,30.025,42886,42886,0,11,9.938,88.844,11.883,18.821,21.725,25.168,4.869,35,0,0,51,0,0,0.78,21.16 ,1463.603,20.497,4,30,30.026,43946,43946,0,8,15.981,113.544,18.172,24.657,28.413,32.299,5.624,33,0,0,51,0,0,0.80,21.69 ,1470.635,27.188,4,40,30.026,44158,44158,0,25,21.444,119.460,27.822,30.221,34.782,46.773,7.301,32,0,0,51,0,0,0.81,21.79 ,1470.343,33.981,4,50,30.024,44145,44145,0,40,26.809,112.049,34.115,36.158,43.251,71.898,7.196,32,0,0,51,0,0,0.81,21.79 ,1427.398,42.011,4,60,30.015,42843,42843,0,54,33.050,120.446,40.857,44.180,67.299,90.089,8.074,34,0,0,51,0,0,0.79,21.15 ,1300.924,53.771,4,70,30.017,39050,39050,0,62,43.579,218.430,49.265,65.313,95.912,104.371,14.066,43,0,0,51,0,0,0.72,19.27
This is absolutely what I’m looking for. Excellent. I can totally feed this into Google Charts.
Load testing strategy
I might be overselling this a bit, calling this section “strategy”. It’s mostly about how to attack the problem of automating the load tests in such a way that the tool is reusable for any API that we have.
This is fairly easy if all of your APIs provide a spec, like you can using OpenAPI (formerly known as Swagger). This allows you to easily figure out what the exposed routes are. It doesn’t provide a solution for
DELETE routes, mostly on account of them usually being much more complex. It’s a much more complicated business, for example, to automatically figure out the order in which you need to create the resources, then automatically generate fake data to create items at load-testing speeds, and also keep tabs on all the returned values so you can also test the
DELETE routes with also fake data…
I mean it’s technically even hard to load-test for routes that have required parameters without having either an example in the Swagger file that has a valid ID, or per-endpoint ID types so that you can infer which route to query for the collection…
I didn’t have a lot of time, and we’re considering the “IDs in the examples” as a first step.
In conclusion: Of the necessity of it all
There’s plenty of alternate tools, and the approach is not complete; for now, I’ll let good enough be good enough. Fact of the matter is:
- HTTP Services are, in general, read-heavy
- This approach will give me a good idea of whether or not the service is fundamentally broken
- This approach also will give me a good idea of when to scale up, given a single node of known size
Essentially, just as long as we can gather the data in a shape that’s easy to pass around, it’s going to be easy to get these performance graphs going, either in GNUPlot, any spreadsheet software, or online through libraries like Google Chart.
That is all.
Until next time!