Latency vs Throughput
I was helping a Berkeley DB XML user recently who complained that his query took as long as ~55s to run. It turned out he was trying to run 40 concurrent queries, so I tried out the query on his data set using our concurrency testing framework. I was confused when it reported to me that I was getting ~2 ops/s, which is a completely different ballpark.
It took me a while, but eventually I figured it out: When the user was saying that his query took ~55s what he meant was that a thread's view of how long a query takes can be as much as 55s. When I talked about getting ~2 ops/s, I was looking at the system as a whole. The problem was that he was talking about latency, and I was talking about throughput.
Typically the way that a server gets smaller latency is to implement a thread pool and work queue. Since DB XML is an embedded library and does not spawn it's own threads, it is down to the application using DB XML to manage it's threads well.
So I created a Java test program that used a thread pool to manage the queries being run through DB XML (in Java it's really easy). I created a test function that could run the query with arguments of how many threads to create, how many queries to run, and how many queries to start per second.
I ran a number of tests - in all of the test runs I kept the number of queries run as constant, so that exactly the same amount of useful work was done in all cases. I measured "workload" time, which is the time from when the query is requested until it is completed - this is a measure of latency. I also measured the total time taken for all the queries, and the number of operation per second executed - this is a measure of throughput. Here are the actual figures I got:
| Queries run | Queries per second | Threads | Latency (avg query time in seconds) | Throughput (queries per second) |
|---|---|---|---|---|
| 80 | no delay | 40 | 32.15 | 1.70 |
| 80 | no delay | 10 | 23.34 | 1.96 |
| 80 | no delay | 4 | 22.00 | 1.89 |
| 80 | 10 | 4 | 18.51 | 1.88 |
| 80 | 5 | 4 | 13.87 | 1.93 |
| 80 | 2 | 4 | 0.90 | 1.97 |
The first thing I noticed is that as the number of threads spawned decreased, latency decreases and throughput increases - this peaked around 4-5 threads on my system. Throughput increased because each thread has a processing overhead associated with it, so less threads means less overhead. Average latency decreased because instead of 5 queries each taking 50s (as an example), the first query takes 10s, and the last takes 50s:
Average time for 5 queries in parallel: (50s + 50s + 50s + 50s + 50s) / 5 = 50s
Average time for 5 queries in series: (10s + 20s + 30s + 40s + 50s) / 5 = 30s
Surprise number one - adding more threads can actually increase your latency!
Next I tried to add a little bit of realism to the simulation - it is very rare that 80 queries need to be executed at exactly the same time, so I added a little bit of a delay between starting each query. As the queries per second decreased throughput was more or less constant, but latency decreased dramatically. This is simple to explain - the queries spent less time in the work queue, when they aren't actually being executed:
Average time for 5 queries executed at the same time: (10s + 20s + 30s + 40s + 50s) / 5 = 30s
Average time for 5 queries executed with a 10s delay between each: (10s + 10s + 10s + 10s + 10s) / 5 = 10s
So surprise number two - a more realistic workload can result in smaller latency!
Posted by john at 10:42 AM | Comments (2)