Threads, throughput, and Disqualified v0.3.0

I wrote a background job processor, Disqualified, but until now, it worked pretty slowly. I made some huge improvements in this release, v0.3.0! Here are the fun, technical details.

The original implementation

The Disqualified server started five, identical threads by default. Each worker in the pool did something like this:

  1. Check the DB to see if there is at least one pending job
  2. Run that pending job (it would skip this step if there were no jobs)
  3. Go to sleep for a random duration
  4. Wake up, go back to #1, and repeat

It’s pretty straightforward, but it has a couple downsides.

First is the unnecessary work. Each thread must actively wake up and check for jobs.

Second is the sleeping. You might have noticed that #3 always happens—each thread goes back to sleep! Even if there’s a long list of pending jobs!

So to recap, the loop I described is less than optimal because it works too hard when there are no jobs, and it processes jobs too slowly when there are a lot of jobs.

This has a name: polling. Polling is sometimes the right solution, but in this case, we can do something a little better.

Making it faster

We can use interrupts to speed things up! Instead of checking for work, a thread can ask to be notified when there’s work to do. The thread basically tells the computer, “Interrupt me when there’s work to do”. The computer pauses that thread, then resumes it when there’s work to do.

(There’s another use case of interrupts where the computer says to a thread, “One sec, the user typed something so I need to handle that”. The computer pauses that thread, then resumes it when the computer is free again. The implementation of these two cases are basically the same.)

Unfortunately, SQLite can’t tell Disqualified when there’s new work to be done; SQLite is just a file, so there’s no process that can do any alerting. Some other databases, like Postgres or Redis, have this functionality, but since Disqualified needs to support SQLite, we have to stick with polling.

Even so, we can partially rely on interrupts here to speed up some things—especially the case where there’s a long line of pending jobs. This is a bit more complicated and requires two kinds of threads.

The first kind of thread is similar to what we discussed first—the worker pool—except instead of sleeping a random amount, it uses interrupts to wait until it’s told that there’s a job that’s ready to be handled. Each worker thread can handle two commands: “RUN”, which executes one pending job, or “CHECK”, which checks if there are pending jobs.

The second kind of thread is one I call a “clock”. Every time the clock ticks, it tells the worker pool to “CHECK”, then it goes back to sleep, on repeat. I mentioned that it’s impossible to stop polling, and this is the thread that polls and keeps everything moving forward.

If a worker thread finds jobs while “CHECK”ing, instead of handling the job, it raises many interrupts that tells the worker pool to “RUN” a job (one interrupt per job).

Ruby provides Thread::Queue (this class used to be called Queue prior to Ruby 3.1) in its standard library that handles interrupts for us. When a thread requests an item from an empty queue, the thread gets paused until there’s a new item in it.

Results

So we talked about a bunch of things and made the server more complicated. We started off by polling, and ended up still polling—just less. Was it worth it?

Definitely. We greatly improved the throughput. Let’s approximate how many jobs it can do within 60 seconds.

Let’s say that the waiting period is a constant 5 seconds and that the duration of each task is 1 second. We’ll try to figure out how many jobs it can do within 60 seconds.

In the original case with polling, each worker spent 6 seconds to find then run a job. Within one minute, a single worker could have executed ten jobs. With five workers, we can approximate a total of about 50 jobs.

In the new case with interrupts, the clock thread will run after a five second delay. It’ll notice and prepare the pool to run all pending jobs. In the next fifty-five seconds, each worker will be able to finish fifty-five jobs, so five workers can finish a total of about 275 jobs.

Closing thoughts

I’m very happy with this release and its performance improvements. I enjoyed working on this project since it gave me a chance to apply some things I learned in college.

Here’s the source, and here’s a link to the gem.

Posted on 2023-04-16 08:02 PM -0400
Contact
hello(at)zachahn(dot)com
© Copyright 2008–2023 Zach Ahn