Escalators, Software Projects, and the Science of Queues

I recently started a new job. New jobs always bring new challenges like building relationships with colleagues, learning about the business domain, or getting up to speed on the tech stack.

This job has all of those challenges, but also a challenge that I wasn’t expecting – getting into the office.

Let me explain.

The main entrance to this office is on the third floor of the building. There are many ways to get into the first floor of the building (probably dozens, I don’t know for sure, but I’m not curious enough to find out). There are four ways that I know of to get to the second floor. There are two ways to get to the third floor – one single-file escalator and one elevator. Everyone is trying to get to the third floor.

You probably know where I’m going with this, but stick around.

I get into the office “early”. Early is subjective for a tech company, but it’s enough to say that I get in before 90% of the rest of the people in the office. From the time I enter the building until I reach my desk usually takes just a few minutes. Sometimes I ride the escalator, though most of the time I walk up it. When there are few other people using the escalator, I can get right on and climb my way up quickly.

But sometimes I get into the office later – around the same time that almost everyone else is getting into the office. As I get closer to that single escalator going up to the third floor there is often a huge line of people also trying to get up that escalator. As the line grows, causing lines to form on other floors, the typical three minute trip can take anywhere up to fifteen minutes. The distance I travel is exactly the same. The route is exactly the same. The only difference is utilization.

This is a problem that software teams run into often. We can work quickly and deliver more predictably when the route is clear and we have spare capacity. When we load up a team with multiple projects to “fully utilize” them, we’re actually slowing them down and drastically reducing their predictably.

This is the crux of queuing theory. Here’s an excellent write-up of queuing theory in the context of software development. When utilization is low, delivery speeds are fast. When utilization increases, delivery speeds slow and queues grow. You can start to predict delivery times by measuring queue lengths. And you can improve delivery times by simply reducing the or controlling queue size.

Even if an escalator breaks down, the total time it takes me to get into the office won’t be drastically impacted if there aren’t many other people using it. This is true of a software project, as well. A bug found when the team has spare capacity will have little impact on the overall delivery times.

Here’s the other fun component of queuing theory – batch sizes. For illustration purposes, let’s say it takes one minute to get one person from the entrace of the building to the office. We’ll call this the cycle time. If people arrive no more than once per minute, this will never be a wait. Each person will get to their desk in one minute.

Our office is connected to a train station. So very frequently we have hundreds of people showing up at exactly the same time, which drastically increases the utilization of the escalators. Quickly, a queue forms that continues to grow until people return to arriving less than once per minute. While the first person in the line will have a cycle time of about one minute, the last peson in the queue will have a cycle time orders of magnitude higher (depending on the length of the queue).

The effects of a broken esclator in this situation is signficantly more impactful to overall cycle times.

Big batches slow down delivery. A huge project ahead of a small one means the small one will not be completed until the big one is finished, making it’s delivery timeframe wildly unpredictable.

There are some simple options to fix this issue. The obvious one is to reduce batch sizes and work in progress. Don’t start more than can be finished based on historical cycle times. Don’t pull in new work until the existing work is complete. Split projects into tiny deliverables. I’ve talked before about how splitting big stories can improve delivery times.

Other more difficult options focus on reducing utilization or cycle times in other ways. In our example, we could add more escalators. Or we could speed them up. This is consideribly more complex for a software team, of course, but could be the right long term option.

Maybe we need more people on the team. Maybe we need more automation. Maybe we need to move to a more componentized architecture. These may be positive changes, but they require significantly more effort to implement.

Though, if we want to reduce delivery times, the last thing we need is an attempt to “keep everyone busy”.

Much of this is inspired by the work of Don Reinertsen and Mary & Tom Poppendieck. I’d highly recommend reading The Principles of Product Development Flow: Second Generation Lean Product Development and Lean Software Development: An Agile Toolkit for a much deeper dive into lean software development.

1 Comment

  1. Idrees January 29, 2019 at 1:37 pm

    I love the analogy here and how you defined a problem that is complex and over looked!

Leave a comment

Your email address will not be published. Required fields are marked *