blog

Learning how to modern web dev

I’m neck deep in learning ES6, React, Typescript, Node and all the billions of pieces that are a part of this “ecosystem”. I think I’m going to post some of the things that I learn along the way here.

Most of it will probably be obvious if you’ve been working with these technologies for a while, though I assumed a lot of what I was trying to do would be easily discoverable online. Maybe I just need to take some Google lessons.

My Lessons

Lessons to write about…

  • Using Sequelize CLI (and probably a rant about all ORMs, but especially this one)
  • Avoiding Redux and test driving React apps using hooks
  • Dockerizing a dev database
  • Writing integration tests with Sequelize without losing my mind

Packaging a Node Express App

I’m not sure how most of the JavaScript world is building, packaging, and deploying their applications. We’re using Yarn instead of npm, but neither tool provides a simple way to zip up and version your app (unless you’re publishing to npm, that is).

Here’s what worked for me (based on Yarn 1.15.2). As with everything I post, If there are better ways, please tell me.

Install cross-var to support consistent variable usage in package.json across Windows and Mac.

yarn add cross-var

Install bestzip to support cross-platform packaging.

yarn add bestzip

Install copyfiles to support cross-platform, configurable file copying.

yarn add copyfiles

Create a script in package.json for compiling TypeScript to JavaScript

"build:ts": "tsc"

Create a script in package.json for copying the compiled files to a /dist directory

"copy": "copyfiles -a -e \"**/*.ts\" -u 1 \"src/**/*.*\" dist"

Create a script in package.json for packing up the source and dependent node_modules

"package": "cross-var bestzip $npm_package_name-$npm_package_version.zip dist/* node_modules/* package.json"

Notice the $npm_* variables? Apparently you can use any variable defined in package.json within package.json. You just need cross-var to consistently access them.

That ended up being it. Here is a redacted package.json that I’m using:

{
"name": "Nick Korbel Demo",
"version": "0.1.0",
"description": "The demo for packaging",
"private": true,
"main": "dist/index.js",
"scripts": {
"build:ts": "tsc",
"copy": "copyfiles -a -e \"/.ts\" -e \"/.spec.js\" -e \"/.log\" -u 1 \"src//.\" dist", "start": "node ./dist/index.js", "package": "cross-var bestzip $npm_package_name-$npm_package_version.zip dist/ node_modules/* package.json"
},
… more stuff here like dependencies

This post is part of a series I’m writing to share what I’m learning

Oh no… GitFlow

It seems like every team I’ve worked with in the past few years follows some form of GitFlow. At the very least, they use feature branches and pull requests to perform code reviews.

How we got here

The creation of distributed source control systems made the pull request model possible. Distributed source control systems emerged from the open source community, where there is typically a group of independent people working on a shared system and there is a need for trusted reviewers to gate changes to the code. With previous tools like CVS or Subversion, the cost of accepting a contribution to open source projects was high. Unless the author was already a trusted contributor, it usually involved submitting a patch.

Context is king

I view code as the end result of a series of decisions made by the code’s author. When the author was writing the code, they likely evaluated a series of approaches, tried a few of them, and found one to be the best solution to the problem. The final code can’t possibly capture the all decisions that were made or all of the trade-offs that were considered, though.

I was using Subversion for version control for Booked for many years. When other people had changes they wanted to contribute I asked them to commit directly to trunk or to create a branch. I’d often have to revert commits or make non-trivial changes to ensure the commit didn’t break the application. I moved to Git a couple of years back and being able to review/accept pull requests has certainly made it easier to incorporate contributions to the source from external developers.

Unfortunately, I still don’t have the author’s context which led to the decision to change or write a certain line of code. Collaboration tools make the discussion easier by being able to point to specific changes. Pull requests certainly enable a technically simpler process for accepting changes to the source, but they do not provide any other benefits over methods like patch submission.

Wait, what are we doing?

Using pull requests on a team all working on the same project is so strange to me. It’s a process designed for highly-distributed, loosely-coupled developers to contribute to the same codebase. If the team is co-located, there is no physical separation that forces an asynchronous process. There is very rarely a risk of a change being submitted that isn’t part of an agreed upon team strategy.

Yet, I see pull request comments and iterations of changes fly back and forth over the course of hours, days, or even weeks. Often, the resulting changes only take a few hours of hands-on work in culmination, but take orders of magnitude longer due to the asynchronous nature of the pull request process. During each passing minute the author’s memory of their decision making process fades. This is compounded if the author moves on to work on something new – but work in progress abuse is a topic for another blog post.

At my first real job in the early 2000s we did side by side code reviews. I would sit down with another developer (there was only one other person, so I guess I should say that I’d sit down with THE other developer) and talk through the changes I made to enable a new feature.

Unless the review was happening within a few minutes of finishing the feature, it’s very likely that I didn’t remember the specific context which led to the decision to write a specific line of code. Side by side code reviews encourage discussion and are a drastic improvement over pull requests, but I think we can do better.

We can do better

The most productive team I’ve ever been a part of adopted pair-programming for all production code. Two people, two monitors, two keyboards, one computer. There is no way to lose the context of each line of code being written when practicing pair-programming. This is a real-time collaborative decision making and code review process.

Our team had no need for a separate review step, because we already had the multiplicative brain power of two people involved in the creation of the code. Even without the “formal” review and merge process, we had incredibly high quality with almost zero knowledge gaps throughout the team.

One more gripe, that’s it, I promise

Here’s my last gripe with code reviews and pull requests as commonly practiced – they defer continuous integration.

Continuous integration is the process of continually integrating the whole team’s set of changes at a regular cadence. The purpose is simple – regularly integrating, compiling, and testing outstanding changes minimizes the time between the code working on one person’s machine and validation that it works once combined with the rest of the team’s changes.

Pull requests (more specifically feature branches, I suppose) defer integration. Sure, if you’re a diligent developer you may pull master into your branch a few times a day. But since everyone on the team is committing to their own branch, master is missing the majority of the outstanding changes and full integration is being deferred.

When I talk to teams about this, I regularly hear the opinion that it’s a good thing.

There is an impression that feature branching and pull requests help keep master stable. I agree with the spirit of this approach. I agree with always-releaseable code. But let’s be honest, if a team can’t keep trunk stable, why would maintaining X number of branches will be stable?

Branches are a security blanket. Sure, they make you feel safe, but provide little to actually provide safety. It’s just an illusion.

Always-releasable doesn’t mean change should be avoided or isolated. While it is true that not changing the code is the easiest way to keep it from breaking, building up large batches of changes that all get merged once “complete” is far more risky than committing small changes to a single branch daily. When the time between writing a line of code and knowing if it causes unexpected issues is short, it becomes very simple to identify and correct it.

Issues have nowhere to hide in a small change set.

The sales pitch

So I guess my pitch here is to try different approaches.

Try trunk based development as an alternative to feature branches. Try pair-programming as an alternative to pull requests.

If you’re absolutely not going to budge from feature branches, impose a rule that a branch cannot live longer than 24 hours. If you’re absolutely not going to budge from pull requests, impose a rule that all reviews must be complete within a few hours.

Tighten the feedback loop and see what good comes from it.

These things may not be for every team, but it may work for your team.

Escalators, Software Projects, and the Science of Queues

I recently started a new job. New jobs always bring new challenges like building relationships with colleagues, learning about the business domain, or getting up to speed on the tech stack.

This job has all of those challenges, but also a challenge that I wasn’t expecting – getting into the office.

Let me explain.

The main entrance to this office is on the third floor of the building. There are many ways to get into the first floor of the building (probably dozens, I don’t know for sure, but I’m not curious enough to find out). There are four ways that I know of to get to the second floor. There are two ways to get to the third floor – one single-file escalator and one elevator. Everyone is trying to get to the third floor.

You probably know where I’m going with this, but stick around.

I get into the office “early”. Early is subjective for a tech company, but it’s enough to say that I get in before 90% of the rest of the people in the office. From the time I enter the building until I reach my desk usually takes just a few minutes. Sometimes I ride the escalator, though most of the time I walk up it. When there are few other people using the escalator, I can get right on and climb my way up quickly.

But sometimes I get into the office later – around the same time that almost everyone else is getting into the office. As I get closer to that single escalator going up to the third floor there is often a huge line of people also trying to get up that escalator. As the line grows, causing lines to form on other floors, the typical three minute trip can take anywhere up to fifteen minutes. The distance I travel is exactly the same. The route is exactly the same. The only difference is utilization.

This is a problem that software teams run into often. We can work quickly and deliver more predictably when the route is clear and we have spare capacity. When we load up a team with multiple projects to “fully utilize” them, we’re actually slowing them down and drastically reducing their predictably.

This is the crux of queuing theory. Here’s an excellent write-up of queuing theory in the context of software development. When utilization is low, delivery speeds are fast. When utilization increases, delivery speeds slow and queues grow. You can start to predict delivery times by measuring queue lengths. And you can improve delivery times by simply reducing the or controlling queue size.

Even if an escalator breaks down, the total time it takes me to get into the office won’t be drastically impacted if there aren’t many other people using it. This is true of a software project, as well. A bug found when the team has spare capacity will have little impact on the overall delivery times.

Here’s the other fun component of queuing theory – batch sizes. For illustration purposes, let’s say it takes one minute to get one person from the entrace of the building to the office. We’ll call this the cycle time. If people arrive no more than once per minute, this will never be a wait. Each person will get to their desk in one minute.

Our office is connected to a train station. So very frequently we have hundreds of people showing up at exactly the same time, which drastically increases the utilization of the escalators. Quickly, a queue forms that continues to grow until people return to arriving less than once per minute. While the first person in the line will have a cycle time of about one minute, the last peson in the queue will have a cycle time orders of magnitude higher (depending on the length of the queue).

The effects of a broken esclator in this situation is signficantly more impactful to overall cycle times.

Big batches slow down delivery. A huge project ahead of a small one means the small one will not be completed until the big one is finished, making it’s delivery timeframe wildly unpredictable.

There are some simple options to fix this issue. The obvious one is to reduce batch sizes and work in progress. Don’t start more than can be finished based on historical cycle times. Don’t pull in new work until the existing work is complete. Split projects into tiny deliverables. I’ve talked before about how splitting big stories can improve delivery times.

Other more difficult options focus on reducing utilization or cycle times in other ways. In our example, we could add more escalators. Or we could speed them up. This is consideribly more complex for a software team, of course, but could be the right long term option.

Maybe we need more people on the team. Maybe we need more automation. Maybe we need to move to a more componentized architecture. These may be positive changes, but they require significantly more effort to implement.

Though, if we want to reduce delivery times, the last thing we need is an attempt to “keep everyone busy”.

Much of this is inspired by the work of Don Reinertsen and Mary & Tom Poppendieck. I’d highly recommend reading The Principles of Product Development Flow: Second Generation Lean Product Development and Lean Software Development: An Agile Toolkit for a much deeper dive into lean software development.