blog

Learning how to modern web dev

I’m neck deep in learning ES6, React, Typescript, Node and all the billions of pieces that are a part of this “ecosystem”. I think I’m going to post some of the things that I learn along the way here.

Most of it will probably be obvious if you’ve been working with these technologies for a while, though I assumed a lot of what I was trying to do would be easily discoverable online. Maybe I just need to take some Google lessons.

My Lessons

Lessons to write about…

  • Using Sequelize CLI (and probably a rant about all ORMs, but especially this one)
  • Avoiding Redux and test driving React apps using hooks
  • Dockerizing a dev database
  • Writing integration tests with Sequelize without losing my mind

Packaging a Node Express App

I’m not sure how most of the JavaScript world is building, packaging, and deploying their applications. We’re using Yarn instead of npm, but neither tool provides a simple way to zip up and version your app (unless you’re publishing to npm, that is).

Here’s what worked for me (based on Yarn 1.15.2). As with everything I post, If there are better ways, please tell me.

Install cross-var to support consistent variable usage in package.json across Windows and Mac.

yarn add cross-var

Install bestzip to support cross-platform packaging.

yarn add bestzip

Install copyfiles to support cross-platform, configurable file copying.

yarn add copyfiles

Create a script in package.json for compiling TypeScript to JavaScript

"build:ts": "tsc"

Create a script in package.json for copying the compiled files to a /dist directory

"copy": "copyfiles -a -e \"**/*.ts\" -u 1 \"src/**/*.*\" dist"

Create a script in package.json for packing up the source and dependent node_modules

"package": "cross-var bestzip $npm_package_name-$npm_package_version.zip dist/* node_modules/* package.json"

Notice the $npm_* variables? Apparently you can use any variable defined in package.json within package.json. You just need cross-var to consistently access them.

That ended up being it. Here is a redacted package.json that I’m using:

{
"name": "Nick Korbel Demo",
"version": "0.1.0",
"description": "The demo for packaging",
"private": true,
"main": "dist/index.js",
"scripts": {
"build:ts": "tsc",
"copy": "copyfiles -a -e \"/.ts\" -e \"/.spec.js\" -e \"/.log\" -u 1 \"src//.\" dist", "start": "node ./dist/index.js", "package": "cross-var bestzip $npm_package_name-$npm_package_version.zip dist/ node_modules/* package.json"
},
… more stuff here like dependencies

This post is part of a series I’m writing to share what I’m learning

Oh no… GitFlow

It seems like every team I’ve worked with in the past few years follows some form of GitFlow. At the very least, they use feature branches and pull requests to perform code reviews.

How we got here

The creation of distributed source control systems made the pull request model possible. Distributed source control systems emerged from the open source community, where there is typically a group of independent people working on a shared system and there is a need for trusted reviewers to gate changes to the code. With previous tools like CVS or Subversion, the cost of accepting a contribution to open source projects was high. Unless the author was already a trusted contributor, it usually involved submitting a patch.

Context is king

I view code as the end result of a series of decisions made by the code’s author. When the author was writing the code, they likely evaluated a series of approaches, tried a few of them, and found one to be the best solution to the problem. The final code can’t possibly capture the all decisions that were made or all of the trade-offs that were considered, though.

I was using Subversion for version control for Booked for many years. When other people had changes they wanted to contribute I asked them to commit directly to trunk or to create a branch. I’d often have to revert commits or make non-trivial changes to ensure the commit didn’t break the application. I moved to Git a couple of years back and being able to review/accept pull requests has certainly made it easier to incorporate contributions to the source from external developers.

Unfortunately, I still don’t have the author’s context which led to the decision to change or write a certain line of code. Collaboration tools make the discussion easier by being able to point to specific changes. Pull requests certainly enable a technically simpler process for accepting changes to the source, but they do not provide any other benefits over methods like patch submission.

Wait, what are we doing?

Using pull requests on a team all working on the same project is so strange to me. It’s a process designed for highly-distributed, loosely-coupled developers to contribute to the same codebase. If the team is co-located, there is no physical separation that forces an asynchronous process. There is very rarely a risk of a change being submitted that isn’t part of an agreed upon team strategy.

Yet, I see pull request comments and iterations of changes fly back and forth over the course of hours, days, or even weeks. Often, the resulting changes only take a few hours of hands-on work in culmination, but take orders of magnitude longer due to the asynchronous nature of the pull request process. During each passing minute the author’s memory of their decision making process fades. This is compounded if the author moves on to work on something new – but work in progress abuse is a topic for another blog post.

At my first real job in the early 2000s we did side by side code reviews. I would sit down with another developer (there was only one other person, so I guess I should say that I’d sit down with THE other developer) and talk through the changes I made to enable a new feature.

Unless the review was happening within a few minutes of finishing the feature, it’s very likely that I didn’t remember the specific context which led to the decision to write a specific line of code. Side by side code reviews encourage discussion and are a drastic improvement over pull requests, but I think we can do better.

We can do better

The most productive team I’ve ever been a part of adopted pair-programming for all production code. Two people, two monitors, two keyboards, one computer. There is no way to lose the context of each line of code being written when practicing pair-programming. This is a real-time collaborative decision making and code review process.

Our team had no need for a separate review step, because we already had the multiplicative brain power of two people involved in the creation of the code. Even without the “formal” review and merge process, we had incredibly high quality with almost zero knowledge gaps throughout the team.

One more gripe, that’s it, I promise

Here’s my last gripe with code reviews and pull requests as commonly practiced – they defer continuous integration.

Continuous integration is the process of continually integrating the whole team’s set of changes at a regular cadence. The purpose is simple – regularly integrating, compiling, and testing outstanding changes minimizes the time between the code working on one person’s machine and validation that it works once combined with the rest of the team’s changes.

Pull requests (more specifically feature branches, I suppose) defer integration. Sure, if you’re a diligent developer you may pull master into your branch a few times a day. But since everyone on the team is committing to their own branch, master is missing the majority of the outstanding changes and full integration is being deferred.

When I talk to teams about this, I regularly hear the opinion that it’s a good thing.

There is an impression that feature branching and pull requests help keep master stable. I agree with the spirit of this approach. I agree with always-releaseable code. But let’s be honest, if a team can’t keep trunk stable, why would maintaining X number of branches will be stable?

Branches are a security blanket. Sure, they make you feel safe, but provide little to actually provide safety. It’s just an illusion.

Always-releasable doesn’t mean change should be avoided or isolated. While it is true that not changing the code is the easiest way to keep it from breaking, building up large batches of changes that all get merged once “complete” is far more risky than committing small changes to a single branch daily. When the time between writing a line of code and knowing if it causes unexpected issues is short, it becomes very simple to identify and correct it.

Issues have nowhere to hide in a small change set.

The sales pitch

So I guess my pitch here is to try different approaches.

Try trunk based development as an alternative to feature branches. Try pair-programming as an alternative to pull requests.

If you’re absolutely not going to budge from feature branches, impose a rule that a branch cannot live longer than 24 hours. If you’re absolutely not going to budge from pull requests, impose a rule that all reviews must be complete within a few hours.

Tighten the feedback loop and see what good comes from it.

These things may not be for every team, but it may work for your team.

Escalators, Software Projects, and the Science of Queues

I recently started a new job. New jobs always bring new challenges like building relationships with colleagues, learning about the business domain, or getting up to speed on the tech stack.

This job has all of those challenges, but also a challenge that I wasn’t expecting – getting into the office.

Let me explain.

The main entrance to this office is on the third floor of the building. There are many ways to get into the first floor of the building (probably dozens, I don’t know for sure, but I’m not curious enough to find out). There are four ways that I know of to get to the second floor. There are two ways to get to the third floor – one single-file escalator and one elevator. Everyone is trying to get to the third floor.

You probably know where I’m going with this, but stick around.

I get into the office “early”. Early is subjective for a tech company, but it’s enough to say that I get in before 90% of the rest of the people in the office. From the time I enter the building until I reach my desk usually takes just a few minutes. Sometimes I ride the escalator, though most of the time I walk up it. When there are few other people using the escalator, I can get right on and climb my way up quickly.

But sometimes I get into the office later – around the same time that almost everyone else is getting into the office. As I get closer to that single escalator going up to the third floor there is often a huge line of people also trying to get up that escalator. As the line grows, causing lines to form on other floors, the typical three minute trip can take anywhere up to fifteen minutes. The distance I travel is exactly the same. The route is exactly the same. The only difference is utilization.

This is a problem that software teams run into often. We can work quickly and deliver more predictably when the route is clear and we have spare capacity. When we load up a team with multiple projects to “fully utilize” them, we’re actually slowing them down and drastically reducing their predictably.

This is the crux of queuing theory. Here’s an excellent write-up of queuing theory in the context of software development. When utilization is low, delivery speeds are fast. When utilization increases, delivery speeds slow and queues grow. You can start to predict delivery times by measuring queue lengths. And you can improve delivery times by simply reducing the or controlling queue size.

Even if an escalator breaks down, the total time it takes me to get into the office won’t be drastically impacted if there aren’t many other people using it. This is true of a software project, as well. A bug found when the team has spare capacity will have little impact on the overall delivery times.

Here’s the other fun component of queuing theory – batch sizes. For illustration purposes, let’s say it takes one minute to get one person from the entrace of the building to the office. We’ll call this the cycle time. If people arrive no more than once per minute, this will never be a wait. Each person will get to their desk in one minute.

Our office is connected to a train station. So very frequently we have hundreds of people showing up at exactly the same time, which drastically increases the utilization of the escalators. Quickly, a queue forms that continues to grow until people return to arriving less than once per minute. While the first person in the line will have a cycle time of about one minute, the last peson in the queue will have a cycle time orders of magnitude higher (depending on the length of the queue).

The effects of a broken esclator in this situation is signficantly more impactful to overall cycle times.

Big batches slow down delivery. A huge project ahead of a small one means the small one will not be completed until the big one is finished, making it’s delivery timeframe wildly unpredictable.

There are some simple options to fix this issue. The obvious one is to reduce batch sizes and work in progress. Don’t start more than can be finished based on historical cycle times. Don’t pull in new work until the existing work is complete. Split projects into tiny deliverables. I’ve talked before about how splitting big stories can improve delivery times.

Other more difficult options focus on reducing utilization or cycle times in other ways. In our example, we could add more escalators. Or we could speed them up. This is consideribly more complex for a software team, of course, but could be the right long term option.

Maybe we need more people on the team. Maybe we need more automation. Maybe we need to move to a more componentized architecture. These may be positive changes, but they require significantly more effort to implement.

Though, if we want to reduce delivery times, the last thing we need is an attempt to “keep everyone busy”.

Much of this is inspired by the work of Don Reinertsen and Mary & Tom Poppendieck. I’d highly recommend reading The Principles of Product Development Flow: Second Generation Lean Product Development and Lean Software Development: An Agile Toolkit for a much deeper dive into lean software development.

SimCity BuildIt – A Lean Software Training Ground

Confession time. I’m addicted to SimCity BuildIt.

Ok, with that out of the way I want to talk about how this game is making me more conscious of lean software development principles. The gameplay is pretty simple – you construct roads and buildings, then produce raw materials and manufacture goods to complete missions and upgrade buildings.

Simple. But there are some challenges. One challenge is that the time it takes to produce of materials and goods varies by the type. This complicates my life because I don’t know what I’ll need in order to complete a mission ahead of time. So, if a mission comes up which requires 5 watches, for example, I’m in trouble. A watch is built from a chemical, a glass, and a couple plastic materials. Chemicals take 2 hours to make, glass takes 5 hours, plastic takes 15 minutes. The watch itself takes a couple hours and I can only produce 1 watch at a time.

If I go full-blown lean, I’ll do this just-in-time and the first watch will take about 7 hours to build, then each additional one will take 6 hours (I can produce up to 45 materials in parallel, so the material production time is equal to the longest part, in this case the glass).

Here’s the problem. If the mission only lasts 3 hours I won’t be able to complete it. What’s a mayor to do?

I have the ability to carry a limited set of inventory, so I could pre-produce some of these more ‘expensive’ items and hang on to them for when they’re needed. The drawback is that this takes up shelf space and I have no idea when or how many items I’ll need. If I fully stock the shelves but end up needing another item, I’ll have to get rid of some of my fully produced items.

I can purchase fully produced items from other cities. I don’t have to carry the inventory, but I pay a premium in this case and there may not be any available when I need it.

GET TO THE POINT, NICK

Ok, I hear ya. So what does any of this have to do with software development?

One of my all time favorite software development books is Lean Software Development by Mary and Tom Poppendieck.

I learned that overproduction or producing things that nobody needs is waste. We should be building the smallest increment of an application, measuring the value it returns, and iterating on that. Spending a month adding a feature that nobody uses is a month of lost time that could have been used to produce something of value.

Along the same lines, producing something and letting it sit on a shelf provides no value. Small batch sizes can help with this. Often, we believe that we can’t release a software product until it’s “done”. So we build all the features we can think of, then push them all out at once. While we’re holding all of those completed features in inventory, we’re not realizing any value from them.

Another obvious, but seldom recognized, truth of software development is the effect of the theory of constraints. When building an application we’re only as fast as the slowest part of the process. If I need to produce cheese in SimCity, for example, it’s only a couple of hours – but to produce the raw materials is 5 hours. So what is a 2 hour process on it’s own is actually 7 hours end-to-end.

The theory of constraints shifts our focus to optimizing the slowest part of a system. Instead of compartmentalizing the development process into analysis, design, development, testing, release, etc, see the process as one piece. It doesn’t matter if we can test 20 things per week if we can only develop 4. Optimize the whole.

We have a lot of tools to visualize and optimize our processes. Value Stream Maps can show us where our time is actually spent from start to finish. Cumulative Flow Diagrams can visualize bottlenecks in our workflow. There’s nothing I can do to speed up the production of electrical components, but we have complete control over everything we do when building a software project. The hardest part can be identifying where to focus.

Well, my appliances are done being made. Time to go upgrade a building in my city!

What I Should Have Said

Mike Birbiglia is one of my favorite comedians. His effortless mix of storytelling and comedy is something I haven’t seen anyone else be able to pull off.

Throughout his jokes he has a recurring theme where he finds himself in a high-stakes conversation. He builds up the audience with the statement “What I should have said…. was nothing.” It’s an incredibly simple concept to grasp, but incredibly difficult to practice.

This is something I’ve been trying to get better at. I’m chock-full of opinions and thoughts that I share freely and unsolicited at times. Instead, I’m trying to listen more, talk less, and understand before sharing what’s on my mind.

I think this will make me a better manager and will help me better focus on the people around me.

Look Mom, No Hands! Test Driving Code Without A Mocking Framework

I love TDD. I haven’t found a more effective way to incrementally design and build software in the 15+ years that I’ve been doing this. I have formed and evolved a lot of opinions about how I approach TDD, though.

Recently, I wrote a post for EuroStar Software Testing titled Look Mom, No Hands! Test Driving Code Without A Mocking Framework

This is a topic that has been on my mind for a long time. It’s not intended to start a mocks vs stubs flamewar or anything like that. Instead, I wanted to walk through my progression of TDD practices over the years and share what I’ve learned.

Don’t get me wrong – test-driving with a mocking framework is better than not test-driving at all. I just prefer stubs.

Looking back at the test cases in the Booked source code which utilize PHPUnit’s mocking framework (yes, there are still a lot), I can see just how entangled the test code is with the implementation of the production code. The source for Booked changes frequently and it is covered by more than 1000 unit tests. New features are introduced and, occasionally, some of the unrelated tests fail.

They fail because there is too much specified in the mock setup. In order to validate the behavior of some area of the code, I have to set up unrelated mock expectations to get collaborating objects to return usable data. If I change the implementation of an object to no longer use that data, my test shouldn’t fail.

A couple of years ago I stopped using PHPUnit’s mock objects and I’ve seen the resiliency of my unit test suite increase. I’ve also seen my development speed and design quality improve. Instead of ambiguous mock expectations scattered throughout the tests, I’ve built up a library of stub objects which have logical default behavior.

When test-driving increments of functionality, I’m able to concentrate on the behavior that I need to implement rather than getting distracted with test setup and management.

More focus. Better design. Higher quality. No mocks.