Breaking down the vulnerability at the heart of web development
There’s a gnarly security vulnerability at the heart of web development, and we need to talk about it.
Hackers have been specifically targeting this kind of bug recently. It’s consequential.
All web developers should known about this issue. To that end, I’m writing for both new and experienced devs. Hopefully non-technical folks can also follow along.
If you want to get into the nitty-gritty details, I recommend you check out Darcy Clarke’s original post.
Let’s dive in.
Software is social
When building software, you’re constantly using other people’s code to do stuff — big and small.
This is standard practice, and sometimes people explain it as using pre-existing “Lego blocks” or “building blocks”.
It’s great!
...and it also means that we, as web developers, are constantly downloading code from all over the internet onto our laptops.
As you can imagine, that creates security concerns.
One concern is that you download something with a security vulnerability in it — in other words: a piece of software that contains a loophole that could be exploited by hackers.
Another concern is that you accidentally download malware. This isn’t just the possibility of getting hacked — this is the real deal. Someone has now installed malicious software on your computer.
So... should we stop downloading code from the internet?
Actually, no.
This would be equivalent to never clicking on a link in your email.
Sure, it reduces risk...
...but we’ve got work to do, and clicking links is an important part of using email!
It just means we have to follow some best-practices.
Similarly, downloading software is a part of web development. One that requires care.
To better understand the landscape, let’s cover some quick background on the software supply chain.
Later, I’ll discuss some of those best-practices for staying safe.
Background: The software supply chain
Remember those building blocks of software that I mentioned? Those are called packages.
They’re published online for others to use. The place where they’re published is called a package registry.
When you want to use a package, you can download it from the package registry and include it as one of the building blocks that your code uses. Your code depends on this package, so we call it a dependency.
This means that when you want to get your software ready to use, one of the first steps is to install all of its dependencies from the package registry.
Projects typically list out their dependencies in a file. When you run an installation command, that file determines what gets installed.
In JavaScript projects, we use package.json
and package-lock.json
to list out our dependencies.
All of those dependencies — the Lego blocks that you use — are considered a part of your software supply chain.
Much like with physical goods, the supply chain is a part of what you’re building.
If a sweater uses Alpaca wool, that wool is a part of the sweater. When your customers get the sweater, they’re getting the wool too.
Similarly, if your software uses a particular package in its supply chain, that package is a part of your software.
At risk of splitting hairs: Not every package is shipped to your end-users. There might be dependencies that are only used during development, for example.
That’s still a part of your supply chain, though — and it could expose you, as a developer, to risks.
If there’s a security vulnerability in one of your dependencies, it’s serious. It likely means you’ve got that vulnerability baked right into your software.
And guess what — your dependencies have dependencies of their own. And even those packages have dependencies.
It’s dependencies all the way down!
Needless to say, it gets complicated quickly.
What happens when you install your dependencies
Every project comes with a command that you can run to install its dependencies.
When you do this, your dependencies will each be installed from the package registry.
Each of those dependencies will need to install its own dependencies, and so on.
This is a very risky step. You could accidentally download some random code from the internet without meaning to.
...oh, and by the way — there’s this super scary thing that I forgot to mention: before a package installs its own dependencies, it can run a script on your computer. That means it can run any code that it wants!!
(Obviously this is a nightmare for security.)
Recap: Danger City
Does it feel like we’re swimming in shark-infested waters yet?
Let’s recap the dangers here:
- You could accidentally download code that is vulnerable to hacking. (you could get hacked later)
- You could accidentally download code that is straight-up malware. (you’re hacked)
- One of your dependencies could run a script that gives hackers access to your computer. (you’re hacked)
...cool cool cool cool. No big deal.
That’s not anxiety inducing at all...
So... how do we stay safe?
There are too many dependencies to go through manually, so there are automated tools that check them for us.
These tools are trying to answer the questions: “What code is getting installed?” and “Is it safe?”
These are called dependency scanners. Some examples include Snyk and Dependabot.
But there’s a problem...
Another scary thing
Okay, so it seems like the best way to know what a piece of code is doing is to read the code, right?
In other words: it feels like that’s what dependency scanners should do.
...but that’s actually not how they work.
It turns out that when you publish a package to the registry, you don’t just include the package, you also include something called a manifest file.
The manifest file is basically a big label that says “Hey, here’s what’s up with this package: it has these dependencies, and it runs this script.”
And that’s what dependency scanners look at.
...but the label might not actually match what’s in the package. It could be wrong — maybe even on purpose.
Whoever is publishing the package can put anything on the label. The registry doesn’t check it.
Y'ever bought shoes?
Look I don’t know about you, but whenever I’m buying shoes I like to try them on.
Sure, the label on the side of the box tells me the size — and I could go off of that.
...but there's a chance that the label is wrong. Maybe the shoes are actually a size 11, not a size 12.
...or maybe the label says that the shoes are green, but they're actually blue.
Either way, it’s generally a good idea to open the box and check.
With software dependencies, it’s the same idea: going off of the label is risky. You want to actually open the box and check what’s inside.
In other words: you want to check the code itself, rather than some label that supposedly describes the code.
So... how should you check your dependencies?
The key is to use a dependency scanner that actually opens the box, rather than just reading the label.
As of right now, the only one that I know of is called Socket.
Socket keeps track of the packages it’s found issues with.
You can use their special tools to install your dependencies, and it will check them against its database of known issues.
In particular, they have a tool called "safe npm" that is a wrapper around the npm CLI. It checks your dependencies against their database before installing them.
When will this problem go away for good?
This problem will go away when package registries start checking the manifest files that people upload.
Right now, most registries let you put whatever you want in the manifest file, and they don’t check whether it’s accurate.
For an up-to-date list of vulnerable registries, please see Darcy's post: The massive bug at the heart of the npm ecosystem.
As of this writing, the issue is extremely widespread.
Ideally, the manifest file should either be generated automatically by the registry, or it should be checked for correctness.
...but — unfortunately — changing a piece of core infrastructure like a package registry is a big deal, so it tends to happen slowly.
So we’re probably stuck with this problem for a while.
We’ll need to use tools like Socket’s "safe npm" to stay safe.
...or even use new package registries that are more secure, once they become available. Darcy Clarke, who originally wrote about this issue, is working on one called Vlt.
Conclusion
So there you have it: a gnarly security vulnerability at the heart of web development.
It’s a big deal, and it’s probably not going away any time soon.
Thankfully we have tools to deal with it.
Stay safe out there!