Skip to content

How to Spring Clean Your Heroku Slug Size

This week I was deploying a routine update to Heroku, but the app wouldn’t deploy. Here is the message that was displayed in the build log:

-----> Discovering process types
       Procfile declares types     -> release, sidekiq, web
       Default types for buildpack -> console, rake
-----> Compressing...
 !     Compiled slug size: 502.5M is too large (max is 500M).
 !     See: http://devcenter.heroku.com/articles/slug-size
 !     Push failed

Slug size is too large was the culprit. 502 MB! The app is decently sized but that seems way off to me. According to rails stats the app is about 58,243 lines of code. Not small, but certainly not huge. And these are mostly source code plain text files too so file size should not be an issue at this scale.

Investigating the Sizing

I used a basic du -sh * command to check the basic file sizes of the folders in the app’s repository to see what’s going on. Here’s the truncated output:

 12M    app
352K    config
1.3M    db
376K    lib
332K    public
260M    node_modules
 78M    tmp
226M    vendor

So my app is relatively small, only about 17 MB. The rest of the file size comes from the various dependencies. By some standards, I don’t really even think this app has very many dependencies. There are a few dozen Ruby gems in the Gemfile and there’s nothing out of the ordinary here. Rails, Postgres, Nokogiri, etc. All things you would expect. (Although side note, the google-api-client gem is 61 MB on its own. Yikes, by far the largest gem in there. That’s something to work on for another day.) All of the gems are in vendor/ so the file size there checks out as reasonable.

The tmp directory is pretty big here too, but that’s ok. It is mostly filled with bootsnap caching files to make the app load quicker too. This would be an easy 78MB to clear out if it was the culprit, since I don’t really care how long the app takes to boot, but let’s reserve clearing it for later.

Even that massive node_modules directory isn’t really problem, since Heroku slugs are compressed before this slug size is calculated. Those dependencies compressed down are only about 80MB.

So there’s something up here. Even with my most conservative calculation the slug size should be max of around 300MB after compression. Which is still too big in my opinion for an app of this size, but still nowhere near the limit of 500 MB.

There is a nice Heroku plugin that allows you to download slugs from your builds. It’s a very handy way to download the exact slug from Heroku’s filestore on S3. I downloaded the slug in question to give it a look.

Using the same du command within the extracted slug files, there was one major diference:

961M    public

Whoa. 961MB in the downloaded and unzipped slug, but only 332K in my project’s repo itself. It turns out that inside the public/assets directory there were hundreds of asset build files dating back years. The app itself is a few years old and it looks like Heroku had stored the compiled assets in the slug for each release since the app was created. Yikes.

This is a known feature on Heroku and it is called the build cache. I was mistakenly under the impression that each app deploy would cause a new slug to be built, including the asset compilation. The build cache feature makes sense: why waste time re-downloading files and other dependencies on each build when they can be cached? But caching the output of asset creation until the end of time? That seems excessive. So let’s clean it up.

The Fix

It turns out there’s an easy fix when build cache is your problem. There’s another Heroku plugin for managing the repo that contains your app’s code. Within this plugin is the purge-cache utility, which clears the build cache.

Here’s how it works:

heroku plugins:install heroku-repo
heroku repo:purge_cache -a appname

I redeployed the app and that shaved off a few hundred MB from the slug size. We’re back in business here. But I still couldn’t get past that massive node_modules/ directory. That was 260MB of modules only used for compiling assets. Once the assets were compiled, those modules are no longer used. (This particular app does not use any node_modules at run time, they are all precompiled with webpack.)

There is a community-developed buildpack for Heroku called post-build-clean that gives us the ability to remove files from the slug after the build command finishes running. This is slightly different than the standard .slugignore file which won’t even allow you to use those files during the build phase. I installed the buildpack, added node_modules to the .slug-post-clean file and tried to deploy again.

This time around, it looks much more reasonable to me:

-----> POST BUILD CLEAN app detected
Removing directory node_modules/ from slug
-----> Discovering process types
       Procfile declares types     -> release, sidekiq, web
       Default types for buildpack -> console, rake
-----> Compressing...
       Done: 87.4M
-----> Launching...
 !     Release command declared: this new release will not be available until the command succeeds.
       Released v1019

For a medium-sized app 87 MB seems like a good size to me. The asset files continue to build up after the compilations step, but it’s going to be a while until I need to clear the cache again.

MonoLisa Font

MonoLisa is a new coding-focused font by Marcus Sterz:

As software developers, we always strive for better tools but rarely consider font as such. Yet we spend most of our days looking at screens reading and writing code. Using a wrong font can negatively impact our productivity and lead to bugs. MonoLisa was designed by professionals to improve developers’ productivity and reduce fatigue.

I’m still partial to Operator from Hoefler & Co, but MonoLisa looks very well done.

Github and npm

Speaking of Github, this week it was also announced that Github (aka Microsoft) has acquired the de-facto package manager for JavaScript, npm. 

Nat Friedman, on Github’s blog:

npm is a critical part of the JavaScript world. The work of the npm team over the last 10 years, and the contributions of hundreds of thousands of open source developers and maintainers, have made npm home to over 1.3 million packages with 75 billion downloads a month. Together, they’ve helped JavaScript become the largest developer ecosystem in the world. We at GitHub are honored to be part of the next chapter of npm’s story and to help npm continue to scale to meet the needs of the fast-growing JavaScript community. 

On what’s next:

Looking further ahead, we’ll integrate GitHub and npm to improve the security of the open source software supply chain, and enable you to trace a change from a GitHub pull request to the npm package version that fixed it.

That sounds very cool. Excited to see that piece come together. 

For this Mac-loving tech kid that grew up in the 90’s, I still cringe any time I hear Microsoft doing anything. But this is not the Microsoft of then. They’ve done well with Github so far. They’re doing amazing things with the cloud. I need to get over it. Hopefully this is a new great beginning for npm. 

Github Mobile

New this week: a brand new native iOS and Android app for Github. From what I can tell these are completely native apps. Maybe there’s some web embedded stuff in there, but if there is, I can’t tell and it’s super slick and fast. As it should be. 

The app seems very well designed and thoroughly considered. Handy for managing Pull Requests and Issues on the go. 

Proxyman

Very nice looking app for debugging network requests. Reminds me of the beautiful and powerful Paw app. I love that these little development and tech tools are being so well done lately, rather than having to use some awful cross-platform Java app.

via Brent Simmons

Introducing Tally

Last week I open sourced a new RubyGem called Tally. Tally was created over the past few years as a part of a number of products I’ve worked on and I’ve always wanted to open it up to the public for anyone else to use as they see fit.

Tally is a quick utility for collecting counters and stats throughout a Rails application. Technically I suppose it could be used outside of Rails in a standard Ruby app, but that’s not my use-case so I haven’t spent any time optimizing it for that.

Tally sits on top of Redis for fast collection of these counters. The goal of the stat collection was to make it as fast as possible so that counters could be incremented throughout your application code in real-time.

Periodically throughout the day the counters are extracted from Redis and archived into a standard ActiveRecord model within your application. There’s a single table added to your database that keeps track of the counters each day after they are archived.

That’s it. It’s a quick and simple way to get basic stats reporting into an application. There are certainly bigger players in this space, and I’ve used several before as well. StatsD is a great example of a much more robust and scalable tool for this sort of thing. But in my recent use cases, it’s just a bit overkill. Sure, I could spin up StatsD and get it all to work. But I had bigger areas that I wanted to focus on, so Tally is a great place to start.

Side note: if you’re wanting to collect millions and millions of data points, you probably need something different here. Tally can work, but there are better tools for that job.

I’m currently using Tally for few specific needs, which I think are perfect use cases for the project. Here are a few use cases so far…

Read more…