Case Study: Angi

Executive Summary

Founded in 1995, Angi (formerly Angie’s List) connects individuals looking for home improvement, repair and household services with home service professionals. At the center of the Angi Services business is a Ruby on Rails-based web application internally known as "Handybook". Angi engaged Hint's services to upgrade Handybook from Rails 4.1 and Ruby 2.3 to the latest versions of Ruby and Rails.

Challenges

Angi is experiencing accelerated growth. Keeping application dependencies up-to-date is always important, but it's absolutely essential in today's developer market. Attracting and retaining top talent is difficult already; add an outdated stack into the mix, and it's nearly impossible. Security & compliance risks can be mitigated by patching old dependencies, but there is no shortcut to improving developer happiness. With this in mind, Angi chose Hint as its partner to establish a process by which major dependency upgrades can be performed safely & continuously.

How we helped

Angi's services are in high demand, and Handybook is an essential application. Featuring deep integration with third-party services and 40+ microservices, Handybook has a lot of moving parts. Aggressive caching, database sharding, and horizontal scaling are all important factors allowing the application to meet demand.

Upgrade Manifesto

Before starting on the upgrade, we set forth the following upgrade manifesto.

  • Minimize WIP. Upgrading Rails in a large, very active codebase is a risky business. Falling behind is easy, and if you're not careful, you may find yourself in merge hell. Practically speaking, this means no long-running branches. Instead, all changes must be PR'd to the main branch and be merged swiftly.
  • Incremental everything. No task is too small to deserve asking the question, "how can we break this up?". We minimize WIP and reduce business risk by taking many small, incremental steps. Put in practice, this means changes for compatibility are made in the smallest possible units, and when deploying changes into production, we perform a canary deployment, starting at 1% traffic and ramping up slowly.
  • Use data to drive decision making. Sometimes it isn’t possible to determine the correct solution for a given challenge by static analysis, even with excellent git history. In these situations, it's tempting to guess. But don’t do it; it's a trap! Instead, use dynamic analysis (often in production). Gather more information, and make decisions based on fact, not conjecture.
  • Over-communicate. We'll be making significant sweeping changes (incrementally) to the entire codebase. If something goes wrong and the symptom is outside our purview, we want to know about it. By continuously communicating project status, every member of the organization becomes a secret agent, reporting helpful information, resulting in a smoother upgrade.

Process

Dual Boot

In the spirit of "minimizing WIP" and "incremental everything", we need a way for all compatibility changes to flow to the main branch. The solution is to dual-boot the application with two different sets of dependencies co-existing on the main branch together. Dual-booting a Rails application is a well-trodden path at this point, and to make the process easier, we took advantage of Shopify's excellent bootboot gem.

With bootboot installed, you can insert a condition like the following into your Gemfile:

if ENV["DEPENDENCIES_NEXT"]
  gem "rails", "~> 6.1.0"
else
  gem "rails", "~> 6.0.0"
end

In each stage of the upgrade (Rails 4.1 -> 4.2, 4.2 -> 5.0, and so on), we begin with bundling the new/next set of dependencies in a Gemfile_next.lock. In each step, some dependencies can be upgraded alongside the current/non-next set of dependencies (i.e. Gemfile.lock). These dependencies are upgraded in advance and removed from the Gemfile_next.lock equation.

Changes for compatibility

Once the main branch has a Gemfile_next.lock with the target dependencies, our focus turns to compatibility. The goal here is to obtain green CI results when the suite is run against both sets of dependencies. Note that only upgrade-related branches run the test suite against the "next" dependencies at this stage.

Each compatibility change targets the main branch. Ideally, our changes are compatible with both the current and next set of dependencies "out of the box." When changes are incompatible, we consider making them compatible by backporting functionality from the new dependencies. If the backport path is not feasible, we add a condition. Here’s an example where sprockets changed the API for registering an engine:

if dependencies_next?
  Rails.application.config.assets.configure do |env|
    env.register_engine '.es6', BabelTranspile
  end
else
  Rails.application.assets.register_engine '.es6', BabelTranspile
end

This stage is where the bulk of the upgrade work takes place. Many small PR's bring compatibility with the new/next set of dependencies until, eventually, we have a green build on CI with the suite running under both sets of dependencies.

Green CI, require compatibility on all branches

Once the test suite is green when run under both sets of dependencies, it's time to "lock it in". At this phase, we enable dual-boot CI on all branches. This mandates all developers write code in a way that is compatible with both sets of dependencies. Changes are not approved to be merged to the main branch unless they are compatible with both sets of dependencies.

handybook-ci

Smoke test in a production-like environment

The final stage is QA. Angi uses Kubernetes namespaces to spin up production-like environments for testing purposes. Testing typically starts with a sanity check/smoke test by upgrade engineers. After that, a QA specialist puts the upgraded application through its paces.

When a bug is discovered in this phase, it is fixed on the testing branch and re-deployed for further testing. This cycle repeats until no known issues remain.

Canary Deployment

Once all the testing & QA boxes are checked, it's time to deploy, but not all at once. In the spirit of "incremental everything", we begin rolling out the upgraded application alongside the current version. At Angi, we accomplish this with weighted DNS in AWS Route53. We send 1% of production traffic to Kubernetes pods running the upgraded application, then monitor and see what happens. If everything checks out, we increase to 5% and continue in small steps to 100%. If an issue is discovered, we dial traffic back (0% to upgraded instances), fix the problem, and start back at 1%. This phase takes time, but it's invaluable.

We utilized a Kibana dashboard to monitor the rollout. In the picture below, you can see roughly 1% of web traffic is targeting handybook-next, while all sidekiq jobs are still being processed by non-next pods.

handybook-dashboard

Bake at 100%, strip and repeat

Some issues may not be immediately apparent. Once 100% of production traffic is being routed to the upgraded application, it's time to let it bake. After a day or two of watching graphs & logs, we commit to the upgrade.

Committing to the upgrade means copying the contents of Gemfile_next.lock over to Gemfile.lock, freeing up Gemfile_next.lock for the next upgrade step. We also must remove all compatibility backports and dependencies_next? conditions.

At this stage, the application has successfully been upgraded to the target dependencies, and we can move on to the next upgrade step!

Challenges

By following the predefined upgrade manifesto & process, we avoided many common pitfalls. However, it's worth calling out some unique challenges that we encountered.

Strong Parameters migration

Strong parameters is a mass-assignment security feature first introduced in Rails 4.0 as a replacement for protected attributes. When we began the upgrade process, Handybook was running Rails 4.1 and had only partially embraced Strong Parameters. Once we reached Rails 4.2, finishing the migration to Strong Parameters was necessary before upgrading to Rails 5.0 where protected attributes are no longer supported.

Performing this migration is never easy, and doing so in Handybook was no exception. Therefore, we developed a process (and tools) that allow us to perform the migration systematically, without guessing. At a high level, the process goes like this:

  • Install logging that shows us where unpermitted mass-assignment is being performed.
  • Install moderate_parameters, a gem we developed that shows us what parameters will be filtered out by strong parameters, without actually filtering any parameters out.
  • Use the test environment to resolve all unpermitted mass-assignment & moderate parameter logs. Take unpermitted mass-assignment & moderate parameter logging into production and resolve remaining issues until the logs are empty or until they contain only known/explicitly ignored entities (parameters that should * not be permitted).
  • Swap calls to moderate parameters for the strong parameters at which time actual filtering occurs.
  • Finally, remove the protected_attributes gem and related code, and the application is running with Strong Parameters!

Complex Ruby objects in persistent data stores

Storing complex Ruby objects in persistent data stores is a mistake that often bites us during upgrades. The problem is, it's hard to know when it'll bite, and it often goes undetected in test suites & QA testing. For the problem to occur, objects must be serialized into persistent storage by the application running one set of dependencies, then read by the application running another set of dependencies. When it "bites," the application fails to deserialize the object, often because of a constant removed or renamed.

For example, if you attempt to serialize an ActiveRecord object in Rails 4.2, and deserialize it in Rails 5.0, you will get an ArgumentError like the following:

ArgumentError: undefined class/module ActiveRecord::ConnectionAdapters::AbstractMysqlAdapter::MysqlDateTime

We ran into several of these scenarios while upgrading Handybook. In each case, the solution was to rework the code such that complex objects are not put in persistent data stores. Instead, serialization is performed upfront, and only simple data types are put in persistent storage. When the data is deserialized, we rehydrate those complex Ruby objects, and the code path continues as it did before.

Octopus -> Rails 6.1 multi-database features & Octoball

Another significant migration was caused by changes in Active Record at Rails 6.1 to enable new multi-database features. Specifically, Handybook was using a gem called Octopus to make some database queries using read-replicas, and unfortunately, the gem is not supported beyond Rails 6.0. Adding support for Rails 6.1 to Octopus wasn't feasible because of fundamental assumptions made in its approach, which changed in Rails 6.1.

The good news is that Rails 6.1 introduced multi-database features that make much of what a gem like Octopus did obsolete. We were able to migrate Handybook to a gem named Octoball, which is a relatively thin layer on top of Rails 6.1 multi-database implementation offering an Octopus-like API.

Getting rid of the Octopus gem was a massive win for the Handybook application as it was the source of many edge-case bugs and unexpected behavior. A bonus was that Octoball usage could also be tested in CI.

Results

We partnered with Angi in their effort to improve developer experience & productivity by investing in the platform. Upgrading a large/busy Rails application is tedious work and it's easy to introduce new tech debt that compounds making future upgrades even more difficult. By following a well-defined process backed by the upgrade manifesto, we achieved an overall reduction in technical debt, leading to a number of positive outcomes.

Perhaps the best way to highlight the results is by sharing some inspirational quotes from a few of our favorite people at Angi:

Angi developers are happy, leading to increased productivity

"I think I speak for every long-tenured Angi person when I say... WOW!"

- David Olsen
   Director, Infrastructure Engineering

With an up-to-date stack, it's easier to attract top engineering talent

"Handybook is pretty heckin modern if you ask me."

- Thomas Johnell
   VP, Engineering

Angi customer data is safer than ever (and security & compliance can sleep at night)

"I for one like software that is not EOL and still receives security updates. Nice work!"

- Chris Sansone
   Director of Information Security and Risk

How can we help you?

Interested in finding out how partnering with Hint could help your company achieve similar results? We'd love to talk! Contact us to get started.

Benjamin Wood

Ben is a family man and partner at Hint. When not shipping software with the team at Hint, you'll likely find him spending time with his wife and two children.

  
  

Ready to Get Started?

LET'S CONNECT