Every year, I speak with a handful of current, former, and potential clients who ask about service-oriented architecture or microservices. The situations all start out with similar intent:
Image processing takes a long time and performs orders of magnitude more operations compared to other areas. Because it’s isolated, we’d like to break it apart so we can scale it independently to handle the growing workload.
Or
We’re growing our team and our monolith isn’t cutting it anymore. It’s difficult to add new features, and services will force the team down a path where they need to properly isolate components.
Or
We’ve got a couple of discrete areas of the application, each of which changes independently. Most of these areas are data-in, data-out, with no persistent state, and fit within a simple request/response lifecycle. Because we expect the complexity to grow, we’d like to extract those to their own applications.
One of these is not like the others, however. Can you guess which?
Services Don’t Address Process Problems
Both the first and third example seem reasonable and well within what I’d encourage a client to explore as an option. While simple, isolated components aren’t the only reason to extract to services, it can make the process easier (both from a mocking and testing perspective, and from a surface-area perspective). When exploring services becomes an attempt to band-aid larger issues, however, I strongly encourage clients to reconsider their choices.
What (seemingly innocuous) process problems get cited most often as reasons why a company wants to move to a service-oriented architecture?
- “It takes a lot longer to add new features because there’s so much code.”
- “The codebase is so large that even code review takes a long time when we add new features.”
- “A small change I made introduced an odd bug in a totally unrelated area of the application.”
- “The code is so complex that our test suite takes hours to run.”
Slowing Feature Development
Feature development often slows down after an initial period of time where everything seems so simple. The team is cranking out features left and right for what could be months on end, and then one day, a feature that was expected to take a couple of hours takes a day. Another feature that the team agreed seemed like it should be a under a week took two.
What gives?
As applications grow, so does complexity (both accidental and essential). Teams that are not rigorous about addressing this complexity begin to run into issues at multiple levels as applications grow. Tests become more difficult to maintain and run more slowly because of growing data dependencies. Side-effects from data modification happen haphazardly and have impacts on other areas of the application.
These are realities of codebases regardless of whether you have a monolith or a group of services. Services may not solve your problem, and they may actually make it worse if the boundaries between services are not correct.
Slowing Code Review
As applications grow, and especially if there are intermingled dependencies, adding new features results in shotgun surgery. What should be an isolated change ends up touching dozens of files. Because of the growing breadth of changes, reviews go from reviewing 50 new lines of code over four files to changing 150 lines of code across twelve. Reviewing code takes longer, developer confidence is reduced, and merging code goes from 20 minutes to 2 hours.
Services may again seem like a good solution; with smaller, isolated codebases, theoretically a single change occurs in one application and the size of the change is reduced.
Even with a well-written and clearly-defined API, if there’s not proper decoupling between systems, what seems like it should be back to “adding 30 lines of code in four files in one codebase” ends up being “adding 30 lines of code in four files to the codebase primarily responsible for the feature, and changing 20 lines across each of the other two services because things aren’t properly decoupled.”
Features Introduce Seemingly Unrelated Bugs
Related to slowing feature development are code changes that introduce bugs in
unrelated areas of the application. When multiple parts of an application
interact with lower-level components instead of an agreed-upon interface,
issues often arise. One simplified example I’ve seen is when a model is using
some sort of state machine or status column, and multiple controllers are
querying against the model directly. Especially when new states are added and
multiple controllers are using NOT IN
clauses to filter results, bad data
might show up, an incorrect set of customers could be emailed, etc.
This, again, boils down to explicit interfaces and provided mechanisms for interacting with data. Code Climate discusses many options for refactoring ActiveRecord models, and as applications grow, it isn’t appropriate to interact with these models directly, instead relying on query objects, service objects, and other abstractions that enforce appropriate rules and constraints.
Services again feel like an answer: “If I make a change in system A, system B won’t be impacted.” However, subtle bugs often creep in. Debugging across multiple services becomes much more difficult, and verifying bugs requires standing up multiple systems to reproduce.
Slowing Test Suite
As dependencies grow within an application, so do test suite times. Testing behavior at the ends of workflows requires tens or hundreds of records to exist in the database, and mocking service responses to recreate a swath of data without persisting seems like a big win.
With well-defined, agreed-upon interfaces for system interaction at the application level, translating these interfaces to the test harness is hugely important. Instead of using a tool like VCR or WebMock and stubbing API responses, stub at the interface and return real data structures.
Services may result in faster individual suites, but running all suites would likely result in similar speeds. Additionally, confidence is often reduced unless a suite is added to test against multiple services.
Moving Forward
It’s perfectly reasonable to think that services are a solution to these problems. By forcing isolation of different areas of the application through explicit interfaces (HTTP over an API), it seems adding features would become a breeze.
I hear statements like
“We wouldn’t need to worry about side-effects in different systems!”
and
“We can isolate complexity so others don’t have to think about it!”
virtually every time services are brought up.
While this might be true, a team can accomplish this without using services.
The biggest factor in what might have a positive impact is not the fact that
the code is isolated; it’s that the team has agreed on an explicit interface
upon which multiple areas can communicate. All information now has a known set
of inputs and outputs (POST
bodies, JSON responses) against a very limited
set of operations (API endpoints). The same can be achieved by applying these
exact principles to existing code and funneling what would be HTTP requests
into individual mediators of responsibility (classes and methods, or modules
and functions).
In practice, this is not easy. It requires a level of rigor that many teams
might not be accustomed to, and in frameworks like Rails, the easiest thing to
do might be to modify a value in session
or update a record directly without
giving it much thought. Over time, dependencies begin to bleed into one another
and the result is a tangled mess.
Work against this entropy by considering how areas of the application are meant to interact, and write software to streamline these paths of communication.