Golden Paths and Accelerators, Part 1: What are They?
What are Golden Paths?
They have many names: Golden Paths, Paved Paths, Accelerators. But what are they?
According to the CNCF, a Golden Path is: "Templated compositions of well-integrated code and capabilities for rapid project development."
And, to get our bearings on how that fits into a platform, CNCF defines a cloud-native computing platform as:
"an integrated collection of capabilities defined and presented according to the needs of the Platform's users. It is a cross-cutting layer that ensures a consistent experience for acquiring and integrating typical capabilities and services for a broad set of applications and use cases. A good platform provides consistent user experiences for using and managing its capabilities and services, such as Web portals, project templates, and self-service APIs."
So, there are three primary layers to Platforms:
The discoverability and interaction layer. This layer focuses on helping the user determine what the Platform offers, how to use it, and how to explore and play. Implementations include a developer portal like Backstage or a good documentation + API set.
The integration layer. This layer focuses on understanding use cases that integrate multiple capabilities and make accomplishing it easier. This layer is consumed by the discoverability + interaction layer.
The primitives layer. This layer includes APIs or modules -- encapsulations of behavior used to compose integrated product experiences. This layer is consumed by the integration layer.
Paved/Golden Paths and Accelerators combine integrated capabilities and an interaction approach to solve problems customers have more simply, and they plug into a cohesive, discoverable user experience.
As such, they have two interfaces: one for Platform team members to know how to build them cooperatively and one for the customers.
As platforms and their customers mature, templates are often customized and composed in a self-service manner:
"As they mature internal platforms also offer compositions of such services as self-serviceable templates for key scenarios like web application development or data analysis, aka MLOps."
Golden paths take on some of the characteristics of primitives: they have to be composable, the internals are documented, and operational support is in place to support these customer workflows.
The Template Approach: Spotify
In "How We Use Golden Paths to Solve Fragmentation in Our Software Ecosystem," Gary Niemen states: "The Golden Path is the opinionated and supported path to build your system, and the Golden Path tutorial walks you through this path."
I bolded opinionated and supported because those further constrict what a Golden Path means at Spotify. "Opinionated" means standards and tools are built in as convention. "Supported" means the Platform team keeps it working and updated and provides customer support.
Spotify primarily found value in Golden Paths as a solution for an app or service startup and delivered the products via Backstage. Spotify is already advanced in its use of Golden Paths, as evidenced by its expanded usage per product type:
"Over the years, the Backend Engineering Golden Path tutorial has grown and we have added Golden Paths for: client development, data engineering, data science, machine learning, and web. Actually, there is a recent addition to the family: audio processing."
They noted a few problems -- they found ownership models problematic, or how each platform team can contribute their functionality and documentation to the single golden path product and tutorial (primitive to integration levels). What's the process, say, the observability group would take to contribute to the Golden Path in a cohesive way with the other groups?
They also found that it's essential to keep the Golden Paths short and fast to keep the engineer in a good workflow:
"Since the path is meant to represent the entire workflow they found they commonly receive the feedback that the path documentation is too lengthy. However, Niemen encourages shortening the actual path, not the documentation. By automating portions of the process, the step-by-step documentation can be simplified while still maintaining the overall outcome of the path."
Specificity is the strategy to shrink the size of the Golden Path. One idea Niemen states is to break down Golden Paths by use case, such as a Golden Path for testing, then the user can compose their own Golden Paths of sub-paths. Or to create Golden Paths per business product, which may not shorten it per se but will create a more specialized context for the engineers.
Maria Jernström and Jason Palmer, in "How We Improved Developer Productivity for Our DevOps Teams," give us a broader view of where Golden Paths fits into the more general developer experience:
"As product managers in the Platform Developer Experience (PDX) Tribe, part of Spotify's Technology Infrastructure Group, we focus on unlocking the creativity of engineers by building tools and establishing best practices that automate processes to make Spotify a true DevOps company. Doing so helps our teams experiment, learn, and launch features quickly....These tools give engineers the ability to focus on creating and running experiments to get data that helps determine if their idea had the desired effect."
This definition broadens what we learned from Niemen's article: an app startup or initial launch is part of a more extensive set of capabilities to empower experimentation.
There are two standards built into every Golden Path. The first is automating CI/CD with a tool Spotify created called "Tingle," though many companies use a source control tool like GitHub, triggers, and deployment tools like ArgoCD.
The other standard involves exposing specific data and using data as a signifier. They created something called the "test-certification program":
"Using a gamified experience, we encourage developers to subject their code to the appropriate tests. We also inform them when their code contains unreliable tests (also known as flaky tests). When a service fulfills the certification requirements, it automatically displays a badge next to the service. This informs users that the service is being maintained and follows best practices for quality controls. Additionally, we provide reports on build times, code coverage, and reliability of test suites to give developers insight on the quality of their code. From 2018, we've noticed that teams who have invested time on test certification saw a drastic drop in blocking bugs and reactive work."
What other data to improve the speed of experimentation should be exposed by default, like build time reports, or used as signifiers of standard adherence, like the test-certification program?
Spotify showed us several lessons. They showed us the goal is the speed of experimentation, and in the more extensive journey, app startup is just the first step. They showed us that establishing a straightforward contribution model among Platform teams contributing to Golden Paths is essential and that product groups will customize Golden Paths. They showed us that it is vital to expose data to empower teams to make decisions and use it as signifiers to adhere to platform standards (also relating to a Golden State).
Application Definitions: Intuit
Several companies didn't follow the template-based approach popularized by Backstage but instead used "Application Definitions":
"An application definition is an operational runbook that describes in code everything an application needs to be built, run, and managed."
The spec is open-source on GitHub, and a quote from their introduction states their perspective: "Developers think in terms of application architecture, not of infrastructure."
In this talk from Platform engineers at Intuit, they talk through how they came to decide on using application definitions:
"We wanted to take a methodical approach in understanding how we can actually solve the problems [with] existing tools that would also fit with our Intuit toolchain and our use cases.
The two choices we had were the Open Application Model, which suited our needs pretty well. Or we could go with a templating style model where you had to provide a bunch of input parameters. But there was also a lot of abstraction leaked into the application spec. So it was easy for us to go with an application OAM-style specification."
They then run it through Kustomize. Like Helm, the outputs are Kubernetes-native CRDs.
In their conclusion, they state: "Velocity and innovation of platform teams will improve with application abstraction." So, product teams will perform better if platforms don't leak into an application codebase. OAM is a standard that aims to accomplish this.
I included this because it gives insight into what a Golden Path application could look like. For example, nothing stops Spotify from having their Backstage templates produce repositories that follow OAM specifications.
Final Thoughts
At the end of the day, a Golden Path/Paved Path/Accelerator are all names to describe a standard to encapsulate, share, and customize automations among the development teams. Here’s what you should consider when designing your own:
The core unit of impact is the application itself. Identify the complex use cases engineers have from integrating multiple platform components. Examples are setting up an application, customizing application setup for specific business units, incident response and break glass, or specific testing, like chaos or load test automation.
Keep the interfaces consistent and cohesive, not just for the developers but for the Platform engineers building primitive capabilities. They are different interfaces but consistent for each respective group.
A major pitfall is that some platform teams will build automations or golden paths for a set of teams too rigidly without a clearly defined customization workflow. Per Spotify's advice, this goes for keeping the automations small as well. By keeping the automations small, the composability and customization problem has to be solved early.
Exposing data to understand a team's alignment toward standards is vital. On the one hand, the ability of the customer engineers to see "signifiers" or data about standards such as "test-compliance" or MTTR is vital to empower teams to meet standards that aren't easily or broadly applied. On the other hand, the Platform organization needs to supervise the applications for drift and adoption with clear strategies to mitigate. A team's success in aligning towards these standards can grant them extra freedom on the Platform, and accelerators feed that strategy.