GraphQL Beyond the Spec

Cover Image for GraphQL Beyond the Spec
John Masse
John Masse

I remember the first time I was introduced to GraphQL. It was during the first React Rally in 2015. I watched Lee Byron and Nick Schrock present what GraphQL was and the patterns behind creating GraphQL servers. I remember thinking about Lee Byron's presentation on the syntax for fetching data - feeling somewhat intimidated to learn another query language - so it was a little difficult to swallow at first. While I thought the query language was already pretty good, it was the server technology that grabbed my attention. I nearly stood out of my seat when they described how the resolution of any part of a GraphQL schema might come from any resource, with no restrictions. In addition, the GraphQL server specification packed in a ton of intelligent I/O juggling design decisions.

The vision I had

I have spent the most significant part of my time with software, writing it, specifically products designed for human interaction. A considerable amount of time spent creating interactive products is fetching data from one or many resources at a time. The process for fetching data from different resources can be inconsistent, incomplete, incoherent, and wasteful. Writing code to talk to these systems requires additional tests, error handling, and conversations with the people who built them on understanding the edge cases that need handling. The long-term maintenance of each connection requires manual intervention. Keeping up to date with my APIs meant every few months, I had to read the latest release notes and update my code to stay in sync with the decisions made by the teams actively changing these services to my applications are connected.

In the right hands, GraphQL could make a single source abstraction that represents any data my application might be concerned for their features. At the same time, the GraphQL server technology allows the backend team to shift application concerns around without requiring their clients to change anything about their work.

Working with more than one data source

Working with one API is manageable; heck, even two, three, or four APIs are tolerable. In my time, though, I have built products that required orchestration between dozens of services, with each feature at times requiring its unique implementation details - such as polling or streaming for updates. When planning for new integrations, I recall times when we first invested in exploring and learning about the API we planned to adopt. The discovery process usually takes several days. Time spent identifying API documentation, speaking with the API engineer (if I had access to them), identifying the pieces of information we needed to fulfill the functional requirements, spending time designing complex modeling associations so we can get everything lined up in just the right order to fulfill the requirements of the user experience.

The cost of this process to me was extraordinary. And while we always managed to find a way, I couldn't help but think there could be a different way to think about what it takes to do this type of work. It has always felt that we were limiting our view of the work to just one piece of the broader experience we were building. Why is it that isolation is preferred over collaboration when it comes to building software? How do we get the information we need, the way we need it to create the products and services we care about without making a new API for a tiny slice of an experience? In my mind, technologies like GraphQL ask us to think about our domain and relationships rather than individual capabilities. If these associations are made, then a user experience becomes an expression of the data. Today most software engineering is designed from a human perspective and organizational design - but not the domain it is trying to represent - Conway's Law.

The final point I will make about working with multiple data sources is that engineers often forget to take advantage of the asynchronous nature of programming. Asynchronous programming can be tricky to do in a maintainable manner. Debugging asynchronous programs also comes with unique challenges to overcome. Not inclusive of every use case possible, leveraging a platform like GraphQL provides means to leverage the power of asynchronous programming because promise-based resolution, batching, and caching are baked into the practice of building servers.

REST is about principles, not just shipping JSON.

In practice, I have found subscribing to many organizations' RESTful services tedious, inconsistent, and difficult or distracting to keep up to date. Additionally, I have seen weeks wasted in troubleshooting client and server contract issues - where we were working sample data structures, which later changed with no communication. Wasting time tracking down changes to JSON structures and communicating them back to the teams building the API creates frustration, confusion, and wasteful rework. REST is frequently misrepresented by organizations succeeding only in the making of JSON factories. At the same time, the principles of REST are strong enough to transcend JSON and could be applied to any means of transport. While the language to describe REST needs to be adjusted to fit the protocols it represents, the principles of REST are valid.

Weak contracts lead to unpredictability and confusion.

I like to think of myself as a programmer, but I often find myself keeping up with JSON documents that lack guarantees of what they will deliver. Nullability is a disaster of a problem, creating perhaps a third test case to contend with when working with an API that multiplies in management complexity since documentation quality is driven mainly by human behavior. Unfortunately, the default human behavior is to leave documentation to be automatically generated or omitted entirely. And in twenty-plus years of building and managing software, good documentation has been provided for only five percent of the projects I participated in. I have probably been paid more to spend time trying to grapple with complexities around API integrations than building great products.

I have seen hundreds of hours lost spent chasing after broken promises in API contracts. I have been woken up in the middle of the night to explain defects brought on by changes made to an API that a client had not been prepared to deal with. In my professional opinion, this process does not move ideas forward. Still, it keeps us in a position where we constantly hammer dents out of a brittle relationship between different pieces of software.

I know I am not the only one seeing this as an issue since "API documentation" has become a subject almost every programmer I meet laughs about.

Expensive migrations

I don't have actual numbers on how many API migrations happen each year, but one thing is sure: they can get expensive to get done from a labor and quality assurance perspective. Is there anything different about the new API that we need to consider? What features are required to keep the customer's expectations met (feature gaps)? Considering the relevant data to recreate the same experience is present in the new version of an API - how has the data model changed, and what does the client need to change to retrofit the requirements? Engineers and product people spend weeks, months even years planning, migrating, and testing these types of changes. Migrating a client to another API is difficult to get done well, let alone correct.

Understanding how clients are using the data provided by an API

Unless a suite of APIs is incredibly modular, I mean tightly scoped from a content perspective (small JSON parts that allow clients to assemble), then I see large one size fits all JSON payloads instead. Traditional JSON APIs ship JSON structures blindly in response to a valid request; however, no feedback loop is baked into the relationship between what an API ships and what the client is using. There may be some creative ways to get to know what a client of an API is consuming, sure. Still, in almost all cases, the only option is to ask the client to provide an inventory of the fields they are using - which can be incredibly tedious, time-consuming, and challenging to capture each detail. Many client patterns pass around the same model around their applications logic for convenience. While giving a single model around is convenient, it makes it very difficult to modify these things later since sight into the use cases becomes ambiguous.

In addition, a traditional REST controller will invoke every computational detail required to generate a static JSON payload for the sake of shipping the JSON payload, which means that even if a client consumes a single field in a hundred field object, the server will do the work for all one hundred fields. In practice, this is called "over-fetching," which is fair, but I think it does a disservice to the additional computational work. For example, imagine an API that returns fields from two different databases; traditional JSON APIs would invoke the work on both databases 100% of the time - while GraphQL, being declarative, allows for only the databases required to fulfill precisely the needs of the client(s).

By default, we want to auto-generate documentation

Deprioritizing good communication is evident with documentation. 95% of the engineers I speak with will tell me that they would prefer to auto-generate their documentation or schema from the code they wrote. While autogenerating things is convenient, this may also lead to a behavior of neglect. The option to auto-generate documentation is not one made from a quality perspective but one made as a convenience. What that means is an engineer can "say" they provided documentation but without intent. How often is the effectiveness of documentation measured and iterated?

Rarely.

Tools like Swagger are fantastic when used appropriately but are often in a broken or confusing state. When I am provided a Swagger document, the examples are confusing or impossible to assemble. When I can wrestle a valid request together, it fails to give me a proper response. Inevitably I wind up talking to the API programmer that then ships me a Google document or Slack message of URLs they use for their testing.

That is not to say documentation generated with GraphQL cannot fall into the same trap. However, the conversation over GraphQL's documentation is manageable since the schema is the contract, and discussions over individual fields and arguments are possible - we can be specific. In addition, most GraphQL practices suggest that the schema is designed with the needs of the client. Developing client first means that the features of a GraphQL API are representative of the conditions of the product first and nothing more. The client's first approach to API design ensures that only the most relevant information is generated and how the content is presented to users.

Backend for frontend contracts are brittle, layered, dilute the truth, invite inconsistency, and may grow to become wasteful to maintain

Backend for frontend is a term you may have heard before (BFF for short). Backend for frontend says if you are building a frontend application and need to deal with many data sources, then make an API that aggregates and shapes that information for the sake of meeting the need of each client.

I love backend for frontend since I have an appreciation for the encapsulation of concerns. With the backend for frontend model, we go ahead and build a single-purpose API. But what about when we have two clients, three or a dozen? Then the company needs to maintain the core APIs and client applications and now dozens or so of APIs whose sole purpose is to talk to other APIs for the sake of those clients. Then what happens when we want to roll out a new version of a core API? How do we roll out a common feature across all of the clients? In the best cases, I can connect my client(s) directly to the data source. Still, it is not that simple any longer because all of the backends for frontend applications need to be addressed first before the associated clients can do their work.

Why can't we have our cake and eat it too? If our core API met the needs to engage other humans in our products, why can't our core services be represented holistically? How many different perspectives are we talking about that requires so much additional labor to make new information readily available to the ones we want to take advantage of?

Why I invite teams to consider using GraphQL for their work

If you read this far, thank you. This article has primarily been an experiment for me to connect with folks on the technical subject of GraphQL without describing GraphQL's components. Each time I hear GraphQL described as a pass-through layer, a toy for frontend engineers or a mid-tier service orchestrator, a tiny piece of me dies inside because it shows that we have not yet connected with what could be possible. I would also like to add that GraphQL is not the only solution, but just one that I have experience with.

Additionally, not all practices are wasteful, and many others may not benefit from this level of consideration.