Dive Into GraphQL Part V: Should You Use GraphQL?

François ZaninottoSeptember 08, 2017

#graphql

After learning GraphQL in practice with the third and fourth tutorials of the Dive into GraphQL series, it's time to make a decision. How good is the developer experience? What's the return on (learning) investment? In which cases is GraphQL relevant, and when is it overkill? Should you use GraphQL for your next project?

Here is the opinion of the marmelab team, based on the feedback accumulated from developing several GraphQL projects.

Frontend Developers Love It

Frontend Developers who built mobile apps with React and Apollo ask for more. They absolutely love the Apollo Client Developer Tools, the React HOC, the ability to get just what they want from the backend, and to request aggregates of data in a single call. From their point of view, GraphQL is definitely more powerful than REST.

The GraphQL stack enables them to prototype quickly, even when the server isn't ready (thanks to mocking tools). It enables them to iterate quickly, too, because of the stability of the entire stack. Besides, GraphQL only covers the data fetching needs, and doesn't require to change their set of tools for yet another full-featured framework.

And most important, GraphQL lets them develop fast applications, because these applications make less requests to the API, and have a local cache. It lets them build optimistic UIs, regardless of the complexity of the data synchronization problems it implies. It lets them build real-time scenarios, without messing up with the WebSocket API.

On a related subject, GraphQL hides all the HTTP complexity to frontend devs. No more trial-and-error tweaks on the HTTP headers, content-type, return codes, or the half finished fetch API. They just ask for data, and they get what they asked for - no more, no less.

GraphQL presents a short learning curve for frontend devs - about half a day for the basics, and another day to tackle the optimistic UI and store manipulations. And if they have already used Redux in the past, and they understand the idea of a central state for the entire application, the learning curve is even flatter.

The only drawbacks of frontend GraphQL - unit testing tooling and batched queries support - are manageable, because they concern tasks that developers seldom address with REST anyway.

So from a frontend developer's perspective, GraphQL is all wins. To my opinion, that means that Facebook's objective is reached to a degree beyond their expectations. GraphQL moves the center of gravity of APIs towards the frontend, empowers frontend developers, and frees them from backend implementation details.

Note: I can't speak for native frontend developers (iOS and Android), but feel free to share your feedback in the comments!

Backend Developers Like It, Too

Backend developers are a bit more nuanced about their GraphQL experience. For them, the learning curve is a bit more tedious (mostly the first day). The good news is, if you've read this series so far, you've already learned 80% of what a GraphQL backend developer needs. Once the server is bootstrapped, the development speed is equivalent or slightly faster than for REST APIS. The Financial Times devs, for instance, built a GraphQL API on top of their REST API in 3 days.

Resolvers are dead simple to write and test, especially since it's not necessary to test them through HTTP. And as soon as they become larger, it's time to move the logic down to backends. Remember: GraphQL should be used as a thin layer on top of data backends. It's an implementation of the API Gateway pattern, which has become a good practice anyway.

GraphQL servers are no faster than REST servers - they may even be a bit slower, due to the latency added by the gateway. For queries with embedded relationships, parallel resolver execution and DataLoader keep execution speed at a reasonnable level.

However, as GraphQL hides the backend implementation to frontend developers, frontend devs tend to create uterly complex queries that can hit several backends and be super expensive. So backend developers must take extra care of the server monitoring, log the slow queries, use DataLoader, and limit the allowed complexity of queries. All this requires more development, as the backend tooling is not as sophisticated as the frontend packages.

Speaking of tooling, and while GraphQL servers in Node.js are relatively easy to write, it's not the same for all languages. In PHP for instance, the Graphql package for Symfony hasn't been updated in 2 years. So if you're not using Node.js, be prepared for some reverse engineering and debugging of your tools.

Note: How about hypermedia? If you've fought for HATEOAS during the past years, you probably understand now that it's a lost battle. Web developers don't need to learn how to get from one resource to the other by reading an API response. Their API must let them get from one resource to the other by clever design. GraphQL has no built-in hypermedia, because we simply don't need it.

Side effect: It's Great for Collaboration

In my opinion, the biggest change that GraphQL brings is not for frontend or backend devs, but between them. GraphQL makes collaboration more fluid in API-centric environments. It maximizes productivity, thanks to conventions and tooling.

For instance, backend developers don't need to design their API for UI components. They expose the data from the domain in a structured way, and let the frontend devs pick what they want. On the other hand, frontend developers don't need to ask backend developers for modifications each time they add a new control - they can experiment with custom queries first. But paradoxically, the fact that they don't need to talk to each other makes them talk more, because it's no longer a pain point.

That's also because of the emphasis that GraphQL puts on the schema. The schema first development, inspired by DDD and strong typed languages, forces frontend and backend devs to speak about the important things early, leaving the less important things (the implementation) for later.

If you've ever developped based on a Word document for API specification, you know how this makes frontend devs feel disconnected. On the contrary, the GraphQL schema gives the big picture, and makes the API contract explicit. GraphiQL provides not only an interactive documentation, but also a playground that lets developers communicate with actual data. Onboarding new developers is faster, too.

Also, it's liberating for a backend developer to be able to make changes to an API without breaking the clients. I mean, to make evolutionary changes, i.e. adding new fields (never changing or removing one). GraphQL offers backward & forward compatibility, which means no breaking changes, ever.

Finally, GraphQL works across many languages. It can become the lingua franca for frontend and backend developers who use diffrent stacks - say PHP for the backend and Swift for the frontend.

GraphQL Code Is Maintainable

RESTful web servers are often modular in code, because they use a resource-based architecture. This modularity does not come out of the box for GraphQL, but if you follow the tips from part III of this series (splitting resolvers and schemas by domain), your server-side code will be maintainable, too. In client-side code, the GraphQL logic can easily be isolated and detached into Higher Order Components.

Besides, most of the popular JavaScript tools for GraphQL use a functional approach. The Apollo and Facebook teams built these tools based on the latest best practices of web development, with maintenability in mind. To use GraphQL, developers write single-purpose, pure, and testable functions - a future-proof architecture in my opinion.

As a project grows, developers must sometimes refrain from using dynamic queries ; even though it's a nice shortcut, it's dangerous from a maintenance point of view. Fragments, directives and variables binding unlock the ability to use static queries, and to make queries reusable. This removes the need for a query builder API, which would open the door to ORMs, and then to an ocean of complexity. However, maintaining query blobs has a cost - just think about the SQL blobs in your past projects. But no other alernative come free of charge.

The Ecosystem Is Booming

It's a pleasure to see so many people dive into GraphQL, and share their (mostly positive) experience. That's another sign that GraphQL is not just a fling: the number of new tutorials and open-source libraries published every week increases constantly. If you're interested in the field, I advise you to subscribe to GraphQL Weekly to receive news and tutorials about GraphQL.

In particular, the Meteor team have contributed a huge amount of libraries and documentation via the Apollo project. They are responsible for a good share of the GraphQL popularity these days. They are also pushing GraphQL in new directions, for instance with graphql-anywhere, which lets you filter or normalize a complex JSON to get only the fields you want using a GraphQL query, but without a schema. It's kind of like an ETL, except smarter.

We've listed many tools already, but grouping them by editor tells a story:

Facebook: GraphQL specification, Relay, graphql-js, express-graphql, GraphiQL, DataLoader
Apollo: apollo client, react-apollo, apollo-ios, apollo-android, apollo server, graphql-tools, Optics, etc.
GraphCool: GraphCool, graphql-up, graphql-cli
Marmelab (the new kids on the block!): json-graphql-server, admin-on-rest-graphql

You can find many more tools and services in the awesome-graphql list.

A growing ecosystem is a sign of good health. It's a good indication that GraphQL goes in the right direction, and that the GraphQL vision gets everyone to agree.

But It's Still Young

To moderate the enthusiasm a bit, I must admit that everything isn't perfect in GraphQL land. Firstly because it's so young. Facebook made the first public release mid 2015, and it's been production ready since September 2016 only. The subscriptions specification and implementation are even youger.

This causes 4 major consequences:

Many tutorials on the Internet are outdated, because the syntax or the practices changed quite a lot since the early release. I read countless posts explaining how to build a schema with the graphQL AST language (GraphQLObjectType), or how to colocate resolvers with the schema. It's easy to get lost by reading that.
There are few GraphQL APIs in production at scale. GitHub, Yelp, [The Financial Times](https:// www.youtube.com/watch?v=S0s935RKKB4), and Coursera are the most cited, but the list of public GraphQL APIs isn't that impressive (and doesn't even list Facebook). That means that most of the tutorials come from pet projects. Real-life scenarios force different practices (some of which I've tried to share in this series) than the ones Google will show you.
The feverish blooming of new tools sometimes leads to duplicated efforts, and confusion for developers. At the time of writing, two competing libraries fight for the client side supremacy (Apollo and Relay). Good luck figuring the winner.
The libraries for most languages (apart JavaScript) are lagging in features and stability. When you step outside of JavaScript, GraphQL is a dangerous place.

These problems will vanish with time, but for now it's frustrating to have to dig into the code of an open-source library because of lacking documentation, or abandoned code. The situation recently improved a great deal with the publication of HowToGraphQL, an excellent full-stack tutorial sharing a lot of good practices.

For loose ends in the specification, the community hasn't settled on a set of good practices yet. Every developer has to reinvent the right way to develop pagination, filtering, metadata, and more importantly authentication. API design is still a pain: should I name my query Tweets, allTweets, or getPageOfTweets? Batch queries are not standardized. This lack of common ground will persist until a few industry leaders with large GraphQL APIs share their best practices. In the meantime, I advise you to follow the example of popular APIs - we took GraphCool for model when designing the API of json-graphql-server.

Logging Is Hard

Let's be objective: GraphQL also introduces new problems that you never had with REST.

For instance, looking for a faulty request in HTTP server logs used to be easy with REST:

GET  /tweets/12 200 OK
GET  /tweets?search=lorem&page=1 200 OK
GET  /profiles/francoisz 500 Internal Server Error
POST /tweets/ 200 OK

With GraphQL, HTTP logs look like the following:

POST /graphql 200 OK
POST /graphql 200 OK
POST /graphql 500 Internal Server Error
POST /graphql 200 OK

Good luck finding the faulty query here, unless you've instrumented your code with extensive logging. Alternately, you can use GET instead of POST for all queries. But the URL encoding makes the logs still very hard to parse:

GET /graphql/?query=query%7B%20Tweet%20(id%3A%2012)%20%7Bid%20body%20date%20%7D%7D 200 OK
GET /graphql/?query=query%7B%20Tweets%20(page%3A%201%2C%20search%3A%20%22lorem%22)%20%7B%20id%20body%20date%20%7D%20%7D 200 OK
GET /graphql/?query=query%7B%20Profile%20(name%3A%20francoisz)%20%7B%20username%20first_name%20last_name%20avatar_url%20%7D%20%7D 500 Internal Server Error
GET /graphql/?query=query%7B%20createTweet%20(body%3A%20%22Hello%2C%20World!%22)%20%7B%20id%20body%20date%20%7D%20%7D 200 OK

So let me repeat a tip for the third time in this series: Always name your queries. A named query looks like this:

query MyQueryName {
    Tweets (id: 12) {
        id
        body
        date
    }
}

Once URL-encoded, the query name MyQueryName appears early in the URL, which makes it easy to spot:

GET /graphql?query=query%20MyQueryName%20%7B%20Tweets%20(id%3A%2012)%20%7B%20id%20body%20date%20%7D%20%7D 200 OK

Still, it only exacerbates another shortcoming of GraphQL versus REST: GraphQL queries are much more verbose than their REST equivalent. This information overload burdens the developer's mind. The simplicity of REST made data fetching easier to reason with.

Let me briefly mention another solution to the logging problem, called Persisted queries. I'll describe it in detail in a few minutes. In the meantime, please bear with me while I expose more GraphQL drawbacks.

Server Side Caching is Hard, Too

While server-side caching was a simple task in REST thanks to the HTTP Cache-Control header and reverse proxies, GraphQL makes it hard.

With GraphQL, requests are in POST by default, so reverse proxies like Varnish generaly don't cache them. You can require that the clients use GET for all requests, but it's not ideal, because mutations will get cached, too. So whether you use GET or POST, you will need to open the Varnish documentation, and write specific cache rules depending on queries types, names, and sometimes variables. In short, caching GraphQL with a reverse proxy is not straightforward.

Facebook developers suggest that the solution is to cache data below GraphQL rather than above it, i.e. put a cache layer between the GraphQL gateway and a REST backend. They suggest using DataLoader for that. The problem is that this cache doesn't save the overhead of query parsing. And since the GraphQL gateway is the main entry point, not being able to put it behind a cache transforms it to a Single Point of Failure.

Considering how GraphQL can become central in the server architecture, the lack of consensus on performance tooling is especially concerning.

It turns out that persisted queries, which I mentioned earlier, also solves the caching problem somehow. We'll see that soon.

GraphQL Opens New Attack Vectors

Last but not least, security is more a concern with GraphQL than with REST, due to easy introspection and Denial of Service attack risks.

The introspection abilities that allow GraphiQL to suggest-as-you-type also allow attackers to determine the entire attack surface of your API in one query. In production, in addition to disabling GraphiQL, don't forget to disable introspection queries, too. For Node.js servers, the graphql-disable-introspection package does just that.

Also, GraphQL makes it very easy to forge resource intensive queries, e.g. Tweet { Author { Tweets { Author { Tweets { Author { Tweets } } } } } }. Using this technique, an attacker can overload a GraphQL server in a single query. How to protect against this risk? By detecting timeouts in backends, by computing query depth and / or complexity, and by throttling or rejecting overly complex queries.

And even if the strong typing guarantees against GraphQL injections, if you use SQL in the backend, you're still vulnerable to SQL injection attacks.

The security risks translate into additional costs: to protect a GraphQL server from these attacks, a team must spend at least a few more days of development.

You know the best way to protect a GraphQL server from DoS attacks? Persisted queries. All right, let's see what these guys are.

Persisted Queries Are A Double-Edged Sword

Harder logging, caching, and DoS protection can all be solved with Persisted Queries. In short, that means that graphQL clients should send query identifiers instead of query documents over HTTP:

// replace
POST /?variables={"id":123} HTTP 1.1
Host: http://graphql.acme.com/
Content-Type: application/graphql
query getTweet($id: ID!) {
    Tweet(id: $id) {
        id
        body
        date
    }
}

// with
GET /?id=234&variables={"id":123} HTTP 1.1
Host: http://graphql.acme.com/

To make that possible, both the client and the server must be able to map query documents (like query getTweet($id: ID!) { Tweet(id: $id) { id body date } }) to query ids (like 234). To do so, the client and the server use a common dictionary. This file must be generated by a CLI tool, based on the GraphQL query documents found in the source code.

For instance, to add persisted queries support to a GraphQL server written in Node.js, add the following middleware:

import queryMap from ‘../extracted_queries.json’;
import { invert } from 'lodash';
app.use(
  '/graphql',
  (req, resp, next) => {
    if (config.persistedQueries) {
      const invertedMap = invert(queryMap);
      req.body.query = invertedMap[req.body.id];
    }
    next();
  },
);

Persisted queries require special tooling and compile-time optimizations both on the frontend and on the backend side. The tooling is ready and easy to use (at least in JS land), see the persistgraphql tutorial. One notable consequence is the addition of Webpack in the backend toolchain, which may slow down development a bit.

With persisted queries, requests parameters contain not a query, but a query identifier. Queries not present in the dictionary are rejected, which removes the DoS risk. Request logs are easy to parse, provided you have the query dictionary close by. And caching rules are equally easy to write.

Persisted queries solve most GraphQL shortcomings, at the cost of a few days of development. But they also add hidden costs that you shouldn't underestimate. Using persisted queries means that frontend and backend developers must share not only the schema, but also all the queries. Frontend developers can't add new queries on their own anymore. So persisted queries seriously complicate the deployment of GraphQL apps, since the frontend and backend apps must now be synchronized.

So persisted queries come with a cost, a relatively high cost.

GraphQL Doesn't Deal With File Uploads

One pain point that persisted queries don't solve, surprisingly, is file uploads. The GraphQL spec does not address this requirement whatsoever. It's still possible, but a bit hacky.

For instance, to upload a picture in a createTweet mutation, I use a good old multipart HTTP request. On the server side, in the context initialization function, I grab the image from the request, and make it available in the GraphQL context. In Node.js, the multer package helps deal with multipart requests:

const multer = require('multer');
const upload = multer({ storage: multer.memoryStorage() });
var app = express();
app.use('/graphql', upload.single('image'), graphqlHTTP(req => ({
    schema: schema,
    graphiql: true,
    context: { image: req.file }),
})));

The resolver for the createTweet mutation can now grab the image data from the context rather than from the query parameters:

// in src/tweet/resolvers.js
exports.Mutation = {
    createTweet: (_, { body }, context) => {
        const imageBinary = context.image;
        // now do what you want with the image
    }
}

Of course, it's even easier if the client handles file upload via a third-party service, and simply includes an URL in the GraphQL mutation, rather than the full image.

Some corner cases may reveal a few more pain points of GraphQL, but that's the gist.

You May Not Need GraphQL

As explained in the first post of this series, GraphQL helps overcome the flaws of REST. But in many cases, REST is more than enough. And adding a new technology to your development stack isn't neutral. So before choosing to go with GraphQL, make sure you really need it.

To help readers make up their mind, I've prepared a short survey. Pick the response to each question in a candid way, and you will see whether you actually need GraphQL or not for your next project.

Question	Answer A	Answer B
Architecture is API-centric	No	Yes
Client Platform	Desktop only	Desktop and mobile
Frontend Performance matters	No	Yes
Frontend deployment frequency	Yearly	Monthly or Weekly
API Size	Small	Moderate to large
API durability	A few months	Several years
API complexity	Small	Moderate to large
Already have a working REST backend	Yes	No
Number of backends	One	More than one
Complex caching rules	Yes	No
Team size	1 to 4 developers	At least 5 developers
Frontend and backend teams situation	Siloed	Colocated
Team is Fluent in modern JavaScript	No	Yes
Preferred language	Loosely typed	Strongly typed

Is the architecture API-centric?
- A: No
- B: Yes
What's the client platform?
- A: Desktop only
- B: Desktop and mobile
Does frontend performance matter?
- A: No
- B: Yes
What's the frontend deployment frequency?
- A: Yearly
- B: Monthly or Weekly
What's the API Size?
- A: Small
- B: Moderate to large
How long will the API last?
- A: A few months
- B: Several years
What's the API complexity?
- A: Small
- B: Moderate to large
Do you already have a working REST backend?
- A: Yes
- B: No
How many backends do clients request?
- A: One
- B: More than one
Do you have complex caching rules?
- A: Yes
- B: No
How large is your tech team?
- A: 1 to 4 developers
- B: At least 5 developers
What's the frontend and backend teams situation?
- A: Siloed
- B: Colocated
Is the team fluent in modern JavaScript?
- A: No
- B: Yes
What's your preferred language?
- A: Loosely typed
- B: Strongly typed

If you have a majority of B answers, it's a good idea to investigate more about GraphQL for your project. Otherwise, REST is good enough for your case, don't waste time and money adding a new layer of complexity to your architecture.

There is one good reason for choosing GraphQL that I haven't mentioned yet, even though it's probably the most important.

Attract Talent

GraphQL is hot. Developers around the world see it as a revolution (see for instance Facebook just taught us all how to build websites). They believe in this cargo cult so hard that they learn GraphQL in their free time. They would do anything to work on a real world GraphQL project. Besides, it's very hard to find people who don't like GraphQL.

Did you notice how this article explains that developers love GraphQL, rather than explaining that their bosses like GraphQL? In the web and mobile development industry, developers are the rare resource. Attracting and keeping talents is the most important role of CTOs.

Do the math: GraphQL is a great way to attract talented developers. That's why Facebook and Apollo and other companies communicate so much about it. Otherwise, why do you think people like me would spend days writing a series of tutorials about GraphQL? By the way, shameless plug: come and work at marmelab, we'll do great stuff together - including GraphQL.

Making The Switch

You don't need to go all in with GraphQL. The switch can be gradual.

For existing apps, see GraphQL as an optional layer that frontend apps can choose to use on an opt in basis. For instance, a mobile Twitter app can use GraphQL for the Tweets page, but a REST API for the user profile. So the migration path is soft, and it's always reversible.

For new projects, it's more of a clear-cut decision: Either you go with GraphQL, or with REST. Although if you can't decide, write a GraphQL API first, then a REST api on top of it ;)

Conclusion

It took a incredibly long time to answer this simple question: Should you dive in and use GraphQL? Longer than I expected, really. But I'm confident that with all the elements we've shared in this series, architects, developers, and CTOs alike can make an informed decision on the subject.

And my opinion is that most of the time, the answer should be: Take the leap! For existing projects and new developments, if the architecture is API-centric, and some of the usage is mobile, GraphQL is a good fit. You may not need GraphQL immediately, but you will probably use it in the future. GraphQL is here to stay, and it's a big deal. GraphQL is a fabulous idea, it's a great piece of technology, and a booming community.

But GraphQL does come with its own shortcomings. Don't expect a bed of roses, and be prepared for the extra complexity and longer development times.

For us at Marmelab, I made the decision. Our next customer projects will use GraphQL!

If you liked this article, please tell your friends about it. You may also be interested in the other blog series we've written:

A Lean Startup Adventure (18 posts)
The Blockchain Explained to Web Developers (3 posts)

Did you like this article? Share it!