DNS Caching in NodeJS

Arnaud Tilbian
Arnaud TilbianJuly 28, 2025
#performance#node-js

While working on the Arte.tv infrastructure, we found that DNS lookups could be a hidden bottleneck, slowing down a critical layer. The good news: it's something we can easily fix. That's what this post is about.

Background

For the past few weeks, we've been seeing spikes of 504 Gateway Timeout errors coming from our backend-for-frontend (BFF). This key component of the Arte.tv infrastructure is called EMAC. It's a proxy for six services, and it can make dozens of API requests for a single frontend call.

emac bff

Even though the spikes are low-impact for now, EMAC is a single point of failure, so we couldn't ignore it. We turned to our monitoring tools to dig into the root cause and fix it.

Thanks to NewRelic (a tool we use to monitor and troubleshoot apps), we found that DNS resolution can significantly slow down response times and sometimes becomes a system bottleneck.

Here's an example of a particularly slow request caused by DNS resolution:

Very slow DNS

And here's a graph comparing the number of DNS lookups from EMAC to other services:

EMAC doing much more DNS requests than all other NodeJS services

During the observed peak hour, EMAC can make up to 3,000 DNS lookups per minute. That was enough for us to take a closer look at how DNS works in NodeJS—and how we could improve it.

DNS in a Nutshell

The Domain Name System (DNS) lets us enter readable domain names (e.g., https://lapoulequimue.fr/) and get IP addresses (e.g., 46.105.57.169) in return.

DNS requests usually go through the ISP's DNS server, which then queries other DNS servers in a hierarchy until it finds the right IP.

DNS servers tree

DNS servers return records with a name, TTL (time to live), type, and data. The most relevant type for us is the A record:

Domain nameTime to liveRecord typeIPv4 address
arte.tv.300A212.95.74.37

This record says that the arte.tv domain points to 212.95.74.37, and that any client should cache it for 300 seconds.

When your app makes an HTTP request, the DNS resolution happens behind the scenes using the system's built-in functions.

You can test it using the dig command on Linux:

$ dig arte.tv

; <<>> DiG 9.16.1-Ubuntu <<>> arte.tv
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29724
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;arte.tv.                       IN      A

;; ANSWER SECTION:
arte.tv.                8739    IN      A       212.95.74.37

;; Query time: 9 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Fri Apr 04 10:49:32 CEST 2025
;; MSG SIZE  rcvd: 52

You can get a shorter output using the +short parameter:

dig +short arte.tv
212.95.74.37

Now you can impress your friends by pretending you control the Matrix.

By default, dig uses the DNS server listed in /etc/resolv.conf, but you can target a different one:

dig arte.tv @8.8.8.8 # Google Public DNS

You can also override name resolution locally using /etc/hosts:

127.0.0.1       localhost

Tip: You can run your own DNS server using Unbound to improve privacy and browsing speed. This is what Pi-hole does to block ads and trackers.

DNS in NodeJS

Popular NodeJs data fetching libraries like Axios or node-fetch rely on the HTTP and HTTPS modules. In these modules, domain names are resolved via the dns.lookup NodeJS function, which relies on the operating system's DNS resolver.

const dns = require('node:dns');
const options = {
  family: 6,
  hints: dns.ADDRCONFIG | dns.V4MAPPED,
  all: true,
};
dns.lookup('example.org', options, (err, addresses) =>
  console.log('addresses: %j', addresses));
// addresses: [{"address":"2606:2800:21f:cb07:6820:80da:af6b:8b2c","family":6}]

The problem is that dns.lookup is synchronous (despite using a callback) and blocks a thread in the libuv threadpool. With only four threads available by default, a few slow DNS requests can negatively impact unrelated parts of an application.

NodeJS provides an alternative dns.resolve function, which fetches DNS records without blocking the threadpool. Yet, most libraries don't use it by default and I, at the time of writing, don't know why.

Now that we understand what can impact EMAC's performance, let's explore a potential solution.

DNS Caching

dns.lookup relies on the OS resolver, which already caches DNS records for a limited time (defined by the TTL). This means that if you make multiple requests to the same domain within the TTL, the OS will return the cached result without performing a new lookup.

However, EMAC handles so many DNS requests that the libuv threadpool can become overwhelmed, slowing down the entire NodeJS process. This leads to slower API responses from EMAC, which can cascade into bigger performance problems. We have a beautiful snowball effect here.

Adding another layer of caching at the application level makes sense. It's quick to implement because it requires no infrastructure changes, and it could have a great impact. Plus, many of services called by EMAC share the same domain, so we could expect a lot of cache hits.

There are several libraries for DNS caching in NodeJS:

We chose cacheable-lookup. It respects TTLs, works with higher-level HTTP libraries (like Axios), and is widely used.

Warning: If your DNS TTLs are too long and your server IPs change, you're in trouble. Make sure those records are properly configured.

Using cacheable-lookup To Cache DNS Lookups

Using cacheable-lookup is simple. First, install it:

yarn add cacheable-lookup

Then, integrate it with all HTTP and HTTPS agents. We added a feature flag so we can toggle DNS caching on or off easily:

import CacheableLookup from 'cacheable-lookup';

if (config.getProperties().cacheDNSLookups) {
    const cachedDns = new CacheableLookup();
    cachedDns.install(http.globalAgent);
    cachedDns.install(https.globalAgent);
}

This overrides the default dns.lookup with a cached version. If there's a cached entry, it uses that. Otherwise, it performs a lookup and caches the result based on the TTL. If nothing is returned, it falls back to the original dns.lookup.

Tip: Application-level DNS caching isn't the only option. You could hardcode IPs, use virtual IPs, modify /etc/hosts, run a shared DNS cache outside your app, and so on.

An Unexpected Result

Right after release, the performance metrics started looking amazing:

A great success

Maybe too amazing. The number of DNS lookups dropped to zero. But even with long TTLs, each EMAC pod (we have many) should make at least a few lookups before caching kicks in.

The reason: the metric we were using (calls to dns.lookup) was no longer being triggered. Our custom resolver bypassed it entirely. So now we need better metrics to measure real DNS activity.

At this time, we are unable to monitor the replacement of the dns.lookup function from the cacheable-lookup package using NewRelic. We are therefore waiting for the infrastructure team to expose metrics at the DNS resolver level, which should give us more information about the impact of our changes.

Conclusion

For a backend-for-frontend like EMAC, making thousands of requests to the same domains every minute, DNS caching is a no-brainer.

It won't solve all timeout errors, but it sure will help to make EMAC more resilient and responsive.

As we've seen, it's easy to implement in NodeJS. Just keep in mind: caching at the application level is one of many possible solutions. Infrastructure-level caching could be another path if you need to do it widely. In any case, make sure you have strong observability in place at the application and infrastructure layers, so you can accurately detect, understand, and fix latency and failure issues.

Did you like this article? Share it!