DNS Caching in NodeJS

While working on the Arte.tv infrastructure, we found that DNS lookups could be a hidden bottleneck, slowing down a critical layer. The good news: it's something we can easily fix. That's what this post is about.
Background
For the past few weeks, we've been seeing spikes of 504 Gateway Timeout
errors coming from our backend-for-frontend (BFF). This key component of the Arte.tv infrastructure is called EMAC. It's a proxy for six services, and it can make dozens of API requests for a single frontend call.
Even though the spikes are low-impact for now, EMAC is a single point of failure, so we couldn't ignore it. We turned to our monitoring tools to dig into the root cause and fix it.
Thanks to NewRelic (a tool we use to monitor and troubleshoot apps), we found that DNS resolution can significantly slow down response times and sometimes becomes a system bottleneck.
Here's an example of a particularly slow request caused by DNS resolution:
And here's a graph comparing the number of DNS lookups from EMAC to other services:
During the observed peak hour, EMAC can make up to 3,000 DNS lookups per minute. That was enough for us to take a closer look at how DNS works in NodeJS—and how we could improve it.
DNS in a Nutshell
The Domain Name System (DNS) lets us enter readable domain names (e.g., https://lapoulequimue.fr/) and get IP addresses (e.g., 46.105.57.169) in return.
DNS requests usually go through the ISP's DNS server, which then queries other DNS servers in a hierarchy until it finds the right IP.
DNS servers return records with a name, TTL (time to live), type, and data. The most relevant type for us is the A record:
Domain name | Time to live | Record type | IPv4 address |
---|---|---|---|
arte.tv. | 300 | A | 212.95.74.37 |
This record says that the arte.tv domain points to 212.95.74.37, and that any client should cache it for 300 seconds.
When your app makes an HTTP request, the DNS resolution happens behind the scenes using the system's built-in functions.
You can test it using the dig
command on Linux:
$ dig arte.tv
; <<>> DiG 9.16.1-Ubuntu <<>> arte.tv
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29724
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;arte.tv. IN A
;; ANSWER SECTION:
arte.tv. 8739 IN A 212.95.74.37
;; Query time: 9 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Fri Apr 04 10:49:32 CEST 2025
;; MSG SIZE rcvd: 52
You can get a shorter output using the +short
parameter:
dig +short arte.tv
212.95.74.37
Now you can impress your friends by pretending you control the Matrix.
By default, dig
uses the DNS server listed in /etc/resolv.conf
, but you can target a different one:
dig arte.tv @8.8.8.8 # Google Public DNS
You can also override name resolution locally using /etc/hosts
:
127.0.0.1 localhost
Tip: You can run your own DNS server using Unbound to improve privacy and browsing speed. This is what Pi-hole does to block ads and trackers.
DNS in NodeJS
Popular NodeJs data fetching libraries like Axios or node-fetch rely on the HTTP and HTTPS modules. In these modules, domain names are resolved via the dns.lookup NodeJS function, which relies on the operating system's DNS resolver.
const dns = require('node:dns');
const options = {
family: 6,
hints: dns.ADDRCONFIG | dns.V4MAPPED,
all: true,
};
dns.lookup('example.org', options, (err, addresses) =>
console.log('addresses: %j', addresses));
// addresses: [{"address":"2606:2800:21f:cb07:6820:80da:af6b:8b2c","family":6}]
The problem is that dns.lookup
is synchronous (despite using a callback) and blocks a thread in the libuv threadpool. With only four threads available by default, a few slow DNS requests can negatively impact unrelated parts of an application.
NodeJS provides an alternative dns.resolve
function, which fetches DNS records without blocking the threadpool. Yet, most libraries don't use it by default and I, at the time of writing, don't know why.
Now that we understand what can impact EMAC's performance, let's explore a potential solution.
DNS Caching
dns.lookup
relies on the OS resolver, which already caches DNS records for a limited time (defined by the TTL). This means that if you make multiple requests to the same domain within the TTL, the OS will return the cached result without performing a new lookup.
However, EMAC handles so many DNS requests that the libuv threadpool can become overwhelmed, slowing down the entire NodeJS process. This leads to slower API responses from EMAC, which can cascade into bigger performance problems. We have a beautiful snowball effect here.
Adding another layer of caching at the application level makes sense. It's quick to implement because it requires no infrastructure changes, and it could have a great impact. Plus, many of services called by EMAC share the same domain, so we could expect a lot of cache hits.
There are several libraries for DNS caching in NodeJS:
We chose cacheable-lookup
. It respects TTLs, works with higher-level HTTP libraries (like Axios), and is widely used.
Warning: If your DNS TTLs are too long and your server IPs change, you're in trouble. Make sure those records are properly configured.
Using cacheable-lookup
To Cache DNS Lookups
Using cacheable-lookup
is simple. First, install it:
yarn add cacheable-lookup
Then, integrate it with all HTTP and HTTPS agents. We added a feature flag so we can toggle DNS caching on or off easily:
import CacheableLookup from 'cacheable-lookup';
if (config.getProperties().cacheDNSLookups) {
const cachedDns = new CacheableLookup();
cachedDns.install(http.globalAgent);
cachedDns.install(https.globalAgent);
}
This overrides the default dns.lookup
with a cached version. If there's a cached entry, it uses that. Otherwise, it performs a lookup and caches the result based on the TTL. If nothing is returned, it falls back to the original dns.lookup
.
Tip: Application-level DNS caching isn't the only option. You could hardcode IPs, use virtual IPs, modify /etc/hosts
, run a shared DNS cache outside your app, and so on.
An Unexpected Result
Right after release, the performance metrics started looking amazing:
Maybe too amazing. The number of DNS lookups dropped to zero. But even with long TTLs, each EMAC pod (we have many) should make at least a few lookups before caching kicks in.
The reason: the metric we were using (calls to dns.lookup
) was no longer being triggered. Our custom resolver bypassed it entirely. So now we need better metrics to measure real DNS activity.
At this time, we are unable to monitor the replacement of the dns.lookup
function from the cacheable-lookup
package using NewRelic. We are therefore waiting for the infrastructure team to expose metrics at the DNS resolver level, which should give us more information about the impact of our changes.
Conclusion
For a backend-for-frontend like EMAC, making thousands of requests to the same domains every minute, DNS caching is a no-brainer.
It won't solve all timeout errors, but it sure will help to make EMAC more resilient and responsive.
As we've seen, it's easy to implement in NodeJS. Just keep in mind: caching at the application level is one of many possible solutions. Infrastructure-level caching could be another path if you need to do it widely. In any case, make sure you have strong observability in place at the application and infrastructure layers, so you can accurately detect, understand, and fix latency and failure issues.