Java DNS Lookups

DNS lookup performance is not something Java developers typically worry about, but in some cases, it can become a real gotcha.?? Allow me to share my little tale of woe with you.If you didn't know by now, the default behavior of the Java runtime is to cache DNS lookups for the life of the JVM.?? This may be OK for some applications, but if you have a long running process, and an administrator 'moves' a service via a DNS change, your application will never know.?? This is almost never acceptable in my experience for typical deployment scenarios.?? You can disable this behavior by setting the system property networkaddress.cache.ttl to a value indicating the number of seconds to cache successful DNS lookups. See the javadoc for java.net.InetAddress for more details.I ran into a situation at work recently where, in an ill-conceived effort to load balance access to a remote resource via round-robin DNS, the cache time was set to 0, effectively requiring a DNS lookup for every connection attempt.?? This wasn't immediately horrible when used on the LAN near the DNS server.?? The trouble began when we deployed applications across the WAN, and the remote ends of the WAN had no local DNS server.?? Latency for a roundtrip from the remote servers to the datacenter housing the DNS service was approximately 80ms.?? You might expect an added cost of just slightly more than 80ms per connection, but it was worse than that.Our first indication of trouble was a connection pool that would randomly fail a connection attempt.?? The connection pooling mechanism had a connection timeout functionality that would fail the attempt if it took over a certain time, by default, 5 seconds.?? Connection attempts typically would take around 150-300ms or so to be created, fairly typical -- but the occasional spikes were alarming.?? We had a couple developers spend hours pouring over the home-grown connection pooling mechanism looking for threading issues or anything that could have caused the problem, in concert with forcing our TechOps team to spend time investigating potential network issues.?? In the end, it turned out to be none of the usual suspects.An average lookup via InetAddress.getByName() was taking ~480ms -- significantly longer than expected.?? A network trace revealed that DNS lookups were returning quite fast, but there were more queries than expected for each lookup.?? It was then that the realization hit us.?? The Java runtime was spending a lot of time looking for IPv6 addresses (AAAA records) that are not used within our network.?? And it was even worse than that.?? Here's how a typical lookup would go:
  1. AAAA lookup for db.mycorp.com (FAILED) ~80ms
  2. AAAA lookup for db.mycorp.com.localdomain.mycorp.com (FAILED) ~80ms
  3. AAAA lookup for db.mycorp.com.otherdomain.mycorp.com (FAILED) ~80ms
  4. AAAA lookup for db.mycorp.com.yetanotherdomain.mycorp.com (FAILED) ~80ms
  5. AAAA lookup for db.mycorp.com.mycorp.com (FAILED) ~80ms
  6. A lookup for db.mycorp.com (HIT) ~80ms
Becuase the host's /etc/resolv.conf specified a number of search domains to attempt, the host resolver spent over 400ms on every attempt looking up records that we would never care about.?? This turns out to have an easy fix: set the system property java.net.preferIPv4Stack=true.?? With that property set, the runtime will skip steps 1-5 above, and request the desired A record directly.We still have the issue of the random connection failure.?? Because DNS lookups use UDP, packet loss results in a failed DNS lookup attempt.?? The default behavior (for the Linux resolver at least) is to timeout after 5 seconds, and then re-attempt.?? Because our connection pool timed out the connection attempt after 5 seconds, it never had a chance to even pass the DNS lookup which could have taken up to ~5480ms (5 second timeout, plus AAAA lookup overhead).?? Setting a value of greater than 5 or less than 5 would have been fine, but 5 seconds was an unfortunate coincidence with the resolver timeout.?? Another option would be to configure the resolver to timeout more quickly.?? I'll leave that as a discussion between you and your network administrator as to what gives. :)At the end of the day, we set our ttl value to something more reasonable, shortened our connection timeout (forcing a reconnect attempt more quickly), and with the IPv4 preference, our problems have been resolved.

Published: March 15 2009

blog comments powered by Disqus