DNS failures present as some of the most confusing connectivity issues in practice. A site that loads fine from one machine but returns NXDOMAIN from another. A domain that works over one resolver but fails over another. Intermittent resolution failures that correlate with nothing obvious. And increasingly, the interaction between encrypted DNS (DoH/DoT), local resolvers, and system-level DNS settings adds another layer of complexity.

This guide provides a structured debugging flow for DNS outages and unexpected NXDOMAIN responses, covering resolver health assessment, common failure modes, and the specific wrinkles introduced by DNS-over-HTTPS and DNS-over-TLS.

The debugging flow

When DNS resolution fails, work through these steps in order:

Step 1: Confirm the failure scope

Before deep-diving, establish what is failing and where:

  • Single domain or all domains? If all resolution fails, the problem is your resolver or network path to it. If one domain fails, the problem is likely authoritative-side or cache-related.
  • Single machine or multiple? If only one machine is affected, check its local DNS configuration. If multiple machines share the failure, check the shared resolver.
  • All record types or specific? If A records resolve but AAAA or MX fail, the issue may be type-specific filtering or a partial zone.
# Quick scope check
dig example.com A +short          # IPv4
dig example.com AAAA +short       # IPv6
dig example.com MX +short         # Mail
dig different-domain.com A +short # Cross-domain check

Step 2: Test against multiple resolvers

Compare results across different resolvers to isolate the failure:

# Your configured resolver
dig example.com A

# Google Public DNS
dig @8.8.8.8 example.com A

# Cloudflare DNS
dig @1.1.1.1 example.com A

# Quad9
dig @9.9.9.9 example.com A

# Direct to authoritative (find NS first)
dig example.com NS +short
dig @ns1.example.com example.com A

If the domain resolves from public resolvers but not from your local resolver, the problem is your resolver (stale cache, filtering, or connectivity to upstream).

If the domain fails from all resolvers including authoritative, the problem is the domain's DNS configuration.

Step 3: Check for NXDOMAIN vs SERVFAIL vs timeout

The response code tells you different things:

  • NXDOMAIN (RCODE 3): The authoritative server says this domain does not exist. This is definitive — the domain is not configured, has expired, or the zone is misconfigured.
  • SERVFAIL (RCODE 2): The resolver could not get an answer from the authoritative server. Possible causes: DNSSEC validation failure, authoritative server unreachable, or upstream timeout.
  • Timeout / no response: Network-level failure. The resolver cannot reach the authoritative server, or the query is being dropped.
  • REFUSED (RCODE 5): The resolver is refusing to answer. Common with resolvers that restrict queries to authorized clients.
# Check full response including status
dig example.com A +noall +comments

Step 4: Inspect the DNS chain

Walk the delegation chain from root to authoritative:

# Trace the full resolution path
dig example.com A +trace

This shows each delegation step. Look for:

  • Missing or incorrect NS records at the parent zone
  • Glue records that point to wrong IPs
  • DNSSEC signature expiration
  • Lame delegations (NS records pointing to servers that don't serve the zone)

Resolver health checks

Local resolver health

If you run your own resolver (Unbound, BIND, dnsmasq):

  1. Check the service is running: systemctl status unbound (or equivalent)
  2. Check cache size: large caches that never get pruned can cause memory pressure
  3. Check upstream connectivity: can the resolver reach root servers and authoritative nameservers?
  4. Check DNSSEC validation: is the trust anchor current? Run unbound-anchor or equivalent
  5. Check query logs: are queries arriving and getting answered?

Public resolver health

Major public resolvers occasionally have outages. Check:

DoH and DoT complications

Encrypted DNS adds resolution paths that bypass traditional debugging tools.

DoH (DNS-over-HTTPS)

Browsers (Firefox, Chrome, Edge) can use DoH by default, sending DNS queries over HTTPS to a DoH resolver (e.g., https://cloudflare-dns.com/dns-query). This means:

  • DNS queries bypass the system resolver entirely
  • nslookup and dig on the command line may return different results than the browser
  • Corporate DNS filtering may be bypassed
  • If the DoH endpoint is unreachable, the browser may fall back to system DNS (or may not, depending on configuration)

Debugging DoH issues:

# Test DoH directly
curl -s -H 'Accept: application/dns-json' \
  'https://cloudflare-dns.com/dns-query?name=example.com&type=A' | jq .

# Or with DNS wire format
curl -s -H 'Content-Type: application/dns-message' \
  --data-binary @query.bin \
  'https://cloudflare-dns.com/dns-query'

In the browser, check about:networking#dns (Firefox) or chrome://net-internals/#dns (Chrome) to see what the browser's internal DNS cache contains and which resolver it is using.

DoT (DNS-over-TLS)

DoT wraps DNS queries in TLS, typically on port 853. Less common in browsers, more common in system-level resolvers (Android Private DNS, systemd-resolved):

# Test DoT
kdig -d @1.1.1.1 +tls-ca example.com A

If DoT port 853 is blocked (some networks block it), the resolver may fail or fall back to unencrypted DNS, depending on configuration.

Common DoH/DoT failure patterns

  • Browser shows different results than terminal: browser is using DoH, terminal uses system resolver. Different resolvers, different caches, potentially different filtering.
  • DNS works on Wi-Fi but not on VPN: VPN may redirect DNS to a corporate resolver that doesn't support DoT, or the VPN blocks port 853.
  • Intermittent SERVFAIL from DoH: the DoH endpoint may be performing DNSSEC validation that the domain's zone is failing. Check DNSSEC health.

Common NXDOMAIN causes

When a domain returns NXDOMAIN unexpectedly:

  1. Domain expired: the registrar removed the zone. Check WHOIS data.
  2. Nameserver change not propagated: NS records at the registrar were changed but parent zone hasn't updated (or TTL hasn't expired).
  3. Typo in zone file: a subdomain was configured as wwww.example.com instead of www.example.com.
  4. DNSSEC key rollover failure: the zone's DNSKEY does not match the DS record at the parent. Validating resolvers return SERVFAIL; non-validating resolvers may return NXDOMAIN if the zone is incomplete.
  5. Split-horizon DNS: internal and external views of the zone differ. You're querying the wrong view.

Verification and prevention

  1. Set up external DNS monitoring (Pingdom, Uptime Robot, or custom synthetic checks) to detect resolution failures before users report them
  2. Monitor DNSSEC signature expiration dates — set alerts for 7 days before expiry
  3. Test from multiple geographic locations and resolver providers
  4. Keep TTLs reasonable (300–3600s) so changes propagate in a useful timeframe
  5. Document your DNS chain: registrar → parent NS → authoritative NS → record types — so you know where to look when things break

Related reading on wplus.net