Monitoring Uptime and Integrity

What this section covers

Uptime and integrity monitoring ensures your documentation remains accessible and unmodified. This guide covers practical monitoring strategies suitable for static sites hosted on CDN infrastructure.

Alongside “is it up?”, you’ll also want to track real user performance. A site can return 200 OK from every region and still feel broken if pages load slowly or the main content appears late. That’s where Largest Contentful Paint (LCP) is useful: it measures when the primary content on a page becomes visible, which is often what visitors perceive as “loaded”.

Why monitor uptime and integrity

For technical documentation sites:

Availability tracking: Confirm site remains accessible globally
Performance monitoring: Detect slowdowns or degraded CDN performance
Content integrity: Verify files haven't been tampered with
Certificate validity: Ensure HTTPS remains functional
DNS resolution: Confirm nameservers respond correctly

Monitoring approaches

External uptime monitoring

Free and paid services:

UptimeRobot (free tier): HTTP/HTTPS checks every 5 minutes
Pingdom: More detailed monitoring with global test locations
StatusCake: HTTP, ping, and port monitoring
Cloudflare Analytics: Built-in analytics for Pages deployments

Synthetic monitoring

Automated checks simulating real users:

Homepage load time testing
Critical path verification (can users reach key documentation?)
Form submission testing (contact pages)
Search functionality validation

Content integrity monitoring

Verify content hasn't changed unexpectedly:

Git repository audits: Compare deployed files to source
Checksum verification: SHA-256 hashes of critical files
CSP violation reports: Content Security Policy alerts for unauthorized scripts
SRI (Subresource Integrity): Hash verification for external resources

Key metrics to track

Availability metrics

Uptime percentage: Target 99.9% (8.76 hours downtime/year)
Response time: P50, P95, P99 latencies
Error rates: 4xx and 5xx response codes
Geographic availability: Test from multiple regions

Performance indicators

Time to First Byte (TTFB): Server response speed
Largest Contentful Paint (LCP): When main content renders
Cumulative Layout Shift (CLS): Visual stability
First Input Delay (FID): Interactivity responsiveness

Alerting strategies

Critical alerts (immediate notification)

Site completely down (returns 5xx errors)
SSL certificate expired or invalid
DNS resolution failure
CDN purge failures

Warning alerts (review within hours)

Elevated error rates (>1% of requests)
Slow response times (P95 > 2 seconds)
Unexpected traffic patterns
Certificate expiring within 7 days

Informational (daily/weekly review)

Traffic trends
Popular pages
Referrer analysis
Search query data

Monitoring tools setup

Basic UptimeRobot configuration

Create HTTP(S) monitor for homepage
Set check interval to 5 minutes
Configure alert contacts (email, Slack)
Add keyword monitoring (check for specific text on page)
Monitor from multiple locations if available

Cloudflare Analytics

For Cloudflare Pages deployments:

Enable Web Analytics for visitor insights
Review Cache Analytics for CDN performance
Check Security Events for attack patterns
Monitor DNS Analytics for resolution health

Key pages in this section

Error code reference — Understanding HTTP status codes

Related sections

Infrastructure hub — CDN and DNS configuration
Security hub — Protecting against attacks
Operations hub — Overall operational practices

Technical glossary

Uptime : Percentage of time a service is available and responsive

TTFB (Time to First Byte) : Time from request sent to first byte of response received

Synthetic monitoring : Automated testing simulating real user behavior

SLA (Service Level Agreement) : Commitment to specific availability/performance targets

P95 latency : 95th percentile response time — 95% of requests faster than this value