Real-time Monitoring
Every system metric,
every 30 seconds.
CPU, memory, disk, network, and load averages collected from every server in your fleet. Domain health checks every 5 minutes. SSL certificate expiration tracked daily. All streamed to your dashboard over WebSockets in real time.
30s
Collection interval
5 min
Health check cycle
24/7
SSL tracking
<1s
WebSocket delivery
System Metrics
Six dimensions of server health. One agent.
The HostAtlas agent collects six categories of system metrics every 30 seconds: CPU utilization, memory breakdown, disk usage, network throughput, load averages, and uptime. Each metric is stored with nanosecond-precision timestamps, aggregated over configurable windows, and rendered as interactive time-series charts on your dashboard.
CPU Usage
Per-core and aggregated CPU utilization broken down by user, system, iowait, steal, and idle time. The agent reads directly from /proc/stat on Linux for precise kernel-level granularity.
- check Per-core utilization (user, system, iowait, steal, idle)
- check Aggregated total CPU percentage
- check Historical charts with configurable time ranges
- check Threshold alert rules for sustained high utilization
Memory (RAM)
Complete memory breakdown including used, total, available, cached, buffers, and swap utilization. Understand exactly where your server's memory is allocated and detect leaks before they cause OOM kills.
- check Used, total, available, cached, and buffer breakdown
- check Swap usage (used, total, percentage)
- check Stacked area charts showing memory composition
- check Alert when available memory drops below threshold
Disk Usage
Per-mount-point disk utilization with used, total, and percentage values. The agent detects all mounted filesystems automatically and tracks inodes alongside storage capacity.
- check Per mount point: used, total, percentage, inodes
- check Automatic detection of all mounted filesystems
- check Threshold lines on charts for warning and critical levels
- check Projected days until full based on growth rate
Network Traffic
Bytes in and bytes out per network interface, sampled every 30 seconds. Track bandwidth utilization across public and private interfaces to identify traffic spikes and capacity constraints.
- check Bytes received and bytes transmitted per interface
- check Rate calculation (bytes/sec, Mbps) over time
- check Dual-axis charts for inbound vs outbound traffic
- check Spike detection with configurable baseline thresholds
Load Average
1-minute, 5-minute, and 15-minute load averages displayed as overlapping line charts. Correlate load spikes with CPU, memory, and disk I/O to pinpoint the source of contention on your servers.
- check 1-minute, 5-minute, and 15-minute averages
- check Normalized by CPU core count for comparability
- check Overlapping tri-line chart for trend visualization
- check Alert when load exceeds core count for sustained periods
System Uptime
Continuous uptime tracking from the moment the agent is installed. Detect unexpected reboots, track uptime streaks, and correlate restarts with metric anomalies and incident timelines.
- check Current uptime in days, hours, and minutes
- check Reboot event detection with timestamp logging
- check Uptime history with reboot annotations on charts
- check Alert on unexpected reboots outside maintenance windows
Domain Health Checks
Know the moment a domain goes down.
HostAtlas pings every discovered domain via HTTP every 5 minutes. Each check records the status code, response time, TLS handshake duration, and any redirect chain. When a domain fails its health check, an alert fires within 5 minutes of the failure.
HTTP Ping Every 5 Minutes
An external check hits each domain's root URL over HTTPS (falling back to HTTP if no certificate is found). The check runs from HostAtlas infrastructure, not from your servers, so it reflects the experience of a real external visitor.
Status Codes & Response Times
Every check records the HTTP status code (200, 301, 403, 500, etc.) and full response time in milliseconds. Historical data lets you spot degradation trends before they become downtime events. Charts display p50, p95, and p99 latencies over time.
SSL Verification on Every Check
The TLS handshake is validated during each health check. Certificate chain verification, hostname matching, and protocol version are all confirmed. A failed TLS handshake is flagged separately from an HTTP failure so you can distinguish between application and certificate issues.
Timeout & Error Handling
Checks enforce a 10-second timeout. DNS resolution failures, TCP connection refused errors, TLS handshake timeouts, and HTTP-level errors are all categorized individually. Each failure type generates a distinct alert payload so your alerting rules can differentiate between network and application failures.
Redirect Chain Detection
When a domain responds with 301 or 302 redirects, HostAtlas follows the chain up to 10 hops and records every step. See exactly where traffic ends up, catch redirect loops early, and detect unexpected intermediate destinations that could indicate DNS hijacking or misconfiguration.
SSL Certificate Monitoring
Never let a certificate expire silently again.
HostAtlas discovers every SSL certificate on your servers and tracks expiration dates automatically. When a certificate is within 14 days of expiry, you get alerted. When it renews, the dashboard updates instantly. Every affected domain is correlated so you know the blast radius of an expiring certificate.
Automatic Certificate Discovery
The agent scans web server configurations (nginx, Apache, Caddy) and discovers every SSL certificate installed on your servers. New certificates are detected within minutes of installation, including wildcard and SAN certificates.
14-Day Expiration Warnings
When a certificate enters its final 14 days before expiry, HostAtlas triggers a warning alert. Additional alerts fire at 7 days, 3 days, and 1 day. Severity escalates as the deadline approaches so the right people are reached at the right time.
Renewal Detection
When a certificate is renewed (manually or via Let's Encrypt automation), the agent detects the new expiration date on its next scan cycle and updates the dashboard. Active expiration alerts are automatically resolved, and a renewal event is logged for your audit trail.
Affected Domain Correlation
Every SSL certificate is linked to its associated domains. When a certificate is nearing expiry, the alert shows every domain that will be affected. For wildcard certificates, all matching subdomains are listed so you understand the full impact before expiration.
Certificate Details
View the full certificate chain: issuer, subject, SANs, serial number, signature algorithm, key size, and valid-from/valid-to dates. All displayed on the certificate detail page alongside its renewal history and associated servers.
Renewal History
A complete log of every certificate renewal event: old expiration date, new expiration date, issuer change detection, and the exact timestamp of detection. Useful for auditing Let's Encrypt cron jobs and verifying automation reliability.
Server Offline Detection
Five minutes of silence. That's all it takes.
When a server's agent hasn't reported in for more than 5 minutes, HostAtlas marks the server as offline. Minute-by-minute checks distinguish between brief network blips and genuine outages. Status changes are pushed to your dashboard over WebSockets in real time.
Agent Heartbeat Every 30 Seconds
The agent sends a heartbeat to the HostAtlas API every 30 seconds. Each heartbeat includes the server's current timestamp, agent version, and a lightweight health payload. This is the baseline signal that confirms the server is alive and the agent is running.
Missed Check-In Detection
The platform runs a background job every minute that evaluates the last heartbeat timestamp for every registered server. If the gap exceeds 5 minutes (10 consecutive missed heartbeats), the server is flagged for offline evaluation.
Status Transition & Alert
Once confirmed offline, the server's status changes from "online" to "offline" in the database. An alert is triggered immediately through all configured notification channels. The status change is broadcast to all connected dashboards via WebSocket so every team member sees it instantly.
Automatic Recovery Detection
When the agent resumes heartbeats, the server is automatically marked as "online" again. A recovery event is logged with the exact downtime duration. The original offline alert is resolved and a recovery notification is sent to all configured channels.
Dashboard & Visualization
Charts that tell the full story.
Time-series line charts, stacked area charts, bar gauges, and KPI cards. Every metric is visualized with configurable time ranges from 1 hour to 30 days. Hover for exact values. Click to drill down. Pin the views that matter most.
Time-Series Line Charts
CPU, load average, and network traffic are rendered as multi-line charts with configurable time ranges. Each line is color-coded and labeled. Hover over any point to see the exact value and timestamp. Zoom by selecting a time range on the chart itself.
Stacked Area Charts
Memory composition (used, cached, buffers, available) and disk breakdown are displayed as stacked area charts. The visual proportions make it immediately clear where resources are allocated and how the balance shifts over time.
Bar Gauges & KPI Cards
Current values for CPU, RAM, disk, and swap are shown as horizontal bar gauges with color-coded thresholds. KPI cards display uptime percentage, current load average, active alerts, and domain health scores at a glance.
Real-time WebSocket Updates
Charts update live as new data arrives over WebSocket connections. No page refreshes, no polling intervals. When a new metric lands, the chart appends the data point and scrolls forward automatically. Status changes are reflected instantly across every open dashboard.
Data Retention & Aggregation
High resolution when it matters. Efficient storage always.
HostAtlas uses a tiered retention strategy that keeps raw data for the most recent window and progressively aggregates older data. You get 30-second granularity for recent events and long-term trend data without unbounded storage costs.
Raw Data — Under 6 Hours
Every data point at full 30-second resolution is retained for the most recent 6 hours. This gives you maximum granularity for investigating active incidents, correlating recent events, and debugging performance issues in real time.
5-Minute Averages — 6 to 24 Hours
Data older than 6 hours is aggregated into 5-minute windows (average, min, max). This tier covers the previous day with enough detail to spot trends, confirm recurring patterns, and review overnight performance without storing every individual sample.
1-Hour Averages — Beyond 24 Hours
Data older than 24 hours is aggregated into 1-hour windows. This long-term tier is retained for the duration of your plan's data retention limit, providing week-over-week and month-over-month trend visibility for capacity planning and reporting.
Time Range Selection
Every chart in HostAtlas supports configurable time ranges. Select a preset or define a custom window. The system automatically serves data from the appropriate retention tier so you always get the best available resolution for your selected range.
1h
Raw 30s data
120 data points
6h
Raw 30s data
720 data points
24h
5-min averages
288 data points
7d
1-hour averages
168 data points
Custom
Best available tier
Auto-selected
What Gets Aggregated
Each aggregation window stores the average, minimum, and maximum value for every metric. This means you never lose visibility into spikes or dips — even in the 1-hour tier, the peak CPU or minimum free memory is preserved alongside the average.
- check Average value across the window
- check Minimum value (floor) within the window
- check Maximum value (peak) within the window
- check Sample count for statistical validity
Why Tiered Retention
Storing every 30-second sample indefinitely would be prohibitively expensive for infrastructure with dozens or hundreds of servers. Tiered retention balances three competing needs:
- check Incident investigation — full-resolution data for the recent window
- check Trend analysis — multi-day and multi-week aggregated data
- check Cost efficiency — predictable storage costs that scale linearly
Get started
Start monitoring your infrastructure in 30 seconds.
Install the agent, and metrics start flowing immediately. CPU, memory, disk, network, load averages, domain health checks, and SSL tracking — all collected automatically. No configuration files to write. No dashboards to build from scratch. Everything is ready the moment your first heartbeat lands.
Quick install
$ curl -sSL install.hostatlas.app | bash_