Real-time Monitoring

Every system metric,
every 30 seconds.

CPU, memory, disk, network, and load averages collected from every server in your fleet. Domain health checks every 5 minutes. SSL certificate expiration tracked daily. All streamed to your dashboard over WebSockets in real time.

30s

Collection interval

5 min

Health check cycle

24/7

SSL tracking

<1s

WebSocket delivery

System Metrics

Six dimensions of server health. One agent.

The HostAtlas agent collects six categories of system metrics every 30 seconds: CPU utilization, memory breakdown, disk usage, network throughput, load averages, and uptime. Each metric is stored with nanosecond-precision timestamps, aggregated over configurable windows, and rendered as interactive time-series charts on your dashboard.

speed

CPU Usage

Per-core and aggregated CPU utilization broken down by user, system, iowait, steal, and idle time. The agent reads directly from /proc/stat on Linux for precise kernel-level granularity.

  • check Per-core utilization (user, system, iowait, steal, idle)
  • check Aggregated total CPU percentage
  • check Historical charts with configurable time ranges
  • check Threshold alert rules for sustained high utilization
memory

Memory (RAM)

Complete memory breakdown including used, total, available, cached, buffers, and swap utilization. Understand exactly where your server's memory is allocated and detect leaks before they cause OOM kills.

  • check Used, total, available, cached, and buffer breakdown
  • check Swap usage (used, total, percentage)
  • check Stacked area charts showing memory composition
  • check Alert when available memory drops below threshold
hard_drive

Disk Usage

Per-mount-point disk utilization with used, total, and percentage values. The agent detects all mounted filesystems automatically and tracks inodes alongside storage capacity.

  • check Per mount point: used, total, percentage, inodes
  • check Automatic detection of all mounted filesystems
  • check Threshold lines on charts for warning and critical levels
  • check Projected days until full based on growth rate
swap_horiz

Network Traffic

Bytes in and bytes out per network interface, sampled every 30 seconds. Track bandwidth utilization across public and private interfaces to identify traffic spikes and capacity constraints.

  • check Bytes received and bytes transmitted per interface
  • check Rate calculation (bytes/sec, Mbps) over time
  • check Dual-axis charts for inbound vs outbound traffic
  • check Spike detection with configurable baseline thresholds
show_chart

Load Average

1-minute, 5-minute, and 15-minute load averages displayed as overlapping line charts. Correlate load spikes with CPU, memory, and disk I/O to pinpoint the source of contention on your servers.

  • check 1-minute, 5-minute, and 15-minute averages
  • check Normalized by CPU core count for comparability
  • check Overlapping tri-line chart for trend visualization
  • check Alert when load exceeds core count for sustained periods
timer

System Uptime

Continuous uptime tracking from the moment the agent is installed. Detect unexpected reboots, track uptime streaks, and correlate restarts with metric anomalies and incident timelines.

  • check Current uptime in days, hours, and minutes
  • check Reboot event detection with timestamp logging
  • check Uptime history with reboot annotations on charts
  • check Alert on unexpected reboots outside maintenance windows

Domain Health Checks

Know the moment a domain goes down.

HostAtlas pings every discovered domain via HTTP every 5 minutes. Each check records the status code, response time, TLS handshake duration, and any redirect chain. When a domain fails its health check, an alert fires within 5 minutes of the failure.

http

HTTP Ping Every 5 Minutes

An external check hits each domain's root URL over HTTPS (falling back to HTTP if no certificate is found). The check runs from HostAtlas infrastructure, not from your servers, so it reflects the experience of a real external visitor.

signal_cellular_alt

Status Codes & Response Times

Every check records the HTTP status code (200, 301, 403, 500, etc.) and full response time in milliseconds. Historical data lets you spot degradation trends before they become downtime events. Charts display p50, p95, and p99 latencies over time.

verified_user

SSL Verification on Every Check

The TLS handshake is validated during each health check. Certificate chain verification, hostname matching, and protocol version are all confirmed. A failed TLS handshake is flagged separately from an HTTP failure so you can distinguish between application and certificate issues.

timer_off

Timeout & Error Handling

Checks enforce a 10-second timeout. DNS resolution failures, TCP connection refused errors, TLS handshake timeouts, and HTTP-level errors are all categorized individually. Each failure type generates a distinct alert payload so your alerting rules can differentiate between network and application failures.

alt_route

Redirect Chain Detection

When a domain responds with 301 or 302 redirects, HostAtlas follows the chain up to 10 hops and records every step. See exactly where traffic ends up, catch redirect loops early, and detect unexpected intermediate destinations that could indicate DNS hijacking or misconfiguration.

Domain Health Checks

Last 24 hours
app.example.com
200 142ms
api.example.com
200 89ms
staging.example.com
301 320ms
legacy.example.com
503 10,002ms
docs.example.com
200 67ms
cdn.example.com
200 34ms
6 domains monitored 99.4% uptime (24h)

SSL Certificate Monitoring

Never let a certificate expire silently again.

HostAtlas discovers every SSL certificate on your servers and tracks expiration dates automatically. When a certificate is within 14 days of expiry, you get alerted. When it renews, the dashboard updates instantly. Every affected domain is correlated so you know the blast radius of an expiring certificate.

travel_explore

Automatic Certificate Discovery

The agent scans web server configurations (nginx, Apache, Caddy) and discovers every SSL certificate installed on your servers. New certificates are detected within minutes of installation, including wildcard and SAN certificates.

schedule

14-Day Expiration Warnings

When a certificate enters its final 14 days before expiry, HostAtlas triggers a warning alert. Additional alerts fire at 7 days, 3 days, and 1 day. Severity escalates as the deadline approaches so the right people are reached at the right time.

autorenew

Renewal Detection

When a certificate is renewed (manually or via Let's Encrypt automation), the agent detects the new expiration date on its next scan cycle and updates the dashboard. Active expiration alerts are automatically resolved, and a renewal event is logged for your audit trail.

language

Affected Domain Correlation

Every SSL certificate is linked to its associated domains. When a certificate is nearing expiry, the alert shows every domain that will be affected. For wildcard certificates, all matching subdomains are listed so you understand the full impact before expiration.

info

Certificate Details

View the full certificate chain: issuer, subject, SANs, serial number, signature algorithm, key size, and valid-from/valid-to dates. All displayed on the certificate detail page alongside its renewal history and associated servers.

history

Renewal History

A complete log of every certificate renewal event: old expiration date, new expiration date, issuer change detection, and the exact timestamp of detection. Useful for auditing Let's Encrypt cron jobs and verifying automation reliability.

Server Offline Detection

Five minutes of silence. That's all it takes.

When a server's agent hasn't reported in for more than 5 minutes, HostAtlas marks the server as offline. Minute-by-minute checks distinguish between brief network blips and genuine outages. Status changes are pushed to your dashboard over WebSockets in real time.

01

Agent Heartbeat Every 30 Seconds

The agent sends a heartbeat to the HostAtlas API every 30 seconds. Each heartbeat includes the server's current timestamp, agent version, and a lightweight health payload. This is the baseline signal that confirms the server is alive and the agent is running.

02

Missed Check-In Detection

The platform runs a background job every minute that evaluates the last heartbeat timestamp for every registered server. If the gap exceeds 5 minutes (10 consecutive missed heartbeats), the server is flagged for offline evaluation.

03

Status Transition & Alert

Once confirmed offline, the server's status changes from "online" to "offline" in the database. An alert is triggered immediately through all configured notification channels. The status change is broadcast to all connected dashboards via WebSocket so every team member sees it instantly.

04

Automatic Recovery Detection

When the agent resumes heartbeats, the server is automatically marked as "online" again. A recovery event is logged with the exact downtime duration. The original offline alert is resolved and a recovery notification is sent to all configured channels.

Server Status

Live
web-prod-01 192.168.1.10
12s ago
web-prod-02 192.168.1.11
4s ago
db-primary 192.168.1.20
18s ago
worker-03 192.168.1.33
7m 22s ago
cache-redis-01 192.168.1.40
8s ago
warning 1 server offline — alert sent to #ops-alerts 2m ago

Dashboard & Visualization

Charts that tell the full story.

Time-series line charts, stacked area charts, bar gauges, and KPI cards. Every metric is visualized with configurable time ranges from 1 hour to 30 days. Hover for exact values. Click to drill down. Pin the views that matter most.

Time-Series Line Charts

CPU, load average, and network traffic are rendered as multi-line charts with configurable time ranges. Each line is color-coded and labeled. Hover over any point to see the exact value and timestamp. Zoom by selecting a time range on the chart itself.

Stacked Area Charts

Memory composition (used, cached, buffers, available) and disk breakdown are displayed as stacked area charts. The visual proportions make it immediately clear where resources are allocated and how the balance shifts over time.

Bar Gauges & KPI Cards

Current values for CPU, RAM, disk, and swap are shown as horizontal bar gauges with color-coded thresholds. KPI cards display uptime percentage, current load average, active alerts, and domain health scores at a glance.

Real-time WebSocket Updates

Charts update live as new data arrives over WebSocket connections. No page refreshes, no polling intervals. When a new metric lands, the chart appends the data point and scrolls forward automatically. Status changes are reflected instantly across every open dashboard.

web-prod-01

Online

CPU

34%

RAM

67%

Disk

42%

Load

1.24

CPU Usage — Last 6 Hours

1h 6h 24h 7d
12:00 15:00 18:00

Memory Breakdown

Used 5.4 GB / 8 GB
Cached 1.8 GB
Swap 128 MB / 2 GB

Load Average

1.24

1 min

0.98

5 min

0.76

15 min

1m
5m
15m

Data Retention & Aggregation

High resolution when it matters. Efficient storage always.

HostAtlas uses a tiered retention strategy that keeps raw data for the most recent window and progressively aggregates older data. You get 30-second granularity for recent events and long-term trend data without unbounded storage costs.

RAW

Raw Data — Under 6 Hours

Every data point at full 30-second resolution is retained for the most recent 6 hours. This gives you maximum granularity for investigating active incidents, correlating recent events, and debugging performance issues in real time.

Resolution 30 seconds
Retention 6 hours
Data points / metric ~720
5m

5-Minute Averages — 6 to 24 Hours

Data older than 6 hours is aggregated into 5-minute windows (average, min, max). This tier covers the previous day with enough detail to spot trends, confirm recurring patterns, and review overnight performance without storing every individual sample.

Resolution 5 minutes
Retention 6 – 24 hours
Data points / metric ~216
1h

1-Hour Averages — Beyond 24 Hours

Data older than 24 hours is aggregated into 1-hour windows. This long-term tier is retained for the duration of your plan's data retention limit, providing week-over-week and month-over-month trend visibility for capacity planning and reporting.

Resolution 1 hour
Retention Plan-dependent
Data points / metric 24 / day

Time Range Selection

Every chart in HostAtlas supports configurable time ranges. Select a preset or define a custom window. The system automatically serves data from the appropriate retention tier so you always get the best available resolution for your selected range.

1h

Raw 30s data

120 data points

6h

Raw 30s data

720 data points

24h

5-min averages

288 data points

7d

1-hour averages

168 data points

Custom

Best available tier

Auto-selected

What Gets Aggregated

Each aggregation window stores the average, minimum, and maximum value for every metric. This means you never lose visibility into spikes or dips — even in the 1-hour tier, the peak CPU or minimum free memory is preserved alongside the average.

  • check Average value across the window
  • check Minimum value (floor) within the window
  • check Maximum value (peak) within the window
  • check Sample count for statistical validity

Why Tiered Retention

Storing every 30-second sample indefinitely would be prohibitively expensive for infrastructure with dozens or hundreds of servers. Tiered retention balances three competing needs:

  • check Incident investigation — full-resolution data for the recent window
  • check Trend analysis — multi-day and multi-week aggregated data
  • check Cost efficiency — predictable storage costs that scale linearly

Get started

Start monitoring your infrastructure in 30 seconds.

Install the agent, and metrics start flowing immediately. CPU, memory, disk, network, load averages, domain health checks, and SSL tracking — all collected automatically. No configuration files to write. No dashboards to build from scratch. Everything is ready the moment your first heartbeat lands.

Quick install

$ curl -sSL install.hostatlas.app | bash_