Smart Alerting

Get notified before
it becomes an outage.

Define threshold-based alert rules for any infrastructure metric. Route notifications to Slack, email, PagerDuty, or webhooks. Set up escalation policies, maintenance windows, and cooldown periods so the right people get the right alerts at the right time.

8

Alert sources

4

Channels

<30s

Delivery time

100%

Delivery tracking

Alert Sources

Every signal your infrastructure produces.

HostAtlas monitors eight distinct alert sources across servers, services, domains, and certificates. Every source can trigger independently or feed into composite rules for sophisticated detection.

speed

Metric Thresholds

Set warning and critical thresholds on any system metric. CPU, RAM, disk, load average, swap, and network I/O are all supported with configurable evaluation windows.

CPU utilization (per-core or aggregate)
RAM usage (used, cached, buffers)
Disk usage per mount point
Load average (1m, 5m, 15m)
Swap usage percentage
Network bytes in/out
favorite

Heartbeat Checks

Monitors cron jobs, background tasks, and CI/CD pipelines via HTTP pings. Alerts fire when expected pings are missed or when a job reports explicit failure.

Missed ping detection
Explicit failure reports
Duration threshold exceeded
Grace period tolerance
verified_user

SSL Certificate Expiration

Automatic tracking of every discovered certificate. HostAtlas alerts you 30, 14, 7, and 1 day(s) before expiration so you have ample time to renew.

Configurable warning thresholds
Renewal detection and confirmation
Chain validation errors
cloud_off

Server Offline Events

When the agent hasn't reported for more than 5 minutes, HostAtlas marks the server offline and fires alerts. Consecutive missed check-ins distinguish blips from real outages.

5-minute detection window
Network blip filtering
Recovery notifications
dns

Service Status Changes

Detects when monitored services like nginx, MySQL, PostgreSQL, Redis, or Caddy stop running unexpectedly. Alerts fire within seconds of the process disappearing.

Process disappearance detection
Service restart tracking
Port binding changes
health_and_safety

Domain Health Failures

HTTP health checks run every 5 minutes for every discovered domain. Non-2xx responses, timeouts, and TLS handshake failures all trigger alerts with full response details.

HTTP status code monitoring
Response time thresholds
TLS handshake failures
history

Configuration Changes

Track changes to web server configs, firewall rules, and system settings. Get alerted when a configuration file is modified so you can correlate changes with incidents.

Nginx/Apache config diffs
System file modifications
Change-to-incident correlation
trending_up

Error Spikes

Detects sudden increases in error rates from logs. When error frequency exceeds baseline by a configurable multiplier, HostAtlas fires alerts with sample log lines attached.

Baseline comparison
Configurable multiplier threshold
Sample log lines in alert body

Rule Builder

Build alert rules
without writing code.

The visual rule builder lets you define exactly when an alert should fire. Pick a metric, set the condition and threshold, configure the evaluation duration, and assign a severity level. Apply rules to individual servers, tags, or your entire fleet.

tune

Granular conditions

Greater than, less than, equal to, or between. Combine multiple conditions with AND/OR logic for compound rules.

timer

Evaluation windows

Require the condition to persist for 1, 5, 10, or 15 minutes before firing. Prevents noisy alerts from momentary spikes.

label

Tag-based scoping

Apply rules to servers matching specific tags. One rule for all production servers, another for staging. Scale without duplication.

priority_high

Severity levels

Info, Warning, Critical, and Emergency. Each severity can route to different channels and escalation policies.

Create Alert Rule Draft
High CPU on production servers
CPU Utilization (Aggregate) expand_more
Greater than expand_more
90 %
5 minutes expand_more
Info Warn Crit Emrg
sell env:production sell tier:web + Add tag
#ops-alerts mail oncall@acme.io PagerDuty
Rule will evaluate every 30s

Notification Channels

Deliver alerts where your team already works.

Configure multiple notification channels per alert rule. Each channel has its own delivery settings, retry logic, and delivery tracking. Mix and match to ensure critical alerts reach the right people through the right medium.

Slack

Route alerts to any Slack channel using incoming webhooks or the HostAtlas Slack app. Rich message formatting with severity badges, metric values, and direct links to the affected server.

check_circle Channel-per-severity routing
check_circle Rich message blocks with context
check_circle Thread-based alert grouping
check_circle Acknowledge from Slack
mail

Email

Send alerts to individual email addresses or distribution lists. HTML-formatted emails include metric snapshots, alert history, and one-click acknowledge links. Delivered via reliable transactional email infrastructure.

check_circle HTML + plain text fallback
check_circle Metric snapshot in email body
check_circle One-click acknowledge link
check_circle Distribution list support

PagerDuty

Trigger PagerDuty incidents directly from HostAtlas alerts. Map alert severities to PagerDuty urgency levels. Incidents auto-resolve in PagerDuty when the metric returns to normal.

check_circle Events API v2 integration
check_circle Severity-to-urgency mapping
check_circle Auto-resolve on recovery
check_circle Service key per alert rule
webhook

Webhooks

Send alert payloads to any HTTP endpoint. HMAC-signed for authenticity, with configurable headers and retry logic. Build custom integrations with Microsoft Teams, Discord, Opsgenie, or your own systems.

check_circle HMAC-SHA256 signatures
check_circle Custom headers and auth tokens
check_circle 3 retries with exponential backoff
check_circle Full request/response logging

Example webhook payload

{
"event": "alert.triggered",
"rule": "High CPU on production servers",
"severity": "critical",
"server": {
"hostname": "prod-03",
"ip": "10.0.1.23"
},
"metric": {
"name": "cpu_utilization",
"value": 94.7,
"threshold": 90,
"unit": "percent"
},
"triggered_at": "2026-03-21T14:32:07Z",
"dashboard_url": "https://app.hostatlas.app/servers/prod-03"
}
Escalation Policy: Production Critical Active
1

Tier 1 — Immediate

Fires immediately when the alert triggers.

#ops-alerts
mail oncall@acme.io
2

Tier 2 — After 10 minutes

Escalates if no acknowledgment within 10 minutes.

mail eng-leads@acme.io
#eng-escalation
3

Tier 3 — After 30 minutes

Final escalation to leadership if still unacknowledged.

PagerDuty — P1 Service
mail vp-eng@acme.io
phone SMS to +1 (555) 012-3456

Escalation Policies

The right person.
Every time.

Define tiered escalation policies with configurable timeouts at each level. If the primary on-call team doesn't acknowledge an alert within the time window, it automatically escalates to the next tier. No alert goes unnoticed.

account_tree

Unlimited tiers

Create as many escalation levels as your team needs. Each tier has its own timeout and notification channels.

front_hand

Acknowledgment-based

Escalation stops when someone acknowledges. Acknowledge via Slack, email link, dashboard, or API.

schedule

Configurable timeouts

Set 5-minute, 10-minute, 30-minute, or custom timeout windows between each tier. Fine-tune response expectations.

swap_vert

Severity-based routing

Attach different escalation policies to different severities. Warning alerts notify Slack; critical alerts page on-call immediately.

Maintenance Windows

Planned work.
Zero noise.

Schedule maintenance windows to suppress alerts during planned work. Define the scope, set the duration, and let your team deploy, patch, or reboot without triggering a flood of notifications. Alerts resume automatically when the window closes.

target

Flexible scoping

Scope a window to a single server, a group of servers by tag, specific services, or your entire fleet. Narrow suppression prevents masking unrelated issues.

event_repeat

Recurring schedules

Set up recurring maintenance windows for regular patching cycles. Daily, weekly, or monthly recurrence with configurable day-of-week and time-of-day.

auto_mode

Auto-resume

When the maintenance window closes, alerting resumes instantly with no manual action. No forgotten muted servers, no silent outages after the window.

visibility

Visibility during maintenance

Metrics are still collected and displayed during maintenance. You see the data without the noise. Dashboard badges indicate which servers are in a maintenance window.

Scheduled Maintenance Windows
engineering

Database Cluster Upgrade

Upgrading PostgreSQL from 15.4 to 16.1

Active

Scope

tag:role=database

Started

Mar 21, 02:00 UTC

Ends

Mar 21, 04:00 UTC

~42m remaining
engineering

Weekly Security Patches

Automated patching of all production web servers

Scheduled

Scope

tag:tier=web

Next run

Mar 25, 03:00 UTC

Recurrence

Every Tuesday

check_circle

Load Balancer Migration

Migrated from HAProxy to Caddy

Completed

Scope

All servers

Duration

1h 22m

Alerts suppressed

14

Alert Timeline — CPU > 90% on prod-03

Alert Triggered

14:32:07 UTC

CPU at 94.7% for 5 minutes. Notifications sent to #ops-alerts, oncall@acme.io.

Cooldown Active

14:32:07 — 14:47:07 UTC

15-minute cooldown period started. Duplicate alerts suppressed.

Suppressed Alert Triggered

14:37:37 UTC

CPU still at 92.1%. Would trigger, but cooldown active. No notification sent.

Suppressed Alert Triggered

14:43:07 UTC

CPU at 91.3%. Still within cooldown window.

Cooldown Expired

14:47:07 UTC

Cooldown period ended. New violations will trigger fresh notifications. 2 duplicate alerts were suppressed.

Cooldown Periods

Alert storms.
Eliminated.

When a metric crosses a threshold, you don't need to be told every 30 seconds. HostAtlas enforces a configurable cooldown period after each alert fires. During cooldown, duplicate notifications are suppressed while the underlying data continues to be tracked.

snooze

Default 15-minute cooldown

Out of the box, alerts have a 15-minute cooldown. Customize per-rule to 5, 10, 15, 30, or 60 minutes depending on the metric's volatility.

filter_alt

Suppression counts visible

The dashboard shows how many alerts were suppressed during each cooldown window. Full transparency into what was muted and why.

data_usage

Data continues flowing

Cooldown only suppresses notifications — metric collection and threshold evaluation continue uninterrupted. Nothing is lost.

restart_alt

Recovery resets cooldown

When a metric returns below threshold and then crosses again, a new alert fires immediately regardless of cooldown state.

Delivery Tracking

Know that every alert was delivered.

Every notification sent by HostAtlas is tracked end-to-end. See sent, delivered, and failed statuses with timestamps. When a webhook fails or an email bounces, detailed logs show exactly what happened so you can fix the issue immediately.

Delivery Log Alert: High CPU on prod-03 — Mar 21, 14:32 UTC
3 Delivered 1 Failed
check_circle

Slack — #ops-alerts

Delivered in 340ms · Message ID: msg_a7b2c9

14:32:08 UTC
check_circle
mail

Email — oncall@acme.io

Accepted by SMTP relay · SES Message ID: 0100018...3ab2

14:32:09 UTC
check_circle

PagerDuty — Production Infra

Incident created · Dedup key: hostatlas-cpu-prod03-20260321

14:32:10 UTC
error
webhook

Webhook — https://hooks.internal.acme.io/alerts

Failed after 3 retries · Last response: 503 Service Unavailable · Timeout: 10s

14:32:42 UTC
analytics

Delivery Analytics

Track delivery success rates per channel over time. Identify unreliable webhooks or email deliverability issues before they cause missed alerts in a real incident.

replay

Automatic Retries

Failed webhook deliveries are retried 3 times with exponential backoff (1s, 5s, 30s). If all retries fail, the alert is flagged and you can manually retry from the delivery log.

receipt_long

Full Request Logs

Every webhook delivery includes the full HTTP request and response — headers, body, status code, and timing. Debug integration issues without guessing what payload was sent.

Never miss an alert again.

Set up your first alert rule in under two minutes. Multi-channel delivery, escalation policies, and maintenance windows included on every plan.

$ curl -sSL install.hostatlas.app | bash_