Smart Alerting

Get notified before
it becomes an outage.

Define threshold-based alert rules for any infrastructure metric. Route notifications to Slack, email, PagerDuty, or webhooks. Set up escalation policies, maintenance windows, and cooldown periods so the right people get the right alerts at the right time.

Start free — no credit card See all features

Alert sources

Channels

<30s

Delivery time

100%

Delivery tracking

Alert Sources

Every signal your infrastructure produces.

HostAtlas monitors eight distinct alert sources across servers, services, domains, and certificates. Every source can trigger independently or feed into composite rules for sophisticated detection.

speed

Metric Thresholds

Set warning and critical thresholds on any system metric. CPU, RAM, disk, load average, swap, and network I/O are all supported with configurable evaluation windows.

CPU utilization (per-core or aggregate)

RAM usage (used, cached, buffers)

Disk usage per mount point

Load average (1m, 5m, 15m)

Swap usage percentage

Network bytes in/out

favorite

Heartbeat Checks

Monitors cron jobs, background tasks, and CI/CD pipelines via HTTP pings. Alerts fire when expected pings are missed or when a job reports explicit failure.

Missed ping detection

Explicit failure reports

Duration threshold exceeded

Grace period tolerance

verified_user

SSL Certificate Expiration

Automatic tracking of every discovered certificate. HostAtlas alerts you 30, 14, 7, and 1 day(s) before expiration so you have ample time to renew.

Configurable warning thresholds

Renewal detection and confirmation

Chain validation errors

cloud_off

Server Offline Events

When the agent hasn't reported for more than 5 minutes, HostAtlas marks the server offline and fires alerts. Consecutive missed check-ins distinguish blips from real outages.

5-minute detection window

Network blip filtering

Recovery notifications

dns

Service Status Changes

Detects when monitored services like nginx, MySQL, PostgreSQL, Redis, or Caddy stop running unexpectedly. Alerts fire within seconds of the process disappearing.

Process disappearance detection

Service restart tracking

Port binding changes

health_and_safety

Domain Health Failures

HTTP health checks run every 5 minutes for every discovered domain. Non-2xx responses, timeouts, and TLS handshake failures all trigger alerts with full response details.

HTTP status code monitoring

Response time thresholds

TLS handshake failures

history

Configuration Changes

Track changes to web server configs, firewall rules, and system settings. Get alerted when a configuration file is modified so you can correlate changes with incidents.

Nginx/Apache config diffs

System file modifications

Change-to-incident correlation

trending_up

Error Spikes

Detects sudden increases in error rates from logs. When error frequency exceeds baseline by a configurable multiplier, HostAtlas fires alerts with sample log lines attached.

Baseline comparison

Configurable multiplier threshold

Sample log lines in alert body

Rule Builder

Build alert rules
without writing code.

The visual rule builder lets you define exactly when an alert should fire. Pick a metric, set the condition and threshold, configure the evaluation duration, and assign a severity level. Apply rules to individual servers, tags, or your entire fleet.

tune

Granular conditions

Greater than, less than, equal to, or between. Combine multiple conditions with AND/OR logic for compound rules.

timer

Evaluation windows

Require the condition to persist for 1, 5, 10, or 15 minutes before firing. Prevents noisy alerts from momentary spikes.

label

Tag-based scoping

Apply rules to servers matching specific tags. One rule for all production servers, another for staging. Scale without duplication.

priority_high

Severity levels

Info, Warning, Critical, and Emergency. Each severity can route to different channels and escalation policies.

Create Alert Rule Draft

Rule Name

High CPU on production servers

Metric

CPU Utilization (Aggregate) expand_more

Condition

Greater than expand_more

Threshold

90 %

For at least

5 minutes expand_more

Severity

Info Warn Crit Emrg

Apply to

sell env:production sell tier:web + Add tag

Notify via

#ops-alerts mail oncall@acme.io PagerDuty

Rule will evaluate every 30s

Notification Channels

Deliver alerts where your team already works.

Configure multiple notification channels per alert rule. Each channel has its own delivery settings, retry logic, and delivery tracking. Mix and match to ensure critical alerts reach the right people through the right medium.

Slack

Route alerts to any Slack channel using incoming webhooks or the HostAtlas Slack app. Rich message formatting with severity badges, metric values, and direct links to the affected server.

check_circle Channel-per-severity routing

check_circle Rich message blocks with context

check_circle Thread-based alert grouping

check_circle Acknowledge from Slack

mail

Email

Send alerts to individual email addresses or distribution lists. HTML-formatted emails include metric snapshots, alert history, and one-click acknowledge links. Delivered via reliable transactional email infrastructure.

check_circle HTML + plain text fallback

check_circle Metric snapshot in email body

check_circle One-click acknowledge link

check_circle Distribution list support

PagerDuty

Trigger PagerDuty incidents directly from HostAtlas alerts. Map alert severities to PagerDuty urgency levels. Incidents auto-resolve in PagerDuty when the metric returns to normal.

check_circle Events API v2 integration

check_circle Severity-to-urgency mapping

check_circle Auto-resolve on recovery

check_circle Service key per alert rule

webhook

Webhooks

Send alert payloads to any HTTP endpoint. HMAC-signed for authenticity, with configurable headers and retry logic. Build custom integrations with Microsoft Teams, Discord, Opsgenie, or your own systems.

check_circle HMAC-SHA256 signatures

check_circle Custom headers and auth tokens

check_circle 3 retries with exponential backoff

check_circle Full request/response logging

Example webhook payload

{

"event": "alert.triggered",

"rule": "High CPU on production servers",

"severity": "critical",

"server": {

"hostname": "prod-03",

"ip": "10.0.1.23"

"metric": {

"name": "cpu_utilization",

"value": 94.7,

"threshold": 90,

"unit": "percent"

"triggered_at": "2026-03-21T14:32:07Z",

"dashboard_url": "https://my.hostatlas.app/servers/prod-03"

}

Escalation Policy: Production Critical Active

Tier 1 — Immediate

Fires immediately when the alert triggers.

#ops-alerts

mail oncall@acme.io

Tier 2 — After 10 minutes

Escalates if no acknowledgment within 10 minutes.

mail eng-leads@acme.io

#eng-escalation

Tier 3 — After 30 minutes

Final escalation to leadership if still unacknowledged.

PagerDuty — P1 Service

mail vp-eng@acme.io

call SMS to +1 (555) 012-3456

Escalation Policies

The right person.
Every time.

Define tiered escalation policies with configurable timeouts at each level. If the primary on-call team doesn't acknowledge an alert within the time window, it automatically escalates to the next tier. No alert goes unnoticed.

account_tree

Unlimited tiers

Create as many escalation levels as your team needs. Each tier has its own timeout and notification channels.

front_hand

Acknowledgment-based

Escalation stops when someone acknowledges. Acknowledge via Slack, email link, dashboard, or API.

schedule

Configurable timeouts

Set 5-minute, 10-minute, 30-minute, or custom timeout windows between each tier. Fine-tune response expectations.

swap_vert

Severity-based routing

Attach different escalation policies to different severities. Warning alerts notify Slack; critical alerts page on-call immediately.

Maintenance Windows

Planned work.
Zero noise.

Schedule maintenance windows to suppress alerts during planned work. Define the scope, set the duration, and let your team deploy, patch, or reboot without triggering a flood of notifications. Alerts resume automatically when the window closes.

target

Flexible scoping

Scope a window to a single server, a group of servers by tag, specific services, or your entire fleet. Narrow suppression prevents masking unrelated issues.

event_repeat

Recurring schedules

Set up recurring maintenance windows for regular patching cycles. Daily, weekly, or monthly recurrence with configurable day-of-week and time-of-day.

auto_mode

Auto-resume

When the maintenance window closes, alerting resumes instantly with no manual action. No forgotten muted servers, no silent outages after the window.

visibility

Visibility during maintenance

Metrics are still collected and displayed during maintenance. You see the data without the noise. Dashboard badges indicate which servers are in a maintenance window.

Scheduled Maintenance Windows

engineering

Database Cluster Upgrade

Upgrading PostgreSQL from 15.4 to 16.1

Active

Scope

tag:role=database

Started

Mar 21, 02:00 UTC

Ends

Mar 21, 04:00 UTC

~42m remaining

engineering

Weekly Security Patches

Automated patching of all production web servers

Scheduled

Scope

tag:tier=web

Next run

Mar 25, 03:00 UTC

Recurrence

Every Tuesday

check_circle

Load Balancer Migration

Migrated from HAProxy to Caddy

Completed

Scope

All servers

Duration

1h 22m

Alerts suppressed

Alert Timeline — CPU > 90% on prod-03

Alert Triggered

14:32:07 UTC

CPU at 94.7% for 5 minutes. Notifications sent to #ops-alerts, oncall@acme.io.

Cooldown Active

14:32:07 — 14:47:07 UTC

15-minute cooldown period started. Duplicate alerts suppressed.

Suppressed Alert Triggered

14:37:37 UTC

CPU still at 92.1%. Would trigger, but cooldown active. No notification sent.

Suppressed Alert Triggered

14:43:07 UTC

CPU at 91.3%. Still within cooldown window.

Cooldown Expired

14:47:07 UTC

Cooldown period ended. New violations will trigger fresh notifications. 2 duplicate alerts were suppressed.

Cooldown Periods

Alert storms.
Eliminated.

When a metric crosses a threshold, you don't need to be told every 30 seconds. HostAtlas enforces a configurable cooldown period after each alert fires. During cooldown, duplicate notifications are suppressed while the underlying data continues to be tracked.

snooze

Default 15-minute cooldown

Out of the box, alerts have a 15-minute cooldown. Customize per-rule to 5, 10, 15, 30, or 60 minutes depending on the metric's volatility.

filter_alt

Suppression counts visible

The dashboard shows how many alerts were suppressed during each cooldown window. Full transparency into what was muted and why.

data_usage

Data continues flowing

Cooldown only suppresses notifications — metric collection and threshold evaluation continue uninterrupted. Nothing is lost.

restart_alt

Recovery resets cooldown

When a metric returns below threshold and then crosses again, a new alert fires immediately regardless of cooldown state.

Delivery Tracking

Know that every alert was delivered.

Every notification sent by HostAtlas is tracked end-to-end. See sent, delivered, and failed statuses with timestamps. When a webhook fails or an email bounces, detailed logs show exactly what happened so you can fix the issue immediately.

Delivery Log Alert: High CPU on prod-03 — Mar 21, 14:32 UTC

3 Delivered 1 Failed

check_circle

Slack — #ops-alerts

Delivered in 340ms · Message ID: msg_a7b2c9

14:32:08 UTC

check_circle

mail

Email — oncall@acme.io

Accepted by SMTP relay · SES Message ID: 0100018...3ab2

14:32:09 UTC

check_circle

PagerDuty — Production Infra

Incident created · Dedup key: hostatlas-cpu-prod03-20260321

14:32:10 UTC

error

webhook

Webhook — https://hooks.internal.acme.io/alerts

Failed after 3 retries · Last response: 503 Service Unavailable · Timeout: 10s

14:32:42 UTC

analytics

Delivery Analytics

Track delivery success rates per channel over time. Identify unreliable webhooks or email deliverability issues before they cause missed alerts in a real incident.

replay

Automatic Retries

Failed webhook deliveries are retried 3 times with exponential backoff (1s, 5s, 30s). If all retries fail, the alert is flagged and you can manually retry from the delivery log.

receipt_long

Full Request Logs

Every webhook delivery includes the full HTTP request and response — headers, body, status code, and timing. Debug integration issues without guessing what payload was sent.

Never miss an alert again.

Set up your first alert rule in under two minutes. Multi-channel delivery, escalation policies, and maintenance windows included on every plan.

Start free Talk to sales

$ curl -sSL https://install.hostatlas.app/install.sh | sudo bash -s -- --key=SERVER_KEY_

Get notified before it becomes an outage.

Every signal your infrastructure produces.

Metric Thresholds

Heartbeat Checks

SSL Certificate Expiration

Server Offline Events

Service Status Changes

Domain Health Failures

Configuration Changes

Error Spikes

Build alert ruleswithout writing code.

Deliver alerts where your team already works.

Slack

Email

PagerDuty

Webhooks

The right person.Every time.

Planned work.Zero noise.

Alert storms.Eliminated.

Know that every alert was delivered.

Delivery Analytics

Automatic Retries

Full Request Logs

Never miss an alert again.

Get notified before
it becomes an outage.

Build alert rules
without writing code.

The right person.
Every time.

Planned work.
Zero noise.

Alert storms.
Eliminated.