Prometheus ist ein Open-Source-Monitoring-System, das Metriken sammelt und speichert. Es ist der De-facto-Standard für Container- und Kubernetes-Monitoring, funktioniert aber genauso gut für klassische Server.
Architektur
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Server 1 │ │ Server 2 │ │ Server 3 │
│ (Exporter) │ │ (Exporter) │ │ (Exporter) │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└───────────────────┼───────────────────┘
│ Pull (scrape)
▼
┌─────────────┐
│ Prometheus │
│ Server │
└──────┬──────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Grafana │ │Alertmanager│ │ API │
└──────────┘ └──────────┘ └──────────┘Prometheus installieren
Benutzer und Verzeichnisse
# Benutzer erstellen
useradd --no-create-home --shell /bin/false prometheus
# Verzeichnisse erstellen
mkdir /etc/prometheus
mkdir /var/lib/prometheus
chown prometheus:prometheus /etc/prometheus
chown prometheus:prometheus /var/lib/prometheusDownload und Installation
# Neueste Version herunterladen
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.48.0/prometheus-2.48.0.linux-amd64.tar.gz
# Entpacken
tar xvfz prometheus-2.48.0.linux-amd64.tar.gz
cd prometheus-2.48.0.linux-amd64
# Binaries kopieren
cp prometheus /usr/local/bin/
cp promtool /usr/local/bin/
chown prometheus:prometheus /usr/local/bin/prometheus
chown prometheus:prometheus /usr/local/bin/promtool
# Konfigurationsdateien kopieren
cp -r consoles /etc/prometheus
cp -r console_libraries /etc/prometheus
chown -R prometheus:prometheus /etc/prometheus/consoles
chown -R prometheus:prometheus /etc/prometheus/console_librariesGrundkonfiguration
# /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]Systemd-Service
# /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
--web.enable-lifecycle
Restart=always
[Install]
WantedBy=multi-user.targetsystemctl daemon-reload
systemctl enable --now prometheus
systemctl status prometheusWeb-Interface
Öffnen Sie http://server-ip:9090 im Browser.
Node Exporter installieren
Node Exporter exportiert System-Metriken (CPU, RAM, Disk, Netzwerk).
Installation
# Benutzer
useradd --no-create-home --shell /bin/false node_exporter
# Download
cd /tmp
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz
# Installieren
cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
chown node_exporter:node_exporter /usr/local/bin/node_exporterSystemd-Service
# /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
Restart=always
[Install]
WantedBy=multi-user.targetsystemctl daemon-reload
systemctl enable --now node_exporterIn Prometheus einbinden
# /etc/prometheus/prometheus.yml
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "node"
static_configs:
- targets: ["localhost:9100"]# Konfiguration neu laden
curl -X POST http://localhost:9090/-/reload
# Oder
systemctl restart prometheusMehrere Server überwachen
Prometheus-Konfiguration
# /etc/prometheus/prometheus.yml
scrape_configs:
- job_name: "nodes"
static_configs:
- targets:
- "server1.example.com:9100"
- "server2.example.com:9100"
- "server3.example.com:9100"
labels:
env: "production"
- targets:
- "staging1.example.com:9100"
labels:
env: "staging"Mit Datei-basierter Service Discovery
# /etc/prometheus/prometheus.yml
scrape_configs:
- job_name: "nodes"
file_sd_configs:
- files:
- "/etc/prometheus/targets/*.yml"
refresh_interval: 5m# /etc/prometheus/targets/servers.yml
- targets:
- "server1.example.com:9100"
- "server2.example.com:9100"
labels:
env: "production"
dc: "dc1"PromQL - Abfragesprache
Grundlegende Abfragen
# Alle Metriken eines Jobs
up{job="node"}
# CPU-Nutzung (idle)
node_cpu_seconds_total{mode="idle"}
# Freier Speicher
node_memory_MemFree_bytes
# Festplattenbelegung
node_filesystem_avail_bytesFunktionen
# Rate (Änderungsrate pro Sekunde)
rate(node_cpu_seconds_total{mode="idle"}[5m])
# Increase (Zunahme in Zeitraum)
increase(node_network_receive_bytes_total[1h])
# Durchschnitt
avg(node_load1)
# Maximum
max(node_load1)
# Summe über alle Instanzen
sum(rate(node_cpu_seconds_total{mode="idle"}[5m]))CPU-Auslastung berechnen
# CPU-Auslastung in Prozent
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)RAM-Auslastung berechnen
# RAM-Auslastung in Prozent
100 * (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes))Festplatten-Auslastung
# Festplatten-Auslastung in Prozent
100 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100)Netzwerk-Traffic
# Eingehender Traffic pro Sekunde
rate(node_network_receive_bytes_total{device="eth0"}[5m])
# Ausgehender Traffic pro Sekunde
rate(node_network_transmit_bytes_total{device="eth0"}[5m])Alert-Regeln
Regel-Datei erstellen
# /etc/prometheus/rules/alerts.yml
groups:
- name: node_alerts
rules:
- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} ist seit mehr als 5 Minuten nicht erreichbar."
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 10m
labels:
severity: warning
annotations:
summary: "High CPU on {{ $labels.instance }}"
description: "CPU-Auslastung auf {{ $labels.instance }} ist über 80% (aktuell: {{ $value }}%)"
- alert: HighMemoryUsage
expr: 100 * (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) > 85
for: 5m
labels:
severity: warning
annotations:
summary: "High Memory on {{ $labels.instance }}"
description: "RAM-Auslastung auf {{ $labels.instance }} ist über 85%"
- alert: DiskSpaceLow
expr: 100 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100) > 85
for: 5m
labels:
severity: warning
annotations:
summary: "Low disk space on {{ $labels.instance }}"
description: "Festplatte auf {{ $labels.instance }} ist zu über 85% voll"
- alert: DiskSpaceCritical
expr: 100 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100) > 95
for: 1m
labels:
severity: critical
annotations:
summary: "Critical disk space on {{ $labels.instance }}"
description: "Festplatte auf {{ $labels.instance }} ist zu über 95% voll!"In Prometheus einbinden
# /etc/prometheus/prometheus.yml
rule_files:
- "rules/*.yml"# Regeln prüfen
promtool check rules /etc/prometheus/rules/alerts.yml
# Prometheus neu laden
curl -X POST http://localhost:9090/-/reloadAlertmanager einrichten
Installation
# Benutzer
useradd --no-create-home --shell /bin/false alertmanager
# Download
cd /tmp
wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar xvfz alertmanager-0.26.0.linux-amd64.tar.gz
# Installieren
cp alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
cp alertmanager-0.26.0.linux-amd64/amtool /usr/local/bin/
chown alertmanager:alertmanager /usr/local/bin/alertmanager
chown alertmanager:alertmanager /usr/local/bin/amtool
mkdir /etc/alertmanager
mkdir /var/lib/alertmanager
chown alertmanager:alertmanager /etc/alertmanager
chown alertmanager:alertmanager /var/lib/alertmanagerKonfiguration
# /etc/alertmanager/alertmanager.yml
global:
smtp_smarthost: 'smtp.example.com:587'
smtp_from: 'alertmanager@example.com'
smtp_auth_username: 'alertmanager@example.com'
smtp_auth_password: 'passwort'
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: 'email-notifications'
routes:
- match:
severity: critical
receiver: 'email-critical'
repeat_interval: 1h
receivers:
- name: 'email-notifications'
email_configs:
- to: 'admin@example.com'
- name: 'email-critical'
email_configs:
- to: 'admin@example.com'
send_resolved: trueMit Slack
receivers:
- name: 'slack-notifications'
slack_configs:
- api_url: 'https://hooks.slack.com/services/XXX/YYY/ZZZ'
channel: '#alerts'
send_resolved: true
title: '{{ .Status | toUpper }}: {{ .CommonLabels.alertname }}'
text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'Systemd-Service
# /etc/systemd/system/alertmanager.service
[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target
[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
--config.file=/etc/alertmanager/alertmanager.yml \
--storage.path=/var/lib/alertmanager/
Restart=always
[Install]
WantedBy=multi-user.targetsystemctl daemon-reload
systemctl enable --now alertmanagerPrometheus mit Alertmanager verbinden
# /etc/prometheus/prometheus.yml
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093Weitere Exporter
Wichtige Exporter
| Exporter | Port | Funktion | |----------|------|----------| | node_exporter | 9100 | System-Metriken | | nginx_exporter | 9113 | Nginx-Statistiken | | mysqld_exporter | 9104 | MySQL-Metriken | | postgres_exporter | 9187 | PostgreSQL-Metriken | | redis_exporter | 9121 | Redis-Metriken | | blackbox_exporter | 9115 | HTTP/TCP/ICMP Checks |
Blackbox Exporter (HTTP-Checks)
# /etc/prometheus/prometheus.yml
scrape_configs:
- job_name: 'blackbox'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://example.com
- https://api.example.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9115Grafana-Integration
Prometheus als Datenquelle
1. In Grafana: Configuration → Data Sources → Add 2. Prometheus auswählen 3. URL: http://localhost:9090 4. Save & Test
Nützliche Dashboards
- 1860: Node Exporter Full
- 3662: Prometheus 2.0 Overview
- 11074: Node Exporter for Prometheus Dashboard
Retention und Storage
Storage-Konfiguration
# In Service-Datei
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--storage.tsdb.retention.time=30d \
--storage.tsdb.retention.size=10GBSpeicherverbrauch
Faustregel: ~1-2 Bytes pro Sample
- 1000 Metriken × 15s Intervall = 5,76 Millionen Samples/Tag
- Ca. 6-12 MB/Tag für 1000 MetrikenFazit
Prometheus ist ein mächtiges Monitoring-System mit einer flexiblen Abfragesprache. Die Pull-basierte Architektur macht das Setup einfach, und die Integration mit Grafana bietet schöne Dashboards. Beginnen Sie mit Node Exporter für System-Metriken und erweitern Sie mit speziellen Exportern für Ihre Anwendungen. Alertmanager sorgt dafür, dass Sie bei Problemen sofort benachrichtigt werden.