Prometheus und Grafana bilden den De-facto-Standard für modernes, Cloud-natives Monitoring. Prometheus sammelt Metriken, Grafana visualisiert sie in ansprechenden Dashboards.
Architektur
Komponenten
Prometheus - Zeitserien-Datenbank und Scraping
Grafana - Visualisierung und Dashboards
Exporters - Metrik-Exposition
Alertmanager - Alert-Routing und Benachrichtigung
Pushgateway - Für kurzlebige JobsDatenfluss
Targets (Exporter) ← Scrape ← Prometheus → Alertmanager → Notifications
↓
GrafanaPrometheus Installation
Binary-Installation
# Benutzer erstellen
useradd --no-create-home --shell /bin/false prometheus
# Verzeichnisse
mkdir -p /etc/prometheus /var/lib/prometheus
chown prometheus:prometheus /etc/prometheus /var/lib/prometheus
# Download
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.49.1/prometheus-2.49.1.linux-amd64.tar.gz
tar xzf prometheus-2.49.1.linux-amd64.tar.gz
cd prometheus-2.49.1.linux-amd64
# Installieren
cp prometheus promtool /usr/local/bin/
cp -r consoles console_libraries /etc/prometheus/
chown -R prometheus:prometheus /etc/prometheusDocker-Installation
docker run -d \
--name prometheus \
-p 9090:9090 \
-v /etc/prometheus:/etc/prometheus \
-v prometheus-data:/prometheus \
prom/prometheusKonfiguration
# /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
rule_files:
- "alerts/*.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: ['localhost:9100']Systemd Service
# /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
--storage.tsdb.retention.time=15d
[Install]
WantedBy=multi-user.targetsystemctl daemon-reload
systemctl enable --now prometheusNode Exporter
Installation
# Download
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xzf node_exporter-1.7.0.linux-amd64.tar.gz
cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
# Benutzer
useradd --no-create-home --shell /bin/false node_exporterSystemd Service
# /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.targetsystemctl daemon-reload
systemctl enable --now node_exporterPrometheus-Konfiguration ergänzen
# /etc/prometheus/prometheus.yml
scrape_configs:
- job_name: 'node'
static_configs:
- targets:
- 'server1:9100'
- 'server2:9100'
- 'server3:9100'Weitere Exporter
Wichtige Exporter
| Exporter | Port | Verwendung | |----------|------|------------| | node_exporter | 9100 | System-Metriken | | mysqld_exporter | 9104 | MySQL | | postgres_exporter | 9187 | PostgreSQL | | nginx_exporter | 9113 | Nginx | | blackbox_exporter | 9115 | Probes (HTTP, TCP) | | redis_exporter | 9121 | Redis |
MySQL Exporter
# Installieren
wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.15.1/mysqld_exporter-0.15.1.linux-amd64.tar.gz
tar xzf mysqld_exporter-0.15.1.linux-amd64.tar.gz
cp mysqld_exporter-0.15.1.linux-amd64/mysqld_exporter /usr/local/bin/
# MySQL-User erstellen
mysql -u root -p << EOF
CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'password';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost';
FLUSH PRIVILEGES;
EOF
# Credentials
cat > /etc/prometheus/.my.cnf << EOF
[client]
user=exporter
password=password
EOF
chmod 600 /etc/prometheus/.my.cnf
# Service
cat > /etc/systemd/system/mysqld_exporter.service << EOF
[Unit]
Description=MySQL Exporter
After=network.target
[Service]
User=prometheus
Environment="DATA_SOURCE_NAME=exporter:password@(localhost:3306)/"
ExecStart=/usr/local/bin/mysqld_exporter
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable --now mysqld_exporterBlackbox Exporter
# /etc/prometheus/blackbox.yml
modules:
http_2xx:
prober: http
timeout: 5s
http:
valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
valid_status_codes: []
method: GET
follow_redirects: true
tcp_connect:
prober: tcp
timeout: 5s# prometheus.yml - Blackbox Scrape
scrape_configs:
- job_name: 'blackbox'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://example.com
- https://example.org
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9115Grafana Installation
Debian/Ubuntu
# Repository
apt install -y apt-transport-https software-properties-common
wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor -o /usr/share/keyrings/grafana.gpg
echo "deb [signed-by=/usr/share/keyrings/grafana.gpg] https://apt.grafana.com stable main" | tee /etc/apt/sources.list.d/grafana.list
# Installieren
apt update
apt install grafana
# Starten
systemctl enable --now grafana-serverDocker
docker run -d \
--name grafana \
-p 3000:3000 \
-v grafana-data:/var/lib/grafana \
grafana/grafanaErster Login
URL: http://server:3000
Benutzer: admin
Passwort: admin (ändern beim ersten Login)Grafana konfigurieren
Prometheus Data Source
1. Configuration → Data Sources → Add data source
2. Prometheus auswählen
3. URL: http://localhost:9090
4. Save & TestDashboard importieren
1. Dashboards → Import
2. ID eingeben oder JSON hochladen
3. Data Source auswählen
4. Import
Empfohlene Dashboards:
- 1860: Node Exporter Full
- 7362: MySQL Overview
- 9628: PostgreSQL Database
- 12708: NginxEigenes Dashboard
1. Dashboards → New Dashboard
2. Add visualization
3. Query konfigurieren
4. Panel-Optionen anpassen
5. Save dashboardPromQL (Prometheus Query Language)
Grundlegende Abfragen
# Instant Vector
node_cpu_seconds_total
# Mit Label-Filter
node_cpu_seconds_total{mode="idle"}
# Regex-Filter
node_cpu_seconds_total{mode=~"idle|iowait"}
# Negation
node_cpu_seconds_total{mode!="idle"}Funktionen
# Rate (pro Sekunde)
rate(node_cpu_seconds_total[5m])
# Durchschnitt
avg(rate(node_cpu_seconds_total[5m]))
# Nach Label gruppieren
avg by (instance) (rate(node_cpu_seconds_total[5m]))
# Summe
sum(rate(http_requests_total[5m]))
# Top 5
topk(5, sum by (instance) (rate(http_requests_total[5m])))Nützliche Abfragen
# CPU-Nutzung in Prozent
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Speicher-Nutzung
node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes
# Speicher-Nutzung in Prozent
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100
# Disk-Nutzung
1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})
# Netzwerk-Traffic
rate(node_network_receive_bytes_total[5m])Alertmanager
Installation
wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar xzf alertmanager-0.26.0.linux-amd64.tar.gz
cp alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
mkdir -p /etc/alertmanagerKonfiguration
# /etc/alertmanager/alertmanager.yml
global:
smtp_smarthost: 'smtp.example.com:587'
smtp_from: 'alertmanager@example.com'
smtp_auth_username: 'alertmanager@example.com'
smtp_auth_password: 'password'
route:
group_by: ['alertname', 'instance']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: 'email-admin'
routes:
- match:
severity: critical
receiver: 'pagerduty-critical'
- match:
severity: warning
receiver: 'email-admin'
receivers:
- name: 'email-admin'
email_configs:
- to: 'admin@example.com'
send_resolved: true
- name: 'pagerduty-critical'
pagerduty_configs:
- service_key: 'your-service-key'
- name: 'slack'
slack_configs:
- api_url: 'https://hooks.slack.com/services/...'
channel: '#alerts'
send_resolved: trueSystemd Service
# /etc/systemd/system/alertmanager.service
[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Type=simple
ExecStart=/usr/local/bin/alertmanager \
--config.file=/etc/alertmanager/alertmanager.yml \
--storage.path=/var/lib/alertmanager
[Install]
WantedBy=multi-user.targetAlert Rules
Alert-Regeln definieren
# /etc/prometheus/alerts/node.yml
groups:
- name: node
rules:
- alert: HighCpuUsage
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage is above 80% (current: {{ $value }}%)"
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage on {{ $labels.instance }}"
- alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 20
for: 5m
labels:
severity: critical
annotations:
summary: "Low disk space on {{ $labels.instance }}"
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} is down"In Prometheus aktivieren
# /etc/prometheus/prometheus.yml
rule_files:
- "alerts/*.yml"Service Discovery
File-basiert
# /etc/prometheus/prometheus.yml
scrape_configs:
- job_name: 'file_sd'
file_sd_configs:
- files:
- '/etc/prometheus/targets/*.json'
refresh_interval: 5m// /etc/prometheus/targets/webservers.json
[
{
"targets": ["web1:9100", "web2:9100"],
"labels": {
"env": "production",
"team": "web"
}
}
]Kubernetes Service Discovery
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: trueEC2 Discovery
scrape_configs:
- job_name: 'ec2'
ec2_sd_configs:
- region: eu-central-1
access_key: ACCESS_KEY
secret_key: SECRET_KEY
port: 9100Grafana Alerting
Alert Rule in Grafana
1. Panel bearbeiten
2. Alert Tab
3. Create alert rule
4. Conditions definieren
5. Notifications konfigurierenContact Points
Alerting → Contact points → Add contact point
Name: Email Admin
Type: Email
Addresses: admin@example.comNotification Policy
Alerting → Notification policies
Root policy:
Default contact point: Email Admin
Group by: grafana_folder, alertnameZusammenfassung
| Komponente | Port | Funktion | |------------|------|----------| | Prometheus | 9090 | Metriken-Server | | Alertmanager | 9093 | Alert-Routing | | Grafana | 3000 | Visualisierung | | Node Exporter | 9100 | System-Metriken |
| Datei | Beschreibung | |-------|--------------| | /etc/prometheus/prometheus.yml | Prometheus-Config | | /etc/alertmanager/alertmanager.yml | Alert-Config | | /etc/grafana/grafana.ini | Grafana-Config |
| PromQL | Funktion | |--------|----------| | rate() | Rate pro Sekunde | | avg() | Durchschnitt | | sum() | Summe | | topk() | Top N Werte | | by() | Gruppierung |
Fazit
Prometheus und Grafana bilden ein leistungsfähiges Monitoring-Stack für moderne Infrastrukturen. Das Pull-basierte Modell von Prometheus ist ideal für dynamische Umgebungen. Die umfangreiche Exporter-Bibliothek deckt praktisch alle Anwendungsfälle ab. Grafana ermöglicht ansprechende Visualisierungen und flexible Dashboards. Die Kombination ist besonders für Kubernetes und Cloud-native Anwendungen erste Wahl.