Prometheus und Grafana bilden den De-facto-Standard für modernes, Cloud-natives Monitoring. Prometheus sammelt Metriken, Grafana visualisiert sie in ansprechenden Dashboards.

Architektur

Komponenten

Prometheus       - Zeitserien-Datenbank und Scraping
Grafana          - Visualisierung und Dashboards
Exporters        - Metrik-Exposition
Alertmanager     - Alert-Routing und Benachrichtigung
Pushgateway      - Für kurzlebige Jobs

Datenfluss

Targets (Exporter) ← Scrape ← Prometheus → Alertmanager → Notifications
                                  ↓
                              Grafana

Prometheus Installation

Binary-Installation

# Benutzer erstellen
useradd --no-create-home --shell /bin/false prometheus

# Verzeichnisse
mkdir -p /etc/prometheus /var/lib/prometheus
chown prometheus:prometheus /etc/prometheus /var/lib/prometheus

# Download
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.49.1/prometheus-2.49.1.linux-amd64.tar.gz
tar xzf prometheus-2.49.1.linux-amd64.tar.gz
cd prometheus-2.49.1.linux-amd64

# Installieren
cp prometheus promtool /usr/local/bin/
cp -r consoles console_libraries /etc/prometheus/
chown -R prometheus:prometheus /etc/prometheus

Docker-Installation

docker run -d \
  --name prometheus \
  -p 9090:9090 \
  -v /etc/prometheus:/etc/prometheus \
  -v prometheus-data:/prometheus \
  prom/prometheus

Konfiguration

# /etc/prometheus/prometheus.yml

global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

rule_files:
  - "alerts/*.yml"

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']

Systemd Service

# /etc/systemd/system/prometheus.service

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file=/etc/prometheus/prometheus.yml \
    --storage.tsdb.path=/var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries \
    --storage.tsdb.retention.time=15d

[Install]
WantedBy=multi-user.target
systemctl daemon-reload
systemctl enable --now prometheus

Node Exporter

Installation

# Download
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xzf node_exporter-1.7.0.linux-amd64.tar.gz
cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/

# Benutzer
useradd --no-create-home --shell /bin/false node_exporter

Systemd Service

# /etc/systemd/system/node_exporter.service

[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target
systemctl daemon-reload
systemctl enable --now node_exporter

Prometheus-Konfiguration ergänzen

# /etc/prometheus/prometheus.yml

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets:
        - 'server1:9100'
        - 'server2:9100'
        - 'server3:9100'

Weitere Exporter

Wichtige Exporter

| Exporter | Port | Verwendung | |----------|------|------------| | node_exporter | 9100 | System-Metriken | | mysqld_exporter | 9104 | MySQL | | postgres_exporter | 9187 | PostgreSQL | | nginx_exporter | 9113 | Nginx | | blackbox_exporter | 9115 | Probes (HTTP, TCP) | | redis_exporter | 9121 | Redis |

MySQL Exporter

# Installieren
wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.15.1/mysqld_exporter-0.15.1.linux-amd64.tar.gz
tar xzf mysqld_exporter-0.15.1.linux-amd64.tar.gz
cp mysqld_exporter-0.15.1.linux-amd64/mysqld_exporter /usr/local/bin/

# MySQL-User erstellen
mysql -u root -p << EOF
CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'password';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost';
FLUSH PRIVILEGES;
EOF

# Credentials
cat > /etc/prometheus/.my.cnf << EOF
[client]
user=exporter
password=password
EOF
chmod 600 /etc/prometheus/.my.cnf

# Service
cat > /etc/systemd/system/mysqld_exporter.service << EOF
[Unit]
Description=MySQL Exporter
After=network.target

[Service]
User=prometheus
Environment="DATA_SOURCE_NAME=exporter:password@(localhost:3306)/"
ExecStart=/usr/local/bin/mysqld_exporter

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable --now mysqld_exporter

Blackbox Exporter

# /etc/prometheus/blackbox.yml

modules:
  http_2xx:
    prober: http
    timeout: 5s
    http:
      valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
      valid_status_codes: []
      method: GET
      follow_redirects: true

  tcp_connect:
    prober: tcp
    timeout: 5s
# prometheus.yml - Blackbox Scrape
scrape_configs:
  - job_name: 'blackbox'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
        - https://example.com
        - https://example.org
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

Grafana Installation

Debian/Ubuntu

# Repository
apt install -y apt-transport-https software-properties-common
wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor -o /usr/share/keyrings/grafana.gpg
echo "deb [signed-by=/usr/share/keyrings/grafana.gpg] https://apt.grafana.com stable main" | tee /etc/apt/sources.list.d/grafana.list

# Installieren
apt update
apt install grafana

# Starten
systemctl enable --now grafana-server

Docker

docker run -d \
  --name grafana \
  -p 3000:3000 \
  -v grafana-data:/var/lib/grafana \
  grafana/grafana

Erster Login

URL:      http://server:3000
Benutzer: admin
Passwort: admin (ändern beim ersten Login)

Grafana konfigurieren

Prometheus Data Source

1. Configuration → Data Sources → Add data source
2. Prometheus auswählen
3. URL: http://localhost:9090
4. Save & Test

Dashboard importieren

1. Dashboards → Import
2. ID eingeben oder JSON hochladen
3. Data Source auswählen
4. Import

Empfohlene Dashboards:
- 1860: Node Exporter Full
- 7362: MySQL Overview
- 9628: PostgreSQL Database
- 12708: Nginx

Eigenes Dashboard

1. Dashboards → New Dashboard
2. Add visualization
3. Query konfigurieren
4. Panel-Optionen anpassen
5. Save dashboard

PromQL (Prometheus Query Language)

Grundlegende Abfragen

# Instant Vector
node_cpu_seconds_total

# Mit Label-Filter
node_cpu_seconds_total{mode="idle"}

# Regex-Filter
node_cpu_seconds_total{mode=~"idle|iowait"}

# Negation
node_cpu_seconds_total{mode!="idle"}

Funktionen

# Rate (pro Sekunde)
rate(node_cpu_seconds_total[5m])

# Durchschnitt
avg(rate(node_cpu_seconds_total[5m]))

# Nach Label gruppieren
avg by (instance) (rate(node_cpu_seconds_total[5m]))

# Summe
sum(rate(http_requests_total[5m]))

# Top 5
topk(5, sum by (instance) (rate(http_requests_total[5m])))

Nützliche Abfragen

# CPU-Nutzung in Prozent
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Speicher-Nutzung
node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes

# Speicher-Nutzung in Prozent
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

# Disk-Nutzung
1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})

# Netzwerk-Traffic
rate(node_network_receive_bytes_total[5m])

Alertmanager

Installation

wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar xzf alertmanager-0.26.0.linux-amd64.tar.gz
cp alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
mkdir -p /etc/alertmanager

Konfiguration

# /etc/alertmanager/alertmanager.yml

global:
  smtp_smarthost: 'smtp.example.com:587'
  smtp_from: 'alertmanager@example.com'
  smtp_auth_username: 'alertmanager@example.com'
  smtp_auth_password: 'password'

route:
  group_by: ['alertname', 'instance']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'email-admin'

  routes:
    - match:
        severity: critical
      receiver: 'pagerduty-critical'
    - match:
        severity: warning
      receiver: 'email-admin'

receivers:
  - name: 'email-admin'
    email_configs:
      - to: 'admin@example.com'
        send_resolved: true

  - name: 'pagerduty-critical'
    pagerduty_configs:
      - service_key: 'your-service-key'

  - name: 'slack'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/...'
        channel: '#alerts'
        send_resolved: true

Systemd Service

# /etc/systemd/system/alertmanager.service

[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Type=simple
ExecStart=/usr/local/bin/alertmanager \
    --config.file=/etc/alertmanager/alertmanager.yml \
    --storage.path=/var/lib/alertmanager

[Install]
WantedBy=multi-user.target

Alert Rules

Alert-Regeln definieren

# /etc/prometheus/alerts/node.yml

groups:
  - name: node
    rules:
      - alert: HighCpuUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage is above 80% (current: {{ $value }}%)"

      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage on {{ $labels.instance }}"

      - alert: DiskSpaceLow
        expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 20
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Low disk space on {{ $labels.instance }}"

      - alert: InstanceDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} is down"

In Prometheus aktivieren

# /etc/prometheus/prometheus.yml

rule_files:
  - "alerts/*.yml"

Service Discovery

File-basiert

# /etc/prometheus/prometheus.yml

scrape_configs:
  - job_name: 'file_sd'
    file_sd_configs:
      - files:
        - '/etc/prometheus/targets/*.json'
        refresh_interval: 5m
// /etc/prometheus/targets/webservers.json
[
  {
    "targets": ["web1:9100", "web2:9100"],
    "labels": {
      "env": "production",
      "team": "web"
    }
  }
]

Kubernetes Service Discovery

scrape_configs:
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true

EC2 Discovery

scrape_configs:
  - job_name: 'ec2'
    ec2_sd_configs:
      - region: eu-central-1
        access_key: ACCESS_KEY
        secret_key: SECRET_KEY
        port: 9100

Grafana Alerting

Alert Rule in Grafana

1. Panel bearbeiten
2. Alert Tab
3. Create alert rule
4. Conditions definieren
5. Notifications konfigurieren

Contact Points

Alerting → Contact points → Add contact point

Name:         Email Admin
Type:         Email
Addresses:    admin@example.com

Notification Policy

Alerting → Notification policies

Root policy:
  Default contact point: Email Admin
  Group by: grafana_folder, alertname

Zusammenfassung

| Komponente | Port | Funktion | |------------|------|----------| | Prometheus | 9090 | Metriken-Server | | Alertmanager | 9093 | Alert-Routing | | Grafana | 3000 | Visualisierung | | Node Exporter | 9100 | System-Metriken |

| Datei | Beschreibung | |-------|--------------| | /etc/prometheus/prometheus.yml | Prometheus-Config | | /etc/alertmanager/alertmanager.yml | Alert-Config | | /etc/grafana/grafana.ini | Grafana-Config |

| PromQL | Funktion | |--------|----------| | rate() | Rate pro Sekunde | | avg() | Durchschnitt | | sum() | Summe | | topk() | Top N Werte | | by() | Gruppierung |

Fazit

Prometheus und Grafana bilden ein leistungsfähiges Monitoring-Stack für moderne Infrastrukturen. Das Pull-basierte Modell von Prometheus ist ideal für dynamische Umgebungen. Die umfangreiche Exporter-Bibliothek deckt praktisch alle Anwendungsfälle ab. Grafana ermöglicht ansprechende Visualisierungen und flexible Dashboards. Die Kombination ist besonders für Kubernetes und Cloud-native Anwendungen erste Wahl.