Prometheus ist ein Open-Source-Monitoring-System, das Metriken sammelt und speichert. Es ist der De-facto-Standard für Container- und Kubernetes-Monitoring, funktioniert aber genauso gut für klassische Server.

Architektur

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Server 1  │     │   Server 2  │     │   Server 3  │
│  (Exporter) │     │  (Exporter) │     │  (Exporter) │
└──────┬──────┘     └──────┬──────┘     └──────┬──────┘
       │                   │                   │
       └───────────────────┼───────────────────┘
                           │ Pull (scrape)
                           ▼
                    ┌─────────────┐
                    │  Prometheus │
                    │   Server    │
                    └──────┬──────┘
                           │
              ┌────────────┼────────────┐
              ▼            ▼            ▼
       ┌──────────┐ ┌──────────┐ ┌──────────┐
       │ Grafana  │ │Alertmanager│ │  API    │
       └──────────┘ └──────────┘ └──────────┘

Prometheus installieren

Benutzer und Verzeichnisse

# Benutzer erstellen
useradd --no-create-home --shell /bin/false prometheus

# Verzeichnisse erstellen
mkdir /etc/prometheus
mkdir /var/lib/prometheus
chown prometheus:prometheus /etc/prometheus
chown prometheus:prometheus /var/lib/prometheus

Download und Installation

# Neueste Version herunterladen
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.48.0/prometheus-2.48.0.linux-amd64.tar.gz

# Entpacken
tar xvfz prometheus-2.48.0.linux-amd64.tar.gz
cd prometheus-2.48.0.linux-amd64

# Binaries kopieren
cp prometheus /usr/local/bin/
cp promtool /usr/local/bin/
chown prometheus:prometheus /usr/local/bin/prometheus
chown prometheus:prometheus /usr/local/bin/promtool

# Konfigurationsdateien kopieren
cp -r consoles /etc/prometheus
cp -r console_libraries /etc/prometheus
chown -R prometheus:prometheus /etc/prometheus/consoles
chown -R prometheus:prometheus /etc/prometheus/console_libraries

Grundkonfiguration

# /etc/prometheus/prometheus.yml

global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

Systemd-Service

# /etc/systemd/system/prometheus.service

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries \
    --web.enable-lifecycle

Restart=always

[Install]
WantedBy=multi-user.target
systemctl daemon-reload
systemctl enable --now prometheus
systemctl status prometheus

Web-Interface

Öffnen Sie http://server-ip:9090 im Browser.

Node Exporter installieren

Node Exporter exportiert System-Metriken (CPU, RAM, Disk, Netzwerk).

Installation

# Benutzer
useradd --no-create-home --shell /bin/false node_exporter

# Download
cd /tmp
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz

# Installieren
cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
chown node_exporter:node_exporter /usr/local/bin/node_exporter

Systemd-Service

# /etc/systemd/system/node_exporter.service

[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

Restart=always

[Install]
WantedBy=multi-user.target
systemctl daemon-reload
systemctl enable --now node_exporter

In Prometheus einbinden

# /etc/prometheus/prometheus.yml

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "node"
    static_configs:
      - targets: ["localhost:9100"]
# Konfiguration neu laden
curl -X POST http://localhost:9090/-/reload
# Oder
systemctl restart prometheus

Mehrere Server überwachen

Prometheus-Konfiguration

# /etc/prometheus/prometheus.yml

scrape_configs:
  - job_name: "nodes"
    static_configs:
      - targets:
          - "server1.example.com:9100"
          - "server2.example.com:9100"
          - "server3.example.com:9100"
        labels:
          env: "production"

      - targets:
          - "staging1.example.com:9100"
        labels:
          env: "staging"

Mit Datei-basierter Service Discovery

# /etc/prometheus/prometheus.yml

scrape_configs:
  - job_name: "nodes"
    file_sd_configs:
      - files:
          - "/etc/prometheus/targets/*.yml"
        refresh_interval: 5m
# /etc/prometheus/targets/servers.yml

- targets:
    - "server1.example.com:9100"
    - "server2.example.com:9100"
  labels:
    env: "production"
    dc: "dc1"

PromQL - Abfragesprache

Grundlegende Abfragen

# Alle Metriken eines Jobs
up{job="node"}

# CPU-Nutzung (idle)
node_cpu_seconds_total{mode="idle"}

# Freier Speicher
node_memory_MemFree_bytes

# Festplattenbelegung
node_filesystem_avail_bytes

Funktionen

# Rate (Änderungsrate pro Sekunde)
rate(node_cpu_seconds_total{mode="idle"}[5m])

# Increase (Zunahme in Zeitraum)
increase(node_network_receive_bytes_total[1h])

# Durchschnitt
avg(node_load1)

# Maximum
max(node_load1)

# Summe über alle Instanzen
sum(rate(node_cpu_seconds_total{mode="idle"}[5m]))

CPU-Auslastung berechnen

# CPU-Auslastung in Prozent
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

RAM-Auslastung berechnen

# RAM-Auslastung in Prozent
100 * (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes))

Festplatten-Auslastung

# Festplatten-Auslastung in Prozent
100 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100)

Netzwerk-Traffic

# Eingehender Traffic pro Sekunde
rate(node_network_receive_bytes_total{device="eth0"}[5m])

# Ausgehender Traffic pro Sekunde
rate(node_network_transmit_bytes_total{device="eth0"}[5m])

Alert-Regeln

Regel-Datei erstellen

# /etc/prometheus/rules/alerts.yml

groups:
  - name: node_alerts
    rules:
      - alert: InstanceDown
        expr: up == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} down"
          description: "{{ $labels.instance }} ist seit mehr als 5 Minuten nicht erreichbar."

      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High CPU on {{ $labels.instance }}"
          description: "CPU-Auslastung auf {{ $labels.instance }} ist über 80% (aktuell: {{ $value }}%)"

      - alert: HighMemoryUsage
        expr: 100 * (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High Memory on {{ $labels.instance }}"
          description: "RAM-Auslastung auf {{ $labels.instance }} ist über 85%"

      - alert: DiskSpaceLow
        expr: 100 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100) > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Low disk space on {{ $labels.instance }}"
          description: "Festplatte auf {{ $labels.instance }} ist zu über 85% voll"

      - alert: DiskSpaceCritical
        expr: 100 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100) > 95
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Critical disk space on {{ $labels.instance }}"
          description: "Festplatte auf {{ $labels.instance }} ist zu über 95% voll!"

In Prometheus einbinden

# /etc/prometheus/prometheus.yml

rule_files:
  - "rules/*.yml"
# Regeln prüfen
promtool check rules /etc/prometheus/rules/alerts.yml

# Prometheus neu laden
curl -X POST http://localhost:9090/-/reload

Alertmanager einrichten

Installation

# Benutzer
useradd --no-create-home --shell /bin/false alertmanager

# Download
cd /tmp
wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar xvfz alertmanager-0.26.0.linux-amd64.tar.gz

# Installieren
cp alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
cp alertmanager-0.26.0.linux-amd64/amtool /usr/local/bin/
chown alertmanager:alertmanager /usr/local/bin/alertmanager
chown alertmanager:alertmanager /usr/local/bin/amtool

mkdir /etc/alertmanager
mkdir /var/lib/alertmanager
chown alertmanager:alertmanager /etc/alertmanager
chown alertmanager:alertmanager /var/lib/alertmanager

Konfiguration

# /etc/alertmanager/alertmanager.yml

global:
  smtp_smarthost: 'smtp.example.com:587'
  smtp_from: 'alertmanager@example.com'
  smtp_auth_username: 'alertmanager@example.com'
  smtp_auth_password: 'passwort'

route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'email-notifications'

  routes:
    - match:
        severity: critical
      receiver: 'email-critical'
      repeat_interval: 1h

receivers:
  - name: 'email-notifications'
    email_configs:
      - to: 'admin@example.com'

  - name: 'email-critical'
    email_configs:
      - to: 'admin@example.com'
        send_resolved: true

Mit Slack

receivers:
  - name: 'slack-notifications'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/XXX/YYY/ZZZ'
        channel: '#alerts'
        send_resolved: true
        title: '{{ .Status | toUpper }}: {{ .CommonLabels.alertname }}'
        text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'

Systemd-Service

# /etc/systemd/system/alertmanager.service

[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
    --config.file=/etc/alertmanager/alertmanager.yml \
    --storage.path=/var/lib/alertmanager/

Restart=always

[Install]
WantedBy=multi-user.target
systemctl daemon-reload
systemctl enable --now alertmanager

Prometheus mit Alertmanager verbinden

# /etc/prometheus/prometheus.yml

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - localhost:9093

Weitere Exporter

Wichtige Exporter

| Exporter | Port | Funktion | |----------|------|----------| | node_exporter | 9100 | System-Metriken | | nginx_exporter | 9113 | Nginx-Statistiken | | mysqld_exporter | 9104 | MySQL-Metriken | | postgres_exporter | 9187 | PostgreSQL-Metriken | | redis_exporter | 9121 | Redis-Metriken | | blackbox_exporter | 9115 | HTTP/TCP/ICMP Checks |

Blackbox Exporter (HTTP-Checks)

# /etc/prometheus/prometheus.yml

scrape_configs:
  - job_name: 'blackbox'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
          - https://example.com
          - https://api.example.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

Grafana-Integration

Prometheus als Datenquelle

1. In Grafana: ConfigurationData SourcesAdd 2. Prometheus auswählen 3. URL: http://localhost:9090 4. Save & Test

Nützliche Dashboards

  • 1860: Node Exporter Full
  • 3662: Prometheus 2.0 Overview
  • 11074: Node Exporter for Prometheus Dashboard

Retention und Storage

Storage-Konfiguration

# In Service-Datei
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --storage.tsdb.retention.time=30d \
    --storage.tsdb.retention.size=10GB

Speicherverbrauch

Faustregel: ~1-2 Bytes pro Sample
- 1000 Metriken × 15s Intervall = 5,76 Millionen Samples/Tag
- Ca. 6-12 MB/Tag für 1000 Metriken

Fazit

Prometheus ist ein mächtiges Monitoring-System mit einer flexiblen Abfragesprache. Die Pull-basierte Architektur macht das Setup einfach, und die Integration mit Grafana bietet schöne Dashboards. Beginnen Sie mit Node Exporter für System-Metriken und erweitern Sie mit speziellen Exportern für Ihre Anwendungen. Alertmanager sorgt dafür, dass Sie bei Problemen sofort benachrichtigt werden.