Prometheus Grundlagen - Monitoring-System einrichten | Blog

Prometheus ist ein Open-Source-Monitoring-System, das Metriken sammelt und speichert. Es ist der De-facto-Standard für Container- und Kubernetes-Monitoring, funktioniert aber genauso gut für klassische Server.

Architektur

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Server 1  │     │   Server 2  │     │   Server 3  │
│  (Exporter) │     │  (Exporter) │     │  (Exporter) │
└──────┬──────┘     └──────┬──────┘     └──────┬──────┘
       │                   │                   │
       └───────────────────┼───────────────────┘
                           │ Pull (scrape)
                           ▼
                    ┌─────────────┐
                    │  Prometheus │
                    │   Server    │
                    └──────┬──────┘
                           │
              ┌────────────┼────────────┐
              ▼            ▼            ▼
       ┌──────────┐ ┌──────────┐ ┌──────────┐
       │ Grafana  │ │Alertmanager│ │  API    │
       └──────────┘ └──────────┘ └──────────┘

Prometheus installieren

Benutzer und Verzeichnisse

# Benutzer erstellen
useradd --no-create-home --shell /bin/false prometheus

# Verzeichnisse erstellen
mkdir /etc/prometheus
mkdir /var/lib/prometheus
chown prometheus:prometheus /etc/prometheus
chown prometheus:prometheus /var/lib/prometheus

Download und Installation

# Neueste Version herunterladen
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.48.0/prometheus-2.48.0.linux-amd64.tar.gz

# Entpacken
tar xvfz prometheus-2.48.0.linux-amd64.tar.gz
cd prometheus-2.48.0.linux-amd64

# Binaries kopieren
cp prometheus /usr/local/bin/
cp promtool /usr/local/bin/
chown prometheus:prometheus /usr/local/bin/prometheus
chown prometheus:prometheus /usr/local/bin/promtool

# Konfigurationsdateien kopieren
cp -r consoles /etc/prometheus
cp -r console_libraries /etc/prometheus
chown -R prometheus:prometheus /etc/prometheus/consoles
chown -R prometheus:prometheus /etc/prometheus/console_libraries

Grundkonfiguration

# /etc/prometheus/prometheus.yml

global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

Systemd-Service

# /etc/systemd/system/prometheus.service

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries \
    --web.enable-lifecycle

Restart=always

[Install]
WantedBy=multi-user.target

systemctl daemon-reload
systemctl enable --now prometheus
systemctl status prometheus

Web-Interface

Öffnen Sie http://server-ip:9090 im Browser.

Node Exporter installieren

Node Exporter exportiert System-Metriken (CPU, RAM, Disk, Netzwerk).

Installation

# Benutzer
useradd --no-create-home --shell /bin/false node_exporter

# Download
cd /tmp
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz

# Installieren
cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
chown node_exporter:node_exporter /usr/local/bin/node_exporter

Systemd-Service

# /etc/systemd/system/node_exporter.service

[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

Restart=always

[Install]
WantedBy=multi-user.target

systemctl daemon-reload
systemctl enable --now node_exporter

In Prometheus einbinden

# /etc/prometheus/prometheus.yml

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "node"
    static_configs:
      - targets: ["localhost:9100"]

# Konfiguration neu laden
curl -X POST http://localhost:9090/-/reload
# Oder
systemctl restart prometheus

Mehrere Server überwachen

Prometheus-Konfiguration

# /etc/prometheus/prometheus.yml

scrape_configs:
  - job_name: "nodes"
    static_configs:
      - targets:
          - "server1.example.com:9100"
          - "server2.example.com:9100"
          - "server3.example.com:9100"
        labels:
          env: "production"

      - targets:
          - "staging1.example.com:9100"
        labels:
          env: "staging"

Mit Datei-basierter Service Discovery

# /etc/prometheus/prometheus.yml

scrape_configs:
  - job_name: "nodes"
    file_sd_configs:
      - files:
          - "/etc/prometheus/targets/*.yml"
        refresh_interval: 5m

# /etc/prometheus/targets/servers.yml

- targets:
    - "server1.example.com:9100"
    - "server2.example.com:9100"
  labels:
    env: "production"
    dc: "dc1"

PromQL - Abfragesprache

Grundlegende Abfragen

# Alle Metriken eines Jobs
up{job="node"}

# CPU-Nutzung (idle)
node_cpu_seconds_total{mode="idle"}

# Freier Speicher
node_memory_MemFree_bytes

# Festplattenbelegung
node_filesystem_avail_bytes

Funktionen

# Rate (Änderungsrate pro Sekunde)
rate(node_cpu_seconds_total{mode="idle"}[5m])

# Increase (Zunahme in Zeitraum)
increase(node_network_receive_bytes_total[1h])

# Durchschnitt
avg(node_load1)

# Maximum
max(node_load1)

# Summe über alle Instanzen
sum(rate(node_cpu_seconds_total{mode="idle"}[5m]))

CPU-Auslastung berechnen

# CPU-Auslastung in Prozent
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

RAM-Auslastung berechnen

# RAM-Auslastung in Prozent
100 * (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes))

Festplatten-Auslastung

# Festplatten-Auslastung in Prozent
100 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100)

Netzwerk-Traffic

# Eingehender Traffic pro Sekunde
rate(node_network_receive_bytes_total{device="eth0"}[5m])

# Ausgehender Traffic pro Sekunde
rate(node_network_transmit_bytes_total{device="eth0"}[5m])

Alert-Regeln

Regel-Datei erstellen

# /etc/prometheus/rules/alerts.yml

groups:
  - name: node_alerts
    rules:
      - alert: InstanceDown
        expr: up == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} down"
          description: "{{ $labels.instance }} ist seit mehr als 5 Minuten nicht erreichbar."

      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High CPU on {{ $labels.instance }}"
          description: "CPU-Auslastung auf {{ $labels.instance }} ist über 80% (aktuell: {{ $value }}%)"

      - alert: HighMemoryUsage
        expr: 100 * (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High Memory on {{ $labels.instance }}"
          description: "RAM-Auslastung auf {{ $labels.instance }} ist über 85%"

      - alert: DiskSpaceLow
        expr: 100 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100) > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Low disk space on {{ $labels.instance }}"
          description: "Festplatte auf {{ $labels.instance }} ist zu über 85% voll"

      - alert: DiskSpaceCritical
        expr: 100 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100) > 95
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Critical disk space on {{ $labels.instance }}"
          description: "Festplatte auf {{ $labels.instance }} ist zu über 95% voll!"

In Prometheus einbinden

# /etc/prometheus/prometheus.yml

rule_files:
  - "rules/*.yml"

# Regeln prüfen
promtool check rules /etc/prometheus/rules/alerts.yml

# Prometheus neu laden
curl -X POST http://localhost:9090/-/reload

Alertmanager einrichten

Installation

# Benutzer
useradd --no-create-home --shell /bin/false alertmanager

# Download
cd /tmp
wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar xvfz alertmanager-0.26.0.linux-amd64.tar.gz

# Installieren
cp alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
cp alertmanager-0.26.0.linux-amd64/amtool /usr/local/bin/
chown alertmanager:alertmanager /usr/local/bin/alertmanager
chown alertmanager:alertmanager /usr/local/bin/amtool

mkdir /etc/alertmanager
mkdir /var/lib/alertmanager
chown alertmanager:alertmanager /etc/alertmanager
chown alertmanager:alertmanager /var/lib/alertmanager

Konfiguration

# /etc/alertmanager/alertmanager.yml

global:
  smtp_smarthost: 'smtp.example.com:587'
  smtp_from: 'alertmanager@example.com'
  smtp_auth_username: 'alertmanager@example.com'
  smtp_auth_password: 'passwort'

route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'email-notifications'

  routes:
    - match:
        severity: critical
      receiver: 'email-critical'
      repeat_interval: 1h

receivers:
  - name: 'email-notifications'
    email_configs:
      - to: 'admin@example.com'

  - name: 'email-critical'
    email_configs:
      - to: 'admin@example.com'
        send_resolved: true

Mit Slack

receivers:
  - name: 'slack-notifications'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/XXX/YYY/ZZZ'
        channel: '#alerts'
        send_resolved: true
        title: '{{ .Status | toUpper }}: {{ .CommonLabels.alertname }}'
        text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'

Systemd-Service

# /etc/systemd/system/alertmanager.service

[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
    --config.file=/etc/alertmanager/alertmanager.yml \
    --storage.path=/var/lib/alertmanager/

Restart=always

[Install]
WantedBy=multi-user.target

systemctl daemon-reload
systemctl enable --now alertmanager

Prometheus mit Alertmanager verbinden

# /etc/prometheus/prometheus.yml

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - localhost:9093

Weitere Exporter

Wichtige Exporter

Exporter	Port	Funktion
node_exporter	9100	System-Metriken
nginx_exporter	9113	Nginx-Statistiken
mysqld_exporter	9104	MySQL-Metriken
postgres_exporter	9187	PostgreSQL-Metriken
redis_exporter	9121	Redis-Metriken
blackbox_exporter	9115	HTTP/TCP/ICMP Checks

Blackbox Exporter (HTTP-Checks)

# /etc/prometheus/prometheus.yml

scrape_configs:
  - job_name: 'blackbox'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
          - https://example.com
          - https://api.example.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

Grafana-Integration

Prometheus als Datenquelle

1. In Grafana: Configuration → Data Sources → Add 2. Prometheus auswählen 3. URL: http://localhost:9090 4. Save & Test

Nützliche Dashboards

1860: Node Exporter Full
3662: Prometheus 2.0 Overview
11074: Node Exporter for Prometheus Dashboard

Retention und Storage

Storage-Konfiguration

# In Service-Datei
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/ \
    --storage.tsdb.retention.time=30d \
    --storage.tsdb.retention.size=10GB

Speicherverbrauch

Faustregel: ~1-2 Bytes pro Sample
- 1000 Metriken × 15s Intervall = 5,76 Millionen Samples/Tag
- Ca. 6-12 MB/Tag für 1000 Metriken

Fazit

Prometheus ist ein mächtiges Monitoring-System mit einer flexiblen Abfragesprache. Die Pull-basierte Architektur macht das Setup einfach, und die Integration mit Grafana bietet schöne Dashboards. Beginnen Sie mit Node Exporter für System-Metriken und erweitern Sie mit speziellen Exportern für Ihre Anwendungen. Alertmanager sorgt dafür, dass Sie bei Problemen sofort benachrichtigt werden.

Architektur

Prometheus installieren

Benutzer und Verzeichnisse

Download und Installation

Grundkonfiguration

Systemd-Service

Web-Interface

Node Exporter installieren

Installation

Systemd-Service

In Prometheus einbinden

Mehrere Server überwachen

Prometheus-Konfiguration

Mit Datei-basierter Service Discovery

PromQL - Abfragesprache

Grundlegende Abfragen

Funktionen

CPU-Auslastung berechnen

RAM-Auslastung berechnen

Festplatten-Auslastung

Netzwerk-Traffic

Alert-Regeln

Regel-Datei erstellen

In Prometheus einbinden

Alertmanager einrichten

Installation

Konfiguration

Mit Slack

Systemd-Service

Prometheus mit Alertmanager verbinden

Weitere Exporter

Wichtige Exporter

Blackbox Exporter (HTTP-Checks)

Grafana-Integration

Prometheus als Datenquelle

Nützliche Dashboards

Retention und Storage

Storage-Konfiguration

Speicherverbrauch

Fazit

Das könnte dich auch interessieren

Nagios Monitoring - Server und Services überwachen

Ansible-Grundlagen - Server-Konfiguration automatisieren

atop - Erweitertes System-Monitoring