在Docker容器容器出现错误或CarshLoopBackOff kubernetes时发出警报

我在AWS上安装了kubernetes集群,试图使用cAdvisor Prometheus Alert Manager监视多个Pod.如果容器/吊舱掉落或卡在Error或CarshLoopBackOff状态或stcuk除运行之外的任何其他状态下,我要执行的操作将启动电子邮件警报(带有服务/容器名称).
最佳答案
Prometheus收集a wide range of metrics.例如,您可以使用指标kube_pod_container_status_restarts_total来监视重新启动,这将反映您的问题.

它包含可以在警报中使用的标签:

>容器=容器名称
>名称空间= pod-命名空间
> pod =荚名称

因此,您需要做的就是通过添加正确的SMTP设置,接收者和类似规则来配置alertmanager.yaml config

global:
  # The smarthost and SMTP sender used for mail notifications.
  smtp_smarthost: 'localhost:25'
  smtp_from: 'alertmanager@example.org'
  smtp_auth_username: 'alertmanager'
  smtp_auth_password: 'password'

receivers:
- name: 'team-X-mails'
  email_configs:
  - to: 'team-X+alerts@example.org'

# Only one default receiver
route:
  receiver: team-X-mails

# Example group with one alert
groups:
- name: example-alert
  rules:
    # Alert about restarts
  - alert: RestartAlerts
    expr: count(kube_pod_container_status_restarts_total) by (pod-name) > 5
    for: 10m
    annotations:
      summary: "More than 5 restarts in pod {{ $labels.pod-name }}"
      description: "{{ $labels.container-name }} restarted (current value: {{ $value }}s) times in pod {{ $labels.pod-namespace }}/{{ $labels.pod-name }}"

转载注明原文:在Docker容器容器出现错误或CarshLoopBackOff kubernetes时发出警报 - 代码日志