首页 雷火电竞正文

十面埋伏,根据prometheus+grafana+alertmanager监控系统配置钉钉告警-雷火电竞

admin 雷火电竞 2019-12-02 263 0

概述

因为现在作业根本都是用钉钉工作,所以今日首要介绍一下怎样在prometheus装备钉钉告警,这儿的条件是现已布置了alertmanager。


一、装备go

因为Prometheus 是用golang开发的,所以首要装置一个go环境,Go言语是跨渠道,支撑Windows、Linux、Mac OS X等体系,还提供有源码,可编译装置。

下载地址:https://studygolang.com/dl

1、解压

# tar -xvf go1.13.linux-amd64.tar.gz -C /usr/local/

2、装备环境变量

echo "export PATH=$PATH:/usr/local/go/bin" >> /etc/profile
source /etc/profile

3、测验

验证一下是否成功,用go version 来验证

# go version


二、装备钉钉机器人

1、机器人办理

2、挑选Webhook

3、挑选群组

4、检查机器人设置


二、将钉钉接入 Prometheus AlertManager WebHook

插件下载地址:https://github.com/timonwong/prometheus-webhook-dingtalk

1、装置Webhook

--源码编译(注意在golang的src目录下新建)
mkdir -p /usr/local/go/src/github.com/timonwong/
cd /usr/local/go/src/github.com/timonwong/
git clone https://github.com/timonwong/prometheus-webhook-dingtalk.git
cd prometheus-webhook-dingtalk
make
--二进制包装置
wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v0.3.0/prometheus-webhook-dingtalk-0.3.0
.linux-amd64.tar.gz

2、解压

# tar -xvf prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz

装置后会生成prometheus-webhook-dingtalk发送钉钉告警模版文件:

/usr/local/dingtalk/prometheus-webhook-dingtalk-0.3.0.linux-amd64/default.tmpl

3、发动prometheus-webhook-dingtalk

nohup ./prometheus-webhook-dingtalk --ding.profile="ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=de544211xxxx96f" >dingding.log 2>&1 &

5、装备体系服务

# vim /etc/systemd/system/prometheus-webhook-dingtalk.service
[Unit]
Description=prometheus-webhook-dingtalk
After=network-online.target

[Service]
Restart=on-failure
ExecStart=/usr/local/dingtalk/prometheus-webhook-dingtalk-0.3.0.linux-amd64/prometheus-webhook-dingtalk --ding.profile=sre=https://oapi.dingtalk.com/robot/send?access_token=de544xxx8ebc04e8da096f

[Install]
WantedBy=multi-user.target

# chmod u+x /etc/systemd/system/prometheus-webhook-dingtalk.service
# systemctl daemon-reload
# systemctl start prometheus-webhook-dingtalk
# systemctl status prometheus-webhook-dingtalk


三、装备 alertmanager 的邮件发送方和对接钉钉 webhook

/usr/local/alertmanager/alertmanager.yml

global:
resolve_timeout: 5m
# 装备邮件发送方信息
smtp_smarthost: 'smtp.qq.com:465'
smtp_from: '1275758000@qq.com'
smtp_auth_username: '1275758000@qq.com'
smtp_auth_password: 'nxxxegb'
smtp_require_tls: false
route:
group_by: ['alertname', 'cluster', 'service']
receiver: default-receiver
group_wait: 30s
group_interval: 2m
repeat_interval: 30m
receivers:
- name: 'default-receiver'
email_configs:
- to: '1430985018@qq.com,644642050@qq.com'
# 装备衔接 prometheus-webhook-dingtalk发动的服务
webhook_configs:
#ops_dingding是前面发动webhook所界说的值
- url: 'http://localhost:8060/dingtalk/sre/send'
send_resolved: true

repeat_interval: 这个字段是发送的频率,能够依据自己的需求进行设置,在调试过程中能够设置略微短一点

检查状况:


四、prometheus装备(参阅)

装备文件rules.yml:

groups:
- name: host_monitoring
rules:
- alert: 内存报警
expr: netdata_system_ram_MiB_average{chart="system.ram",dimension="free",family="ram"} < 800
for: 2m
labels:
team: node
annotations:
Alert_type: 内存报警
Server: '{{$labels.instance}}'
#summary: "{{$labels.instance}}: High Memory usage detected"
explain: "内存使用量超越90%,现在剩余量为:{{ $value }}M"
#description: "{{$labels.instance}}: Memory usage is above 80% (current value is: {{ $value }})"
- alert: CPU报警
expr: netdata_system_cpu_percentage_average{chart="system.cpu",dimension="idle",family="cpu"} < 20
for: 2m
labels:
team: node
annotations:
Alert_type: CPU报警
Server: '{{$labels.instance}}'
explain: "CPU使用量超越80%,现在剩余量为:{{ $value }}"
#summary: "{{$labels.instance}}: High CPU usage detected"
#description: "{{$labels.instance}}: CPU usage is above 80% (current value is: {{ $value }})"
- alert: 磁盘报警
expr: netdata_disk_space_GiB_average{chart="disk_space._",dimension="avail",family="/"} < 4
for: 2m
labels:
team: node
annotations:
Alert_type: 磁盘报警
Server: '{{$labels.instance}}'
explain: "磁盘使用量超越90%,现在剩余量为:{{ $value }}G"
- alert: 服务告警
expr: up == 0
for: 2m
labels:
team: node
annotations:
Alert_type: 服务报警
Server: '{{$labels.instance}}'
explain: "netdata服务已封闭"

这个装备文件是改正的,yaml文件对格局要求和其他文件不一样,详细的能够自己去看一下,改完之后能够检测一下自己的格局是否正确

这个是一个格局化东西,首要是能够检查一下你的文件是否正确

http://www.bejson.com/validators/yaml_editor/

五、检查告警

中止cadvisor:docker stop cadvisor

日志:

重启服务后:

好吧,便是告警模板有点丑,后边在做改善,先测验到这儿。


后边会共享更多关于prometheus方面的内容,感兴趣的朋友能够重视下!

雷火电竞版权声明

本文仅代表作者观点,不代表本站立场。
本文系作者授权发表,未经许可,不得转载。

最近发表

    雷火电竞_雷火电竞官网_雷火电竞app

    http://www.zachita.com/

    |

    Powered By

    使用手机软件扫描微信二维码

    关注我们可获取更多热点资讯

    雷火电竞出品