11、Prometheus监控：PrometheusOperator自定义监控项-程序员快看-教程，程序员编程资料站

自定义PrometheusOperator监控项-nginx/mysql

除了Kubernetes 集群中的一些资源对象、节点以及组件需要监控，有的时候我们可能还需要根据实际的业务需求去添加自定义的监控项，添加一个自定义监控的步骤也是非常简单的。

第一步建立一个 ServiceMonitor 对象，用于 Prometheus 添加监控项
第二步为 ServiceMonitor 对象关联 metrics 数据接口的一个 Service 对象
第三步确保 Service 对象可以正确获取到 metrics 数据

1 自定义监控Nginx

步骤

1、创建采集工具使用deploy部署；
2.创建采集工具Service ##clusterIP: None
3、创建ServiceMonitor；

1.1 先helm部署Nginx

[root@k8s-master helm]# helm create nginx
Creating nginx

[root@k8s-master helm]# helm install nginx
NAME:   guiding-dachshund
LAST DEPLOYED: Fri Sep 27 11:37:08 2019 NAMESPACE: default STATUS: DEPLOYED RESOURCES: ==> v1/Deployment NAME READY UP-TO-DATE AVAILABLE AGE guiding-dachshund-nginx 0/1 0 0 0s ==> v1/Pod(related) NAME READY STATUS RESTARTS AGE guiding-dachshund-nginx-54475b65c8-sl78p 0/1 ContainerCreating 0 0s ==> v1/Service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE guiding-dachshund-nginx ClusterIP 10.101.205.141 <none> 80/TCP 0s NOTES: 1. Get the application URL by running these commands: export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=nginx,app.kubernetes.io/instance=guiding-dachshund" -o jsonpath="{.items[0].metadata.name}") echo "Visit http://127.0.0.1:8080 to use your application" kubectl port-forward $POD_NAME 8080:80 [root@k8s-master helm]# kubectl get pod,svc | grep nginx pod/guiding-dachshund-nginx-54475b65c8-sl78p 1/1 Running 0 15s service/guiding-dachshund-nginx ClusterIP 10.101.205.141 <none> 80/TCP 15s [root@k8s-master helm]# [root@k8s-master helm]# curl -I 10.101.205.141 HTTP/1.1 200 OK Server: nginx/1.16.1 Date: Fri, 27 Sep 2019 03:37:56 GMT Content-Type: text/html Content-Length: 612 Last-Modified: Tue, 13 Aug 2019 10:05:00 GMT Connection: keep-alive ETag: "5d528b4c-264" Accept-Ranges: bytes [root@k8s-master helm]#

1.2 nginx监控

https://blog.51cto.com/billy98/2357919

prometheus对收集的数据格式是有一定的要求的，具体格式如下,只有符合此格式的prometheus才会正常的采集，所以在应用中我们需要能把关键性的监控数据以此格式抛出来。

nginx_http_connections{state="active"} 2
nginx_http_connections{state="reading"} 0
nginx_http_connections{state="waiting"} 1
nginx_http_connections{state="writing"} 1
nginx_http_request_bytes_sent{host="10.46.0.4"} 11055968
nginx_http_request_bytes_sent{host="testservers"} 4640
nginx_http_request_time_bucket{host="10.46.0.4",le="00.005"} 3960

至于怎么把监控指标数据抛出来，可能需要从应用本身动手。Prometheus社区也提供了大量的官方以及第三方Exporters，可以满足Prometheus的采纳者快速实现对关键业务，以及基础设施的监控需求。
官方以及第三方Exporters请参考此链接。

此处我们建议直接使用Prometheus的Client进行注册监控接口。Promehtues的Client目前支持大部分编程语言，支持列表可以参考如下文章。
prometheus client支持的语言列表。

本次是使用nginx demo镜像来演示。

1. 创建deployment和service

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: nginx-demo
  labels:
    app: nginx-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-demo template: metadata: labels: app: nginx-demo spec: containers: - name: nginx-demo image: billy98/nginx-prometheus-metrics:latest ports: - name: http-metrics containerPort: 9527 - name: web containerPort: 80 - name: test containerPort: 1314 imagePullPolicy: IfNotPresent --- apiVersion: v1 kind: Service metadata: labels: app: nginx-demo name: nginx-demo namespace: default spec: ports: - name: http-metrics port: 9527 protocol: TCP targetPort: 9527 - name: web port: 80 protocol: TCP targetPort: 80 - name: test port: 1314 protocol: TCP targetPort: 1314 selector: app: nginx-demo type: ClusterIP

2. 创建ServiceMonitor

由于prometheus里指定了serviceMonitor的标签必须release: p才会去抓取数据，所以在我们应用的ServiceMonitor中必须加上此标签。

[root@node-01 ~]# kubectl -n monitoring get prometheus p-prometheus -o yaml

...
  serviceMonitorSelector:
    matchLabels:
      release: p
...(其余内容省略)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app: nginx-demo
    release: p
  name: nginx-demo
  namespace: monitoring
  #prometheus的namespace
spec:
  endpoints:
  - interval: 15s port: http-metrics namespaceSelector: matchNames: - default #nginx demo的namespace selector: matchLabels: app: nginx-demo

此处需要特别做些解释，serviceMonitor是Prometheus Operator中抽象的概念，他的作用就是讲配置Prometheus采集Target的配置变化成为动态发现的方式，可以serviceMonitor通过Deployment对应的Service配置进行挂钩，通过label selector选择Service，并自动发现后端容器。其中需要注意的是namespace字段永远为monitoring，而namespaceSelector中则是选择的应用所在的namespace。

创建完成可以看到endpoints

[root@k8s-master mysql]# kubectl get ep| grep nginx
guiding-dachshund-nginx   10.254.2.251:80 119m nginx-demo 10.254.1.189:9527,10.254.1.189:80,10.254.1.189:1314 115m [root@k8s-master mysql]#

然后我们访问10.254.1.189:1314``生成一些测试指标。

[root@k8s-master mysql]# curl 10.254.1.189:1314
hello world
[root@k8s-master mysql]#

查看监控数据

[root@k8s-master mysql]# curl 10.254.1.189:9527/metrics
# HELP nginx_http_connections Number of HTTP connections
# TYPE nginx_http_connections gauge
nginx_http_connections{state="active"} 3 nginx_http_connections{state="reading"} 0 nginx_http_connections{state="waiting"} 2 nginx_http_connections{state="writing"} 1 # HELP nginx_http_request_bytes_sent Number of HTTP request bytes sent # TYPE nginx_http_request_bytes_sent counter nginx_http_request_bytes_sent{host="10.254.1.189"} 1165650 nginx_http_request_bytes_sent{host="testservers"} 160 # HELP nginx_http_request_time HTTP request time # TYPE nginx_http_request_time histogram nginx_http_request_time_bucket{host="10.254.1.189",le="00.005"} 417 nginx_http_request_time_bucket{host="10.254.1.189",le="00.010"} 417 nginx_http_request_time_bucket{host="10.254.1.189",le="00.020"} 417 nginx_http_request_time_bucket{host="10.254.1.189",le="00.030"} 417 nginx_http_request_time_bucket{host="10.254.1.189",le="00.050"} 417 nginx_http_request_time_bucket{host="10.254.1.189",le="00.075"} 417 nginx_http_request_time_bucket{host="10.254.1.189",le="00.100"} 417

3. 验证

访问Prometheus，验证数据采集，打开Status下的Service Discovery，active的数目等于Pod数据即表示采集正常

打开Graph页面，选择我们刚才推送的数据指标名称，点击Execute，即可查看到采集上来的数据。prometheus查询语法请参考prometheus查询语句示例。

1.3 Nginx添加到grafanan

这个没有特别好的模板，可以自己创建直接选prometheus，把SQL查询语句放上去调整就行了

2 自定义监控MySQL

https://blog.csdn.net/travellersY/article/details/84632679

https://yunlzheng.gitbook.io/prometheus-book/part-ii-prometheus-jin-jie/exporter/commonly-eporter-usage/use-promethues-monitor-mysql

照旧老三步

步骤

1、创建采集工具使用deploy部署；
2.创建采集工具Service ##clusterIP: None
3、创建ServiceMonitor；

2.1 部署一个MySQL作为监控样例

最简单的k8s-mysql部署

# cat mysql-deploy.yaml
apiVersion: extensions/v1beta1
kind: Deployment            #副本控制器Deployment
metadata:
  name: mysql                          #Deployment的名称，全局唯一
spec:
  replicas: 1                          #Pod副本的期待数量
  template:                            #根据此模版创建Pod的副本（实例）
    metadata:
       labels:
         app: mysql                    #Pod副本拥有的标签，对应Deployment的selector
    spec:
       containers:                     #Pod内，定义容器
       - name: mysql #容器名称 image: mysql:5.7 #Docker image ports: - containerPort: 3306 #容器应用监听的端口 env: #注入容器内的环境变量 - name: MYSQL_ROOT_PASSWORD #这里设置root初始密码 value: "123456" # cat mysql-svc.yaml apiVersion: v1 kind: Service metadata: name: mysql spec: type: NodePort ports: - port: 3306 nodePort: 30001 selector: app: mysql

连接测试

[root@k8s-master ~]# kubectl get pod,svc | grep mysql
pod/mysql-94f6bbcfd-9nl7w                                   1/1     Running   0 116m service/mysql NodePort 10.106.33.138 <none> 3306:30001/TCP 110m [root@k8s-master ~]# mysql -uroot -p123456 -h10.106.33.138 -P3306 Welcome to the MariaDB monitor. Commands end with ; or \g. Your MySQL connection id is 656 Server version: 5.7.27 MySQL Community Server (GPL) Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MySQL [(none)]> Bye [root@k8s-master ~]# mysql -uroot -p123456 -h10.6.76.23 -P30001 Welcome to the MariaDB monitor. Commands end with ; or \g. Your MySQL connection id is 661 Server version: 5.7.27 MySQL Community Server (GPL) Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MySQL [(none)]>

2.2 下载prometheus-mysql-exporter

[root@k8s-master helm]# mkdir helm_chart
[root@k8s-master helm]# cd helm_chart/
[root@k8s-master helm_chart]# ls
[root@k8s-master helm_chart]# git clone https://github.com/helm/charts.git 正克隆到 'charts'... remote: Enumerating objects: 7, done. remote: Counting objects: 100% (7/7), done. remote: Compressing objects: 100% (7/7), done. remote: Total 84313 (delta 2), reused 2 (delta 0), pack-reused 84306 接收对象中: 100% (84313/84313), 23.63 MiB | 80.00 KiB/s, done. 处理 delta 中: 100% (61834/61834), done. [root@k8s-master helm_chart]# [root@k8s-master helm_chart]# cd charts/stable/prometheus-mysql-exporter/ [root@k8s-master prometheus-mysql-exporter]# ls a.yaml Chart.yaml OWNERS README.md templates values.yaml [root@k8s-master prometheus-mysql-exporter]#

2.3 创建prometheus-mysql-exporter服务

在kubernetes中添加mysql监控的exporter：prometheus-mysql-exporter 这里采用helm的方式安装prometheus-mysql-exporter，按照github上的步骤进行安装，修改values.yaml中的datasource为安装在kubernetes中mysql的地址

[root@k8s-master prometheus-mysql-exporter]# cat values.yaml
...
mysql:
  db: ""
  host: "10.106.33.138" param: "" pass: "123456" port: 3306 protocol: "" user: "root"

创建

helm install --name my-release -f values.yaml  ../prometheus-mysql-exporter
[root@k8s-master prometheus-mysql-exporter]# kubectl get pod,svc | grep mysql

pod/my-release-prometheus-mysql-exporter-75cb8bffc7-qqckz 1/1 Running 0 107m pod/mysql-94f6bbcfd-9nl7w 1/1 Running 0 120m service/my-release-prometheus-mysql-exporter ClusterIP 10.104.90.123 <none> 9104/TCP 107m service/mysql NodePort 10.106.33.138 <none> 3306:30001/TCP 113m [root@k8s-master prometheus-mysql-exporter]#

测试与MySQL连接

当指标值为1时表示能够正常获取监控数据，如果不是1请排查MySQL连接设置，权限，日志等

[root@k8s-master prometheus-mysql-exporter]# curl 10.104.90.123:9104/metrics|grep mysql_up
  % Total    % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 177k 100 177k 0 0 3176k 0 --:--:-- --:--:-- --:--:-- 3231k # HELP mysql_up Whether the MySQL server is up. # TYPE mysql_up gauge mysql_up 1 [root@k8s-master prometheus-mysql-exporter]#

2.4 创建ServiceMonitor

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor  #资源类型为ServiceMonitor
metadata:
  labels:
    prometheus: kube-prometheus #prometheus默认通过 prometheus: kube-prometheus发现ServiceMonitor，只要写上这个标签prometheus服务就能发现这个ServiceMonitor
  name: prometheus-exporter-mysql
  namespace: monitoring
spec:
  jobLabel: app #jobLabel指定的标签的值将会作为prometheus配置文件中scrape_config下job_name的值，也就是Target，如果不写，默认为service的name
  selector:
    matchLabels: #该ServiceMonitor匹配的Service的labels，如果使用mathLabels，则下面的所有标签都匹配时才会匹配该service，如果使用matchExpressions，则至少匹配一个标签的service都会被选择
      app: prometheus-mysql-exporter # 由于前面查看mysql-exporter的service信息中标签包含了app: prometheus-mysql-exporter这个标签，写上就能匹配到
  namespaceSelector:
    #any: true #表示从所有namespace中去匹配，如果只想选择某一命名空间中的service，可以使用matchNames: []的方式 matchNames: - default endpoints: - port: mysql-exporter #前面查看mysql-exporter的service信息中，提供mysql监控信息的端口是Port: mysql-exporter 9104/TCP，所以这里填mysql-exporter interval: 30s #每30s获取一次信息 # path: /metrics HTTP path to scrape for metrics，默认值为/metrics honorLabels: true

[root@k8s-master prometheus-mysql-exporter]# kubectl get servicemonitors.monitoring.coreos.com  -n monitoring
NAME                        AGE
alertmanager                3d3h
coredns                     3d3h
grafana                     3d3h
kube-apiserver              3d3h
kube-controller-manager     3d3h
kube-scheduler              3d3h
kube-state-metrics          3d3h
kubelet                     3d2h
node-exporter 3d3h prometheus-exporter-mysql 89m prometheus-operator 3d3h [root@k8s-master prometheus-mysql-exporter]#

部分监控指标:

https://yunlzheng.gitbook.io/prometheus-book/part-ii-prometheus-jin-jie/exporter/commonly-eporter-usage/use-promethues-monitor-mysql

查询速率:
mysql_global_status_questions

写操作速率
sum(rate(mysql_global_status_commands_total{command=~"insert|update|delete"}[2m])) without (command)

MySQL默认的最大链接数为151。临时调整最大连接数，可以通过以下指令进行设置：
SET GLOBAL max_connections = 200;
如果想永久化设置，则需要通过修改MySQL配置文件my.cnf，添加以下内容：
max_connections = 200
剩余连接
mysql_global_variables_max_connections - mysql_global_status_threads_connected
当前拒绝连接：
mysql_global_status_aborted_connects

2分钟内磁盘读取请求次数的增长率的变化情况：
rate(mysql_global_status_innodb_buffer_pool_reads[2m])

2.5 添加到grafana

我们使用7362模板

2.6 报警规则

[root@k8s-master manifests]# pwd
/root/prometheus/operator/kube-prometheus/manifests
[root@k8s-master manifests]# tail -78 prometheus-rules.yaml

############
  - name: MySQL rules: - alert: 三分钟内有重启记录 expr: mysql_global_status_uptime < 180 for: 2m labels: severity: warning annotations: summary: "{{$labels.instance}}: Mysql_Instance_Reboot detected" description: "{{$labels.instance}}: Mysql_Instance_Reboot in 3 minute (up to now is: {{ $value }} seconds" - alert: 每秒查询次数 expr: rate(mysql_global_status_questions[5m]) > 500 for: 2m labels: severity: warning annotations: summary: "{{$labels.instance}}: Mysql_High_QPS detected" description: "{{$labels.instance}}: Mysql opreation is more than 500 per second ,(current value is: {{ $value }})" - alert: 连接数 expr: rate(mysql_global_status_connections[5m]) > 100 for: 2m labels: severity: warning annotations: summary: "{{$labels.instance}}: Mysql Too Many Connections detected" description: "{{$labels.instance}}: Mysql Connections is more than 100 per second ,(current value is: {{ $value }})" - alert: mysql接收速率,单位Mbps expr: rate(mysql_global_status_bytes_received[3m]) * 1024 * 8 > 100 for: 2m labels: severity: warning annotations: summary: "{{$labels.instance}}: Mysql_High_Recv_Rate detected" description: "{{$labels.instance}}: Mysql_Receive_Rate is more than 100Mbps ,(current value is: {{ $value }})" - alert: mysql传输速率,单位Mbps expr: rate(mysql_global_status_bytes_sent[3m]) * 1024 * 8 > 100 for: 2m labels: severity: warning annotations: summary: "{{$labels.instance}}: Mysql_High_Send_Rate detected" description: "{{$labels.instance}}: Mysql data Send Rate is more than 100Mbps ,(current value is: {{ $value }})" - alert: 慢查询 expr: rate(mysql_global_status_slow_queries[30m]) > 3 for: 2m labels: severity: warning annotations: summary: "{{$labels.instance}}: Mysql_Too_Many_Slow_Query detected" description: "{{$labels.instance}}: Mysql current Slow_Query Sql is more than 3 ,(current value is: {{ $value }})" - alert: 死锁 expr: mysql_global_status_innodb_deadlocks > 0 for: 2m labels: severity: warning annotations: summary: "{{$labels.instance}}: Mysql_Deadlock detected" description: "{{$labels.instance}}: Mysql Deadlock was found ,(current value is: {{ $value }})" - alert: 活跃线程小于30% expr: mysql_global_status_threads_running / mysql_global_status_threads_connected * 100 < 30 for: 2m labels: severity: warning annotations: summary: "{{$labels.instance}}: Mysql_Too_Many_sleep_threads detected" description: "{{$labels.instance}}: Mysql_sleep_threads percent is more than {{ $value }}, please clean the sleeping threads" - alert: innodb缓存占用缓存池大小超过80% expr: (mysql_global_status_innodb_page_size * on (instance) mysql_global_status_buffer_pool_pages{state="data"} + on (instance) mysql_global_variables_innodb_log_buffer_size + on (instance) mysql_global_variables_innodb_additional_mem_pool_size + on (instance) mysql_global_status_innodb_mem_dictionary + on (instance) mysql_global_variables_key_buffer_size + on (instance) mysql_global_variables_query_cache_size + on (instance) mysql_global_status_innodb_mem_adaptive_hash ) / on (instance) mysql_global_variables_innodb_buffer_pool_size * 100 > 80 for: 2m labels: severity: warning annotations: summary: "{{$labels.instance}}: Mysql_innodb_Cache_insufficient detected" description: "{{$labels.instance}}: Mysql innodb_Cache was used more than 80% ,(current value is: {{ $value }})"

3 配置发送报警

https://www.qikqiak.com/k8s-book/docs/59.%E8%87%AA%E5%AE%9A%E4%B9%89Prometheus%20Operator%20%E7%9B%91%E6%8E%A7%E9%A1%B9.html

3.1 查看相关配置文件

添加一个报警规则配置项，可以通过 AlertManager 的配置文件去配置各种报警接收器

首先我们将 alertmanager-main 这个 Service 改为 NodePort 类型的 Service，修改完成后我们可以在页面上的 status 路径下面查看 AlertManager 的配置信息:

[root@k8s-master manifests]# cat  alertmanager-service.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    alertmanager: main
  name: alertmanager-main
  namespace: monitoring
spec:
  type: NodePort
  ports:
  - name: web
    port: 9093 targetPort: web selector: alertmanager: main app: alertmanager sessionAffinity: ClientIP

这些配置信息实际上是来自于我们之前在prometheus-operator/contrib/kube-prometheus/manifests目录下面创建的 alertmanager-secret.yaml 文件：

[root@k8s-master manifests]# cat alertmanager-secret.yaml
apiVersion: v1
data:
  alertmanager.yaml: Imdsb2JhbCI6CiAgInJlc29sdmVfdGltZW91dCI6ICI1bSIKInJlY2VpdmVycyI6Ci0gIm5hbWUiOiAibnVsbCIKInJvdXRlIjoKICAiZ3JvdXBfYnkiOgogIC0gImpvYiIKICAiZ3JvdXBfaW50ZXJ2YWwiOiAiNW0iCiAgImdyb3VwX3dhaXQiOiAiMzBzIgogICJyZWNlaXZlciI6ICJudWxsIgogICJyZXBlYXRfaW50ZXJ2YWwiOiAiMTJoIgogICJyb3V0ZXMiOgogIC0gIm1hdGNoIjoKICAgICAgImFsZXJ0bmFtZSI6ICJXYXRjaGRvZyIKICAgICJyZWNlaXZlciI6ICJudWxsIg==
kind: Secret
metadata:
  name: alertmanager-main
  namespace: monitoring
type: Opaque

可以将alertmanager.yaml 对应的 value 值做一个 base64 解码：

[root@k8s-master manifests]# echo "Imdsb2JhbCI6CiAgInJlc29sdmVfdGltZW91dCI6ICI1bSIKInJlY2VpdmVycyI6Ci0gIm5hbWUiOiAibnVsbCIKInJvdXRlIjoKICAiZ3JvdXBfYnkiOgogIC0gImpvYiIKICAiZ3JvdXBfaW50ZXJ2YWwiOiAiNW0iCiAgImdyb3VwX3dhaXQiOiAiMzBzIgogICJyZWNlaXZlciI6ICJudWxsIgogICJyZXBlYXRfaW50ZXJ2YWwiOiAiMTJoIgogICJyb3V0ZXMiOgogIC0gIm1hdGNoIjoKICAgICAgImFsZXJ0bmFtZSI6ICJXYXRjaGRvZyIKICAgICJyZWNlaXZlciI6ICJudWxsIg==" |base64 -d
"global": "resolve_timeout": "5m" "receivers": - "name": "null" "route": "group_by": - "job" "group_interval": "5m" "group_wait": "30s" "receiver": "null" "repeat_interval": "12h" "routes": - "match": "alertname": "Watchdog" "receiver": "null"[root@k8s-master manifests]#

3.2 准备钉钉机器人

这个和之前都一样

悲催的是赶上钉钉升级，机器人新建不了，我们用之前的Jenkins留下的

https://oapi.dingtalk.com/robot/send?access_token=17549607d838b3015d183384ffe53333b13df0a98563150df241535808e10781

3.3 配置钉钉报警发送

[root@k8s-master manifests]# cat dingtalk.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: dingtalk-hook
  namespace: monitoring
spec:
  template:
    metadata:
      labels:
        app: dingtalk-hook
    spec:
      containers:
      - name: dingtalk-hook image: cnych/alertmanager-dingtalk-hook:v0.2 imagePullPolicy: IfNotPresent ports: - containerPort: 5000 name: http env: - name: ROBOT_TOKEN valueFrom: secretKeyRef: name: dingtalk-secret key: token resources: requests: cpu: 50m memory: 100Mi limits: cpu: 50m memory: 100Mi --- apiVersion: v1 kind: Service metadata: name: dingtalk-hook namespace: monitoring spec: selector: app: dingtalk-hook ports: - name: hook port: 5000 targetPort: http

要注意上面我们声明了一个 ROBOT_TOKEN 的环境变量，由于这是一个相对于私密的信息，所以我们这里从一个 Secret 对象中去获取，通过如下命令创建一个名为 dingtalk-secret 的 Secret 对象，然后部署上面的资源对象即可：

[root@k8s-master alertmanager]# kubectl create secret generic dingtalk-secret --from-literal=token=17549607d838b3015d183384ffe53333b13df0a98563150df241535808e10781 -n kube-system
secret/dingtalk-secret created
[root@k8s-master alertmanager]# kubectl create -f dingtalk-hook.yaml
deployment.extensions/dingtalk-hook created
service/dingtalk-hook created
[root@k8s-master manifests]# kubectl -n monitoring get secrets | grep dingtalk dingtalk-secret Opaque 1 61m [root@k8s-master manifests]# kubectl -n monitoring get pod,svc | grep dingtalk pod/dingtalk-hook-686ddd6976-pq4fk 1/1 Running 0 59m service/dingtalk-hook ClusterIP 10.111.250.130 <none> 5000/TCP 59m [root@k8s-master manifests]#

3.4 配置报警接收

部署成功后，现在我们就可以给 AlertManager 配置一个 webhook 了，在上面的配置中增加一个路由接收器

[root@k8s-master manifests]# cat alertmanager.yaml
global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.163.com:25' smtp_from: 'w.jjwx@163.com' smtp_auth_username: 'w.jjwx@163.com' smtp_auth_password: '密码' smtp_hello: '163.com' smtp_require_tls: false route: group_by: ['job', 'severity'] group_wait: 30s #测试配置的时间较短 group_interval: 1m repeat_interval: 2m #group_interval: 5m #repeat_interval: 12h receiver: default # receiver: webhook routes: - receiver: webhook match: alertname: CPUThrottlingHigh receivers: - name: 'default' email_configs: - to: '314144952@qq.com' send_resolved: true - name: 'webhook' webhook_configs: - url: 'http://dingtalk-hook.monitoring:5000' send_resolved: true

将上面文件保存为 alertmanager.yaml，然后使用这个文件创建一个 Secret 对象：

# 先将之前的 secret 对象删除

kubectl delete secret alertmanager-main -n monitoring
kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml -n monitoring

加载配置文件

[root@k8s-master manifests]# kubectl -n monitoring get svc
NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
alertmanager-main       NodePort    10.109.59.250    <none>        9093:30583/TCP 13m alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 9m dingtalk-hook ClusterIP 10.111.250.130 <none> 5000/TCP 151m grafana NodePort 10.100.31.73 <none> 3000:32339/TCP 4d2h kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 4d2h node-exporter ClusterIP None <none> 9100/TCP 4d2h prometheus-adapter ClusterIP 10.97.88.175 <none> 443/TCP 4d2h prometheus-k8s NodePort 10.97.199.239 <none> 9090:31466/TCP 4d2h prometheus-operated ClusterIP None <none> 9090/TCP 4d2h prometheus-operator ClusterIP None <none> 8080/TCP 4d2h [root@k8s-master manifests]# curl -X POST "http://10.109.59.250:9093/-/reload" [root@k8s-master manifests]# curl -X POST "http://10.97.199.239:9090/-/reload" [root@k8s-master manifests]#

[root@k8s-master manifests]# kubectl -n monitoring get svc
NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
alertmanager-main       NodePort    10.109.59.250    <none>        9093:30583/TCP 13m alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 9m dingtalk-hook ClusterIP 10.111.250.130 <none> 5000/TCP 151m grafana NodePort 10.100.31.73 <none> 3000:32339/TCP 4d2h kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 4d2h node-exporter ClusterIP None <none> 9100/TCP 4d2h prometheus-adapter ClusterIP 10.97.88.175 <none> 443/TCP 4d2h prometheus-k8s NodePort 10.97.199.239 <none> 9090:31466/TCP 4d2h prometheus-operated ClusterIP None <none> 9090/TCP 4d2h prometheus-operator ClusterIP None <none> 8080/TCP 4d2h [root@k8s-master manifests]# curl -X POST "http://10.109.59.250:9093/-/reload" [root@k8s-master manifests]# curl -X POST "http://10.97.199.239:9090/-/reload" [root@k8s-master manifests]#

如果配置文件有错的，会爆出来

[root@k8s-master manifests]# curl -X POST "http://10.109.59.250:9093/-/reload"
failed to reload config: undefined receiver "webhook" used in route [root@k8s-master manifests]#

我们添加了两个接收器，默认的通过邮箱进行发送，对于 CPUThrottlingHigh这个报警我们通过 webhook 来进行发送，这个 webhook 就是我们前面课程中定义的一个钉钉接收的 Server，上面的步骤创建完成后，很快我们就会收到一条钉钉消息：

钉钉