一,前言
上一篇,介绍了 k8s 服务探针;
本篇,介绍 k8s 服务探针的实现;
二,ExecAction
1,介绍
通过在Pod 容器中,执行预定的 Shell 脚本命令,
如果所执行的命令没有报错退出(返回值为0),代表容器状态健康;否则表示有问题的;
2,实现
1)创建配置文件
shell-probe.yaml
vi shell-probe.yaml
// 脚本内容
apiVersion: v1
kind: Pod #pod类型
metadata:
labels: #标签
test: shell-probe
name: shell-probe #名字
spec: #规格说明
containers: #容器
- name: shell-probe #容器名称
image: registry.aliyuncs.com/google_containers/busybox #指定镜像
args: #需要传给容器的参数
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600 #创建文件睡30秒后删除再睡600秒
livenessProbe: #存活探针
exec: #执行脚本
command: #命令
- cat #查看
- /tmp/healthy
initialDelaySeconds: 5 #等待容器启动完成后多久才执行检测
periodSeconds: 5 #检测间隔时间
为容器配置存活探针:为了判断容器是否存活,尝试执行一个脚本:cat /tmp/healthy
- 如果文件存在,说明容器存活;
- 如果文件不存在,说明容器已死;
说明:
由于一开始文件就已经创建了,所以在刚开始时探测必是存在的;
设计在30 秒之后删除,所以配置容器启动完成之后 5 秒再进行检测;
备注:探针是为容器进行配置的,不同容器的探针可能不一样
2)应用配置
应用配置启动容器
[root@k8s-master deployment]# kubectl apply -f shell-probe.yaml
pod/shell-probe created
[root@k8s-master deployment]# kubectl get pods
NAME READY STATUS RESTARTS AGE
shell-probe 1/1 Running 0 25s
user-v1-8cc9f4fb5-52hmd 1/1 Running 0 17h
user-v1-8cc9f4fb5-6l7mz 1/1 Running 0 17h
user-v1-8cc9f4fb5-zqj2l 1/1 Running 0 17h
说明:
容器启动后,每间隔 5 秒就会进行一次探测,30 秒之后文件被删除,探针就会失败,这将导致容器重启;所以,根据我们的配置,容器每隔 30 秒会重启一次;
3)查看 pod 信息
过一会儿再查看 pod,已经重启了 4 次
[root@k8s-master deployment]# kubectl get pods
NAME READY STATUS RESTARTS AGE
shell-probe 1/1 Running 4 5m6s
user-v1-8cc9f4fb5-52hmd 1/1 Running 0 17h
user-v1-8cc9f4fb5-6l7mz 1/1 Running 0 17h
user-v1-8cc9f4fb5-zqj2l 1/1 Running 0 17h
4)查看容器详情
[root@k8s-master deployment]# kubectl describe pods shell-probe
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m25s default-scheduler Successfully assigned default/shell-probe to k8s-node
Normal Pulled 5m23s kubelet Successfully pulled image "registry.aliyuncs.com/google_containers/busybox" in 1.454670365s
Normal Pulled 4m8s kubelet Successfully pulled image "registry.aliyuncs.com/google_containers/busybox" in 465.285971ms
Normal Pulled 2m53s kubelet Successfully pulled image "registry.aliyuncs.com/google_containers/busybox" in 510.497057ms
Normal Created 2m52s (x3 over 5m23s) kubelet Created container shell-probe
Normal Started 2m52s (x3 over 5m23s) kubelet Started container shell-probe
Warning Unhealthy 2m8s (x9 over 4m48s) kubelet Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
Normal Killing 2m8s (x3 over 4m38s) kubelet Container shell-probe failed liveness probe, will be restarted
Normal Pulling 23s (x5 over 5m24s) kubelet Pulling image "registry.aliyuncs.com/google_containers/busybox"
第12 行:
由于没有找到文件 /tmp/healthy,存活探针检测失败;
容器的存活探针检测失败了,容器将会被重启;
三、TCPSocketAction
1,介绍
使用TCP 套接字进行检测;Kubernetes 会尝试在 Pod 内与指定端口连接;
如果能建立连接(Pod的端口打开了),就代表当前容器是健康的;如果不能,则代表这个 Pod 有问题;
比如:
- nginx 服务,需要看 80 端口是否正常;
- mysql 服务,需要看 3306 端口是否正常;
2,实现
1)创建配置文件
tcp-probe.yaml
vi tcp-probe.yaml
// 脚本内容
apiVersion: v1
kind: Pod
metadata:
name: tcp-probe
labels:
app: tcp-probe
spec:
containers:
- name: tcp-probe
image: nginx
ports:
- containerPort: 80
readinessProbe: #可读
tcpSocket:
port: 80
initialDelaySeconds: 5
periodSeconds: 5
2)应用配置
[root@k8s-master deployment]# kubectl apply -f tcp-probe.yaml
pod/tcp-probe created
[root@k8s-master deployment]# kubectl get pods
NAME READY STATUS RESTARTS AGE
shell-probe 0/1 CrashLoopBackOff 7 17m
tcp-probe 1/1 Running 0 16s
user-v1-8cc9f4fb5-52hmd 1/1 Running 0 17h
user-v1-8cc9f4fb5-6l7mz 1/1 Running 0 17h
user-v1-8cc9f4fb5-zqj2l 1/1 Running 0 17h
// kubectl get pods | grep tcp-probe
3)查看 pod 详情
[root@k8s-master deployment]# kubectl describe pods tcp-probe
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 106s default-scheduler Successfully assigned default/tcp-probe to k8s-node
Normal Pulling 106s kubelet Pulling image "nginx"
Normal Pulled 104s kubelet Successfully pulled image "nginx" in 1.480773049s
Normal Created 104s kubelet Created container tcp-probe
Normal Started 104s kubelet Started container tcp-probe
方案一,停掉 nginx 服务
现在每隔 5 秒钟,就会检测一个 80 端口是否正常
如果要测试探针,可以让 nginx 停掉即可:可以进入到容器中
[root@k8s-master deployment]# kubectl exec -it tcp-probe -- bash
root@tcp-probe:/# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
root@tcp-probe:/# nginx -s stop
2021/12/25 03:59:44 [notice] 48#48: signal process started
root@tcp-probe:/# command terminated with exit code 137
为容器tcp-probe 安装 apt-get,安装 ps 工具:
apt-get update
apt-get install procps
ps
// 实际执行
root@tcp-probe:/# apt-get update
Get:1 http://security.debian.org/debian-security bullseye-security InRelease [44.1 kB]
Get:2 http://deb.debian.org/debian bullseye InRelease [116 kB]
Get:3 http://security.debian.org/debian-security bullseye-security/main amd64 Packages [102 kB]
Get:4 http://deb.debian.org/debian bullseye-updates InRelease [39.4 kB]
Get:5 http://deb.debian.org/debian bullseye/main amd64 Packages [8183 kB]
Get:6 http://deb.debian.org/debian bullseye-updates/main amd64 Packages [2592 B]
Fetched 8487 kB in 4min 19s (32.8 kB/s)
Reading package lists... Done
root@tcp-probe:/# apt-get install procps
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
libgpm2 libncurses6 libncursesw6 libprocps8 psmisc
Suggested packages:
gpm
The following NEW packages will be installed:
libgpm2 libncurses6 libncursesw6 libprocps8 procps psmisc
0 upgraded, 6 newly installed, 0 to remove and 0 not upgraded.
Need to get 1034 kB of archives.
After this operation, 3474 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://deb.debian.org/debian bullseye/main amd64 libncurses6 amd64 6.2+20201114-2 [102 kB]
Get:2 http://deb.debian.org/debian bullseye/main amd64 libncursesw6 amd64 6.2+20201114-2 [132 kB]
Get:3 http://deb.debian.org/debian bullseye/main amd64 libprocps8 amd64 2:3.3.17-5 [63.9 kB]
Get:4 http://deb.debian.org/debian bullseye/main amd64 procps amd64 2:3.3.17-5 [502 kB]
Get:5 http://deb.debian.org/debian bullseye/main amd64 libgpm2 amd64 1.20.7-8 [35.6 kB]
Get:6 http://deb.debian.org/debian bullseye/main amd64 psmisc amd64 23.4-2 [198 kB]
Fetched 1034 kB in 20s (52.5 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package libncurses6:amd64.
(Reading database ... 7815 files and directories currently installed.)
Preparing to unpack .../0-libncurses6_6.2+20201114-2_amd64.deb ...
Unpacking libncurses6:amd64 (6.2+20201114-2) ...
Selecting previously unselected package libncursesw6:amd64.
Preparing to unpack .../1-libncursesw6_6.2+20201114-2_amd64.deb ...
Unpacking libncursesw6:amd64 (6.2+20201114-2) ...
Selecting previously unselected package libprocps8:amd64.
Preparing to unpack .../2-libprocps8_2%3a3.3.17-5_amd64.deb ...
Unpacking libprocps8:amd64 (2:3.3.17-5) ...
Selecting previously unselected package procps.
Preparing to unpack .../3-procps_2%3a3.3.17-5_amd64.deb ...
Unpacking procps (2:3.3.17-5) ...
Selecting previously unselected package libgpm2:amd64.
Preparing to unpack .../4-libgpm2_1.20.7-8_amd64.deb ...
Unpacking libgpm2:amd64 (1.20.7-8) ...
Selecting previously unselected package psmisc.
Preparing to unpack .../5-psmisc_23.4-2_amd64.deb ...
Unpacking psmisc (23.4-2) ...
Setting up libgpm2:amd64 (1.20.7-8) ...
Setting up psmisc (23.4-2) ...
Setting up libncurses6:amd64 (6.2+20201114-2) ...
Setting up libncursesw6:amd64 (6.2+20201114-2) ...
Setting up libprocps8:amd64 (2:3.3.17-5) ...
Setting up procps (2:3.3.17-5) ...
Processing triggers for libc-bin (2.31-13+deb11u2) ...
root@tcp-probe:/# ps
PID TTY TIME CMD
39 pts/0 00:00:00 bash
385 pts/0 00:00:00 ps
备注:容器重启会重新拉取镜像,导致 apt-get 需要重新安装;
停掉nginx
// 执行前
[root@k8s-master deployment]# kubectl get pod
NAME READY STATUS RESTARTS AGE
shell-probe 0/1 CrashLoopBackOff 13 37m
tcp-probe 1/1 Running 0 4m
user-v1-8cc9f4fb5-52hmd 1/1 Running 0 17h
user-v1-8cc9f4fb5-6l7mz 1/1 Running 0 17h
user-v1-8cc9f4fb5-zqj2l 1/1 Running 0 17h
// 执行
root@tcp-probe:/# pkill nginx
root@tcp-probe:/# command terminated with exit code 137
// 执行后
[root@k8s-master ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
shell-probe 0/1 CrashLoopBackOff 15 47m
tcp-probe 0/1 CrashLoopBackOff 1 7m
user-v1-8cc9f4fb5-52hmd 1/1 Running 0 17h
user-v1-8cc9f4fb5-6l7mz 1/1 Running 0 17h
user-v1-8cc9f4fb5-zqj2l 1/1 Running 0 17h
4)查看容器详情
[root@k8s-master ~]# kubectl describe pod tcp-probe
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 35m default-scheduler Successfully assigned default/tcp-probe to k8s-node
Normal Pulled 35m kubelet Successfully pulled image "nginx" in 1.480773049s
Normal Pulled 30m kubelet Successfully pulled image "nginx" in 15.410293911s
Normal Pulled 13m kubelet Successfully pulled image "nginx" in 15.345629205s
Warning BackOff 5m27s kubelet Back-off restarting failed container
Normal Pulling 5m13s (x4 over 35m) kubelet Pulling image "nginx"
Normal Created 4m58s (x4 over 35m) kubelet Created container tcp-probe
Normal Started 4m58s (x4 over 35m) kubelet Started container tcp-probe
Normal Pulled 4m58s kubelet Successfully pulled image "nginx" in 15.360719175s
容器并没有重启,“Back-off restarting failed container”这句并不是重启,是初始化时的问题
方案二:进入容器,调整 nginx 的端口映射
重新启动tcp-probe容器
[root@k8s-master ~]# kubectl delete pod tcp-probe
pod "tcp-probe" deleted
[root@k8s-master ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
user-v1-8cc9f4fb5-52hmd 1/1 Running 0 5d14h
user-v1-8cc9f4fb5-6l7mz 1/1 Running 0 5d14h
user-v1-8cc9f4fb5-zqj2l 1/1 Running 0 5d14h
[root@k8s-master deployment]# kubectl apply -f tcp-probe.yaml
pod/tcp-probe created
[root@k8s-master deployment]# kubectl get pods
NAME READY STATUS RESTARTS AGE
tcp-probe 1/1 Running 0 12m
user-v1-8cc9f4fb5-52hmd 1/1 Running 0 5d14h
user-v1-8cc9f4fb5-6l7mz 1/1 Running 0 5d14h
user-v1-8cc9f4fb5-zqj2l 1/1 Running 0 5d14h
进入容器,将 nginx 端口改为 8080,使探针失败:
[root@k8s-master deployment]# kubectl exec -it tcp-probe -- /bin/sh
# apt-get update
# apt-get install vim -y
修改nginx 配置文件
vi /etc/nginx/conf.d/default.conf
server {
listen 9090; // 80 改为 9090
listen [::]:80;
server_name localhost;
#access_log /var/log/nginx/host.access.log main;
location / {
root /usr/share/nginx/html;
index index.html index.htm;
}
#error_page 404 /404.html;
# redirect server error pages to the static page /50x.html
#
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root /usr/share/nginx/html;
}
# proxy the PHP scripts to Apache listening on 127.0.0.1:80
# 2,22 Top
重新加载配置文件
# nginx -s reload
2021/12/30 01:33:41 [notice] 425#425: signal process started
# exit
查看
[root@k8s-master deployment]# kubectl get pods
NAME READY STATUS RESTARTS AGE
tcp-probe 0/1 Running 0 22m
user-v1-8cc9f4fb5-52hmd 1/1 Running 0 5d15h
user-v1-8cc9f4fb5-6l7mz 1/1 Running 0 5d15h
user-v1-8cc9f4fb5-zqj2l 1/1 Running 0 5d15h
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 22m default-scheduler Successfully assigned default/tcp-probe to k8s-node
Normal Pulling 22m kubelet Pulling image "nginx"
Normal Pulled 22m kubelet Successfully pulled image "nginx" in 19.889680609s
Normal Created 22m kubelet Created container tcp-probe
Normal Started 22m kubelet Started container tcp-probe
Warning Unhealthy 5s (x18 over 90s) kubelet Readiness probe failed: dial tcp 10.244.1.72:80: connect: connection refused
Readiness probe failed: dial tcp 10.244.1.72:80: connect: connection refused
可读性探针失败,连接 10.244.1.72:80 被拒绝
虽然探针失败了,但 pod 还是 Running 的状态
[root@k8s-master deployment]# kubectl get pods
NAME READY STATUS RESTARTS AGE
tcp-probe 0/1 Running 0 24m
user-v1-8cc9f4fb5-52hmd 1/1 Running 0 5d15h
user-v1-8cc9f4fb5-6l7mz 1/1 Running 0 5d15h
user-v1-8cc9f4fb5-zqj2l 1/1 Running 0 5d15h
所以,容器没有重启,只是不再被分配流量了
四、HTTPGetAction
1,介绍
Kubernetes 尝试使用 HTTP GET 请求去访问 Pod 内指定的 API 路径;
- 如果返回200,代表容器就是健康的
- 如果不能,代表这个 Pod 是有问题的
2,实现
1)创建配置文件
http-probe.yaml
vi http-probe.yaml
// 脚本内容
apiVersion: v1
kind: Pod
metadata:
labels:
test: http-probe
name: http-probe
spec:
containers:
- name: http-probe
image: http-probe:1.0.0
livenessProbe: #存活探针
httpGet:
path: /liveness
port: 3000
httpHeaders:
- name: source
value: probe
initialDelaySeconds: 5
periodSeconds: 5
说明:
每隔5 秒钟,请求 ip+3000+/liveness,传入请求头source=probe
2)应用配置
[root@k8s-master deployment]# kubectl apply -f http-probe.yaml
pod/http-probe created
[root@k8s-master deployment]# kubectl get pods
NAME READY STATUS RESTARTS AGE
http-probe 0/1 ContainerCreating 0 15s
tcp-probe 0/1 Running 0 31m
user-v1-8cc9f4fb5-52hmd 1/1 Running 0 5d15h
user-v1-8cc9f4fb5-6l7mz 1/1 Running 0 5d15h
user-v1-8cc9f4fb5-zqj2l 1/1 Running 0 5d15h
http-probe 容器中,有一个 nodejs 服务:
访问路径 /liveness,获取头信息 source,10 秒之前是 200 之后就是 500
let http = require('http');
let start = Date.now();
http.createServer(function(req,res){
if(req.url === '/liveness'){
let value = req.headers['source'];
if(value === 'probe'){
let duration = Date.now()-start;
if(duration>10*1000){
res.statusCode=500;
res.end('error');
}else{
res.statusCode=200;
res.end('success');
}
}else{
res.statusCode=200;
res.end('liveness');
}
}else{
res.statusCode=200;
res.end('liveness');
}
}).listen(3000, function(){
console.log("http server started on 3000")});
Dockerfile
FROM node
COPY ./app /app
WORKDIR /app
EXPOSE 3000
CMD node index.js
如果配置文件有问题,删除 pod 重新启动:
[root@k8s-master deployment]# kubectl delete pod http-probe --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "http-probe" force deleted
[root@k8s-master deployment]# kubectl get pods
NAME READY STATUS RESTARTS AGE
tcp-probe 0/1 Running 0 38m
user-v1-8cc9f4fb5-52hmd 1/1 Running 0 5d15h
user-v1-8cc9f4fb5-6l7mz 1/1 Running 0 5d15h
user-v1-8cc9f4fb5-zqj2l 1/1 Running 0 5d15h
[root@k8s-master deployment]# kubectl apply -f http-probe.yaml
pod/http-probe created
[root@k8s-master deployment]# kubectl get pods
NAME READY STATUS RESTARTS AGE
http-probe 1/1 Running 0 11s
tcp-probe 0/1 Running 0 38m
user-v1-8cc9f4fb5-52hmd 1/1 Running 0 5d15h
user-v1-8cc9f4fb5-6l7mz 1/1 Running 0 5d15h
user-v1-8cc9f4fb5-zqj2l 1/1 Running 0 5d15h
[root@k8s-master deployment]# kubectl describe pods http-probe
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 24s default-scheduler Successfully assigned default/http-probe to k8s-node
Normal Pulled 24s kubelet Container image "http-probe:1.0.0" already present on machine
Normal Created 24s kubelet Created container http-probe
Normal Started 24s kubelet Started container http-probe
Warning Unhealthy 0s (x3 over 10s) kubelet Liveness probe failed: HTTP probe failed with statuscode: 500
Normal Killing 0s kubelet Container http-probe failed liveness probe, will be restarted
容器将会被重启
五,结尾
本篇,介绍了三种服务探针的原理和实现;
下一篇,构建私有镜像仓库;
版权声明:本文不是「本站」原创文章,版权归原作者所有 | 原文地址: