我正在运行具有fargate配置文件的EKS集群。我使用kubectl describe node
检查节点状态,它显示磁盘压力:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Tue, 12 Jul 2022 03:10:33 +0000 Wed, 29 Jun 2022 13:21:17 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Tue, 12 Jul 2022 03:10:33 +0000 Wed, 06 Jul 2022 19:46:54 +0000 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Tue, 12 Jul 2022 03:10:33 +0000 Wed, 29 Jun 2022 13:21:17 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 12 Jul 2022 03:10:33 +0000 Wed, 29 Jun 2022 13:21:27 +0000 KubeletReady kubelet is posting ready status
还有一个失败的垃圾收集事件。
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FreeDiskSpaceFailed 11m (x844 over 2d22h) kubelet failed to garbage collect required amount of images. Wanted to free 6314505830 bytes, but freed 0 bytes
Warning EvictionThresholdMet 65s (x45728 over 5d7h) kubelet Attempting to reclaim ephemeral-storage
我认为磁盘快速填充的原因是由于应用程序日志,该应用程序写入标准输出,根据aws文档,它反过来由容器代理写入日志文件,我使用fargate内置fluentbit将应用程序日志推送到opensearch集群。
但是看起来EKS集群没有删除容器代理创建的旧日志文件。
我希望SSH到fargate节点以进一步调试问题,但根据法律支持SSH到fargate节点是不可能的。
怎样做才能消除肥大淋巴结的磁盘压力?
在回答中建议,我在sidecar中使用logrotate。但是根据logrotate容器的日志,它无法找到dir:
rotating pattern: /var/log/containers/*.log
52428800 bytes (5 rotations)
empty log files are not rotated, old logs are removed
considering log /var/log/containers/*.log
log /var/log/containers/*.log does not exist -- skipping
reading config file /etc/logrotate.conf
Reading state from file: /var/lib/logrotate.status
Allocating hash table for state file, size 64 entries
Creating new state
yaml文件是:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-apis
namespace: kube-system
spec:
replicas: 3
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: my-apis
image: 111111xxxxx.dkr.ecr.us-west-2.amazonaws.com/my-apis:1.0.3
ports:
- containerPort: 8080
resources:
limits:
cpu: "1000m"
memory: "1200Mi"
requests:
cpu: "1000m"
memory: "1200Mi"
readinessProbe:
httpGet:
path: "/ping"
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 2
livenessProbe:
httpGet:
path: "/ping"
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 5
- name: logrotate
image: realz/logrotate
volumeMounts:
- mountPath: /var/log/containers
name: my-app-logs
env:
- name: CRON_EXPR
value: "*/5 * * * *"
- name: LOGROTATE_LOGFILES
value: "/var/log/containers/*.log"
- name: LOGROTATE_FILESIZE
value: "50M"
- name: LOGROTATE_FILENUM
value: "5"
volumes:
- name: my-app-logs
emptyDir: {}
What can be done to remove disk pressure from fargate nodes?
没有已知的配置可以让Fargate自动清除特定的日志位置。你可以像sidecar一样运行logrotate。这里有很多选择。
快速找到磁盘填充的原因。这是由于日志库logback
同时向文件和控制台写入日志,并且logback中的日志轮换策略长时间保留大量日志文件。删除logback配置中写入文件的appender以修复问题。
我还发现container agent
写入文件的STDOUT
日志是旋转的,文件大小为10 mb,最多5个文件。因此,它不会造成磁盘压力。