如何解决的Fragate节点磁盘压力

我正在运行具有fargate配置文件的EKS集群。我使用kubectl describe node检查节点状态，它显示磁盘压力:

Conditions:
Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
----             ------  -----------------                 ------------------                ------                       -------
MemoryPressure   False   Tue, 12 Jul 2022 03:10:33 +0000   Wed, 29 Jun 2022 13:21:17 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
DiskPressure     True    Tue, 12 Jul 2022 03:10:33 +0000   Wed, 06 Jul 2022 19:46:54 +0000   KubeletHasDiskPressure       kubelet has disk pressure
PIDPressure      False   Tue, 12 Jul 2022 03:10:33 +0000   Wed, 29 Jun 2022 13:21:17 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
Ready            True    Tue, 12 Jul 2022 03:10:33 +0000   Wed, 29 Jun 2022 13:21:27 +0000   KubeletReady                 kubelet is posting ready status

还有一个失败的垃圾收集事件。

Events:
Type     Reason                Age                     From     Message
----     ------                ----                    ----     -------
Warning  FreeDiskSpaceFailed   11m (x844 over 2d22h)   kubelet  failed to garbage collect required amount of images. Wanted to free 6314505830 bytes, but freed 0 bytes
Warning  EvictionThresholdMet  65s (x45728 over 5d7h)  kubelet  Attempting to reclaim ephemeral-storage

我认为磁盘快速填充的原因是由于应用程序日志，该应用程序写入标准输出，根据aws文档，它反过来由容器代理写入日志文件，我使用fargate内置fluentbit将应用程序日志推送到opensearch集群。

但是看起来EKS集群没有删除容器代理创建的旧日志文件。

我希望SSH到fargate节点以进一步调试问题，但根据法律支持SSH到fargate节点是不可能的。

怎样做才能消除肥大淋巴结的磁盘压力?

在回答中建议，我在sidecar中使用logrotate。但是根据logrotate容器的日志，它无法找到dir:

rotating pattern: /var/log/containers/*.log
52428800 bytes (5 rotations)
empty log files are not rotated, old logs are removed
considering log /var/log/containers/*.log
log /var/log/containers/*.log does not exist -- skipping
reading config file /etc/logrotate.conf
Reading state from file: /var/lib/logrotate.status
Allocating hash table for state file, size 64 entries
Creating new state

yaml文件是:

apiVersion: apps/v1
kind: Deployment
metadata:
name: my-apis
namespace: kube-system
spec:
replicas: 3
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: my-apis
image: 111111xxxxx.dkr.ecr.us-west-2.amazonaws.com/my-apis:1.0.3
ports:
- containerPort: 8080
resources:
limits:
cpu: "1000m"
memory: "1200Mi"
requests:
cpu: "1000m"
memory: "1200Mi"
readinessProbe:
httpGet:
path: "/ping"
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 2
livenessProbe:
httpGet:
path: "/ping"
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 5
- name: logrotate
image: realz/logrotate
volumeMounts:
- mountPath: /var/log/containers
name: my-app-logs
env:
- name: CRON_EXPR
value: "*/5 * * * *"
- name: LOGROTATE_LOGFILES
value: "/var/log/containers/*.log"
- name: LOGROTATE_FILESIZE
value: "50M"
- name: LOGROTATE_FILENUM
value: "5"
volumes:
- name: my-app-logs
emptyDir: {}

What can be done to remove disk pressure from fargate nodes?

没有已知的配置可以让Fargate自动清除特定的日志位置。你可以像sidecar一样运行logrotate。这里有很多选择。

快速找到磁盘填充的原因。这是由于日志库logback同时向文件和控制台写入日志，并且logback中的日志轮换策略长时间保留大量日志文件。删除logback配置中写入文件的appender以修复问题。

我还发现container agent写入文件的STDOUT日志是旋转的，文件大小为10 mb，最多5个文件。因此，它不会造成磁盘压力。

相关内容

最新更新

热门标签：