使用cqlsh连接到运行在Kubernetes上的节点时出现间歇性问题



在尝试使用cqlsh登录时,我的身份验证出现了一些问题。我正在运行一个由3个节点组成的集群,这些节点分布在Kubernetes中的三个不同物理节点上。在过去的一个月里,它一直像一种魅力一样滚动,但大约一周左右,它开始下降。在下面,你可以看到当我尝试登录到不同的节点和响应时。(仅供参考,并非总是节点0有问题,我也看到节点1有同样的问题,但节点0工作正常。这个问题似乎类似于Cassandra PasswordAuthenticator导致超时,但那里的建议没有帮助。

I have no name!@cassandra-0:/$ cqlsh -u cassandra -p abc123
Connection error: ('Unable to connect to any servers', {'127.0.0.1:9042': OperationTimedOut('errors=Timed out creating connection (5 seconds), last_host=None',)}) 
I have no name!@cassandra-1:/$ cqlsh -u cassandra -p abc123
Python 2.7 support is deprecated. Install Python 3.6+ or set CQLSH_NO_WARN_PY2 to suppress this message.Connected to mycluster at 127.0.0.1:9042
[cqlsh 6.0.0 | Cassandra 4.0.0 | CQL spec 3.4.5 | Native protocol v5]
Use HELP for help.
cassandra@cqlsh>
I have no name!@cassandra-2:/$ cqlsh -u cassandra -p abc123
Python 2.7 support is deprecated. Install Python 3.6+ or set CQLSH_NO_WARN_PY2 to suppress this message.Connected to mycluster at 127.0.0.1:9042
[cqlsh 6.0.0 | Cassandra 4.0.0 | CQL spec 3.4.5 | Native protocol v5]
Use HELP for help.
cassandra@cqlsh>

这是来自节点0 的日志

INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,090 MessagingMetrics.java:206 - HINT_RSP messages were dropped in last 5000 ms: 0 internal and 4 cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 8582 ms
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,090 MessagingMetrics.java:206 - HINT_REQ messages were dropped in last 5000 ms: 0 internal and 2 cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 8582 ms
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,090 StatusLogger.java:65 - Pool Name                       Active   Pending      Completed   Blocked  All Time Blocked
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,090 StatusLogger.java:69 - ReadStage                            0         0          17086         0                 0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,090 StatusLogger.java:69 - CompactionExecutor                   0         0         195522         0                 0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,090 StatusLogger.java:69 - MutationStage                        0         0          88515         0                 0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,091 StatusLogger.java:69 - MemtableReclaimMemory                0         0            210         0                 0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,091 StatusLogger.java:69 - PendingRangeCalculator               0         0             13         0                 0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,091 StatusLogger.java:69 - GossipStage                          0         0         660680         0                 0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,091 StatusLogger.java:69 - SecondaryIndexManagement             0         0              0         0                 0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,091 StatusLogger.java:69 - HintsDispatcher                      2         0          37776         0                 0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,091 StatusLogger.java:69 - Repair-Task                          0         0              5         0                 0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,091 StatusLogger.java:69 - Native-Transport-Requests            0         0          17111         0                 0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,091 StatusLogger.java:69 - RequestResponseStage                 0         0             20         0                 0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,091 StatusLogger.java:69 - MemtableFlushWriter                  0         0            210         0                 0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,091 StatusLogger.java:69 - PerDiskMemtableFlushWriter_0         0         0            210         0                 0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,091 StatusLogger.java:69 - MemtablePostFlush                    0         0            255         0                 0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,091 StatusLogger.java:69 - Sampler                              0         0              0         0                 0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,091 StatusLogger.java:69 - ValidationExecutor                   0         0             41         0                 0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,091 StatusLogger.java:69 - ViewBuildExecutor                    0         0              0         0                 0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,091 StatusLogger.java:69 - InternalResponseStage                0         0         151676         0                 0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,092 StatusLogger.java:69 - AntiEntropyStage                     0         0            249         0                 0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,092 StatusLogger.java:69 - CacheCleanupExecutor                 0         0              0         0                 0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,092 StatusLogger.java:79 - CompactionManager                 0         0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,092 StatusLogger.java:91 - MessagingService                n/a       0/0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,092 StatusLogger.java:101 - Cache Type                     Size                 Capacity               KeysToSave
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,092 StatusLogger.java:103 - KeyCache                      11452                 75497472                      all
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,092 StatusLogger.java:109 - RowCache                          0                        0                      all
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,092 StatusLogger.java:116 - Table                       Memtable ops,data
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,092 StatusLogger.java:119 - system_schema.columns                     0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,093 StatusLogger.java:119 - system_schema.types                       0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,093 StatusLogger.java:119 - system_schema.indexes                     0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,093 StatusLogger.java:119 - system_schema.keyspaces                   0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,093 StatusLogger.java:119 - system_schema.dropped_columns                 0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,093 StatusLogger.java:119 - system_schema.aggregates                  0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,093 StatusLogger.java:119 - system_schema.triggers                    0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,093 StatusLogger.java:119 - system_schema.tables                      0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,093 StatusLogger.java:119 - system_schema.views                       0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,093 StatusLogger.java:119 - system_schema.functions                   0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,093 StatusLogger.java:119 - system.compaction_history               3,634
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,093 StatusLogger.java:119 - system.IndexInfo                          0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,093 StatusLogger.java:119 - system.repairs                            0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,093 StatusLogger.java:119 - system.size_estimates           49344,1087744
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,093 StatusLogger.java:119 - system.table_estimates          98688,2339296
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,093 StatusLogger.java:119 - system.paxos                              0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,093 StatusLogger.java:119 - system.built_views                        0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,093 StatusLogger.java:119 - system.peer_events                        0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,093 StatusLogger.java:119 - system.peers_v2                           0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,093 StatusLogger.java:119 - system.peers                              0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,093 StatusLogger.java:119 - system.peer_events_v2                     0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,093 StatusLogger.java:119 - system.batches                            0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,093 StatusLogger.java:119 - system.transferred_ranges                 0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,094 StatusLogger.java:119 - system.transferred_ranges_v2                 0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,094 StatusLogger.java:119 - system.view_builds_in_progress                 0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,094 StatusLogger.java:119 - system.local                              0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,094 StatusLogger.java:119 - system.sstable_activity              229,2816
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,094 StatusLogger.java:119 - system.available_ranges_v2                 0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,094 StatusLogger.java:119 - system.available_ranges                   0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,094 StatusLogger.java:119 - system.prepared_statements                 0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,095 StatusLogger.java:119 - system_auth.roles                         0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,095 StatusLogger.java:119 - system_auth.role_members                  0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,095 StatusLogger.java:119 - system_auth.resource_role_permissons_index                 0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,095 StatusLogger.java:119 - system_auth.network_permissions                 0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,095 StatusLogger.java:119 - system_auth.role_permissions                 0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,100 StatusLogger.java:119 - system_distributed.parent_repair_history           10,418916
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,100 StatusLogger.java:119 - system_distributed.repair_history         29495,15967
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,100 StatusLogger.java:119 - system_distributed.view_build_status                 0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,101 StatusLogger.java:119 - system_traces.sessions                    0,0
INFO  [ScheduledTasks:1] 2022-10-26 17:39:00,101 StatusLogger.java:119 - system_traces.events                      0,0

密钥空间复制

cassandra@cqlsh> describe keyspace system_auth;
CREATE KEYSPACE system_auth WITH replication = {'class': 
'NetworkTopologyStrategy', 'datacenter1': '3'} AND durable_writes = true; 

节点工具状态

I have no name!@cassandra-prod-2:/$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load      Tokens  Owns (effective)  Host ID                               Rack 
UN  10.233.92.33   3.63 MiB  256     100.0%            xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx  rack1
UN  10.233.96.184  4.2 MiB   256     100.0%            xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx  rack1
UN  10.233.90.48   3.24 MiB  256     100.0%            xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx  rack1

我尝试过节点工具修复,但没有成功。

有人知道发生了什么事吗?

我尝试重新启动所有c*-节点,即k8集群中的底层节点,运行nodetool修复,但没有成功。

您描述的有时无法使用cqlsh连接到集群的症状与其说是身份验证问题,不如说是节点有时没有响应。

从您发布的日志条目中,它显示节点正在删除提示消息。如果你还记得,协调器(负责通过向所有副本发送突变来协调写入请求(存储";提示";当复制品没有确认CCD_ 1内的写入时。

A";提示";包含错过写入的复制副本的IP加上突变有效载荷。当复制品重新联机时;重放";从提示到复制品的突变(在Cassandra中称为提示切换(。

丢弃的消息是一个更大问题的症状。节点丢弃消息是因为它们过载了,而且这是一种在无法接收更多请求时进行甩负载的机制。

查找GC暂停,这是过载的另一个症状。如果pod的资源受到限制,可以考虑分配更多的RAM,这样就可以增加堆大小。此外,您可以通过添加更多节点(pod(来增加集群的容量。干杯

最新更新