gcloud/kubernetes上的dgraph-helm部署不再有效:应用提议时:地址无效



任务:

我正试图通过helm chart将dggraph(一个零和一个阿尔法(部署到kubernetes(谷歌云(。

问题:它以前工作,现在不再工作了。我看不出有什么不同。下面的日志中最好地描述了具体的错误。从本质上讲,这似乎是一个grpc/连接问题。它第一次出现是在我将gcloud集群大小(节点数(设置为0,几天后又设置为4之后,但我发现很难相信这是原因。我对这类问题不太熟悉,而且整件事的策划者已经不在了。

我以前在gdraph论坛上发帖,但由于我不确定这是不是一个dgraph问题,我在这里发帖是为了接触更广泛的群体。

我试图解决的问题:

通过helm 删除发布

helm delete --purge dgraph 

并重新创建

helm install --wait --name dgraph ./charts/dgraph/

我还尝试将gcloud集群大小设置为0,然后再设置回4。没有区别。我查看了配置,它对我来说似乎很好。将其与我在各个地方找到的组合文件进行了比较,包括dgraph repo。

我有另一个docker compose文件来本地测试它,这与云部署无关,而且工作正常(本文中没有包含(。

您可以在下面找到日志和图表规范

非常感谢您的帮助!

谢谢!

Aurel

零日志:

I1204 21:27:51.539624       1 run.go:90] Setting up grpc listener at: 0.0.0.0:5080
I1204 21:27:51.539833       1 run.go:90] Setting up http listener at: 0.0.0.0:6080
badger2018/12/04 21:27:51 INFO: Replaying file id: 0 at offset: 1544608
badger2018/12/04 21:27:51 INFO: Replay took: 15.256µs
I1204 21:27:51.888823       1 node.go:152] Setting raft.Config to: &{ID:1 peers:[] ElectionTick:100 HeartbeatTick:1 Storage:0xc00015de10 Applied:0 MaxSizePerMsg:1048576 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x1d112c0}
I1204 21:27:51.892352       1 node.go:282] Found hardstate: {Term:27 Vote:1 Commit:6525 XXX_unrecognized:[]}
I1204 21:27:51.897997       1 node.go:291] Group 0 found 6526 entries
I1204 21:27:51.898218       1 raft.go:371] Restarting node for dgraphzero
I1204 21:27:51.898497       1 node.go:84] 1 became follower at term 27
I1204 21:27:51.898744       1 node.go:84] newRaft 1 [peers: [], term: 27, commit: 6525, applied: 0, lastindex: 6525, lastterm: 27]
I1204 21:27:51.902606       1 run.go:229] Running Dgraph Zero...
I1204 21:27:51.919236       1 node.go:174] Setting conf state to nodes:1
I1204 21:27:51.919599       1 raft.go:547] Done applying conf change at 1
E1204 21:27:51.921113       1 pool.go:178] Echo error from dgraph-0.dgraph.default.svc.cluster.local:7080. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.0.11.6:7080: connect: connection refused"
I1204 21:27:51.921902       1 pool.go:118] CONNECTED to dgraph-0.dgraph.default.svc.cluster.local:7080
E1204 21:27:51.921301       1 pool.go:178] Echo error from dgraph-0.dgraph.default.svc.cluster.local:7080. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.0.11.6:7080: connect: connection refused"
I1204 21:27:51.923212       1 raft.go:272] Removing tablet for attr: [value_date], gid: [1]
E1204 21:27:51.923984       1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:51.924075       1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:51.924149       1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:51.924210       1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:51.924265       1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:51.924308       1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:51.924366       1 raft.go:552] While applying proposal: Invalid address
...
E1204 21:27:52.207869       1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:52.207873       1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:52.205514       1 pool.go:178] Echo error from dgraph-0.dgraph.default.svc.cluster.local:9080. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.0.11.6:9080: connect: connection refused"
I1204 21:27:52.207897       1 pool.go:118] CONNECTED to dgraph-0.dgraph.default.svc.cluster.local:9080
E1204 21:27:52.205566       1 pool.go:178] Echo error from dgraph-0.dgraph.default.svc.cluster.local:9080. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.0.11.6:9080: connect: connection refused"
I1204 21:27:52.380095       1 zero.go:375] Got connection request: id:6062 addr:"dgraph-0.dgraph.default.svc.cluster.local:7080"
I1204 21:27:52.380886       1 zero.go:484] Connected: id:6062 addr:"dgraph-0.dgraph.default.svc.cluster.local:7080"
I1204 21:27:52.392898       1 node.go:84] 1 no leader at term 27; dropping index reading msg
I1204 21:27:54.480961       1 node.go:84] 1 is starting a new election at term 27
I1204 21:27:54.481005       1 node.go:84] 1 became pre-candidate at term 27
I1204 21:27:54.481017       1 node.go:84] 1 received MsgPreVoteResp from 1 at term 27
I1204 21:27:54.481102       1 node.go:84] 1 became candidate at term 28
I1204 21:27:54.481112       1 node.go:84] 1 received MsgVoteResp from 1 at term 28
I1204 21:27:54.481218       1 node.go:84] 1 became leader at term 28
I1204 21:27:54.481232       1 node.go:84] raft.node: 1 elected leader 1 at term 28
E1204 21:27:54.483865       1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:54.483928       1 zero.go:549] Error while applying proposal in update stream Invalid address
E1204 21:27:54.716975       1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:54.717231       1 zero.go:549] Error while applying proposal in update stream Invalid address
W1204 21:27:55.393083       1 node.go:551] [1] Read index context timed out
E1204 21:28:02.208789       1 pool.go:178] Echo error from dgraph-0.dgraph.default.svc.cluster.local:9080. Err: rpc error: code = Unimplemented desc = unknown service pb.Raft
E1204 21:28:02.209086       1 pool.go:178] Echo error from dgraph-0.dgraph.default.svc.cluster.local:9080. Err: rpc error: code = Unimplemented desc = unknown service pb.Raft
E1204 21:28:21.892166       1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:28:51.893023       1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:29:21.892887       1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:29:51.892775       1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:30:21.892814       1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:30:51.892810       1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:31:21.892858       1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:31:51.892803       1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:32:21.892885       1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:32:51.892669       1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:32:52.417618       1 raft.go:552] While applying proposal: Invalid address
E1204 21:32:52.417962       1 zero.go:549] Error while applying proposal in update stream Invalid address
E1204 21:33:21.892766       1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:33:51.892865       1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:34:21.892804       1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:34:51.892788       1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:35:21.892866       1 oracle.go:425] No healthy connection found to leader of group 2
I1204 21:35:51.892321       1 tablet.go:189]
Groups sorted by size: [{gid:2 size:0} {gid:1 size:80673}]
I1204 21:35:51.892359       1 tablet.go:194] size_diff 80673
I1204 21:35:51.892391       1 tablet.go:83] Going to move predicate: [_predicate_], size: [32 kB] from group 1 to 2
E1204 21:35:51.893181       1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:35:51.917329       1 tablet.go:231] Got error during move: While calling MovePredicate: rpc error: code = Unknown desc = Group id doesn't match, received request for 1, my gid: 2
E1204 21:35:51.919971       1 tablet.go:70] Error while trying to move predicate _predicate_ from 1 to 2: While calling MovePredicate: rpc error: code = Unknown desc = Group id doesn't match, received request for 1, my gid: 2
E1204 21:36:21.892883       1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:36:51.892766       1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:37:21.892853       1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:37:51.892927       1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:37:52.420512       1 raft.go:552] While applying proposal: Invalid address
E1204 21:37:52.420817       1 zero.go:549] Error while applying proposal in update stream Invalid address
E1204 21:38:21.892801       1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:38:51.892913       1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:39:21.892727       1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:39:51.892272       1 oracle.go:425] No healthy connection found to leader of group 2

alpha日志:

++ hostname -f
+ dgraph alpha --my=dgraph-0.dgraph.default.svc.cluster.local:7080 --lru_mb 2048 --zero dgraph-0.dgraph.default.svc.cluster.local:5080
I1204 21:27:52.274206       1 init.go:80]
Dgraph version   : v1.0.10
Commit SHA-1     : 8b801bd7
Commit timestamp : 2018-11-05 17:52:33 -0800
Branch           : HEAD
For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph     , visit https://discuss.dgraph.io.
To say hi to the community       , visit https://dgraph.slack.com.
Licensed under Apache 2.0. Copyright 2015-2018 Dgraph Labs, Inc.

I1204 21:27:52.295997       1 server.go:115] Setting Badger table load option: mmap
I1204 21:27:52.296163       1 server.go:127] Setting Badger value log load option: mmap
I1204 21:27:52.296229       1 server.go:155] Opening write-ahead log BadgerDB with options: {Dir:w ValueDir:w SyncWrites:true TableLoadingMode:1 ValueLogLoadingMode:2 NumVersionsToKeep:1 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:65500 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:10000 NumCompactors:3 managedTxns:false DoNotCompact:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true}
badger2018/12/04 21:27:52 INFO: Replaying file id: 0 at offset: 12977
badger2018/12/04 21:27:52 INFO: Replay took: 10.567µs
I1204 21:27:52.322077       1 server.go:115] Setting Badger table load option: mmap
I1204 21:27:52.322103       1 server.go:127] Setting Badger value log load option: mmap
I1204 21:27:52.322108       1 server.go:169] Opening postings BadgerDB with options: {Dir:p ValueDir:p SyncWrites:true TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:1024 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:3 managedTxns:false DoNotCompact:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true}
badger2018/12/04 21:27:52 INFO: Replaying file id: 0 at offset: 0
badger2018/12/04 21:27:52 INFO: Replay took: 18.232µs
I1204 21:27:52.376726       1 run.go:338] gRPC server started.  Listening on port 9080
I1204 21:27:52.376848       1 run.go:339] HTTP server started.  Listening on port 8080
I1204 21:27:52.377184       1 groups.go:92] Current Raft Id: 6062
I1204 21:27:52.377898       1 worker.go:80] Worker listening at address: [::]:7080
I1204 21:27:52.379669       1 pool.go:118] CONNECTED to dgraph-0.dgraph.default.svc.cluster.local:5080
I1204 21:27:52.381207       1 groups.go:119] Connected to group zero. Assigned group: 0
E1204 21:27:52.382305       1 pool.go:178] Echo error from dgraph-0.dgraph.default.svc.cluster.local:9080. Err: rpc error: code = Unimplemented desc = unknown service pb.Raft
I1204 21:27:52.382655       1 pool.go:118] CONNECTED to dgraph-0.dgraph.default.svc.cluster.local:9080
I1204 21:27:52.390886       1 draft.go:74] Node ID: 6062 with GroupID: 2
I1204 21:27:52.391199       1 node.go:152] Setting raft.Config to: &{ID:6062 peers:[] ElectionTick:100 HeartbeatTick:1 Storage:0xc00008fe10 Applied:22 MaxSizePerMsg:1048576 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x1d112c0}
I1204 21:27:52.391360       1 node.go:271] Found Snapshot.Metadata: {ConfState:{Nodes:[6062] XXX_unrecognized:[]} Index:22 Term:11 XXX_unrecognized:[]}
I1204 21:27:52.391445       1 node.go:282] Found hardstate: {Term:12 Vote:6062 Commit:25 XXX_unrecognized:[]}
I1204 21:27:52.391534       1 node.go:291] Group 2 found 4 entries
I1204 21:27:52.391574       1 draft.go:1047] Restarting node for group: 2
I1204 21:27:52.391638       1 node.go:174] Setting conf state to nodes:6062
I1204 21:27:52.391909       1 node.go:84] 17ae became follower at term 12
I1204 21:27:52.392015       1 node.go:84] newRaft 17ae [peers: [17ae], term: 12, commit: 25, applied: 22, lastindex: 25, lastterm: 12]
I1204 21:27:52.392285       1 groups.go:519] Got address of a Zero server: dgraph-0.dgraph.default.svc.cluster.local:5080
I1204 21:27:52.394939       1 draft.go:313] Skipping snapshot at 22, because found one at 22
I1204 21:27:54.712797       1 node.go:84] 17ae is starting a new election at term 12
I1204 21:27:54.713220       1 node.go:84] 17ae became pre-candidate at term 12
I1204 21:27:54.713303       1 node.go:84] 17ae received MsgPreVoteResp from 17ae at term 12
I1204 21:27:54.713474       1 node.go:84] 17ae became candidate at term 13
I1204 21:27:54.713564       1 node.go:84] 17ae received MsgVoteResp from 17ae at term 13
I1204 21:27:54.713821       1 node.go:84] 17ae became leader at term 13
I1204 21:27:54.713954       1 node.go:84] raft.node: 17ae elected leader 17ae at term 13
I1204 21:27:55.392399       1 groups.go:718] Leader idx=6062 of group=2 is connecting to Zero for txn updates
W1204 21:27:55.392803       1 groups.go:723] WARNING: We don't have address of any dgraphzero leader.
I1204 21:27:56.393134       1 groups.go:718] Leader idx=6062 of group=2 is connecting to Zero for txn updates
E1204 21:27:56.397090       1 draft.go:467] Lastcommit 10337 > current 10002. This would cause some commits to be lost.
E1204 21:28:02.383404       1 pool.go:178] Echo error from dgraph-0.dgraph.default.svc.cluster.local:9080. Err: rpc error: code = Unimplemented desc = unknown service pb.Raft

图表规定如下:

statefulset.yml:

# This StatefulSet runs 1 pod with one Zero, one Server
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: dgraph
spec:
serviceName: "dgraph"
replicas: 1
selector:
matchLabels:
app: dgraph
template:
metadata:
labels:
app: dgraph
spec:
{{- if .Values.server.initData.image }}
initContainers:
- name: init-schema
image: {{ .Values.server.initData.image }}
command: ['curl', '-X', 'POST', '-H', 'X-Dgraph-CommitNow:true', '--data-binary', '@graph/schema.txt', '{{ .Values.service.name }}.default.svc.cluster.local/alter']
- name: init-data
image: {{ .Values.server.initData.image }}
command: ['curl', '-X', 'POST', '-H', 'X-Dgraph-CommitNow:true', '--data-binary', '@graph/data.txt', '{{ .Values.service.name }}.default.svc.cluster.local/mutate']
{{- end }}
containers:
- name: zero
image: {{ template "dgraph.image" . }}
imagePullPolicy: {{ .Values.image.pullPolicy | quote }}
ports:
- containerPort: {{ .Values.service.ports.zeroGrpc }}
name: zero-grpc
- containerPort: {{ .Values.service.ports.zeroHttp }}
name: zero-http
volumeMounts:
- name: datadir
mountPath: /dgraph
command:
- bash
- "-c"
- |
set -ex
dgraph zero --my=$(hostname -f):{{ .Values.service.ports.zeroGrpc }}
- name: server
image: {{ template "dgraph.image" . }}
imagePullPolicy: {{ .Values.image.pullPolicy | quote }}
ports:
- containerPort: {{ .Values.service.ports.serverHttp }}
name: server-http
- containerPort: {{ .Values.service.ports.serverGrpc }}
name: server-grpc
volumeMounts:
- name: datadir
mountPath: /dgraph
command:
- bash
- "-c"
- |
set -ex
dgraph alpha --my=$(hostname -f):{{ .Values.server.port }} --lru_mb {{ .Values.server.lruSizeMB }} --zero {{ .Values.server.zeroDns }}:{{ .Values.service.ports.zeroGrpc }}
terminationGracePeriodSeconds: 60
volumes:
- name: datadir
persistentVolumeClaim:
claimName: datadir
updateStrategy:
type: RollingUpdate
volumeClaimTemplates:
- metadata:
name: datadir
annotations:
volume.alpha.kubernetes.io/storage-class: anything
spec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: {{ .Values.storage.size }}

values.yml:

image:
registry: docker.io
repository: dgraph/dgraph
tag: latest
pullPolicy: Always
service:
name: dgraph-service
ports:
zeroGrpc: 5080
zeroHttp: 6080
serverHttp: 8080
serverGrpc: 9080
server:
# Estimate of the LRU cache size in MB. It’s recommended to set lru_mb to one-third the available RAM.
lruSizeMB: 2048
zeroDns: dgraph-0.dgraph.default.svc.cluster.local
port: 7080
initData:
image: ""
#image: "registry.gitlab.com/organisation/project/backend:latest"
storage:
size: 5Gi

我解决了这个问题。这确实是一个图形问题。我忽略了一个事实,即持久卷声明用于存储。因此,删除并重新安装容器并没有解决问题。我擦去了存储卷(也就是说,我删除了dgraph创建的p w zw文件夹(,瞧,一切都恢复了!

dgraph.io论坛上的帖子可以在这里找到:https://discuss.dgraph.io/t/dgraph-deployment-via-helm-not-working-anymore/3692/2?u=aurel

最新更新