任务:
我正试图通过helm chart将dggraph(一个零和一个阿尔法(部署到kubernetes(谷歌云(。
问题:它以前工作,现在不再工作了。我看不出有什么不同。下面的日志中最好地描述了具体的错误。从本质上讲,这似乎是一个grpc/连接问题。它第一次出现是在我将gcloud集群大小(节点数(设置为0,几天后又设置为4之后,但我发现很难相信这是原因。我对这类问题不太熟悉,而且整件事的策划者已经不在了。
我以前在gdraph论坛上发帖,但由于我不确定这是不是一个dgraph问题,我在这里发帖是为了接触更广泛的群体。
我试图解决的问题:
通过helm 删除发布
helm delete --purge dgraph
并重新创建
helm install --wait --name dgraph ./charts/dgraph/
我还尝试将gcloud集群大小设置为0,然后再设置回4。没有区别。我查看了配置,它对我来说似乎很好。将其与我在各个地方找到的组合文件进行了比较,包括dgraph repo。
我有另一个docker compose文件来本地测试它,这与云部署无关,而且工作正常(本文中没有包含(。
您可以在下面找到日志和图表规范
非常感谢您的帮助!
谢谢!
Aurel
零日志:
I1204 21:27:51.539624 1 run.go:90] Setting up grpc listener at: 0.0.0.0:5080
I1204 21:27:51.539833 1 run.go:90] Setting up http listener at: 0.0.0.0:6080
badger2018/12/04 21:27:51 INFO: Replaying file id: 0 at offset: 1544608
badger2018/12/04 21:27:51 INFO: Replay took: 15.256µs
I1204 21:27:51.888823 1 node.go:152] Setting raft.Config to: &{ID:1 peers:[] ElectionTick:100 HeartbeatTick:1 Storage:0xc00015de10 Applied:0 MaxSizePerMsg:1048576 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x1d112c0}
I1204 21:27:51.892352 1 node.go:282] Found hardstate: {Term:27 Vote:1 Commit:6525 XXX_unrecognized:[]}
I1204 21:27:51.897997 1 node.go:291] Group 0 found 6526 entries
I1204 21:27:51.898218 1 raft.go:371] Restarting node for dgraphzero
I1204 21:27:51.898497 1 node.go:84] 1 became follower at term 27
I1204 21:27:51.898744 1 node.go:84] newRaft 1 [peers: [], term: 27, commit: 6525, applied: 0, lastindex: 6525, lastterm: 27]
I1204 21:27:51.902606 1 run.go:229] Running Dgraph Zero...
I1204 21:27:51.919236 1 node.go:174] Setting conf state to nodes:1
I1204 21:27:51.919599 1 raft.go:547] Done applying conf change at 1
E1204 21:27:51.921113 1 pool.go:178] Echo error from dgraph-0.dgraph.default.svc.cluster.local:7080. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.0.11.6:7080: connect: connection refused"
I1204 21:27:51.921902 1 pool.go:118] CONNECTED to dgraph-0.dgraph.default.svc.cluster.local:7080
E1204 21:27:51.921301 1 pool.go:178] Echo error from dgraph-0.dgraph.default.svc.cluster.local:7080. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.0.11.6:7080: connect: connection refused"
I1204 21:27:51.923212 1 raft.go:272] Removing tablet for attr: [value_date], gid: [1]
E1204 21:27:51.923984 1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:51.924075 1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:51.924149 1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:51.924210 1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:51.924265 1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:51.924308 1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:51.924366 1 raft.go:552] While applying proposal: Invalid address
...
E1204 21:27:52.207869 1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:52.207873 1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:52.205514 1 pool.go:178] Echo error from dgraph-0.dgraph.default.svc.cluster.local:9080. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.0.11.6:9080: connect: connection refused"
I1204 21:27:52.207897 1 pool.go:118] CONNECTED to dgraph-0.dgraph.default.svc.cluster.local:9080
E1204 21:27:52.205566 1 pool.go:178] Echo error from dgraph-0.dgraph.default.svc.cluster.local:9080. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.0.11.6:9080: connect: connection refused"
I1204 21:27:52.380095 1 zero.go:375] Got connection request: id:6062 addr:"dgraph-0.dgraph.default.svc.cluster.local:7080"
I1204 21:27:52.380886 1 zero.go:484] Connected: id:6062 addr:"dgraph-0.dgraph.default.svc.cluster.local:7080"
I1204 21:27:52.392898 1 node.go:84] 1 no leader at term 27; dropping index reading msg
I1204 21:27:54.480961 1 node.go:84] 1 is starting a new election at term 27
I1204 21:27:54.481005 1 node.go:84] 1 became pre-candidate at term 27
I1204 21:27:54.481017 1 node.go:84] 1 received MsgPreVoteResp from 1 at term 27
I1204 21:27:54.481102 1 node.go:84] 1 became candidate at term 28
I1204 21:27:54.481112 1 node.go:84] 1 received MsgVoteResp from 1 at term 28
I1204 21:27:54.481218 1 node.go:84] 1 became leader at term 28
I1204 21:27:54.481232 1 node.go:84] raft.node: 1 elected leader 1 at term 28
E1204 21:27:54.483865 1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:54.483928 1 zero.go:549] Error while applying proposal in update stream Invalid address
E1204 21:27:54.716975 1 raft.go:552] While applying proposal: Invalid address
E1204 21:27:54.717231 1 zero.go:549] Error while applying proposal in update stream Invalid address
W1204 21:27:55.393083 1 node.go:551] [1] Read index context timed out
E1204 21:28:02.208789 1 pool.go:178] Echo error from dgraph-0.dgraph.default.svc.cluster.local:9080. Err: rpc error: code = Unimplemented desc = unknown service pb.Raft
E1204 21:28:02.209086 1 pool.go:178] Echo error from dgraph-0.dgraph.default.svc.cluster.local:9080. Err: rpc error: code = Unimplemented desc = unknown service pb.Raft
E1204 21:28:21.892166 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:28:51.893023 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:29:21.892887 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:29:51.892775 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:30:21.892814 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:30:51.892810 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:31:21.892858 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:31:51.892803 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:32:21.892885 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:32:51.892669 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:32:52.417618 1 raft.go:552] While applying proposal: Invalid address
E1204 21:32:52.417962 1 zero.go:549] Error while applying proposal in update stream Invalid address
E1204 21:33:21.892766 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:33:51.892865 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:34:21.892804 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:34:51.892788 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:35:21.892866 1 oracle.go:425] No healthy connection found to leader of group 2
I1204 21:35:51.892321 1 tablet.go:189]
Groups sorted by size: [{gid:2 size:0} {gid:1 size:80673}]
I1204 21:35:51.892359 1 tablet.go:194] size_diff 80673
I1204 21:35:51.892391 1 tablet.go:83] Going to move predicate: [_predicate_], size: [32 kB] from group 1 to 2
E1204 21:35:51.893181 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:35:51.917329 1 tablet.go:231] Got error during move: While calling MovePredicate: rpc error: code = Unknown desc = Group id doesn't match, received request for 1, my gid: 2
E1204 21:35:51.919971 1 tablet.go:70] Error while trying to move predicate _predicate_ from 1 to 2: While calling MovePredicate: rpc error: code = Unknown desc = Group id doesn't match, received request for 1, my gid: 2
E1204 21:36:21.892883 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:36:51.892766 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:37:21.892853 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:37:51.892927 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:37:52.420512 1 raft.go:552] While applying proposal: Invalid address
E1204 21:37:52.420817 1 zero.go:549] Error while applying proposal in update stream Invalid address
E1204 21:38:21.892801 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:38:51.892913 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:39:21.892727 1 oracle.go:425] No healthy connection found to leader of group 2
E1204 21:39:51.892272 1 oracle.go:425] No healthy connection found to leader of group 2
alpha日志:
++ hostname -f
+ dgraph alpha --my=dgraph-0.dgraph.default.svc.cluster.local:7080 --lru_mb 2048 --zero dgraph-0.dgraph.default.svc.cluster.local:5080
I1204 21:27:52.274206 1 init.go:80]
Dgraph version : v1.0.10
Commit SHA-1 : 8b801bd7
Commit timestamp : 2018-11-05 17:52:33 -0800
Branch : HEAD
For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph , visit https://discuss.dgraph.io.
To say hi to the community , visit https://dgraph.slack.com.
Licensed under Apache 2.0. Copyright 2015-2018 Dgraph Labs, Inc.
I1204 21:27:52.295997 1 server.go:115] Setting Badger table load option: mmap
I1204 21:27:52.296163 1 server.go:127] Setting Badger value log load option: mmap
I1204 21:27:52.296229 1 server.go:155] Opening write-ahead log BadgerDB with options: {Dir:w ValueDir:w SyncWrites:true TableLoadingMode:1 ValueLogLoadingMode:2 NumVersionsToKeep:1 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:65500 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:10000 NumCompactors:3 managedTxns:false DoNotCompact:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true}
badger2018/12/04 21:27:52 INFO: Replaying file id: 0 at offset: 12977
badger2018/12/04 21:27:52 INFO: Replay took: 10.567µs
I1204 21:27:52.322077 1 server.go:115] Setting Badger table load option: mmap
I1204 21:27:52.322103 1 server.go:127] Setting Badger value log load option: mmap
I1204 21:27:52.322108 1 server.go:169] Opening postings BadgerDB with options: {Dir:p ValueDir:p SyncWrites:true TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:1024 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:3 managedTxns:false DoNotCompact:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true}
badger2018/12/04 21:27:52 INFO: Replaying file id: 0 at offset: 0
badger2018/12/04 21:27:52 INFO: Replay took: 18.232µs
I1204 21:27:52.376726 1 run.go:338] gRPC server started. Listening on port 9080
I1204 21:27:52.376848 1 run.go:339] HTTP server started. Listening on port 8080
I1204 21:27:52.377184 1 groups.go:92] Current Raft Id: 6062
I1204 21:27:52.377898 1 worker.go:80] Worker listening at address: [::]:7080
I1204 21:27:52.379669 1 pool.go:118] CONNECTED to dgraph-0.dgraph.default.svc.cluster.local:5080
I1204 21:27:52.381207 1 groups.go:119] Connected to group zero. Assigned group: 0
E1204 21:27:52.382305 1 pool.go:178] Echo error from dgraph-0.dgraph.default.svc.cluster.local:9080. Err: rpc error: code = Unimplemented desc = unknown service pb.Raft
I1204 21:27:52.382655 1 pool.go:118] CONNECTED to dgraph-0.dgraph.default.svc.cluster.local:9080
I1204 21:27:52.390886 1 draft.go:74] Node ID: 6062 with GroupID: 2
I1204 21:27:52.391199 1 node.go:152] Setting raft.Config to: &{ID:6062 peers:[] ElectionTick:100 HeartbeatTick:1 Storage:0xc00008fe10 Applied:22 MaxSizePerMsg:1048576 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x1d112c0}
I1204 21:27:52.391360 1 node.go:271] Found Snapshot.Metadata: {ConfState:{Nodes:[6062] XXX_unrecognized:[]} Index:22 Term:11 XXX_unrecognized:[]}
I1204 21:27:52.391445 1 node.go:282] Found hardstate: {Term:12 Vote:6062 Commit:25 XXX_unrecognized:[]}
I1204 21:27:52.391534 1 node.go:291] Group 2 found 4 entries
I1204 21:27:52.391574 1 draft.go:1047] Restarting node for group: 2
I1204 21:27:52.391638 1 node.go:174] Setting conf state to nodes:6062
I1204 21:27:52.391909 1 node.go:84] 17ae became follower at term 12
I1204 21:27:52.392015 1 node.go:84] newRaft 17ae [peers: [17ae], term: 12, commit: 25, applied: 22, lastindex: 25, lastterm: 12]
I1204 21:27:52.392285 1 groups.go:519] Got address of a Zero server: dgraph-0.dgraph.default.svc.cluster.local:5080
I1204 21:27:52.394939 1 draft.go:313] Skipping snapshot at 22, because found one at 22
I1204 21:27:54.712797 1 node.go:84] 17ae is starting a new election at term 12
I1204 21:27:54.713220 1 node.go:84] 17ae became pre-candidate at term 12
I1204 21:27:54.713303 1 node.go:84] 17ae received MsgPreVoteResp from 17ae at term 12
I1204 21:27:54.713474 1 node.go:84] 17ae became candidate at term 13
I1204 21:27:54.713564 1 node.go:84] 17ae received MsgVoteResp from 17ae at term 13
I1204 21:27:54.713821 1 node.go:84] 17ae became leader at term 13
I1204 21:27:54.713954 1 node.go:84] raft.node: 17ae elected leader 17ae at term 13
I1204 21:27:55.392399 1 groups.go:718] Leader idx=6062 of group=2 is connecting to Zero for txn updates
W1204 21:27:55.392803 1 groups.go:723] WARNING: We don't have address of any dgraphzero leader.
I1204 21:27:56.393134 1 groups.go:718] Leader idx=6062 of group=2 is connecting to Zero for txn updates
E1204 21:27:56.397090 1 draft.go:467] Lastcommit 10337 > current 10002. This would cause some commits to be lost.
E1204 21:28:02.383404 1 pool.go:178] Echo error from dgraph-0.dgraph.default.svc.cluster.local:9080. Err: rpc error: code = Unimplemented desc = unknown service pb.Raft
图表规定如下:
statefulset.yml:
# This StatefulSet runs 1 pod with one Zero, one Server
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: dgraph
spec:
serviceName: "dgraph"
replicas: 1
selector:
matchLabels:
app: dgraph
template:
metadata:
labels:
app: dgraph
spec:
{{- if .Values.server.initData.image }}
initContainers:
- name: init-schema
image: {{ .Values.server.initData.image }}
command: ['curl', '-X', 'POST', '-H', 'X-Dgraph-CommitNow:true', '--data-binary', '@graph/schema.txt', '{{ .Values.service.name }}.default.svc.cluster.local/alter']
- name: init-data
image: {{ .Values.server.initData.image }}
command: ['curl', '-X', 'POST', '-H', 'X-Dgraph-CommitNow:true', '--data-binary', '@graph/data.txt', '{{ .Values.service.name }}.default.svc.cluster.local/mutate']
{{- end }}
containers:
- name: zero
image: {{ template "dgraph.image" . }}
imagePullPolicy: {{ .Values.image.pullPolicy | quote }}
ports:
- containerPort: {{ .Values.service.ports.zeroGrpc }}
name: zero-grpc
- containerPort: {{ .Values.service.ports.zeroHttp }}
name: zero-http
volumeMounts:
- name: datadir
mountPath: /dgraph
command:
- bash
- "-c"
- |
set -ex
dgraph zero --my=$(hostname -f):{{ .Values.service.ports.zeroGrpc }}
- name: server
image: {{ template "dgraph.image" . }}
imagePullPolicy: {{ .Values.image.pullPolicy | quote }}
ports:
- containerPort: {{ .Values.service.ports.serverHttp }}
name: server-http
- containerPort: {{ .Values.service.ports.serverGrpc }}
name: server-grpc
volumeMounts:
- name: datadir
mountPath: /dgraph
command:
- bash
- "-c"
- |
set -ex
dgraph alpha --my=$(hostname -f):{{ .Values.server.port }} --lru_mb {{ .Values.server.lruSizeMB }} --zero {{ .Values.server.zeroDns }}:{{ .Values.service.ports.zeroGrpc }}
terminationGracePeriodSeconds: 60
volumes:
- name: datadir
persistentVolumeClaim:
claimName: datadir
updateStrategy:
type: RollingUpdate
volumeClaimTemplates:
- metadata:
name: datadir
annotations:
volume.alpha.kubernetes.io/storage-class: anything
spec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: {{ .Values.storage.size }}
values.yml:
image:
registry: docker.io
repository: dgraph/dgraph
tag: latest
pullPolicy: Always
service:
name: dgraph-service
ports:
zeroGrpc: 5080
zeroHttp: 6080
serverHttp: 8080
serverGrpc: 9080
server:
# Estimate of the LRU cache size in MB. It’s recommended to set lru_mb to one-third the available RAM.
lruSizeMB: 2048
zeroDns: dgraph-0.dgraph.default.svc.cluster.local
port: 7080
initData:
image: ""
#image: "registry.gitlab.com/organisation/project/backend:latest"
storage:
size: 5Gi
我解决了这个问题。这确实是一个图形问题。我忽略了一个事实,即持久卷声明用于存储。因此,删除并重新安装容器并没有解决问题。我擦去了存储卷(也就是说,我删除了dgraph创建的p w zw文件夹(,瞧,一切都恢复了!
dgraph.io论坛上的帖子可以在这里找到:https://discuss.dgraph.io/t/dgraph-deployment-via-helm-not-working-anymore/3692/2?u=aurel