Pod Eviction & Replacement
This chapter specifies the rules around evicting pods from nodes and restarting or replacing them.
Eviction
Eviction is the process of removing a pod that is running on a node from that node.
This is typically the result of a drain action (kubectl drain
) or from a taint being added to a node (either automatically by Kubernetes or manually by an operator).
Replacement
Replacement is the process of replacing a pod by another pod that takes over the responsibilities of the original pod.
The replacement pod has a new ID and new (read empty) persistent data.
Note that replacing a pod is different from restarting a pod. A pod is restarted when it has been reported to have termined.
NoExecute Tolerations
NoExecute tolerations are used to control the behavior of Kubernetes (wrt. to a Pod) when the node that the pod is running on is no longer reachable or becomes not-ready.
See the applicable Kubernetes documentation for more info.
Rules
The rules for eviction & replacement are specified per type of pod.
Image ID Pods
The Image ID pods are started to fetch the ArangoDB version of a specific ArangoDB image and fetch the docker sha256 of that image. They have no persistent state.
- Image ID pods can always be evicted from any node
- Image ID pods can always be restarted on a different node. There is no need to replace an image ID pod, nor will it cause problems when 2 image ID pods run at the same time.
node.kubernetes.io/unreachable:NoExecute
toleration time is set very low (5sec)node.kubernetes.io/not-ready:NoExecute
toleration time is set very low (5sec)
Coordinator Pods
Coordinator pods run an ArangoDB coordinator as part of an ArangoDB cluster. They have no persistent state, but do have a unique ID.
- Coordinator pods can always be evicted from any node
- Coordinator pods can always be replaced with another coordinator pod with a different ID on a different node
node.kubernetes.io/unreachable:NoExecute
toleration time is set low (15sec)node.kubernetes.io/not-ready:NoExecute
toleration time is set low (15sec)
DBServer Pods
DBServer pods run an ArangoDB dbserver as part of an ArangoDB cluster. It has persistent state potentially tied to the node it runs on and it has a unique ID.
- DBServer pods can be evicted from any node as soon as:
- It has been completely drained AND
- It is no longer the shard master for any shard
- DBServer pods can be replaced with another dbserver pod with a different ID on a different node when:
- It is not the shard master for any shard OR
- For every shard it is the master for, there is an in-sync follower
node.kubernetes.io/unreachable:NoExecute
toleration time is set high to “wait it out a while” (5min)node.kubernetes.io/not-ready:NoExecute
toleration time is set high to “wait it out a while” (5min)
Agent Pods
Agent pods run an ArangoDB dbserver as part of an ArangoDB agency. It has persistent state potentially tight to the node it runs on and it has a unique ID.
- Agent pods can be evicted from any node as soon as:
- It is no longer the agency leader AND
- There is at least an agency leader that is responding AND
- There is at least an agency follower that is responding
- Agent pods can be replaced with another agent pod with the same ID but wiped persistent state on a different node when:
- The old pod is known to be deleted (e.g. explicit eviction)
node.kubernetes.io/unreachable:NoExecute
toleration time is set high to “wait it out a while” (5min)node.kubernetes.io/not-ready:NoExecute
toleration time is set high to “wait it out a while” (5min)
Single Server Pods
Single server pods run an ArangoDB server as part of an ArangoDB single server deployment. It has persistent state potentially tied to the node.
- Single server pods cannot be evicted from any node.
- Single server pods cannot be replaced with another pod.
node.kubernetes.io/unreachable:NoExecute
toleration time is not set to “wait it out forever”node.kubernetes.io/not-ready:NoExecute
toleration time is not set “wait it out forever”
Single Pods in Active Failover Deployment
Single pods run an ArangoDB single server as part of an ArangoDB active failover deployment. It has persistent state potentially tied to the node it runs on and it has a unique ID.
- Single pods can be evicted from any node as soon as:
- It is a follower of an active-failover deployment (Q: can we trigger this failover to another server?)
- Single pods can always be replaced with another single pod with a different ID on a different node.
node.kubernetes.io/unreachable:NoExecute
toleration time is set high to “wait it out a while” (5min)node.kubernetes.io/not-ready:NoExecute
toleration time is set high to “wait it out a while” (5min)
SyncMaster Pods
SyncMaster pods run an ArangoSync as master as part of an ArangoDB DC2DC cluster. They have no persistent state, but do have a unique address.
- SyncMaster pods can always be evicted from any node
- SyncMaster pods can always be replaced with another syncmaster pod on a different node
node.kubernetes.io/unreachable:NoExecute
toleration time is set low (15sec)node.kubernetes.io/not-ready:NoExecute
toleration time is set low (15sec)
SyncWorker Pods
SyncWorker pods run an ArangoSync as worker as part of an ArangoDB DC2DC cluster. They have no persistent state, but do have in-memory state and a unique address.
- SyncWorker pods can always be evicted from any node
- SyncWorker pods can always be replaced with another syncworker pod on a different node
node.kubernetes.io/unreachable:NoExecute
toleration time is set a bit higher to try to avoid resynchronization (1min)node.kubernetes.io/not-ready:NoExecute
toleration time is set a bit higher to try to avoid resynchronization (1min)