Safe Node Restart for Nodes Hosting jiva-ctrl Pods¶

Service: OpenEBS Jiva iSCSI (pvek8s) First documented: 2026-05-30 PIR: pvek8s Post-Power-Outage Recovery — kubelet Volume Manager Stall and KCM Stale terminatingReplicas Linear: PGM-223

When to Use This Runbook¶

Use this runbook whenever you need to restart kubelite (or drain/taint) a node that may be hosting jiva-ctrl pods (iSCSI targets).

Why this matters: jiva-ctrl pods are iSCSI targets. When the node running them is restarted, those pods are evicted and the iSCSI target process exits. Any workload pod on another node that has an active iSCSI session to the controller will detect a TCP connection failure, enter 120-second session recovery, and — if the target does not reappear within that window — have its SCSI device go offline. The kernel's JBD2 journal then aborts and EXT4 remounts the filesystem read-only. This is a data-safe failure but requires manual recovery.

The pre-restart procedure below migrates affected workload pods before the restart, so the iSCSI sessions are already gone and there is nothing to fail over.

See jiva-ctrl-eviction-iscsi-ro-filesystem.md for recovery if the filesystem has already gone read-only.

Pre-Restart Procedure¶

Step 1 — Identify jiva-ctrl pods on the target node¶

TARGET_NODE=<node>   # e.g. k8s01

kubectl --context pvek8s get pods -n openebs -o wide --no-headers | \
  awk -v n="$TARGET_NODE" '/jiva.*ctrl/ && $7==n {print $1, $7}'

If the output is empty, no jiva-ctrl pods are on this node — skip to the Node Restart Procedure.

Example output:

pvc-746b2837-...-jiva-ctrl-0   k8s01
pvc-a3a7e012-...-jiva-ctrl-0   k8s01

Step 2 — Find nodes with active iSCSI sessions to those controllers¶

For each jiva-ctrl pod, check whether any node has a live iSCSI session to its controller service:

# Get the ClusterIP of each controller's service
# The service name shares the PV prefix with the ctrl pod name
kubectl --context pvek8s get svc -n openebs | grep "jiva-ctrl"
# → pvc-746b2837-...-jiva-ctrl-svc   ClusterIP   10.152.183.57   ...
# → pvc-a3a7e012-...-jiva-ctrl-svc   ClusterIP   10.152.183.22   ...

# Check all nodes for active sessions to those IPs
for pod in $(kubectl --context pvek8s get pods -n openebs \
    -l app=openebs-jiva-csi-node -o name); do
  echo "=== $pod ==="
  kubectl --context pvek8s exec -n openebs "$pod" -c jiva-csi-plugin -- \
    iscsiadm -m session 2>/dev/null || echo "(no sessions)"
done

Note which nodes have sessions to each controller IP. Those are the nodes hosting workload pods that must be migrated before the restart.

Step 3 — Migrate workload pods off the affected nodes¶

For each controller with active sessions on other nodes, find and delete the workload pod that holds that PVC:

# Derive the PV name from the ctrl pod name (strip -jiva-ctrl-N suffix)
CTRL_POD=pvc-746b2837-...-jiva-ctrl-0
PV_NAME=${CTRL_POD%-jiva-ctrl-*}

# Find the PVC bound to this PV
kubectl --context pvek8s get pvc -A --no-headers | awk -v pv="$PV_NAME" '$3==pv {print $1, $2}'
# → media   seerr-seerr-chart-config

# Find the pod in that namespace using that PVC
PVC_NS=media
PVC_NAME=seerr-seerr-chart-config
kubectl --context pvek8s get pods -n "$PVC_NS" -o json | \
  python3 -c "
import json,sys
data=json.load(sys.stdin)
pvc='$PVC_NAME'
for p in data['items']:
  for v in p['spec'].get('volumes',[]):
    if v.get('persistentVolumeClaim',{}).get('claimName')==pvc:
      print(p['metadata']['name'])
"

Once you have the pod name, delete it and wait for it to reschedule to a node that is not $TARGET_NODE:

kubectl --context pvek8s delete pod -n "$PVC_NS" <pod-name>

# Watch until Running on a different node
kubectl --context pvek8s get pod -n "$PVC_NS" <pod-name> -o wide -w
# → 1/1 Running on k8s02 or k8s03 (not TARGET_NODE)

StatefulSet pods do not reschedule automatically on cordoned nodes

If the node is already cordoned (or if you cordon it before deleting), StatefulSet pods will stay Pending until you uncordon another eligible node. Delete the pod before cordoning the target node so the scheduler can place it freely.

Repeat for every controller with active sessions.

Step 4 — Verify all sessions have logged out¶

Confirm no node retains an iSCSI session to the controllers that were on $TARGET_NODE:

for pod in $(kubectl --context pvek8s get pods -n openebs \
    -l app=openebs-jiva-csi-node -o name); do
  echo "=== $pod ==="
  kubectl --context pvek8s exec -n openebs "$pod" -c jiva-csi-plugin -- \
    iscsiadm -m session 2>/dev/null | grep "<controller-ClusterIP>" || echo "(none)"
done
# All nodes should show "(none)" for the affected controller IPs

Only proceed once all sessions to the affected controllers are gone.

Node Restart Procedure¶

With iSCSI sessions safely cleared, restart the node using the standard dqlite → kubelite ordering:

Cordon the node (required — prevents the kubelet watch-race stall on restart):
```
kubectl --context pvek8s cordon "$TARGET_NODE"
```
See kubelet-silent-stall.md — Failure Mode 2 for why cordoning before restart is mandatory.

Restart k8s-dqlite first, wait for it to stabilise:

ssh "$TARGET_NODE" "sudo systemctl restart snap.microk8s.daemon-k8s-dqlite.service"
# Wait until active and no 'database is locked' errors for 30s
ssh "$TARGET_NODE" "sudo systemctl is-active snap.microk8s.daemon-k8s-dqlite.service"

Restart kubelite:

ssh "$TARGET_NODE" "sudo systemctl restart snap.microk8s.daemon-kubelite.service"

Wait for node Ready:

kubectl --context pvek8s wait node/"$TARGET_NODE" --for=condition=Ready --timeout=300s

Uncordon:

kubectl --context pvek8s uncordon "$TARGET_NODE"

See kubelet-volume-manager-stall.md — Option B for the full dqlite restart safety procedure and lock-contention checks.

Post-Restart Verification¶

# Node is Ready and schedulable
kubectl --context pvek8s get node "$TARGET_NODE"
# → Ready (no SchedulingDisabled)

# jiva-ctrl pods have rescheduled and are Running
kubectl --context pvek8s get pods -n openebs -o wide | grep jiva.*ctrl
# → all Running, spread across nodes

# Workload pods that were migrated are Running with rw filesystems
kubectl --context pvek8s get pods -n <namespace> <pod-name> -o wide
# → 1/1 Running on a node other than TARGET_NODE

# iSCSI sessions re-established on the workload node
NEW_NODE=$(kubectl --context pvek8s get pod -n <namespace> <pod-name> \
  -o jsonpath='{.spec.nodeName}')
NEW_JIVA_POD=$(kubectl --context pvek8s get pods -n openebs \
  -l app=openebs-jiva-csi-node \
  -o jsonpath="{.items[?(@.spec.nodeName=='$NEW_NODE')].metadata.name}")
kubectl --context pvek8s exec -n openebs "$NEW_JIVA_POD" -c jiva-csi-plugin -- \
  iscsiadm -m session
# → tcp: [...] iqn.2016-09.com.openebs.jiva:<pvc-name> (non-flash)

# Filesystem is rw
kubectl --context pvek8s exec -n openebs "$NEW_JIVA_POD" -c jiva-csi-plugin -- \
  grep "<pvc-name>" /proc/mounts
# → should show rw in mount options, not ro

References¶

PIR: pvek8s Post-Power-Outage Recovery — Chain 4 root cause (batched jiva-ctrl eviction → EXT4 ro)
Linear: PGM-223 — this runbook
Related: jiva-ctrl-eviction-iscsi-ro-filesystem.md — recovery if the filesystem has already gone read-only (use when it's too late to migrate first)
Related: kubelet-volume-manager-stall.md — Option B: full dqlite+kubelite restart procedure and lock-contention safety checks
Related: kubelet-silent-stall.md — Failure Mode 2: why cordon-before-restart is required for kubelite restarts

Automated Option — Ansible¶

The manual procedure above remains the canonical reference and should be used when the automation is unavailable or when a volume is already degraded. For routine maintenance, the whole pre-restart migration and reboot is automated.

Option A — full reboot flow (recommended for kernel updates)¶

k8s-reboot.yml runs the controller shuffle automatically as part of its pre-flight: if jiva-ctrl pods are found on the target node it announces the migration, moves the controllers off one at a time (waiting for each JivaVolume to return to Ready), re-validates volume quorum, and only then proceeds to cordon, drain, and reboot.

cd ansible
ansible-playbook -i inventory/hosts.ini k8s-reboot.yml --limit <node>

Option B — shuffle only (no reboot)¶

The migration is also available standalone via the ansible-role-microk8s role, gated behind an explicit tag so it can never run as part of a normal role application:

cd ansible
ansible-playbook -i inventory/hosts.ini update/home.yml \
  --limit <node> --tags jiva-ctrl-shuffle

Or from another playbook:

- name: Shuffle jiva-ctrl pods off the target node
  ansible.builtin.include_role:
    name: ansible-role-microk8s
    tasks_from: jiva_ctrl_shuffle

What the automation does (and does not do)¶

Finds controllers by label (openebs.io/controller=jiva-controller for legacy 2.12 volumes, openebs.io/component=jiva-controller for 3.6 CSI volumes) — do not rely on jiva.*ctrl pod-name matching; legacy controller pods are named pvc-...-ctrl-... without "jiva"
Aborts if any JivaVolume is Syncing/Error/Unknown before starting
Cordons the node so controllers cannot reschedule back, and leaves it cordoned — Option A's reboot flow uncordons at the end; after a standalone Option B run you must kubectl uncordon <node> yourself
Moves one controller at a time: delete pod → wait for the Deployment rollout → replacement 2/2 Running on another node → JivaVolume Ready (CSI volumes; legacy volumes have no JivaVolume CR and get a settle pause instead)
Fails if any controller remains on the node afterwards
It does not migrate workload pods (runbook Steps 2–3): moving the controller itself means the iSCSI target is re-homed before the node restart, so workload sessions reconnect to the same ClusterIP. If a filesystem has already gone read-only, use jiva-ctrl-eviction-iscsi-ro-filesystem.md instead — it is too late to shuffle

Implementation: tasks/jiva_ctrl_shuffle.yml in ansible-role-microk8s (PGM-240); reboot integration in ansible/k8s-reboot.yml (PGM-239/PGM-240).