Skip to content

Safe Node Restart for Nodes Hosting jiva-ctrl Pods

Service: OpenEBS Jiva iSCSI (pvek8s) First documented: 2026-05-30 PIR: pvek8s Post-Power-Outage Recovery — kubelet Volume Manager Stall and KCM Stale terminatingReplicas Linear: PGM-223


When to Use This Runbook

Use this runbook whenever you need to restart kubelite (or drain/taint) a node that may be hosting jiva-ctrl pods (iSCSI targets).

Why this matters: jiva-ctrl pods are iSCSI targets. When the node running them is restarted, those pods are evicted and the iSCSI target process exits. Any workload pod on another node that has an active iSCSI session to the controller will detect a TCP connection failure, enter 120-second session recovery, and — if the target does not reappear within that window — have its SCSI device go offline. The kernel's JBD2 journal then aborts and EXT4 remounts the filesystem read-only. This is a data-safe failure but requires manual recovery.

The pre-restart procedure below migrates affected workload pods before the restart, so the iSCSI sessions are already gone and there is nothing to fail over.

See jiva-ctrl-eviction-iscsi-ro-filesystem.md for recovery if the filesystem has already gone read-only.


Pre-Restart Procedure

Step 1 — Identify jiva-ctrl pods on the target node

TARGET_NODE=<node>   # e.g. k8s01

kubectl --context pvek8s get pods -n openebs -o wide --no-headers | \
  awk -v n="$TARGET_NODE" '/jiva.*ctrl/ && $7==n {print $1, $7}'

If the output is empty, no jiva-ctrl pods are on this node — skip to the Node Restart Procedure.

Example output:

pvc-746b2837-...-jiva-ctrl-0   k8s01
pvc-a3a7e012-...-jiva-ctrl-0   k8s01

Step 2 — Find nodes with active iSCSI sessions to those controllers

For each jiva-ctrl pod, check whether any node has a live iSCSI session to its controller service:

# Get the ClusterIP of each controller's service
# The service name shares the PV prefix with the ctrl pod name
kubectl --context pvek8s get svc -n openebs | grep "jiva-ctrl"
# → pvc-746b2837-...-jiva-ctrl-svc   ClusterIP   10.152.183.57   ...
# → pvc-a3a7e012-...-jiva-ctrl-svc   ClusterIP   10.152.183.22   ...

# Check all nodes for active sessions to those IPs
for pod in $(kubectl --context pvek8s get pods -n openebs \
    -l app=openebs-jiva-csi-node -o name); do
  echo "=== $pod ==="
  kubectl --context pvek8s exec -n openebs "$pod" -c jiva-csi-plugin -- \
    iscsiadm -m session 2>/dev/null || echo "(no sessions)"
done

Note which nodes have sessions to each controller IP. Those are the nodes hosting workload pods that must be migrated before the restart.

Step 3 — Migrate workload pods off the affected nodes

For each controller with active sessions on other nodes, find and delete the workload pod that holds that PVC:

# Derive the PV name from the ctrl pod name (strip -jiva-ctrl-N suffix)
CTRL_POD=pvc-746b2837-...-jiva-ctrl-0
PV_NAME=${CTRL_POD%-jiva-ctrl-*}

# Find the PVC bound to this PV
kubectl --context pvek8s get pvc -A --no-headers | awk -v pv="$PV_NAME" '$3==pv {print $1, $2}'
# → media   seerr-seerr-chart-config

# Find the pod in that namespace using that PVC
PVC_NS=media
PVC_NAME=seerr-seerr-chart-config
kubectl --context pvek8s get pods -n "$PVC_NS" -o json | \
  python3 -c "
import json,sys
data=json.load(sys.stdin)
pvc='$PVC_NAME'
for p in data['items']:
  for v in p['spec'].get('volumes',[]):
    if v.get('persistentVolumeClaim',{}).get('claimName')==pvc:
      print(p['metadata']['name'])
"

Once you have the pod name, delete it and wait for it to reschedule to a node that is not $TARGET_NODE:

kubectl --context pvek8s delete pod -n "$PVC_NS" <pod-name>

# Watch until Running on a different node
kubectl --context pvek8s get pod -n "$PVC_NS" <pod-name> -o wide -w
# → 1/1 Running on k8s02 or k8s03 (not TARGET_NODE)

StatefulSet pods do not reschedule automatically on cordoned nodes

If the node is already cordoned (or if you cordon it before deleting), StatefulSet pods will stay Pending until you uncordon another eligible node. Delete the pod before cordoning the target node so the scheduler can place it freely.

Repeat for every controller with active sessions.

Step 4 — Verify all sessions have logged out

Confirm no node retains an iSCSI session to the controllers that were on $TARGET_NODE:

for pod in $(kubectl --context pvek8s get pods -n openebs \
    -l app=openebs-jiva-csi-node -o name); do
  echo "=== $pod ==="
  kubectl --context pvek8s exec -n openebs "$pod" -c jiva-csi-plugin -- \
    iscsiadm -m session 2>/dev/null | grep "<controller-ClusterIP>" || echo "(none)"
done
# All nodes should show "(none)" for the affected controller IPs

Only proceed once all sessions to the affected controllers are gone.


Node Restart Procedure

With iSCSI sessions safely cleared, restart the node using the standard dqlite → kubelite ordering:

  1. Cordon the node (required — prevents the kubelet watch-race stall on restart):

    kubectl --context pvek8s cordon "$TARGET_NODE"
    

    See kubelet-silent-stall.md — Failure Mode 2 for why cordoning before restart is mandatory.

  2. Restart k8s-dqlite first, wait for it to stabilise:

    ssh "$TARGET_NODE" "sudo systemctl restart snap.microk8s.daemon-k8s-dqlite.service"
    # Wait until active and no 'database is locked' errors for 30s
    ssh "$TARGET_NODE" "sudo systemctl is-active snap.microk8s.daemon-k8s-dqlite.service"
    
  3. Restart kubelite:

    ssh "$TARGET_NODE" "sudo systemctl restart snap.microk8s.daemon-kubelite.service"
    
  4. Wait for node Ready:

    kubectl --context pvek8s wait node/"$TARGET_NODE" --for=condition=Ready --timeout=300s
    
  5. Uncordon:

    kubectl --context pvek8s uncordon "$TARGET_NODE"
    

See kubelet-volume-manager-stall.md — Option B for the full dqlite restart safety procedure and lock-contention checks.


Post-Restart Verification

# Node is Ready and schedulable
kubectl --context pvek8s get node "$TARGET_NODE"
# → Ready (no SchedulingDisabled)

# jiva-ctrl pods have rescheduled and are Running
kubectl --context pvek8s get pods -n openebs -o wide | grep jiva.*ctrl
# → all Running, spread across nodes

# Workload pods that were migrated are Running with rw filesystems
kubectl --context pvek8s get pods -n <namespace> <pod-name> -o wide
# → 1/1 Running on a node other than TARGET_NODE

# iSCSI sessions re-established on the workload node
NEW_NODE=$(kubectl --context pvek8s get pod -n <namespace> <pod-name> \
  -o jsonpath='{.spec.nodeName}')
NEW_JIVA_POD=$(kubectl --context pvek8s get pods -n openebs \
  -l app=openebs-jiva-csi-node \
  -o jsonpath="{.items[?(@.spec.nodeName=='$NEW_NODE')].metadata.name}")
kubectl --context pvek8s exec -n openebs "$NEW_JIVA_POD" -c jiva-csi-plugin -- \
  iscsiadm -m session
# → tcp: [...] iqn.2016-09.com.openebs.jiva:<pvc-name> (non-flash)

# Filesystem is rw
kubectl --context pvek8s exec -n openebs "$NEW_JIVA_POD" -c jiva-csi-plugin -- \
  grep "<pvc-name>" /proc/mounts
# → should show rw in mount options, not ro

References


Automated Option — Ansible

The manual procedure above remains the canonical reference and should be used when the automation is unavailable or when a volume is already degraded. For routine maintenance, the whole pre-restart migration and reboot is automated.

k8s-reboot.yml runs the controller shuffle automatically as part of its pre-flight: if jiva-ctrl pods are found on the target node it announces the migration, moves the controllers off one at a time (waiting for each JivaVolume to return to Ready), re-validates volume quorum, and only then proceeds to cordon, drain, and reboot.

cd ansible
ansible-playbook -i inventory/hosts.ini k8s-reboot.yml --limit <node>

Option B — shuffle only (no reboot)

The migration is also available standalone via the ansible-role-microk8s role, gated behind an explicit tag so it can never run as part of a normal role application:

cd ansible
ansible-playbook -i inventory/hosts.ini update/home.yml \
  --limit <node> --tags jiva-ctrl-shuffle

Or from another playbook:

- name: Shuffle jiva-ctrl pods off the target node
  ansible.builtin.include_role:
    name: ansible-role-microk8s
    tasks_from: jiva_ctrl_shuffle

What the automation does (and does not do)

  • Finds controllers by label (openebs.io/controller=jiva-controller for legacy 2.12 volumes, openebs.io/component=jiva-controller for 3.6 CSI volumes) — do not rely on jiva.*ctrl pod-name matching; legacy controller pods are named pvc-...-ctrl-... without "jiva"
  • Aborts if any JivaVolume is Syncing/Error/Unknown before starting
  • Cordons the node so controllers cannot reschedule back, and leaves it cordoned — Option A's reboot flow uncordons at the end; after a standalone Option B run you must kubectl uncordon <node> yourself
  • Moves one controller at a time: delete pod → wait for the Deployment rollout → replacement 2/2 Running on another node → JivaVolume Ready (CSI volumes; legacy volumes have no JivaVolume CR and get a settle pause instead)
  • Fails if any controller remains on the node afterwards
  • It does not migrate workload pods (runbook Steps 2–3): moving the controller itself means the iSCSI target is re-homed before the node restart, so workload sessions reconnect to the same ClusterIP. If a filesystem has already gone read-only, use jiva-ctrl-eviction-iscsi-ro-filesystem.md instead — it is too late to shuffle

Implementation: tasks/jiva_ctrl_shuffle.yml in ansible-role-microk8s (PGM-240); reboot integration in ansible/k8s-reboot.yml (PGM-239/PGM-240).