<!DOCTYPE html><html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /></head><body><div data-html-editor-font-wrapper="true" style="font-family: arial, sans-serif; font-size: 13px;"><div><div><div style="font-family: arial, sans-serif;font-size: 13px">Hi folks,<br><br>I'm troubled moving an arbiter brick to another server because of I/O load issues. My setup is as follows:<br><br># gluster volume info<br> <br>Volume Name: myvol<br>Type: Distributed-Replicate<br>Volume ID: 43ba517a-ac09-461e-99da-a197759a7dc8<br>Status: Started<br>Snapshot Count: 0<br>Number of Bricks: 3 x (2 + 1) = 9<br>Transport-type: tcp<br>Bricks:<br>Brick1: gv0:/data/glusterfs<br>Brick2: gv1:/data/glusterfs<br>Brick3: gv4:/data/gv01-arbiter (arbiter)<br>Brick4: gv2:/data/glusterfs<br>Brick5: gv3:/data/glusterfs<br>Brick6: gv1:/data/gv23-arbiter (arbiter)<br>Brick7: gv4:/data/glusterfs<br>Brick8: gv5:/data/glusterfs<br>Brick9: pluto:/var/gv45-arbiter (arbiter)<br>Options Reconfigured:<br>nfs.disable: on<br>transport.address-family: inet<br>storage.owner-gid: 1000<br>storage.owner-uid: 1000<br>cluster.self-heal-daemon: enable<br><br>The gv23-arbiter is the brick that was recently moved from other server (chronos) using the following command:<br><br># gluster volume replace-brick myvol chronos:/mnt/gv23-arbiter gv1:/data/gv23-arbiter commit force<br>volume replace-brick: success: replace-brick commit force operation successful<br><br>It's not the first time I was moving an arbiter brick, and the heal-count was zero for all the bricks before the change, so I didn't expect much trouble then. What was probably wrong is that I then forced chronos out of cluster with gluster peer detach command. All since that, over the course of the last 3 days, I see this:<br><br># gluster volume heal myvol statistics heal-count<br>Gathering count of entries to be healed on volume myvol has been successful<br><br>Brick gv0:/data/glusterfs<br>Number of entries: 0<br><br>Brick gv1:/data/glusterfs<br>Number of entries: 0<br><br>Brick gv4:/data/gv01-arbiter<br>Number of entries: 0<br><br>Brick gv2:/data/glusterfs<br>Number of entries: 64999<br><br>Brick gv3:/data/glusterfs<br>Number of entries: 64999<br><br>Brick gv1:/data/gv23-arbiter<br>Number of entries: 0<br><br>Brick gv4:/data/glusterfs<br>Number of entries: 0<br><br>Brick gv5:/data/glusterfs<br>Number of entries: 0<br><br>Brick pluto:/var/gv45-arbiter<br>Number of entries: 0<br><br>According to the /var/log/glusterfs/glustershd.log, the self-healing is undergoing, so it might be worth just sit and wait, but I'm wondering why this 64999 heal-count persists (a limitation on counter? In fact, gv2 and gv3 bricks contain roughly 30 million files), and I feel bothered because of the following output:<br><br># gluster volume heal myvol info heal-failed<br>Gathering list of heal failed entries on volume myvol has been unsuccessful on bricks that are down. Please check if all brick processes are running.<br><br>I attached the chronos server back to the cluster, with no noticeable effect. Any comments and suggestions would be much appreciated.<br><br><signature>--<br>Best Regards,<br><br>Seva Gluschenko<br>CTO @ <a target="_blank" rel="external nofollow noopener noreferrer" tabindex="-1" href="http://webkontrol.ru/">http://webkontrol.ru</a></signature><br> </div></div></div></div></body></html>