[Gluster-users] Problems after upgrade/volume expansion
btimm at energy.wisc.edu
Mon Feb 3 21:35:13 UTC 2014
I'm experiencing some major problems with my GlusterFS filesystem
after an upgrade/expansion, and I'm hoping I can get pointed in the
right direction for troubleshooting it.
I had a 5 server, 5 brick distributed volume on 3.3.1. I brought the
volume offline, stopped glusterd and glusterfsd on all servers, then
upgraded to 3.4.2 and brought glusterd and glusterfsd back online. So
far so good.
Once the volume was back online and healthy, I added a new server to the
trusted storage pool and added two bricks attached to that server to the
pool. Everything looked fine so far, gluster volume status showed all
six servers and seven bricks as online.
The problem came next when I tried to rebalance. I ran "gluster volume
rebalance <volname> start force", then once it returned ran "status" and
saw that the rebalance failed on all but one node, which showed in
progress. The node that it was running successfully on was a
pre-existing server, not the new server/brick(s). The other five
servers report "1 subvolume(s) are down. Skipping fix layout." Somebody
in the IRC channel suggested this means that one of my bricks are down,
but "gluster volume <volname> status" reports all servers and bricks as
being online. Full pastebin of the rebalance log (essentially the same
on all five failing servers) here: http://fpaste.org/74082/14615971/
Currently, I have both missing files and files that report "Transport
endopint not connected" when they are accessed. It seems to really be
related to the rebalance failures, and the layout seems incorrect as
well. Really hoping somebody can point me in the right direction of
where to look next. Thanks in advance for any help.
More information about the Gluster-users