[Gluster-users] Problems after upgrade/volume expansion

Tue Feb 4 00:35:49 UTC 2014

Looks like you have a problem getting to one of your servers: 

[2014-02-03 21:03:06.231215] E [socket.c:2157:socket_connect_finish] 0-bigdata-client-0: connection to x.x.x.x:49153 failed (No route to host)


On Mon, 2014-02-03 at 16:15 -0600, Branden Timm wrote: 
> I should mention that the following line from the log is also worrying, 
> as each trusted server is running Gluster v. 3.4.2, as verified by 
> running /usr/sbin/glusterd -V:
> 
> Using Program GlusterFS 3.3, Num (1298437), Version (330)
> 
> Branden
> 
> On 2/3/2014 3:35 PM, Branden Timm wrote:
> > Hello,
> >   I'm experiencing some major problems with my GlusterFS filesystem 
> > after an upgrade/expansion, and I'm hoping I can get pointed in the 
> > right direction for troubleshooting it.
> >
> > I had a 5 server, 5 brick distributed volume on 3.3.1.  I brought the 
> > volume offline, stopped glusterd and glusterfsd on all servers, then 
> > upgraded to 3.4.2 and brought glusterd and glusterfsd back online.  So 
> > far so good.
> >
> > Once the volume was back online and healthy, I added a new server to 
> > the trusted storage pool and added two bricks attached to that server 
> > to the pool.  Everything looked fine so far, gluster volume status 
> > showed all six servers and seven bricks as online.
> >
> > The problem came next when I tried to rebalance.  I ran "gluster 
> > volume rebalance <volname> start force", then once it returned ran 
> > "status" and saw that the rebalance failed on all but one node, which 
> > showed in progress.  The node that it was running successfully on was 
> > a pre-existing server, not the new server/brick(s).  The other five 
> > servers report "1 subvolume(s) are down. Skipping fix layout."  
> > Somebody in the IRC channel suggested this means that one of my bricks 
> > are down, but "gluster volume <volname> status" reports all servers 
> > and bricks as being online.   Full pastebin of the rebalance log 
> > (essentially the same on all five failing servers) here: 
> > http://fpaste.org/74082/14615971/
> >
> > Currently, I have both missing files and files that report "Transport 
> > endopint not connected" when they are accessed.  It seems to really be 
> > related to the rebalance failures, and the layout seems incorrect as 
> > well.  Really hoping somebody can point me in the right direction of 
> > where to look next.  Thanks in advance for any help.
> >
> > -Branden
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users