[Gluster-users] Gluster NFS crashing

Tue Apr 15 06:22:53 UTC 2014

The whole system came to a grinding halt today and no amount of
restarting daemons would make it work again. What was really odd was
that gluster vol status said everything was fine and yet all the client
mount points had hung.

On the node that was exporting Gluster NFS I had zombie processes so I
decided to reboot, took a while for the ZFS JBOD's to sort themselves
out but I was relieved when it all came back up - except that the df
size on the clients was wrong...

gluster vol info and gluster vol status said everything was fine but it
was obvious that 2 of my bricks were missing. I restarted everything,
and still 2 missing brick. I remounted the fuse clients and still no
good.

Just out of sheer desperation and for no good reason I disabled the
Gluster NFS export and magically the missing 2 bricks reappeared and the
filesystem was back to its normal size. I turned NFS exports back on and
everything stayed working.

I'm not trying to belittle all the good work done by the Gluster
developers but this really doesn't look like a viable big data
filesystem at the moment. We've currently got 800TB and are about to add
another 400TB but quite honestly the prospect terrifies me.

On Tue, 2014-04-15 at 08:35 +0800, Franco Broi wrote: 
> On Mon, 2014-04-14 at 17:29 -0700, Harshavardhana wrote: 
> > >
> > > Just distributed.
> > >
> > 
> > Pure distributed setup you have to take a downtime, since the data
> > isn't replicated.
> 
> If I shutdown the server processes, wont the clients just wait for it to
> come back up? Ie like NFS hard mounts? I don't mind an interruption, I
> just want to avoid killing all jobs that are currently accessing the
> filesystem if at all possible, our users have suffered a lot recently
> with filesystem outages.
> 
> By the way, how does one shutdown the glusterfs processes without
> stopping a volume? It would be nice to have a quiesce or freeze option
> that just stalls all access while maintenance takes place.
> 
> > 
> > >>
> > >> > 3.4.1 to 3.4.3-3 shouldn't cause problems with existing clients and
> > >> > other servers, right?
> > >> >
> > >>
> > >> You mean 3.4.1 and 3.4.3 co-existent with in a cluster?
> > >
> > > Yes, at least for the duration of the upgrade.
> > 
> > Yeah 3.4.x series is backward compatible to each other in any case.
> > 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users