[Gluster-users] After reboot, one brick is not being seen by clients

Thu Nov 28 16:06:03 UTC 2013

________________________________________
From: Patrick Haley
Sent: Thursday, November 28, 2013 11:00 AM
To: Ravishankar N
Subject: RE: [Gluster-users] After reboot, one brick is not being seen by clients

Hi Ravi,

I'm pretty sure the clients use fuse mounts.  The relevant line from /etc/fstab is

mseas-data:/gdata       /gdata           glusterfs  defaults,_netdev     0 0

gluster-data sees the other bricks as connected.  The other bricks see each
other as connected but gluster-data as disconnected:

---------------
gluster-data:
---------------
[root at mseas-data ~]# gluster peer status
Number of Peers: 2

Hostname: gluster-0-1
Uuid: 393fc4a6-1573-4564-971e-1b1aec434167
State: Peer in Cluster (Connected)

Hostname: gluster-0-0
Uuid: 3619440a-4ca3-4151-b62e-d4d6bf2e0c03
State: Peer in Cluster (Connected)

-------------
gluster-0-0:
--------------
[root at nas-0-0 ~]# gluster peer status
Number of Peers: 2

Hostname: gluster-data
Uuid: 22f1102a-08e6-482d-ad23-d8e063cf32ed
State: Peer in Cluster (Disconnected)

Hostname: gluster-0-1
Uuid: 393fc4a6-1573-4564-971e-1b1aec434167
State: Peer in Cluster (Connected)

-------------
gluster-0-1:
--------------
[root at nas-0-1 ~]# gluster peer status
Number of Peers: 2

Hostname: gluster-data
Uuid: 22f1102a-08e6-482d-ad23-d8e063cf32ed
State: Peer in Cluster (Disconnected)

Hostname: gluster-0-0
Uuid: 3619440a-4ca3-4151-b62e-d4d6bf2e0c03
State: Peer in Cluster (Connected)

Does any of this suggest what I need to look at next?

Thanks.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley                                                  Email:     phaley at mit.edu
Center for Ocean Engineering                     Phone:    (617) 253-6824
Dept. of Mechanical Engineering                 Fax:        (617) 253-8125
MIT, Room 5-213                                      http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301

________________________________________
From: Ravishankar N [ravishankar at redhat.com]
Sent: Thursday, November 28, 2013 2:48 AM
To: Patrick Haley
Cc: gluster-users at gluster.org
Subject: Re: [Gluster-users] After reboot, one brick is not being seen by clients

On 11/28/2013 12:52 PM, Patrick Haley wrote:
> Hi Ravi,
>
> Thanks for the reply.  If I interpret the output of gluster volume status
> correctly, glusterfsd was running
>
> [root at mseas-data ~]# gluster volume status
> Status of volume: gdata
> Gluster process                                         Port    Online  Pid
> ------------------------------------------------------------------------------
> Brick gluster-0-0:/mseas-data-0-0                       24009   Y       27006
> Brick gluster-0-1:/mseas-data-0-1                       24009   Y       7063
> Brick gluster-data:/data                                24009   Y       2897
> NFS Server on localhost                                 38467   Y       2903
> NFS Server on gluster-0-1                               38467   Y       7069
> NFS Server on gluster-0-0                               38467   Y       27012
>
> For completeness, I tried both "service glusterd restart" and
> "gluster volume start gdata force".  Neither solved the problem.
> Note that after "gluster volume start gdata force" the gluster volume status
> failed
>
> [root at mseas-data ~]# gluster volume status
> operation failed
>
> Failed to get names of volumes
>
> Doing another "service glusterd restart"  let the "gluster volume status"
> command work, but the clients still don't see the files on mseas-data.
Are your clients using fuse mounts or NFS mounts?
>
> A second piece of data, on the other bricks, "gluster volume status"does not
> show gluster-data:/data:
Hmm, could you check if all 3 bricks are connected ? `gluster peer
status` on each brick should show the others as connected.
>
> [root at nas-0-0 ~]# gluster volume status
> Status of volume: gdata
> Gluster process                                         Port    Online  Pid
> ------------------------------------------------------------------------------
> Brick gluster-0-0:/mseas-data-0-0                       24009   Y       27006
> Brick gluster-0-1:/mseas-data-0-1                       24009   Y       7063
> NFS Server on localhost                                 38467   Y       27012
> NFS Server on gluster-0-1                               38467   Y       8051
>
> Any thoughts on what I should look at next?
Also noticed the NFS server process on gluster-0-1 (on which I guess no
commands were run ) seems to have changed it's pid from 7069 to 8051.
FWIW, I am able to observe a similar bug
(https://bugzilla.redhat.com/show_bug.cgi?id=1035586) which needs to be
investigated.

Thanks,
Ravi
> Thanks again.
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley                                                  Email:     phaley at mit.edu
> Center for Ocean Engineering                     Phone:    (617) 253-6824
> Dept. of Mechanical Engineering                 Fax:        (617) 253-8125
> MIT, Room 5-213                                      http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA  02139-4301
>
>
> ________________________________________
> From: Ravishankar N [ravishankar at redhat.com]
> Sent: Wednesday, November 27, 2013 11:21 PM
> To: Patrick Haley; gluster-users at gluster.org
> Subject: Re: [Gluster-users] After reboot, one brick is not being seen by clients
>
> On 11/28/2013 03:12 AM, Pat Haley wrote:
>> Hi,
>>
>> We are currently using gluster with 3 bricks.  We just
>> rebooted one of the bricks (mseas-data, also identified
>> as gluster-data) which is actually the main server.  After
>> rebooting this brick, our client machine (mseas) only sees
>> the files on the other 2 bricks.  Note that if I mount
>> the gluster filespace (/gdata) on the brick I rebooted,
>> it sees the entire space.
>>
>> The last time I had this problem, there was an error in
>> one of our /etc/hosts file.  This does not seem to be the
>> case now.
>>
>> What else can I look at to debug this problem?
>>
>> Some information I have from the gluster server
>>
>> [root at mseas-data ~]# gluster --version
>> glusterfs 3.3.1 built on Oct 11 2012 22:01:05
>>
>> [root at mseas-data ~]# gluster volume info
>>
>> Volume Name: gdata
>> Type: Distribute
>> Volume ID: eccc3a90-212d-4563-ae8d-10a77758738d
>> Status: Started
>> Number of Bricks: 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: gluster-0-0:/mseas-data-0-0
>> Brick2: gluster-0-1:/mseas-data-0-1
>> Brick3: gluster-data:/data
>>
>> [root at mseas-data ~]# ps -ef | grep gluster
>>
>> root      2781     1  0 15:16 ?        00:00:00 /usr/sbin/glusterd -p
>> /var/run/glusterd.pid
>> root      2897     1  0 15:16 ?        00:00:00 /usr/sbin/glusterfsd
>> -s localhost --volfile-id gdata.gluster-data.data -p
>> /var/lib/glusterd/vols/gdata/run/gluster-data-data.pid -S
>> /tmp/e3eac7ce95e786a3d909b8fc65ed2059.socket --brick-name /data -l
>> /var/log/glusterfs/bricks/data.log --xlator-option
>> *-posix.glusterd-uuid=22f1102a-08e6-482d-ad23-d8e063cf32ed
>> --brick-port 24009 --xlator-option gdata-server.listen-port=24009
>> root      2903     1  0 15:16 ?        00:00:00 /usr/sbin/glusterfs -s
>> localhost --volfile-id gluster/nfs -p
>> /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S
>> /tmp/d5c892de43c28a1ee7481b780245b789.socket
>> root      4258     1  0 15:52 ?        00:00:00 /usr/sbin/glusterfs
>> --volfile-id=/gdata --volfile-server=mseas-data /gdata
>> root      4475  4033  0 16:35 pts/0    00:00:00 grep gluster
>> [
>>
>   From the ps output, the brick process (glusterfsd) doesn't seem to be
> running on the gluster-data server. Run `gluster volume status` and
> check if that is indeed the case. If yes, you could either restart
> glusterd on the brick node (`service glusterd restart`) or restart the
> entire volume (`gluster volume start gdata force`) which should bring
> back the brick process online.
>
> I'm not sure why glusterd did not start the brick process when you
> rebooted the machine in the first place. You could perhaps check the
> glusterd log for clues).
>
> Hope this helps,
> Ravi
>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley                          Email:  phaley at mit.edu
>> Center for Ocean Engineering       Phone:  (617) 253-6824
>> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>> MIT, Room 5-213                    http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA  02139-4301
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users