[Gluster-users] After reboot, one brick is not being seen by clients
Patrick Haley
phaley at MIT.EDU
Fri Nov 29 05:19:16 UTC 2013
Hi Ravi,
Success!
After flushing the iptables on gluster-data, I had to restart the glusterd
on all three bricks. Now the clients see all the files on /gdata.
Thanks for all of your efforts in solving this issue.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: phaley at mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
________________________________________
From: Ravishankar N [ravishankar at redhat.com]
Sent: Thursday, November 28, 2013 9:15 PM
To: Patrick Haley; gluster-users at gluster.org
Cc: SPostma at ztechnet.com
Subject: Re: [Gluster-users] After reboot, one brick is not being seen by clients
On 11/29/2013 04:34 AM, Patrick Haley wrote:
> Hi Ravi,
>
> gluster-data is pingable from gluster-0-0, so I tried the detaching/
> reattaching. I had to use the "force" option on the detach on
> gluster-0-0. The first 2 steps seemed to work, however step 3 fails.
>
> -----------------
> on gluster-0-0
> -----------------
> [root at nas-0-0 ~]# gluster peer probe gluster-data
> Probe unsuccessful
> Probe returned with unknown errno 107
>
>
> Now, on gluster-data, gluster isn't seeing the peers
> (although it can still ping them):
Most likely a firewall issue; you need to clear the iptable rules. This
link should help you:
http://thr3ads.net/gluster-users/2013/05/2639667-peer-probe-fails-107
> [root at mseas-data ~]# gluster peer status
> No peers present
>
>
> [root at mseas-data ~]# ping gluster-0-1
> PING gluster-0-1 (10.1.1.11) 56(84) bytes of data.
> 64 bytes from gluster-0-1 (10.1.1.11): icmp_seq=1 ttl=64 time=0.103 ms
> 64 bytes from gluster-0-1 (10.1.1.11): icmp_seq=2 ttl=64 time=0.092 ms
> 64 bytes from gluster-0-1 (10.1.1.11): icmp_seq=3 ttl=64 time=0.094 ms
>
> --- gluster-0-1 ping statistics ---
> 3 packets transmitted, 3 received, 0% packet loss, time 2000ms
> rtt min/avg/max/mdev = 0.092/0.096/0.103/0.009 ms
>
>
> Any further thoughts? Thanks.
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley Email: phaley at mit.edu
> Center for Ocean Engineering Phone: (617) 253-6824
> Dept. of Mechanical Engineering Fax: (617) 253-8125
> MIT, Room 5-213 http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA 02139-4301
>
>
> ________________________________________
> From: Ravishankar N [ravishankar at redhat.com]
> Sent: Thursday, November 28, 2013 12:32 PM
> To: Patrick Haley; gluster-users at gluster.org
> Subject: Re: [Gluster-users] After reboot, one brick is not being seen by clients
>
> On 11/28/2013 09:30 PM, Patrick Haley wrote:
>> Hi Ravi,
>>
>> I'm pretty sure the clients use fuse mounts. The relevant line from /etc/fstab is
>>
>> mseas-data:/gdata /gdata glusterfs defaults,_netdev 0 0
>>
>>
>> gluster-data sees the other bricks as connected. The other bricks see each
>> other as connected but gluster-data as disconnected:
>>
>> ---------------
>> gluster-data:
>> ---------------
>> [root at mseas-data ~]# gluster peer status
>> Number of Peers: 2
>>
>> Hostname: gluster-0-1
>> Uuid: 393fc4a6-1573-4564-971e-1b1aec434167
>> State: Peer in Cluster (Connected)
>>
>> Hostname: gluster-0-0
>> Uuid: 3619440a-4ca3-4151-b62e-d4d6bf2e0c03
>> State: Peer in Cluster (Connected)
>>
>> -------------
>> gluster-0-0:
>> --------------
>> [root at nas-0-0 ~]# gluster peer status
>> Number of Peers: 2
>>
>> Hostname: gluster-data
>> Uuid: 22f1102a-08e6-482d-ad23-d8e063cf32ed
>> State: Peer in Cluster (Disconnected)
>>
>> Hostname: gluster-0-1
>> Uuid: 393fc4a6-1573-4564-971e-1b1aec434167
>> State: Peer in Cluster (Connected)
>>
>> -------------
>> gluster-0-1:
>> --------------
>> [root at nas-0-1 ~]# gluster peer status
>> Number of Peers: 2
>>
>> Hostname: gluster-data
>> Uuid: 22f1102a-08e6-482d-ad23-d8e063cf32ed
>> State: Peer in Cluster (Disconnected)
>>
>> Hostname: gluster-0-0
>> Uuid: 3619440a-4ca3-4151-b62e-d4d6bf2e0c03
>> State: Peer in Cluster (Connected)
>>
>> Does any of this suggest what I need to look at next?
> Hi Patrick,
> If gluster-data is pingable from the other bricks, you could try
> detaching and retttaching it from gluster-0-0 or 0-1.
> 1) On gluster-0-0:
> `gluster peer detach gluster-data`, if that fails, `gluster peer
> detach gluster-data force`
> 2) On gluster-data:
> `rm -rf /var/lib/glusterd`
> `service glusterd restart`
> 3) Again on gluster-0-0:
> 'gluster peer probe gluster-data'
>
> Now check if things work.
> PS:You should really do a 'reply-to-all' so that your queries reach a
> wider audience, getting you faster responses from the community. Also
> serves as a double-check in case I goof up :)
>
> I'm off to sleep now.
>> Thanks.
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email: phaley at mit.edu
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>>
>> ________________________________________
>> From: Ravishankar N [ravishankar at redhat.com]
>> Sent: Thursday, November 28, 2013 2:48 AM
>> To: Patrick Haley
>> Cc: gluster-users at gluster.org
>> Subject: Re: [Gluster-users] After reboot, one brick is not being seen by clients
>>
>> On 11/28/2013 12:52 PM, Patrick Haley wrote:
>>> Hi Ravi,
>>>
>>> Thanks for the reply. If I interpret the output of gluster volume status
>>> correctly, glusterfsd was running
>>>
>>> [root at mseas-data ~]# gluster volume status
>>> Status of volume: gdata
>>> Gluster process Port Online Pid
>>> ------------------------------------------------------------------------------
>>> Brick gluster-0-0:/mseas-data-0-0 24009 Y 27006
>>> Brick gluster-0-1:/mseas-data-0-1 24009 Y 7063
>>> Brick gluster-data:/data 24009 Y 2897
>>> NFS Server on localhost 38467 Y 2903
>>> NFS Server on gluster-0-1 38467 Y 7069
>>> NFS Server on gluster-0-0 38467 Y 27012
>>>
>>> For completeness, I tried both "service glusterd restart" and
>>> "gluster volume start gdata force". Neither solved the problem.
>>> Note that after "gluster volume start gdata force" the gluster volume status
>>> failed
>>>
>>> [root at mseas-data ~]# gluster volume status
>>> operation failed
>>>
>>> Failed to get names of volumes
>>>
>>> Doing another "service glusterd restart" let the "gluster volume status"
>>> command work, but the clients still don't see the files on mseas-data.
>> Are your clients using fuse mounts or NFS mounts?
>>> A second piece of data, on the other bricks, "gluster volume status"does not
>>> show gluster-data:/data:
>> Hmm, could you check if all 3 bricks are connected ? `gluster peer
>> status` on each brick should show the others as connected.
>>> [root at nas-0-0 ~]# gluster volume status
>>> Status of volume: gdata
>>> Gluster process Port Online Pid
>>> ------------------------------------------------------------------------------
>>> Brick gluster-0-0:/mseas-data-0-0 24009 Y 27006
>>> Brick gluster-0-1:/mseas-data-0-1 24009 Y 7063
>>> NFS Server on localhost 38467 Y 27012
>>> NFS Server on gluster-0-1 38467 Y 8051
>>>
>>> Any thoughts on what I should look at next?
>> Also noticed the NFS server process on gluster-0-1 (on which I guess no
>> commands were run ) seems to have changed it's pid from 7069 to 8051.
>> FWIW, I am able to observe a similar bug
>> (https://bugzilla.redhat.com/show_bug.cgi?id=1035586) which needs to be
>> investigated.
>>
>> Thanks,
>> Ravi
>>> Thanks again.
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>> Pat Haley Email: phaley at mit.edu
>>> Center for Ocean Engineering Phone: (617) 253-6824
>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>> 77 Massachusetts Avenue
>>> Cambridge, MA 02139-4301
>>>
>>>
>>> ________________________________________
>>> From: Ravishankar N [ravishankar at redhat.com]
>>> Sent: Wednesday, November 27, 2013 11:21 PM
>>> To: Patrick Haley; gluster-users at gluster.org
>>> Subject: Re: [Gluster-users] After reboot, one brick is not being seen by clients
>>>
>>> On 11/28/2013 03:12 AM, Pat Haley wrote:
>>>> Hi,
>>>>
>>>> We are currently using gluster with 3 bricks. We just
>>>> rebooted one of the bricks (mseas-data, also identified
>>>> as gluster-data) which is actually the main server. After
>>>> rebooting this brick, our client machine (mseas) only sees
>>>> the files on the other 2 bricks. Note that if I mount
>>>> the gluster filespace (/gdata) on the brick I rebooted,
>>>> it sees the entire space.
>>>>
>>>> The last time I had this problem, there was an error in
>>>> one of our /etc/hosts file. This does not seem to be the
>>>> case now.
>>>>
>>>> What else can I look at to debug this problem?
>>>>
>>>> Some information I have from the gluster server
>>>>
>>>> [root at mseas-data ~]# gluster --version
>>>> glusterfs 3.3.1 built on Oct 11 2012 22:01:05
>>>>
>>>> [root at mseas-data ~]# gluster volume info
>>>>
>>>> Volume Name: gdata
>>>> Type: Distribute
>>>> Volume ID: eccc3a90-212d-4563-ae8d-10a77758738d
>>>> Status: Started
>>>> Number of Bricks: 3
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: gluster-0-0:/mseas-data-0-0
>>>> Brick2: gluster-0-1:/mseas-data-0-1
>>>> Brick3: gluster-data:/data
>>>>
>>>> [root at mseas-data ~]# ps -ef | grep gluster
>>>>
>>>> root 2781 1 0 15:16 ? 00:00:00 /usr/sbin/glusterd -p
>>>> /var/run/glusterd.pid
>>>> root 2897 1 0 15:16 ? 00:00:00 /usr/sbin/glusterfsd
>>>> -s localhost --volfile-id gdata.gluster-data.data -p
>>>> /var/lib/glusterd/vols/gdata/run/gluster-data-data.pid -S
>>>> /tmp/e3eac7ce95e786a3d909b8fc65ed2059.socket --brick-name /data -l
>>>> /var/log/glusterfs/bricks/data.log --xlator-option
>>>> *-posix.glusterd-uuid=22f1102a-08e6-482d-ad23-d8e063cf32ed
>>>> --brick-port 24009 --xlator-option gdata-server.listen-port=24009
>>>> root 2903 1 0 15:16 ? 00:00:00 /usr/sbin/glusterfs -s
>>>> localhost --volfile-id gluster/nfs -p
>>>> /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S
>>>> /tmp/d5c892de43c28a1ee7481b780245b789.socket
>>>> root 4258 1 0 15:52 ? 00:00:00 /usr/sbin/glusterfs
>>>> --volfile-id=/gdata --volfile-server=mseas-data /gdata
>>>> root 4475 4033 0 16:35 pts/0 00:00:00 grep gluster
>>>> [
>>>>
>>> From the ps output, the brick process (glusterfsd) doesn't seem to be
>>> running on the gluster-data server. Run `gluster volume status` and
>>> check if that is indeed the case. If yes, you could either restart
>>> glusterd on the brick node (`service glusterd restart`) or restart the
>>> entire volume (`gluster volume start gdata force`) which should bring
>>> back the brick process online.
>>>
>>> I'm not sure why glusterd did not start the brick process when you
>>> rebooted the machine in the first place. You could perhaps check the
>>> glusterd log for clues).
>>>
>>> Hope this helps,
>>> Ravi
>>>
>>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>> Pat Haley Email: phaley at mit.edu
>>>> Center for Ocean Engineering Phone: (617) 253-6824
>>>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>>>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>>>> 77 Massachusetts Avenue
>>>> Cambridge, MA 02139-4301
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
More information about the Gluster-users
mailing list