[Gluster-users] brick offline after restart glusterd

Mon Jul 13 17:06:30 UTC 2015


On 07/13/2015 10:29 PM, Tiemen Ruiten wrote:
> OK, I found what's wrong. From the brick's log:
> 
> [2015-07-12 02:32:01.542934] I [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2015-07-13 14:21:06.722675] W [glusterfsd.c:1219:cleanup_and_exit] (-->
> 0-: received signum (15), shutting down
> [2015-07-13 14:21:35.168750] I [MSGID: 100030] [glusterfsd.c:2294:main]
> 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.7.1
> (args: /usr/sbin/glusterfsd -s 10.100.3.10 --volfile-id
> vmimage.10.100.3.10.export-gluster01-brick -p
> /var/lib/glusterd/vols/vmimage/run/10.100.3.10-export-gluster01-brick.pid
> -S /var/run/gluster/2bfe3a2242d586d0850775f601f1c3ee.socket --brick-name
> /export/gluster01/brick -l
> /var/log/glusterfs/bricks/export-gluster01-brick.log --xlator-option
> *-posix.glusterd-uuid=26186ec6-a8c7-4834-bcaa-24e30289dba3 --brick-port
> 49153 --xlator-option vmimage-server.listen-port=49153)
> [2015-07-13 14:21:35.178558] E [socket.c:823:__socket_server_bind]
> 0-socket.glusterfsd: binding to  failed: Address already in use
> [2015-07-13 14:21:35.178624] E [socket.c:826:__socket_server_bind]
> 0-socket.glusterfsd: Port is already in use
> [2015-07-13 14:21:35.178649] W [rpcsvc.c:1602:rpcsvc_transport_create]
> 0-rpc-service: listening on transport failed
> 
> 
> ps aux | grep gluster
> root      6417  0.0  0.2 753080 175016 ?       Ssl  May21  25:25
> /usr/sbin/glusterfs --volfile-server=10.100.3.10 --volfile-id=/wwwdata
> /mnt/gluster/web/wwwdata
> root      6742  0.0  0.0 622012 17624 ?        Ssl  May21  22:31
> /usr/sbin/glusterfs --volfile-server=10.100.3.10 --volfile-id=/conf
> /mnt/gluster/conf
> root     36575  0.2  0.0 589956 19228 ?        Ssl  16:21   0:19
> /usr/sbin/glusterd --pid-file=/run/glusterd.pid
> root     36720  0.0  0.0 565140 55836 ?        Ssl  16:21   0:02
> /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p
> /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S
> /var/run/gluster/8b9ce8bebfa8c1d2fabb62654bdc550e.socket
> root     36730  0.0  0.0 451016 22936 ?        Ssl  16:21   0:01
> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
> /var/lib/glusterd/glustershd/run/glustershd.pid -l
> /var/log/glusterfs/glustershd.log -S
> /var/run/gluster/c0d7454986c96eef463d028dc8bce9fe.socket --xlator-option
> *replicate*.node-uuid=26186ec6-a8c7-4834-bcaa-24e30289dba3
> root     37398  0.0  0.0 103248   916 pts/2    S+   18:49   0:00 grep
> gluster
> root     40058  0.0  0.0 755216 60212 ?        Ssl  May21  22:06
> /usr/sbin/glusterfs --volfile-server=10.100.3.10 --volfile-id=/fl-webroot
> /mnt/gluster/web/flash/webroot
> 
> So several leftover processes. What will happen if I do a
> 
> /etc/init.d/glusterd stop
> /etc/init.d/glusterfsd stop
> 
> kill all remaining gluster processes and restart gluster on this node?
> 
> Will the volume stay online? What about split-brain? I suppose it would be
> best to disconnect all clients first...?
Can you double check if any brick process is already running, if so kill
it and try 'gluster volume start <volname> force'
> 
> 
> On 13 July 2015 at 18:25, Tiemen Ruiten <t.ruiten at rdmedia.com> wrote:
> 
>> Hello,
>>
>> We have a two-node gluster cluster, running version 3.7.1, that hosts an
>> oVirt storage domain. This afternoon I tried creating a template in oVirt,
>> but within a minute VM's stopped responding and Gluster started generating
>> errors like the following:
>>
>> [2015-07-13 14:09:51.772629] W [rpcsvc.c:270:rpcsvc_program_actor]
>> 0-rpc-service: RPC program not available (req 1298437 330) for
>> 10.100.3.40:1021
>> [2015-07-13 14:09:51.772675] E [rpcsvc.c:565:rpcsvc_check_and_reply_error]
>> 0-rpcsvc: rpc actor failed to complete successfully
>>
>> I managed to get things in working order again by restarting glusterd and
>> glusterfsd, but now one brick is down:
>>
>> $sudo gluster volume status vmimage
>> Status of volume: vmimage
>> Gluster process                             TCP Port  RDMA Port  Online
>>  Pid
>>
>> ------------------------------------------------------------------------------
>> Brick 10.100.3.10:/export/gluster01/brick   N/A       N/A        N
>> 36736
>> Brick 10.100.3.11:/export/gluster01/brick   49153     0          Y
>> 11897
>> NFS Server on localhost                     2049      0          Y
>> 36720
>> Self-heal Daemon on localhost               N/A       N/A        Y
>> 36730
>> NFS Server on 10.100.3.11                   2049      0          Y
>> 11919
>> Self-heal Daemon on 10.100.3.11             N/A       N/A        Y
>> 11924
>>
>> Task Status of Volume vmimage
>>
>> ------------------------------------------------------------------------------
>> There are no active volume tasks
>>
>> $ sudo gluster peer status
>> Number of Peers: 1
>>
>> Hostname: 10.100.3.11
>> Uuid: f9872fea-47f5-41f6-8094-c9fabd3c1339
>> State: Peer in Cluster (Connected)
>>
>> Additionally in the etc-glusterfs-glusterd.vol.log I see these messages
>> repeating every 3 seconds:
>>
>> [2015-07-13 16:15:21.737044] W [socket.c:642:__socket_rwv] 0-management:
>> readv on /var/run/gluster/2bfe3a2242d586d0850775f601f1c3ee.socket failed
>> (Invalid argument)
>> The message "I [MSGID: 106005]
>> [glusterd-handler.c:4667:__glusterd_brick_rpc_notify] 0-management: Brick
>> 10.100.3.10:/export/gluster01/brick has disconnected from glusterd."
>> repeated 39 times between [2015-07-13 16:13:24.717611] and [2015-07-13
>> 16:15:21.737862]
>> [2015-07-13 16:15:24.737694] W [socket.c:642:__socket_rwv] 0-management:
>> readv on /var/run/gluster/2bfe3a2242d586d0850775f601f1c3ee.socket failed
>> (Invalid argument)
>> [2015-07-13 16:15:24.738498] I [MSGID: 106005]
>> [glusterd-handler.c:4667:__glusterd_brick_rpc_notify] 0-management: Brick
>> 10.100.3.10:/export/gluster01/brick has disconnected from glusterd.
>> [2015-07-13 16:15:27.738194] W [socket.c:642:__socket_rwv] 0-management:
>> readv on /var/run/gluster/2bfe3a2242d586d0850775f601f1c3ee.socket failed
>> (Invalid argument)
>> [2015-07-13 16:15:30.738991] W [socket.c:642:__socket_rwv] 0-management:
>> readv on /var/run/gluster/2bfe3a2242d586d0850775f601f1c3ee.socket failed
>> (Invalid argument)
>> [2015-07-13 16:15:33.739735] W [socket.c:642:__socket_rwv] 0-management:
>> readv on /var/run/gluster/2bfe3a2242d586d0850775f601f1c3ee.socket failed
>> (Invalid argument)
>>
>> Can I get this brick back up without bringing the volume/cluster down?
>>
>> --
>> Tiemen Ruiten
>> Systems Engineer
>> R&D Media
>>
> 
> 
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
> 

-- 
~Atin