[Gluster-users] Gluster volume brick keeps going offline

Kaamesh Kamalaaharan kaamesh at novocraft.com
Thu Mar 19 07:28:44 UTC 2015


Sorry, forgot to include the attachment

Thank You Kindly,
Kaamesh
Bioinformatician
Novocraft Technologies Sdn Bhd
C-23A-05, 3 Two Square, Section 19, 46300 Petaling Jaya
Selangor Darul Ehsan
Malaysia
Mobile: +60176562635
Ph: +60379600541
Fax: +60379600540

On Thu, Mar 19, 2015 at 2:40 PM, Kaamesh Kamalaaharan <kaamesh at novocraft.com
> wrote:

> Hi Atin, Thanks for the reply. Im not sure which logs are relevant so ill
> just attach them all in a gz file.
>
> I ran a sudo gluster volume start gfsvolume force at  2015-03-19 05:49
> i hope this helps.
>
> Thank You Kindly,
> Kaamesh
>
> On Sun, Mar 15, 2015 at 11:41 PM, Atin Mukherjee <amukherj at redhat.com>
> wrote:
>
>> Could you attach the logs for the analysis?
>>
>> ~Atin
>>
>> On 03/13/2015 03:29 PM, Kaamesh Kamalaaharan wrote:
>> > Hi guys. Ive been using gluster for a while now and despite a few
>> hiccups,
>> > i find its a great system to use. One of my more persistent hiccups is
>> an
>> > issue with one brick going offline.
>> >
>> > My setup is a 2 brick 2 node setup. my main brick is gfs1 which has not
>> > given me any problem. gfs2 however keeps going offline. Following
>> > http://www.gluster.org/pipermail/gluster-users/2014-June/017583.html
>> > temporarily fixed the error but  the brick goes offline within the hour.
>> >
>> > This is what i get from my volume status command :
>> >
>> > sudo gluster volume status
>> >>
>> >> Status of volume: gfsvolume
>> >> Gluster process Port Online Pid
>> >>
>> >>
>> ------------------------------------------------------------------------------
>> >> Brick gfs1:/export/sda/brick 49153 Y 9760
>> >> Brick gfs2:/export/sda/brick N/A N 13461
>> >> NFS Server on localhost 2049 Y 13473
>> >> Self-heal Daemon on localhost N/A Y 13480
>> >> NFS Server on gfs1 2049 Y 16166
>> >> Self-heal Daemon on gfs1 N/A Y 16173
>> >>
>> >> Task Status of Volume gfsvolume
>> >>
>> >>
>> ------------------------------------------------------------------------------
>> >> There are no active volume tasks
>> >>
>> >>
>> > doing sudo gluster volume start gfsvolume force gives me this:
>> >
>> > sudo gluster volume status
>> >>
>> >> Status of volume: gfsvolume
>> >> Gluster process Port Online Pid
>> >>
>> >>
>> ------------------------------------------------------------------------------
>> >> Brick gfs1:/export/sda/brick 49153 Y 9760
>> >> Brick gfs2:/export/sda/brick 49153 Y 13461
>> >> NFS Server on localhost 2049 Y 13473
>> >> Self-heal Daemon on localhost N/A Y 13480
>> >> NFS Server on gfs1 2049 Y 16166
>> >> Self-heal Daemon on gfs1 N/A Y 16173
>> >>
>> >> Task Status of Volume gfsvolume
>> >>
>> >>
>> ------------------------------------------------------------------------------
>> >> There are no active volume tasks
>> >>
>> >> half an hour later and my brick goes down again.
>> >
>> >>
>> >>
>> >> This is my glustershd.log. I snipped it because the rest of the log is
>> a
>> > repeat of the same error
>> >
>> >
>> >>
>> >> [2015-03-13 02:09:41.951556] I [glusterfsd.c:1959:main]
>> >> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version
>> 3.5.0
>> >> (/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
>> >> /var/lib/glus
>> >> terd/glustershd/run/glustershd.pid -l
>> /var/log/glusterfs/glustershd.log -S
>> >> /var/run/deac2f873d0ac5b6c3e84b23c4790172.socket --xlator-option
>> >> *replicate*.node-uuid=adbb7505-3342-4c6d-be3d-75938633612c)
>> >> [2015-03-13 02:09:41.954173] I [socket.c:3561:socket_init]
>> >> 0-socket.glusterfsd: SSL support is NOT enabled
>> >> [2015-03-13 02:09:41.954236] I [socket.c:3576:socket_init]
>> >> 0-socket.glusterfsd: using system polling thread
>> >> [2015-03-13 02:09:41.954421] I [socket.c:3561:socket_init] 0-glusterfs:
>> >> SSL support is NOT enabled
>> >> [2015-03-13 02:09:41.954443] I [socket.c:3576:socket_init] 0-glusterfs:
>> >> using system polling thread
>> >> [2015-03-13 02:09:41.956731] I [graph.c:254:gf_add_cmdline_options]
>> >> 0-gfsvolume-replicate-0: adding option 'node-uuid' for volume
>> >> 'gfsvolume-replicate-0' with value
>> 'adbb7505-3342-4c6d-be3d-75938633612c'
>> >> [2015-03-13 02:09:41.960210] I
>> [rpc-clnt.c:972:rpc_clnt_connection_init]
>> >> 0-gfsvolume-client-1: setting frame-timeout to 90
>> >> [2015-03-13 02:09:41.960288] I [socket.c:3561:socket_init]
>> >> 0-gfsvolume-client-1: SSL support is NOT enabled
>> >> [2015-03-13 02:09:41.960301] I [socket.c:3576:socket_init]
>> >> 0-gfsvolume-client-1: using system polling thread
>> >> [2015-03-13 02:09:41.961095] I
>> [rpc-clnt.c:972:rpc_clnt_connection_init]
>> >> 0-gfsvolume-client-0: setting frame-timeout to 90
>> >> [2015-03-13 02:09:41.961134] I [socket.c:3561:socket_init]
>> >> 0-gfsvolume-client-0: SSL support is NOT enabled
>> >> [2015-03-13 02:09:41.961145] I [socket.c:3576:socket_init]
>> >> 0-gfsvolume-client-0: using system polling thread
>> >> [2015-03-13 02:09:41.961173] I [client.c:2273:notify]
>> >> 0-gfsvolume-client-0: parent translators are ready, attempting connect
>> on
>> >> transport
>> >> [2015-03-13 02:09:41.961412] I [client.c:2273:notify]
>> >> 0-gfsvolume-client-1: parent translators are ready, attempting connect
>> on
>> >> transport
>> >> Final graph:
>> >>
>> >>
>> +------------------------------------------------------------------------------+
>> >>   1: volume gfsvolume-client-0
>> >>   2:     type protocol/client
>> >>   3:     option remote-host gfs1
>> >>   4:     option remote-subvolume /export/sda/brick
>> >>   5:     option transport-type socket
>> >>   6:     option frame-timeout 90
>> >>   7:     option ping-timeout 30
>> >>   8: end-volume
>> >>   9:
>> >>  10: volume gfsvolume-client-1
>> >>  11:     type protocol/client
>> >>  12:     option remote-host gfs2
>> >>  13:     option remote-subvolume /export/sda/brick
>> >>  14:     option transport-type socket
>> >>  15:     option frame-timeout 90
>> >>  16:     option ping-timeout 30
>> >>  17: end-volume
>> >>  18:
>> >>  19: volume gfsvolume-replicate-0
>> >>  20:     type cluster/replicate
>> >>  21:     option node-uuid adbb7505-3342-4c6d-be3d-75938633612c
>> >>  22:     option background-self-heal-count 0
>> >>  23:     option metadata-self-heal on
>> >>  24:     option data-self-heal on
>> >>  25:     option entry-self-heal on
>> >>  26:     option self-heal-daemon on
>> >>  27:     option data-self-heal-algorithm diff
>> >>  28:     option quorum-type fixed
>> >>  29:     option quorum-count 1
>> >>  30:     option iam-self-heal-daemon yes
>> >>  31:     subvolumes gfsvolume-client-0 gfsvolume-client-1
>> >>  32: end-volume
>> >>  33:
>> >>  34: volume glustershd
>> >>  35:     type debug/io-stats
>> >>  36:     subvolumes gfsvolume-replicate-0
>> >>  37: end-volume
>> >>
>> >>
>> +------------------------------------------------------------------------------+
>> >> [2015-03-13 02:09:41.961871] I [rpc-clnt.c:1685:rpc_clnt_reconfig]
>> >> 0-gfsvolume-client-1: changing port to 49153 (from 0)
>> >> [2015-03-13 02:09:41.962129] I
>> >> [client-handshake.c:1659:select_server_supported_programs]
>> >> 0-gfsvolume-client-1: Using Program GlusterFS 3.3, Num (1298437),
>> Version
>> >> (330)
>> >> [2015-03-13 02:09:41.962344] I
>> >> [client-handshake.c:1456:client_setvolume_cbk] 0-gfsvolume-client-1:
>> >> Connected to 172.20.20.22:49153, attached to remote volume
>> >> '/export/sda/brick'.
>> >> [2015-03-13 02:09:41.962363] I
>> >> [client-handshake.c:1468:client_setvolume_cbk] 0-gfsvolume-client-1:
>> Server
>> >> and Client lk-version numbers are not same, reopening the fds
>> >> [2015-03-13 02:09:41.962416] I [afr-common.c:3922:afr_notify]
>> >> 0-gfsvolume-replicate-0: Subvolume 'gfsvolume-client-1' came back up;
>> going
>> >> online.
>> >> [2015-03-13 02:09:41.962487] I
>> >> [client-handshake.c:450:client_set_lk_version_cbk]
>> 0-gfsvolume-client-1:
>> >> Server lk version = 1
>> >> [2015-03-13 02:09:41.963109] E
>> >> [afr-self-heald.c:1479:afr_find_child_position]
>> 0-gfsvolume-replicate-0:
>> >> getxattr failed on gfsvolume-client-0 - (Transport endpoint is not
>> >> connected)
>> >> [2015-03-13 02:09:41.963502] I
>> >> [afr-self-heald.c:1687:afr_dir_exclusive_crawl]
>> 0-gfsvolume-replicate-0:
>> >> Another crawl is in progress for gfsvolume-client-1
>> >> [2015-03-13 02:09:41.967478] E
>> >> [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk]
>> >> 0-gfsvolume-replicate-0: Non Blocking entrylks failed for
>> >> <gfid:66af7dc1-a2e6-4919-9ea1-ad75fe2d40b9>.
>> >> [2015-03-13 02:09:41.968550] E
>> >> [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk]
>> >> 0-gfsvolume-replicate-0: Non Blocking entrylks failed for
>> >> <gfid:8a7cfa39-9a12-43cd-a9f3-9142b7403d0e>.
>> >> [2015-03-13 02:09:41.969663] E
>> >> [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk]
>> >> 0-gfsvolume-replicate-0: Non Blocking entrylks failed for
>> >> <gfid:3762920e-9631-4a52-9a9f-4f04d09e8d84>.
>> >> [2015-03-13 02:09:41.974345] E
>> >> [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk]
>> >> 0-gfsvolume-replicate-0: Non Blocking entrylks failed for
>> >> <gfid:66af7dc1-a2e6-4919-9ea1-ad75fe2d40b9>.
>> >> [2015-03-13 02:09:41.975657] E
>> >> [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk]
>> >> 0-gfsvolume-replicate-0: Non Blocking entrylks failed for
>> >> <gfid:8a7cfa39-9a12-43cd-a9f3-9142b7403d0e>.
>> >> [2015-03-13 02:09:41.977020] E
>> >> [afr-self-heal-entry.c:2364:afr_sh_post_nonblocking_entry_cbk]
>> >> 0-gfsvolume-replicate-0: Non Blocking entrylks failed for
>> >> <gfid:3762920e-9631-4a52-9a9f-4f04d09e8d84>.
>> >> [2015-03-13 02:09:44.307219] I [rpc-clnt.c:1685:rpc_clnt_reconfig]
>> >> 0-gfsvolume-client-0: changing port to 49153 (from 0)
>> >> [2015-03-13 02:09:44.307748] I
>> >> [client-handshake.c:1659:select_server_supported_programs]
>> >> 0-gfsvolume-client-0: Using Program GlusterFS 3.3, Num (1298437),
>> Version
>> >> (330)
>> >> [2015-03-13 02:09:44.448377] I
>> >> [client-handshake.c:1456:client_setvolume_cbk] 0-gfsvolume-client-0:
>> >> Connected to 172.20.20.21:49153, attached to remote volume
>> >> '/export/sda/brick'.
>> >> [2015-03-13 02:09:44.448418] I
>> >> [client-handshake.c:1468:client_setvolume_cbk] 0-gfsvolume-client-0:
>> Server
>> >> and Client lk-version numbers are not same, reopening the fds
>> >> [2015-03-13 02:09:44.448713] I
>> >> [client-handshake.c:450:client_set_lk_version_cbk]
>> 0-gfsvolume-client-0:
>> >> Server lk version = 1
>> >> [2015-03-13 02:09:44.515112] I
>> >> [afr-self-heal-common.c:2859:afr_log_self_heal_completion_status]
>> >> 0-gfsvolume-replicate-0:  foreground data self heal  is successfully
>> >> completed,  data self heal from gfsvolume-client-0  to sinks
>> >>  gfsvolume-client-1, with 892928 bytes on gfsvolume-client-0, 892928
>> bytes
>> >> on gfsvolume-client-1,  data - Pending matrix:  [ [ 0 155762 ] [ 0 0 ]
>> ]
>> >>  on <gfid:123536cc-c34b-43d7-b0c6-cf80eefa8322>
>> >> [2015-03-13 02:09:44.809988] I
>> >> [afr-self-heal-common.c:2859:afr_log_self_heal_completion_status]
>> >> 0-gfsvolume-replicate-0:  foreground data self heal  is successfully
>> >> completed,  data self heal from gfsvolume-client-0  to sinks
>> >>  gfsvolume-client-1, with 15998976 bytes on gfsvolume-client-0,
>> 15998976
>> >> bytes on gfsvolume-client-1,  data - Pending matrix:  [ [ 0 36506 ] [
>> 0 0 ]
>> >> ]  on <gfid:b6dc0e74-31bf-469a-b629-ee51ab4cf729>
>> >> [2015-03-13 02:09:44.946050] W
>> >> [client-rpc-fops.c:574:client3_3_readlink_cbk] 0-gfsvolume-client-0:
>> remote
>> >> operation failed: Stale NFS file handle
>> >> [2015-03-13 02:09:44.946097] I
>> >> [afr-self-heal-entry.c:1538:afr_sh_entry_impunge_readlink_sink_cbk]
>> >> 0-gfsvolume-replicate-0: readlink of
>> >> <gfid:66af7dc1-a2e6-4919-9ea1-ad75fe2d40b9>/PB2_corrected.fastq on
>> >> gfsvolume-client-1 failed (Stale NFS file handle)
>> >> [2015-03-13 02:09:44.951370] I
>> >> [afr-self-heal-entry.c:2321:afr_sh_entry_fix] 0-gfsvolume-replicate-0:
>> >> <gfid:8a7cfa39-9a12-43cd-a9f3-9142b7403d0e>: Performing conservative
>> merge
>> >> [2015-03-13 02:09:45.149995] W
>> >> [client-rpc-fops.c:574:client3_3_readlink_cbk] 0-gfsvolume-client-0:
>> remote
>> >> operation failed: Stale NFS file handle
>> >> [2015-03-13 02:09:45.150036] I
>> >> [afr-self-heal-entry.c:1538:afr_sh_entry_impunge_readlink_sink_cbk]
>> >> 0-gfsvolume-replicate-0: readlink of
>> >> <gfid:8a7cfa39-9a12-43cd-a9f3-9142b7403d0e>/Rscript on
>> gfsvolume-client-1
>> >> failed (Stale NFS file handle)
>> >> [2015-03-13 02:09:45.214253] W
>> >> [client-rpc-fops.c:574:client3_3_readlink_cbk] 0-gfsvolume-client-0:
>> remote
>> >> operation failed: Stale NFS file handle
>> >> [2015-03-13 02:09:45.214295] I
>> >> [afr-self-heal-entry.c:1538:afr_sh_entry_impunge_readlink_sink_cbk]
>> >> 0-gfsvolume-replicate-0: readlink of
>> >> <gfid:3762920e-9631-4a52-9a9f-4f04d09e8d84>/ananas_d_tmp on
>> >> gfsvolume-client-1 failed (Stale NFS file handle)
>> >> [2015-03-13 02:13:27.324856] W [socket.c:522:__socket_rwv]
>> >> 0-gfsvolume-client-1: readv on 172.20.20.22:49153 failed (No data
>> >> available)
>> >> [2015-03-13 02:13:27.324961] I [client.c:2208:client_rpc_notify]
>> >> 0-gfsvolume-client-1: disconnected from 172.20.20.22:49153. Client
>> >> process will keep trying to connect to glusterd until brick's port is
>> >> available
>> >> [2015-03-13 02:13:37.981531] I [rpc-clnt.c:1685:rpc_clnt_reconfig]
>> >> 0-gfsvolume-client-1: changing port to 49153 (from 0)
>> >> [2015-03-13 02:13:37.981781] E [socket.c:2161:socket_connect_finish]
>> >> 0-gfsvolume-client-1: connection to 172.20.20.22:49153 failed
>> (Connection
>> >> refused)
>> >> [2015-03-13 02:13:41.982125] I [rpc-clnt.c:1685:rpc_clnt_reconfig]
>> >> 0-gfsvolume-client-1: changing port to 49153 (from 0)
>> >> [2015-03-13 02:13:41.982353] E [socket.c:2161:socket_connect_finish]
>> >> 0-gfsvolume-client-1: connection to 172.20.20.22:49153 failed
>> (Connection
>> >> refused)
>> >> [2015-03-13 02:13:45.982693] I [rpc-clnt.c:1685:rpc_clnt_reconfig]
>> >> 0-gfsvolume-client-1: changing port to 49153 (from 0)
>> >> [2015-03-13 02:13:45.982926] E [socket.c:2161:socket_connect_finish]
>> >> 0-gfsvolume-client-1: connection to 172.20.20.22:49153 failed
>> (Connection
>> >> refused)
>> >> [2015-03-13 02:13:49.983309] I [rpc-clnt.c:1685:rpc_clnt_reconfig]
>> >> 0-gfsvolume-client-1: changing port to 49153 (from 0)
>> >>
>> >>
>> >
>> > Any help would be greatly appreciated.
>> > Thank You Kindly,
>> > Kaamesh
>> >
>> >
>> >
>> > _______________________________________________
>> > Gluster-users mailing list
>> > Gluster-users at gluster.org
>> > http://www.gluster.org/mailman/listinfo/gluster-users
>> >
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150319/5aa8e709/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gluster_logs.tgz
Type: application/x-gzip
Size: 4273876 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150319/5aa8e709/attachment-0001.tgz>


More information about the Gluster-users mailing list