[Gluster-users] Mount fails after outage

Sun Oct 5 08:47:28 UTC 2014

Hello Niels,

That was it.

Thanks for all your help.

Regards,
Guru.

On 5 Oct 2014, at 7:42 pm, Niels de Vos <ndevos at redhat.com> wrote:

> On Sun, Oct 05, 2014 at 07:00:04PM +1100, Gurdeep Singh (Guru) wrote:
>> Hello Niels,
>> 
>> Thanks for pointing me to the right logs file. I see the following when I enable the log-level to DEBUG option:
>> 
>> [2014-10-05 07:57:49.254925] I [glusterfsd.c:1959:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.1 (/usr/sbin/glusterfs --log-level=DEBUG --volfile-server=srv1 --volfile-id=/gv0 /var/www/html/image/)
>> [2014-10-05 07:57:49.255810] D [glusterfsd.c:410:set_fuse_mount_options] 0-glusterfsd: fopen-keep-cache mode 2
>> [2014-10-05 07:57:49.255844] D [glusterfsd.c:466:set_fuse_mount_options] 0-: fuse direct io type 2
>> [2014-10-05 07:57:49.256118] D [options.c:1112:xlator_option_init_double] 0-fuse: option negative-timeout using set value 0.000000
>> [2014-10-05 07:57:49.269774] D [rpc-clnt.c:975:rpc_clnt_connection_init] 0-glusterfs: defaulting frame-timeout to 30mins
>> [2014-10-05 07:57:49.269872] D [rpc-transport.c:262:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/3.5.1/rpc-transport/socket.so
>> [2014-10-05 07:57:49.274941] I [socket.c:3561:socket_init] 0-glusterfs: SSL support is NOT enabled
>> [2014-10-05 07:57:49.274962] I [socket.c:3576:socket_init] 0-glusterfs: using system polling thread
>> [2014-10-05 07:57:49.274987] D [rpc-clnt.c:1427:rpcclnt_cbk_program_register] 0-glusterfs: New program registered: GlusterFS Callback, Num: 52743234, Ver: 1
>> [2014-10-05 07:57:49.281676] D [common-utils.c:248:gf_resolve_ip6] 0-resolver: returning ip-127.0.0.1 (port-24007) for hostname: srv1 and port: 24007
>> [2014-10-05 07:57:49.285116] D [io-stats.c:2700:init] 0-gv0: dangling volume. check volfile 
>> [2014-10-05 07:57:49.285160] D [options.c:1109:xlator_option_init_bool] 0-gv0: option count-fop-hits using set value off
>> [2014-10-05 07:57:49.285173] D [options.c:1109:xlator_option_init_bool] 0-gv0: option latency-measurement using set value off
>> [2014-10-05 07:57:49.285204] D [options.c:1106:xlator_option_init_size] 0-gv0-quick-read: option cache-size using set value 1GB
>> [2014-10-05 07:57:49.285311] D [quick-read.c:822:check_cache_size_ok] 0-gv0-quick-read: Max cache size is 1040621568
>> [2014-10-05 07:57:49.285326] E [quick-read.c:827:check_cache_size_ok] 0-gv0-quick-read: Cache size 1073741824 is greater than the max size of 1040621568
>> [2014-10-05 07:57:49.285336] E [xlator.c:403:xlator_init] 0-gv0-quick-read: Initialization of volume 'gv0-quick-read' failed, review your volfile again
>> [2014-10-05 07:57:49.285345] E [graph.c:307:glusterfs_graph_init] 0-gv0-quick-read: initializing translator failed
>> [2014-10-05 07:57:49.285354] E [graph.c:502:glusterfs_graph_activate] 0-graph: init failed
> 
> This ^^ is where the problem happens. This seems to be the most
> important error:
>    Cache size 1073741824 is greater than the max size of 1040621568
> 
> You should probably set performance.cache-size to something smaller, or
> reset it to the default:
> 
>    # gluster volume reset gv0 performance.cache-size
> 
> After changing this option, you can probably mount the volume again.
> 
> Good luck!
> Niels
> 
> 
>> [2014-10-05 07:57:49.285889] W [glusterfsd.c:1095:cleanup_and_exit] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x7fe1cb7766d5] (-->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x320) [0x40bd50] (-->/usr/sbin/glusterfs(glusterfs_process_volfp+0x106) [0x405146]))) 0-: received signum (0), shutting down
>> [2014-10-05 07:57:49.285904] D [glusterfsd-mgmt.c:2025:glusterfs_mgmt_pmap_signout] 0-fsd-mgmt: portmapper signout arguments not given
>> [2014-10-05 07:57:49.285941] I [fuse-bridge.c:5475:fini] 0-fuse: Unmounting '/var/www/html/image/‘.
>> 
>> No modification was done to the .vol file:
>> 
>> -bash-4.1# cat /etc/glusterfs/glusterd.vol 
>> volume management
>>    type mgmt/glusterd
>>    option working-directory /var/lib/glusterd
>>    option transport-type socket,rdma
>>    option transport.socket.keepalive-time 10
>>    option transport.socket.keepalive-interval 2
>>    option transport.socket.read-fail-log off
>>    option rpc-allow-insecure on
>> #   option base-port 49152
>> end-volume
>> -bash-4.1# 
>> 
>> I only added “option roc-allow-insecure on” as a troubleshooting step on both servers.
>> 
>> Any suggestions?
>> 
>> Thanks,
>> Guru.
>> 
>> 
>> On 5 Oct 2014, at 6:53 pm, Niels de Vos <ndevos at redhat.com> wrote:
>> 
>>> On Sun, Oct 05, 2014 at 03:09:21PM +1100, Gurdeep Singh (Guru) wrote:
>>>> Hello,
>>>> 
>>>> There was an outage on one of our servers and after the reboot, mounting of the folder fails on that server with an error message:
>>>> 
>>>> -bash-4.1# mount -t glusterfs srv1:/gv0 /var/www/html/image/
>>>> Mount failed. Please check the log file for more details.
>>>> -bash-4.1# 
>>> 
>>> The log from this mount attempt would be called something like this:
>>> /var/log/glusterfs/var-www-html-image-.log
>>> 
>>> Maybe you can find some hints there. If there is no obvious error, you
>>> can add '-o log-level=DEBUG' to your mount command to make it more
>>> verbose.
>>> 
>>> HTH,
>>> Niels
>>> 
>>>> 
>>>> Looking at the log file glustershd.log file, I see the following:
>>>> 
>>>> Final graph:
>>>> +------------------------------------------------------------------------------+
>>>> 1: volume gv0-client-0
>>>> 2:     type protocol/client
>>>> 3:     option remote-host srv1
>>>> 4:     option remote-subvolume /root/gluster-vol0
>>>> 5:     option transport-type socket
>>>> 6:     option username 300c24e9-ac51-4735-b1ee-7acdd985ccd5
>>>> 7:     option password 989d61f9-8393-4402-8d3f-988d18e832a6
>>>> 8: end-volume
>>>> 9: 
>>>> 10: volume gv0-client-1
>>>> 11:     type protocol/client
>>>> 12:     option remote-host srv2
>>>> 13:     option remote-subvolume /root/gluster-vol0
>>>> 14:     option transport-type socket
>>>> 15:     option username 300c24e9-ac51-4735-b1ee-7acdd985ccd5
>>>> 16:     option password 989d61f9-8393-4402-8d3f-988d18e832a6
>>>> 17: end-volume
>>>> 18: 
>>>> 19: volume gv0-replicate-0
>>>> 20:     type cluster/replicate
>>>> 21:     option node-uuid c531d907-2f86-4bec-9ae7-8318e28295bc
>>>> 22:     option background-self-heal-count 0
>>>> 23:     option metadata-self-heal on
>>>> 24:     option data-self-heal on
>>>> 25:     option entry-self-heal on
>>>> 26:     option self-heal-daemon on
>>>> 27:     option iam-self-heal-daemon yes
>>>> 28:     subvolumes gv0-client-0 gv0-client-1
>>>> 29: end-volume
>>>> 30: 
>>>> 31: volume glustershd
>>>> 32:     type debug/io-stats
>>>> 33:     subvolumes gv0-replicate-0
>>>> 34: end-volume
>>>> 35: 
>>>> +------------------------------------------------------------------------------+
>>>> [2014-10-05 03:54:30.790905] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-gv0-client-0: changing port to 49152 (from 0)
>>>> [2014-10-05 03:54:30.798689] I [client-handshake.c:1659:select_server_supported_programs] 0-gv0-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
>>>> [2014-10-05 03:54:30.805120] I [client-handshake.c:1456:client_setvolume_cbk] 0-gv0-client-0: Connected to 127.0.0.1:49152, attached to remote volume '/root/gluster-vol0'.
>>>> [2014-10-05 03:54:30.805163] I [client-handshake.c:1468:client_setvolume_cbk] 0-gv0-client-0: Server and Client lk-version numbers are not same, reopening the fds
>>>> [2014-10-05 03:54:30.805250] I [afr-common.c:4120:afr_notify] 0-gv0-replicate-0: Subvolume 'gv0-client-0' came back up; going online.
>>>> [2014-10-05 03:54:30.807784] I [client-handshake.c:450:client_set_lk_version_cbk] 0-gv0-client-0: Server lk version = 1
>>>> [2014-10-05 03:54:30.808566] I [afr-self-heald.c:1687:afr_dir_exclusive_crawl] 0-gv0-replicate-0: Another crawl is in progress for gv0-client-0
>>>> [2014-10-05 03:54:30.808614] E [afr-self-heald.c:1479:afr_find_child_position] 0-gv0-replicate-0: getxattr failed on gv0-client-1 - (Transport endpoint is not connected)
>>>> [2014-10-05 03:54:30.818679] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-gv0-client-1: changing port to 49152 (from 0)
>>>> [2014-10-05 03:54:30.828616] I [client-handshake.c:1659:select_server_supported_programs] 0-gv0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
>>>> [2014-10-05 03:54:30.844354] I [client-handshake.c:1456:client_setvolume_cbk] 0-gv0-client-1: Connected to 10.8.0.6:49152, attached to remote volume '/root/gluster-vol0'.
>>>> [2014-10-05 03:54:30.844388] I [client-handshake.c:1468:client_setvolume_cbk] 0-gv0-client-1: Server and Client lk-version numbers are not same, reopening the fds
>>>> [2014-10-05 03:54:30.849128] I [client-handshake.c:450:client_set_lk_version_cbk] 0-gv0-client-1: Server lk version = 1
>>>> 
>>>> There is a nfs.log file that shows this:
>>>> 
>>>> Final graph:
>>>> +------------------------------------------------------------------------------+
>>>> 1: volume gv0-client-0
>>>> 2:     type protocol/client
>>>> 3:     option remote-host srv1
>>>> 4:     option remote-subvolume /root/gluster-vol0
>>>> 5:     option transport-type socket
>>>> 6:     option username 300c24e9-ac51-4735-b1ee-7acdd985ccd5
>>>> 7:     option password 989d61f9-8393-4402-8d3f-988d18e832a6
>>>> 8:     option send-gids true
>>>> 9: end-volume
>>>> 10: 
>>>> 11: volume gv0-client-1
>>>> 12:     type protocol/client
>>>> 13:     option remote-host srv2
>>>> 14:     option remote-subvolume /root/gluster-vol0
>>>> 15:     option transport-type socket
>>>> 16:     option username 300c24e9-ac51-4735-b1ee-7acdd985ccd5
>>>> 17:     option password 989d61f9-8393-4402-8d3f-988d18e832a6
>>>> 18:     option send-gids true
>>>> 19: end-volume
>>>> 20: 
>>>> 21: volume gv0-replicate-0
>>>> 22:     type cluster/replicate
>>>> 23:     subvolumes gv0-client-0 gv0-client-1
>>>> 24: end-volume
>>>> 25: 
>>>> 26: volume gv0-dht
>>>> 27:     type cluster/distribute
>>>> 28:     option lookup-unhashed on
>>>> 29:     subvolumes gv0-replicate-0
>>>> 30: end-volume
>>>> 31: 
>>>> 32: volume gv0-write-behind
>>>> 33:     type performance/write-behind
>>>> 34:     subvolumes gv0-dht
>>>> 35: end-volume
>>>> 36: 
>>>> 37: volume gv0
>>>> 38:     type debug/io-stats
>>>> 39:     option latency-measurement off
>>>> 40:     option count-fop-hits off
>>>> 41:     subvolumes gv0-write-behind
>>>> 42: end-volume
>>>> 43: 
>>>> 44: volume nfs-server
>>>> 45:     type nfs/server
>>>> 46:     option rpc-auth.auth-glusterfs on
>>>> 47:     option rpc-auth.auth-unix on
>>>> 48:     option rpc-auth.auth-null on
>>>> 49:     option rpc-auth.ports.insecure on
>>>> 50:     option rpc-auth-allow-insecure on
>>>> 51:     option transport-type socket
>>>> 52:     option transport.socket.listen-port 2049
>>>> 53:     option nfs.dynamic-volumes on
>>>> 54:     option nfs.nlm on
>>>> 55:     option nfs.drc off
>>>> 56:     option rpc-auth.addr.gv0.allow *
>>>> 57:     option nfs3.gv0.volume-id dc8dc3f2-f5bd-4047-9101-acad04695442
>>>> 58:     subvolumes gv0
>>>> 59: end-volume
>>>> 60: 
>>>> +------------------------------------------------------------------------------+
>>>> [2014-10-05 03:54:30.832422] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-gv0-client-0: changing port to 49152 (from 0)
>>>> [2014-10-05 03:54:30.835888] I [client-handshake.c:1659:select_server_supported_programs] 0-gv0-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
>>>> [2014-10-05 03:54:30.836157] I [client-handshake.c:1456:client_setvolume_cbk] 0-gv0-client-0: Connected to 127.0.0.1:49152, attached to remote volume '/root/gluster-vol0'.
>>>> [2014-10-05 03:54:30.836174] I [client-handshake.c:1468:client_setvolume_cbk] 0-gv0-client-0: Server and Client lk-version numbers are not same, reopening the fds
>>>> [2014-10-05 03:54:30.836393] I [afr-common.c:4120:afr_notify] 0-gv0-replicate-0: Subvolume 'gv0-client-0' came back up; going online.
>>>> [2014-10-05 03:54:30.836430] I [client-handshake.c:450:client_set_lk_version_cbk] 0-gv0-client-0: Server lk version = 1
>>>> [2014-10-05 03:54:30.839191] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-gv0-client-1: changing port to 49152 (from 0)
>>>> [2014-10-05 03:54:30.850953] I [client-handshake.c:1659:select_server_supported_programs] 0-gv0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
>>>> [2014-10-05 03:54:30.851821] I [client-handshake.c:1456:client_setvolume_cbk] 0-gv0-client-1: Connected to 10.8.0.6:49152, attached to remote volume '/root/gluster-vol0'.
>>>> [2014-10-05 03:54:30.851843] I [client-handshake.c:1468:client_setvolume_cbk] 0-gv0-client-1: Server and Client lk-version numbers are not same, reopening the fds
>>>> [2014-10-05 03:54:30.853062] I [client-handshake.c:450:client_set_lk_version_cbk] 0-gv0-client-1: Server lk version = 1
>>>> 
>>>> srv1 (10.8.0.1) is also a VPN server that the srv2 (10.8.0.6) connects to. 
>>>> 
>>>> The volume on srv1,srv2 seems to be up:
>>>> 
>>>> -bash-4.1# gluster volume info
>>>> 
>>>> Volume Name: gv0
>>>> Type: Replicate
>>>> Volume ID: dc8dc3f2-f5bd-4047-9101-acad04695442
>>>> Status: Started
>>>> Number of Bricks: 1 x 2 = 2
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: srv1:/root/gluster-vol0
>>>> Brick2: srv2:/root/gluster-vol0
>>>> Options Reconfigured:
>>>> cluster.lookup-unhashed: on
>>>> performance.cache-refresh-timeout: 60
>>>> performance.cache-size: 1GB
>>>> storage.health-check-interval: 30
>>>> 
>>>> [guru at srv2 ~]$ sudo gluster volume info
>>>> 
>>>> Volume Name: gv0
>>>> Type: Replicate
>>>> Volume ID: dc8dc3f2-f5bd-4047-9101-acad04695442
>>>> Status: Started
>>>> Number of Bricks: 1 x 2 = 2
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: srv1:/root/gluster-vol0
>>>> Brick2: srv2:/root/gluster-vol0
>>>> Options Reconfigured:
>>>> cluster.lookup-unhashed: on
>>>> performance.cache-refresh-timeout: 60
>>>> performance.cache-size: 1GB
>>>> storage.health-check-interval: 30
>>>> [guru at srv2 ~]$ 
>>>> 
>>>> 
>>>> But, still I am not able to mount the folder into the volume.
>>>> 
>>>> Please suggest how can we troubleshoot this issue.
>>>> 
>>>> Regards,
>>>> Guru.
>>>> 
>>> 
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>