[Gluster-devel] gluster 3.5.1 beta
Pranith Kumar Karampuri
pkarampu at redhat.com
Sun Jun 15 02:05:13 UTC 2014
Including devel
Pranith
On 06/14/2014 02:37 AM, David F. Robinson wrote:
> Another update... The previous tests have shown that I can kill
> gluster with even a moderate load to the storage system. One thing we
> noticed with previous version of gluster was that the failure was
> sensitive to TCP parameters. I have seen other postings on the web
> noting similar behavior along with recommendations for TCP tuning
> parameters.
>
> When I use the default TCP parameters, the job dies during i/o and
> gluster hangs during the heals with each of the bricks showing "crawl
> in progress". This never clears and the i/o gets killed...
>
> When I set the following parameters in /etc/sysctl.conf, the job runs
> to completion without any issues and I don't get hung heal processes...
>
> # Set by T. Young May 22 2014
> net.core.netdev_max_backlog = 2500
> net.ipv4.tcp_max_syn_backlog = 4096
> net.core.rmem_max=8388608
> net.core.wmem_max=8388608
> net.core.rmem_default=65536
> net.core.wmem_default=65536
> net.ipv4.tcp_rmem=4096 87380 8388608
> net.ipv4.tcp_wmem=4096 65536 8388608
> net.ipv4.tcp_mem=8388608 8388608 8388608
> net.ipv4.route.flush=1
>
> I do still get many thousands of the following messages in the log files:
>
> [2014-06-13 21:05:22.164073] I
> [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241947:
> LOOKUP (null) (89371586-2e16-4623-bc9b-feb069b5c982) ==> (Stale file
> handle)
> [2014-06-13 21:05:22.165627] I
> [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241948:
> LOOKUP (null) (8589b53e-f8b5-4bf9-9f54-f550e4e768c0) ==> (Stale file
> handle)
> [2014-06-13 21:05:22.166395] I
> [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241949:
> LOOKUP (null) (2ad6bcce-4842-4c29-a319-39f276239b8b) ==> (Stale file
> handle)
> [2014-06-13 21:05:22.166989] I
> [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241950:
> LOOKUP (null) (71b013f7-d508-41ee-8bc8-c8b328ff9f3a) ==> (Stale file
> handle)
> [2014-06-13 21:05:22.167653] I
> [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241951:
> LOOKUP (null) (1d0c99a8-b2ab-402c-a8b2-33f55bcf6123) ==> (Stale file
> handle)
> [2014-06-13 21:05:22.168270] I
> [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241952:
> LOOKUP (null) (c4f8b979-cbf3-4d6b-bcf9-6d5150521e19) ==> (Stale file
> handle)
> [2014-06-13 21:05:22.168797] I
> [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241953:
> LOOKUP (null) (81da3d62-49fc-4465-9fb2-baa6a3278ce3) ==> (Stale file
> handle)
> [2014-06-13 21:05:22.169420] I
> [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241954:
> LOOKUP (null) (dc9e9c2b-f801-452c-8ef7-009e600d23ca) ==> (Stale file
> handle)
>
> David
>
>
> ------ Original Message ------
> From: "Justin Clift" <justin at gluster.org>
> To: "David F. Robinson" <david.robinson at corvidtec.com>
> Cc: "Ravishankar N" <ravishankar at redhat.com>; "Pranith Kumar
> Karampuri" <pkarampu at redhat.com>; "Tom Young" <tom.young at corvidtec.com>
> Sent: 6/13/2014 11:16:38 AM
> Subject: Re: gluster 3.5.1 beta
>
>> Thanks, that's good news on the positive progress front. :)
>>
>> + Justin
>>
>> On 13/06/2014, at 4:12 PM, David F. Robinson wrote:
>>> FYI... The 3.5.1beta2 completed the large rsync... The last time I
>>> tried this, the rsync died after about 3-TB; this time it completed
>>> the 8TB transfer... The only messages that seem strange in the logs
>>> after the rsync completed are:
>>>
>>> [2014-06-13 15:09:30.080574] I
>>> [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 227104:
>>> LOOKUP (null) (3cf20fd1-ce27-4fbd-aaa6-cd31aa6a13e5) ==> (Stale file
>>> handle)
>>> [2014-06-13 15:09:30.969218] I
>>> [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 227105:
>>> LOOKUP (null) (b7353434-32a4-4674-9f62-f373d3d1d4f2) ==> (Stale file
>>> handle)
>>> [2014-06-13 15:10:32.814144] I
>>> [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 227114:
>>> LOOKUP (null) (ad34cd69-0c90-4de9-9688-34199f6a3ae1) ==> (Stale file
>>> handle)
>>>
>>> David
>>>
>>>
>>> ------ Original Message ------
>>> From: "Ravishankar N" <ravishankar at redhat.com>
>>> To: "Justin Clift" <justin at gluster.org>
>>> Cc: "Pranith Kumar Karampuri" <pkarampu at redhat.com>; "Tom Young"
>>> <tom.young at corvidtec.com>; "David F. Robinson"
>>> <david.robinson at corvidtec.com>
>>> Sent: 6/13/2014 12:22:58 AM
>>> Subject: Re: gluster 3.5.1 beta
>>>
>>>> On 06/13/2014 04:03 AM, Justin Clift wrote:
>>>>> Testing feedback for 3.5.1 beta2 (was in a different email chain).
>>>>>
>>>>> Some strange looking messages in the logs (scroll down for the
>>>>> better details):
>>>>>
>>>>>> [2014-06-12 22:09:54.482481] E
>>>>>> [index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base
>>>>>> index is not createdunder index/base_indices_holder
>>>> This would be fixed once http://review.gluster.org/#/c/7897/ gets
>>>> accepted.
>>>>> and:
>>>>>
>>>>>> [2014-06-12 21:49:54.326014] E
>>>>>> [afr-self-heald.c:1189:afr_crawl_build_start_loc]
>>>>>> 0-Software-replicate-1: lookup failed on index dir on
>>>>>> Software-client-2 - (Stale file handle)
>>>> We still need to root cause this...
>>>>>
>>>>> + Justin
>>>>>
>>>>>
>>>>> Begin forwarded message:
>>>>>> From: "David F. Robinson" <david.robinson at corvidtec.com>
>>>>> <snip>
>>>>>> FYI. I am retesting the gluster 3.5.1-beta2 using the same
>>>>>> approach as before. I gluster mounted my homegfs partition to a
>>>>>> workstation and am doing an rsync of roughly 8TB of data. The
>>>>>> 3.5.1 version died after transferring roughly 3-4TB with the
>>>>>> errors show in the previous emails. It seems to be doing fine and
>>>>>> has already transferred 2.5TB. The log messages that seemed
>>>>>> strage are:
>>>>>>
>>>>>> [2014-06-12 22:01:59.872521] W [dict.c:1055:data_to_str]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec)
>>>>>> [0x7feb9f28f8ec]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad)
>>>>>> [0x7feb9f293fcd]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x200)
>>>>>> [0x7feb9f293e80]))) 0-dict: data is NULL
>>>>>> [2014-06-12 22:01:59.872540] W [dict.c:1055:data_to_str]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec)
>>>>>> [0x7feb9f28f8ec]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad)
>>>>>> [0x7feb9f293fcd]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x20b)
>>>>>> [0x7feb9f293e8b]))) 0-dict: data is NULL
>>>>>> [2014-06-12 22:01:59.872545] E
>>>>>> [name.c:147:client_fill_address_family] 0-glusterfs:
>>>>>> transport.address-family not specified. Could not guess default
>>>>>> value from (remote-host:(null) or
>>>>>> transport.unix.connect-path:(null)) options
>>>>>> [2014-06-12 22:02:02.872835] W [dict.c:1055:data_to_str]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec)
>>>>>> [0x7feb9f28f8ec]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad)
>>>>>> [0x7feb9f293fcd]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x200)
>>>>>> [0x7feb9f293e80]))) 0-dict: data is NULL
>>>>>> [2014-06-12 22:02:02.872855] W [dict.c:1055:data_to_str]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec)
>>>>>> [0x7feb9f28f8ec]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad)
>>>>>> [0x7feb9f293fcd]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x20b)
>>>>>> [0x7feb9f293e8b]))) 0-dict: data is NULL
>>>>>> [2014-06-12 22:02:02.872860] E
>>>>>> [name.c:147:client_fill_address_family] 0-glusterfs:
>>>>>> transport.address-family not specified. Could not guess default
>>>>>> value from (remote-host:(null) or
>>>>>> transport.unix.connect-path:(null)) options
>>>>>> [2014-06-12 22:02:05.873151] W [dict.c:1055:data_to_str]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec)
>>>>>> [0x7feb9f28f8ec]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad)
>>>>>> [0x7feb9f293fcd]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x200)
>>>>>> [0x7feb9f293e80]))) 0-dict: data is NULL
>>>>>> [2014-06-12 22:02:05.873171] W [dict.c:1055:data_to_str]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec)
>>>>>> [0x7feb9f28f8ec]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad)
>>>>>> [0x7feb9f293fcd]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x20b)
>>>>>> [0x7feb9f293e8b]))) 0-dict: data is NULL
>>>>>> [2014-06-12 22:02:05.873176] E
>>>>>> [name.c:147:client_fill_address_family] 0-glusterfs:
>>>>>> transport.address-family not specified. Could not guess default
>>>>>> value from (remote-host:(null) or
>>>>>> transport.unix.connect-path:(null)) options
>>>>>> [2014-06-12 22:02:08.873483] W [dict.c:1055:data_to_str]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec)
>>>>>> [0x7feb9f28f8ec]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad)
>>>>>> [0x7feb9f293fcd]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x200)
>>>>>> [0x7feb9f293e80]))) 0-dict: data is NULL
>>>>>> [2014-06-12 22:02:08.873504] W [dict.c:1055:data_to_str]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec)
>>>>>> [0x7feb9f28f8ec]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad)
>>>>>> [0x7feb9f293fcd]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x20b)
>>>>>> [0x7feb9f293e8b]))) 0-dict: data is NULL
>>>>>> [2014-06-12 22:02:08.873509] E
>>>>>> [name.c:147:client_fill_address_family] 0-glusterfs:
>>>>>> transport.address-family not specified. Could not guess default
>>>>>> value from (remote-host:(null) or
>>>>>> transport.unix.connect-path:(null)) options
>>>>>> [2014-06-12 22:02:11.873806] W [dict.c:1055:data_to_str]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec)
>>>>>> [0x7feb9f28f8ec]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad)
>>>>>> [0x7feb9f293fcd]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x200)
>>>>>> [0x7feb9f293e80]))) 0-dict: data is NULL
>>>>>> [2014-06-12 22:02:11.873827] W [dict.c:1055:data_to_str]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec)
>>>>>> [0x7feb9f28f8ec]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad)
>>>>>> [0x7feb9f293fcd]
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x20b)
>>>>>> [0x7feb9f293e8b]))) 0-dict: data is NULL
>>>>>> [2014-06-12 22:02:11.873832] E
>>>>>> [name.c:147:client_fill_address_family] 0-glusterfs:
>>>>>> transport.address-family not specified. Could not guess default
>>>>>> value from (remote-host:(null) or
>>>>>> transport.unix.connect-path:(null)) options
>>>>>> [2014-06-12 22:02:46.073341] I [socket.c:3561:socket_init]
>>>>>> 0-glusterfs: SSL support is NOT enabled
>>>>>> [2014-06-12 22:02:46.073369] I [socket.c:3576:socket_init]
>>>>>> 0-glusterfs: using system polling thread
>>>>>>
>>>>>> [2014-06-12 21:29:54.225860] E
>>>>>> [index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base
>>>>>> index is not createdunder index/base_indices_holder
>>>>>> [2014-06-12 21:39:54.276236] E
>>>>>> [index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base
>>>>>> index is not createdunder index/base_indices_holder
>>>>>> [2014-06-12 21:49:54.325532] E
>>>>>> [index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base
>>>>>> index is not createdunder index/base_indices_holder
>>>>>> [2014-06-12 21:59:54.374955] E
>>>>>> [index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base
>>>>>> index is not createdunder index/base_indices_holder
>>>>>> [2014-06-12 22:09:54.482350] E
>>>>>> [index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base
>>>>>> index is not createdunder index/base_indices_holder
>>>>>> [2014-06-12 22:09:54.482481] E
>>>>>> [index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base
>>>>>> index is not createdunder index/base_indices_holder
>>>>>>
>>>>>> I am also still seeing these messages (very strange because
>>>>>> there are no files on on the Software volume. That volume is
>>>>>> completely empty...):
>>>>>>
>>>>>> [2014-06-12 21:49:54.326014] E
>>>>>> [afr-self-heald.c:1189:afr_crawl_build_start_loc]
>>>>>> 0-Software-replicate-1: lookup failed on index dir on
>>>>>> Software-client-2 - (Stale file handle)
>>>>>> [2014-06-12 21:49:54.327077] E
>>>>>> [afr-self-heald.c:1189:afr_crawl_build_start_loc]
>>>>>> 0-Software-replicate-0: lookup failed on index dir on
>>>>>> Software-client-0 - (Stale file handle)
>>>>>> [2014-06-12 21:59:54.373724] E
>>>>>> [afr-self-heald.c:1189:afr_crawl_build_start_loc]
>>>>>> 0-Source-replicate-0: lookup failed on index dir on
>>>>>> Source-client-0 - (Stale file handle)
>>>>>> [2014-06-12 21:59:54.373950] E
>>>>>> [afr-self-heald.c:1189:afr_crawl_build_start_loc]
>>>>>> 0-Source-replicate-1: lookup failed on index dir on
>>>>>> Source-client-2 - (Stale file handle)
>>>>>> [2014-06-12 21:59:54.375302] E
>>>>>> [afr-self-heald.c:1189:afr_crawl_build_start_loc]
>>>>>> 0-Software-replicate-1: lookup failed on index dir on
>>>>>> Software-client-2 - (Stale file handle)
>>>>>> [2014-06-12 21:59:54.376673] E
>>>>>> [afr-self-heald.c:1189:afr_crawl_build_start_loc]
>>>>>> 0-Software-replicate-0: lookup failed on index dir on
>>>>>> Software-client-0 - (Stale file handle)
>>>>>> [2014-06-12 22:09:54.424471] E
>>>>>> [afr-self-heald.c:1189:afr_crawl_build_start_loc]
>>>>>> 0-Source-replicate-0: lookup failed on index dir on
>>>>>> Source-client-0 - (Stale file handle)
>>>>>> [2014-06-12 22:09:54.424667] E
>>>>>> [afr-self-heald.c:1189:afr_crawl_build_start_loc]
>>>>>> 0-Source-replicate-1: lookup failed on index dir on
>>>>>> Source-client-2 - (Stale file handle)
>>>>>> [2014-06-12 22:09:54.482812] E
>>>>>> [afr-self-heald.c:1189:afr_crawl_build_start_loc]
>>>>>> 0-Software-replicate-1: lookup failed on index dir on
>>>>>> Software-client-2 - (Stale file handle)
>>>>>> [2014-06-12 22:09:54.482910] E
>>>>>> [afr-self-heald.c:1189:afr_crawl_build_start_loc]
>>>>>> 0-Software-replicate-0: lookup failed on index dir on
>>>>>> Software-client-0 - (Stale file handle)
>>>>>> David
>>>>>
>>>>>
>>>>> On 12/06/2014, at 3:16 AM, David F. Robinson wrote:
>>>>>> Roger that. Thanks for the feedback. For testing, this approach
>>>>>> would work fine. If we put gluster into production, it would not
>>>>>> be optimal. Taking the entire data storage offline for the
>>>>>> upgrade would be difficult given the number of machines and the
>>>>>> cluster jobs that are always running.
>>>>>>
>>>>>> If you get the rolling upgrade working and need someone to test,
>>>>>> let me know. Happy to test and provide feedback.
>>>>>>
>>>>>> Thanks...
>>>>>>
>>>>>> David (Sent from mobile)
>>>>>>
>>>>>> ===============================
>>>>>> David F. Robinson, Ph.D.
>>>>>> President - Corvid Technologies
>>>>>> 704.799.6944 x101 [office]
>>>>>> 704.252.1310 [cell]
>>>>>> 704.799.7974 [fax]
>>>>>> David.Robinson at corvidtec.com
>>>>>> http://www.corvidtechnologies.com
>>>>> --
>>>>> GlusterFS - http://www.gluster.org
>>>>>
>>>>> An open source, distributed file system scaling to several
>>>>> petabytes, and handling thousands of clients.
>>>>>
>>>>> My personal twitter: twitter.com/realjustinclift
>>>>>
>>>>
>>>
>>
>> --
>> GlusterFS - http://www.gluster.org
>>
>> An open source, distributed file system scaling to several
>> petabytes, and handling thousands of clients.
>>
>> My personal twitter: twitter.com/realjustinclift
>>
>
More information about the Gluster-devel
mailing list