[Gluster-devel] gluster 3.5.1 beta

Pranith Kumar Karampuri pkarampu at redhat.com
Sun Jun 15 02:05:13 UTC 2014


Including devel

Pranith
On 06/14/2014 02:37 AM, David F. Robinson wrote:
> Another update... The previous tests have shown that I can kill 
> gluster with even a moderate load to the storage system. One thing we 
> noticed with previous version of gluster was that the failure was 
> sensitive to TCP parameters.  I have seen other postings on the web 
> noting similar behavior along with recommendations for TCP tuning 
> parameters.
>
> When I use the default TCP parameters, the job dies during i/o and 
> gluster hangs during the heals with each of the bricks showing "crawl 
> in progress".  This never clears and the i/o gets killed...
>
> When I set the following parameters in /etc/sysctl.conf, the job runs 
> to completion without any issues and I don't get hung heal processes...
>
> # Set by T. Young May 22 2014
> net.core.netdev_max_backlog = 2500
> net.ipv4.tcp_max_syn_backlog = 4096
> net.core.rmem_max=8388608
> net.core.wmem_max=8388608
> net.core.rmem_default=65536
> net.core.wmem_default=65536
> net.ipv4.tcp_rmem=4096 87380 8388608
> net.ipv4.tcp_wmem=4096 65536 8388608
> net.ipv4.tcp_mem=8388608 8388608 8388608
> net.ipv4.route.flush=1
>
> I do still get many thousands of the following messages in the log files:
>
> [2014-06-13 21:05:22.164073] I 
> [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241947: 
> LOOKUP (null) (89371586-2e16-4623-bc9b-feb069b5c982) ==> (Stale file 
> handle)
> [2014-06-13 21:05:22.165627] I 
> [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241948: 
> LOOKUP (null) (8589b53e-f8b5-4bf9-9f54-f550e4e768c0) ==> (Stale file 
> handle)
> [2014-06-13 21:05:22.166395] I 
> [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241949: 
> LOOKUP (null) (2ad6bcce-4842-4c29-a319-39f276239b8b) ==> (Stale file 
> handle)
> [2014-06-13 21:05:22.166989] I 
> [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241950: 
> LOOKUP (null) (71b013f7-d508-41ee-8bc8-c8b328ff9f3a) ==> (Stale file 
> handle)
> [2014-06-13 21:05:22.167653] I 
> [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241951: 
> LOOKUP (null) (1d0c99a8-b2ab-402c-a8b2-33f55bcf6123) ==> (Stale file 
> handle)
> [2014-06-13 21:05:22.168270] I 
> [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241952: 
> LOOKUP (null) (c4f8b979-cbf3-4d6b-bcf9-6d5150521e19) ==> (Stale file 
> handle)
> [2014-06-13 21:05:22.168797] I 
> [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241953: 
> LOOKUP (null) (81da3d62-49fc-4465-9fb2-baa6a3278ce3) ==> (Stale file 
> handle)
> [2014-06-13 21:05:22.169420] I 
> [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 241954: 
> LOOKUP (null) (dc9e9c2b-f801-452c-8ef7-009e600d23ca) ==> (Stale file 
> handle)
>
> David
>
>
> ------ Original Message ------
> From: "Justin Clift" <justin at gluster.org>
> To: "David F. Robinson" <david.robinson at corvidtec.com>
> Cc: "Ravishankar N" <ravishankar at redhat.com>; "Pranith Kumar 
> Karampuri" <pkarampu at redhat.com>; "Tom Young" <tom.young at corvidtec.com>
> Sent: 6/13/2014 11:16:38 AM
> Subject: Re: gluster 3.5.1 beta
>
>> Thanks, that's good news on the positive progress front. :)
>>
>> + Justin
>>
>> On 13/06/2014, at 4:12 PM, David F. Robinson wrote:
>>>  FYI... The 3.5.1beta2 completed the large rsync... The last time I 
>>> tried this, the rsync died after about 3-TB; this time it completed 
>>> the 8TB transfer... The only messages that seem strange in the logs 
>>> after the rsync completed are:
>>>
>>>  [2014-06-13 15:09:30.080574] I 
>>> [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 227104: 
>>> LOOKUP (null) (3cf20fd1-ce27-4fbd-aaa6-cd31aa6a13e5) ==> (Stale file 
>>> handle)
>>>  [2014-06-13 15:09:30.969218] I 
>>> [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 227105: 
>>> LOOKUP (null) (b7353434-32a4-4674-9f62-f373d3d1d4f2) ==> (Stale file 
>>> handle)
>>>  [2014-06-13 15:10:32.814144] I 
>>> [server-rpc-fops.c:154:server_lookup_cbk] 0-homegfs-server: 227114: 
>>> LOOKUP (null) (ad34cd69-0c90-4de9-9688-34199f6a3ae1) ==> (Stale file 
>>> handle)
>>>
>>>  David
>>>
>>>
>>>  ------ Original Message ------
>>>  From: "Ravishankar N" <ravishankar at redhat.com>
>>>  To: "Justin Clift" <justin at gluster.org>
>>>  Cc: "Pranith Kumar Karampuri" <pkarampu at redhat.com>; "Tom Young" 
>>> <tom.young at corvidtec.com>; "David F. Robinson" 
>>> <david.robinson at corvidtec.com>
>>>  Sent: 6/13/2014 12:22:58 AM
>>>  Subject: Re: gluster 3.5.1 beta
>>>
>>>>  On 06/13/2014 04:03 AM, Justin Clift wrote:
>>>>>  Testing feedback for 3.5.1 beta2 (was in a different email chain).
>>>>>
>>>>>  Some strange looking messages in the logs (scroll down for the
>>>>>  better details):
>>>>>
>>>>>>  [2014-06-12 22:09:54.482481] E 
>>>>>> [index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base 
>>>>>> index is not createdunder index/base_indices_holder
>>>>  This would be fixed once http://review.gluster.org/#/c/7897/ gets 
>>>> accepted.
>>>>>  and:
>>>>>
>>>>>>  [2014-06-12 21:49:54.326014] E 
>>>>>> [afr-self-heald.c:1189:afr_crawl_build_start_loc] 
>>>>>> 0-Software-replicate-1: lookup failed on index dir on 
>>>>>> Software-client-2 - (Stale file handle)
>>>>  We still need to root cause this...
>>>>>
>>>>>  + Justin
>>>>>
>>>>>
>>>>>  Begin forwarded message:
>>>>>>  From: "David F. Robinson" <david.robinson at corvidtec.com>
>>>>>  <snip>
>>>>>>  FYI. I am retesting the gluster 3.5.1-beta2 using the same 
>>>>>> approach as before. I gluster mounted my homegfs partition to a 
>>>>>> workstation and am doing an rsync of roughly 8TB of data. The 
>>>>>> 3.5.1 version died after transferring roughly 3-4TB with the 
>>>>>> errors show in the previous emails. It seems to be doing fine and 
>>>>>> has already transferred 2.5TB. The log messages that seemed 
>>>>>> strage are:
>>>>>>
>>>>>>  [2014-06-12 22:01:59.872521] W [dict.c:1055:data_to_str] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec) 
>>>>>> [0x7feb9f28f8ec] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad) 
>>>>>> [0x7feb9f293fcd] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x200) 
>>>>>> [0x7feb9f293e80]))) 0-dict: data is NULL
>>>>>>  [2014-06-12 22:01:59.872540] W [dict.c:1055:data_to_str] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec) 
>>>>>> [0x7feb9f28f8ec] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad) 
>>>>>> [0x7feb9f293fcd] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x20b) 
>>>>>> [0x7feb9f293e8b]))) 0-dict: data is NULL
>>>>>>  [2014-06-12 22:01:59.872545] E 
>>>>>> [name.c:147:client_fill_address_family] 0-glusterfs: 
>>>>>> transport.address-family not specified. Could not guess default 
>>>>>> value from (remote-host:(null) or 
>>>>>> transport.unix.connect-path:(null)) options
>>>>>>  [2014-06-12 22:02:02.872835] W [dict.c:1055:data_to_str] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec) 
>>>>>> [0x7feb9f28f8ec] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad) 
>>>>>> [0x7feb9f293fcd] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x200) 
>>>>>> [0x7feb9f293e80]))) 0-dict: data is NULL
>>>>>>  [2014-06-12 22:02:02.872855] W [dict.c:1055:data_to_str] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec) 
>>>>>> [0x7feb9f28f8ec] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad) 
>>>>>> [0x7feb9f293fcd] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x20b) 
>>>>>> [0x7feb9f293e8b]))) 0-dict: data is NULL
>>>>>>  [2014-06-12 22:02:02.872860] E 
>>>>>> [name.c:147:client_fill_address_family] 0-glusterfs: 
>>>>>> transport.address-family not specified. Could not guess default 
>>>>>> value from (remote-host:(null) or 
>>>>>> transport.unix.connect-path:(null)) options
>>>>>>  [2014-06-12 22:02:05.873151] W [dict.c:1055:data_to_str] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec) 
>>>>>> [0x7feb9f28f8ec] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad) 
>>>>>> [0x7feb9f293fcd] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x200) 
>>>>>> [0x7feb9f293e80]))) 0-dict: data is NULL
>>>>>>  [2014-06-12 22:02:05.873171] W [dict.c:1055:data_to_str] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec) 
>>>>>> [0x7feb9f28f8ec] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad) 
>>>>>> [0x7feb9f293fcd] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x20b) 
>>>>>> [0x7feb9f293e8b]))) 0-dict: data is NULL
>>>>>>  [2014-06-12 22:02:05.873176] E 
>>>>>> [name.c:147:client_fill_address_family] 0-glusterfs: 
>>>>>> transport.address-family not specified. Could not guess default 
>>>>>> value from (remote-host:(null) or 
>>>>>> transport.unix.connect-path:(null)) options
>>>>>>  [2014-06-12 22:02:08.873483] W [dict.c:1055:data_to_str] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec) 
>>>>>> [0x7feb9f28f8ec] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad) 
>>>>>> [0x7feb9f293fcd] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x200) 
>>>>>> [0x7feb9f293e80]))) 0-dict: data is NULL
>>>>>>  [2014-06-12 22:02:08.873504] W [dict.c:1055:data_to_str] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec) 
>>>>>> [0x7feb9f28f8ec] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad) 
>>>>>> [0x7feb9f293fcd] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x20b) 
>>>>>> [0x7feb9f293e8b]))) 0-dict: data is NULL
>>>>>>  [2014-06-12 22:02:08.873509] E 
>>>>>> [name.c:147:client_fill_address_family] 0-glusterfs: 
>>>>>> transport.address-family not specified. Could not guess default 
>>>>>> value from (remote-host:(null) or 
>>>>>> transport.unix.connect-path:(null)) options
>>>>>>  [2014-06-12 22:02:11.873806] W [dict.c:1055:data_to_str] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec) 
>>>>>> [0x7feb9f28f8ec] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad) 
>>>>>> [0x7feb9f293fcd] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x200) 
>>>>>> [0x7feb9f293e80]))) 0-dict: data is NULL
>>>>>>  [2014-06-12 22:02:11.873827] W [dict.c:1055:data_to_str] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(+0x68ec) 
>>>>>> [0x7feb9f28f8ec] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0xad) 
>>>>>> [0x7feb9f293fcd] 
>>>>>> (-->/usr/lib64/glusterfs/3.5.1beta2/rpc-transport/socket.so(client_fill_address_family+0x20b) 
>>>>>> [0x7feb9f293e8b]))) 0-dict: data is NULL
>>>>>>  [2014-06-12 22:02:11.873832] E 
>>>>>> [name.c:147:client_fill_address_family] 0-glusterfs: 
>>>>>> transport.address-family not specified. Could not guess default 
>>>>>> value from (remote-host:(null) or 
>>>>>> transport.unix.connect-path:(null)) options
>>>>>>  [2014-06-12 22:02:46.073341] I [socket.c:3561:socket_init] 
>>>>>> 0-glusterfs: SSL support is NOT enabled
>>>>>>  [2014-06-12 22:02:46.073369] I [socket.c:3576:socket_init] 
>>>>>> 0-glusterfs: using system polling thread
>>>>>>
>>>>>>  [2014-06-12 21:29:54.225860] E 
>>>>>> [index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base 
>>>>>> index is not createdunder index/base_indices_holder
>>>>>>  [2014-06-12 21:39:54.276236] E 
>>>>>> [index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base 
>>>>>> index is not createdunder index/base_indices_holder
>>>>>>  [2014-06-12 21:49:54.325532] E 
>>>>>> [index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base 
>>>>>> index is not createdunder index/base_indices_holder
>>>>>>  [2014-06-12 21:59:54.374955] E 
>>>>>> [index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base 
>>>>>> index is not createdunder index/base_indices_holder
>>>>>>  [2014-06-12 22:09:54.482350] E 
>>>>>> [index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base 
>>>>>> index is not createdunder index/base_indices_holder
>>>>>>  [2014-06-12 22:09:54.482481] E 
>>>>>> [index.c:267:check_delete_stale_index_file] 0-homegfs-index: Base 
>>>>>> index is not createdunder index/base_indices_holder
>>>>>>
>>>>>>  I am also still seeing these messages (very strange because 
>>>>>> there are no files on on the Software volume. That volume is 
>>>>>> completely empty...):
>>>>>>
>>>>>>  [2014-06-12 21:49:54.326014] E 
>>>>>> [afr-self-heald.c:1189:afr_crawl_build_start_loc] 
>>>>>> 0-Software-replicate-1: lookup failed on index dir on 
>>>>>> Software-client-2 - (Stale file handle)
>>>>>>  [2014-06-12 21:49:54.327077] E 
>>>>>> [afr-self-heald.c:1189:afr_crawl_build_start_loc] 
>>>>>> 0-Software-replicate-0: lookup failed on index dir on 
>>>>>> Software-client-0 - (Stale file handle)
>>>>>>  [2014-06-12 21:59:54.373724] E 
>>>>>> [afr-self-heald.c:1189:afr_crawl_build_start_loc] 
>>>>>> 0-Source-replicate-0: lookup failed on index dir on 
>>>>>> Source-client-0 - (Stale file handle)
>>>>>>  [2014-06-12 21:59:54.373950] E 
>>>>>> [afr-self-heald.c:1189:afr_crawl_build_start_loc] 
>>>>>> 0-Source-replicate-1: lookup failed on index dir on 
>>>>>> Source-client-2 - (Stale file handle)
>>>>>>  [2014-06-12 21:59:54.375302] E 
>>>>>> [afr-self-heald.c:1189:afr_crawl_build_start_loc] 
>>>>>> 0-Software-replicate-1: lookup failed on index dir on 
>>>>>> Software-client-2 - (Stale file handle)
>>>>>>  [2014-06-12 21:59:54.376673] E 
>>>>>> [afr-self-heald.c:1189:afr_crawl_build_start_loc] 
>>>>>> 0-Software-replicate-0: lookup failed on index dir on 
>>>>>> Software-client-0 - (Stale file handle)
>>>>>>  [2014-06-12 22:09:54.424471] E 
>>>>>> [afr-self-heald.c:1189:afr_crawl_build_start_loc] 
>>>>>> 0-Source-replicate-0: lookup failed on index dir on 
>>>>>> Source-client-0 - (Stale file handle)
>>>>>>  [2014-06-12 22:09:54.424667] E 
>>>>>> [afr-self-heald.c:1189:afr_crawl_build_start_loc] 
>>>>>> 0-Source-replicate-1: lookup failed on index dir on 
>>>>>> Source-client-2 - (Stale file handle)
>>>>>>  [2014-06-12 22:09:54.482812] E 
>>>>>> [afr-self-heald.c:1189:afr_crawl_build_start_loc] 
>>>>>> 0-Software-replicate-1: lookup failed on index dir on 
>>>>>> Software-client-2 - (Stale file handle)
>>>>>>  [2014-06-12 22:09:54.482910] E 
>>>>>> [afr-self-heald.c:1189:afr_crawl_build_start_loc] 
>>>>>> 0-Software-replicate-0: lookup failed on index dir on 
>>>>>> Software-client-0 - (Stale file handle)
>>>>>>  David
>>>>>
>>>>>
>>>>>  On 12/06/2014, at 3:16 AM, David F. Robinson wrote:
>>>>>>  Roger that. Thanks for the feedback. For testing, this approach 
>>>>>> would work fine. If we put gluster into production, it would not 
>>>>>> be optimal. Taking the entire data storage offline for the 
>>>>>> upgrade would be difficult given the number of machines and the 
>>>>>> cluster jobs that are always running.
>>>>>>
>>>>>>  If you get the rolling upgrade working and need someone to test, 
>>>>>> let me know. Happy to test and provide feedback.
>>>>>>
>>>>>>  Thanks...
>>>>>>
>>>>>>  David (Sent from mobile)
>>>>>>
>>>>>>  ===============================
>>>>>>  David F. Robinson, Ph.D.
>>>>>>  President - Corvid Technologies
>>>>>>  704.799.6944 x101 [office]
>>>>>>  704.252.1310 [cell]
>>>>>>  704.799.7974 [fax]
>>>>>>  David.Robinson at corvidtec.com
>>>>>>  http://www.corvidtechnologies.com
>>>>>  --
>>>>>  GlusterFS - http://www.gluster.org
>>>>>
>>>>>  An open source, distributed file system scaling to several
>>>>>  petabytes, and handling thousands of clients.
>>>>>
>>>>>  My personal twitter: twitter.com/realjustinclift
>>>>>
>>>>
>>>
>>
>> -- 
>> GlusterFS - http://www.gluster.org
>>
>> An open source, distributed file system scaling to several
>> petabytes, and handling thousands of clients.
>>
>> My personal twitter: twitter.com/realjustinclift
>>
>



More information about the Gluster-devel mailing list