[Gluster-users] rebalance fix layout necessary

Amudhan P amudhan83 at gmail.com
Thu Apr 13 07:21:16 UTC 2017


I have another issue now after expanding cluster folder listing time is
increased to 400%.

I have also tried to enable readdir-ahead & parallel-readdir but was not
showing any improvement in folder listing but started with an issue in
listing folders like random folders disappeared from listing and data read
shows IO error.

Tried disabling Cluster.readdir-optimze and remount fuse client but issue
continued. so, disabled readdir-ahead & parallel-readdir  and enabled
Cluster.readdir-optimze everything works fine.

How do I bring down folder listing time?


Below is my config in Volume :
Options Reconfigured:
nfs.disable: yes
cluster.disperse-self-heal-daemon: enable
cluster.weighted-rebalance: off
cluster.rebal-throttle: aggressive
performance.readdir-ahead: off
cluster.min-free-disk: 10%
features.default-soft-limit: 80%
performance.force-readdirp: no
dht.force-readdirp: off
cluster.readdir-optimize: on
cluster.heal-timeout: 43200
cluster.data-self-heal: on

On Fri, Apr 7, 2017 at 7:35 PM, Amudhan P <amudhan83 at gmail.com> wrote:

> Volume type:
> Disperse Volume  8+2  = 1080 bricks
>
> First time added 8+2 * 3 sets and it started giving issue in listing
> folder. so, remounted mount point and it was working fine.
>
> Second added 8+2 *13 sets and it also had the same issue.
>
> when listing folder it was returning an empty folder or not showing all
> the folders.
>
> when ongoing write was interrupted it throws an error destination not
> folder not available.
>
> adding few more lines from log.. let me know if you need full log file.
>
> [2017-04-05 13:40:03.702624] I [glusterfsd-mgmt.c:52:mgmt_cbk_spec]
> 0-mgmt: Volume file changed
> [2017-04-05 13:40:04.970055] I [MSGID: 122067] [ec-code.c:1046:ec_code_detect]
> 2-gfs-vol-disperse-123: Using 'sse' CPU extensions
> [2017-04-05 13:40:04.971194] I [MSGID: 122067] [ec-code.c:1046:ec_code_detect]
> 2-gfs-vol-disperse-122: Using 'sse' CPU extensions
> [2017-04-05 13:40:04.972144] I [MSGID: 122067] [ec-code.c:1046:ec_code_detect]
> 2-gfs-vol-disperse-121: Using 'sse' CPU extensions
> [2017-04-05 13:40:04.973131] I [MSGID: 122067] [ec-code.c:1046:ec_code_detect]
> 2-gfs-vol-disperse-120: Using 'sse' CPU extensions
> [2017-04-05 13:40:04.974072] I [MSGID: 122067] [ec-code.c:1046:ec_code_detect]
> 2-gfs-vol-disperse-119: Using 'sse' CPU extensions
> [2017-04-05 13:40:04.975005] I [MSGID: 122067] [ec-code.c:1046:ec_code_detect]
> 2-gfs-vol-disperse-118: Using 'sse' CPU extensions
> [2017-04-05 13:40:04.975936] I [MSGID: 122067] [ec-code.c:1046:ec_code_detect]
> 2-gfs-vol-disperse-117: Using 'sse' CPU extensions
> [2017-04-05 13:40:04.976905] I [MSGID: 122067] [ec-code.c:1046:ec_code_detect]
> 2-gfs-vol-disperse-116: Using 'sse' CPU extensions
> [2017-04-05 13:40:04.977825] I [MSGID: 122067] [ec-code.c:1046:ec_code_detect]
> 2-gfs-vol-disperse-115: Using 'sse' CPU extensions
> [2017-04-05 13:40:04.978755] I [MSGID: 122067] [ec-code.c:1046:ec_code_detect]
> 2-gfs-vol-disperse-114: Using 'sse' CPU extensions
> [2017-04-05 13:40:04.979689] I [MSGID: 122067] [ec-code.c:1046:ec_code_detect]
> 2-gfs-vol-disperse-113: Using 'sse' CPU extensions
> [2017-04-05 13:40:04.980626] I [MSGID: 122067] [ec-code.c:1046:ec_code_detect]
> 2-gfs-vol-disperse-112: Using 'sse' CPU extensions
> [2017-04-05 13:40:07.270412] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-736: changing port to 49153 (from 0)
> [2017-04-05 13:40:07.271902] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-746: changing port to 49154 (from 0)
> [2017-04-05 13:40:07.272076] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-756: changing port to 49155 (from 0)
> [2017-04-05 13:40:07.273154] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-766: changing port to 49156 (from 0)
> [2017-04-05 13:40:07.273193] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-776: changing port to 49157 (from 0)
> [2017-04-05 13:40:07.273371] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk]
> 2-gfs-vol-client-579: Connected to gfs-vol-client-579, attached to remote
> volume '/media/disk22/brick22'.
> [2017-04-05 13:40:07.273388] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk]
> 2-gfs-vol-client-579: Server and Client lk-version numbers are not same,
> reopening the fds
> [2017-04-05 13:40:07.273435] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk]
> 2-gfs-vol-client-433: Server lk version = 1
> [2017-04-05 13:40:07.275632] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-786: changing port to 49158 (from 0)
> [2017-04-05 13:40:07.275685] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk]
> 2-gfs-vol-client-589: Connected to gfs-vol-client-589, attached to remote
> volume '/media/disk23/brick23'.
> [2017-04-05 13:40:07.275707] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk]
> 2-gfs-vol-client-589: Server and Client lk-version numbers are not same,
> reopening the fds
> [2017-04-05 13:40:07.087011] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-811: changing port to 49161 (from 0)
> [2017-04-05 13:40:07.087031] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-420: changing port to 49158 (from 0)
> [2017-04-05 13:40:07.087045] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-521: changing port to 49168 (from 0)
> [2017-04-05 13:40:07.087060] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-430: changing port to 49159 (from 0)
> [2017-04-05 13:40:07.087074] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-531: changing port to 49169 (from 0)
> [2017-04-05 13:40:07.087098] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-440: changing port to 49160 (from 0)
> [2017-04-05 13:40:07.087105] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-821: changing port to 49162 (from 0)
> [2017-04-05 13:40:07.087117] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-450: changing port to 49161 (from 0)
> [2017-04-05 13:40:07.087131] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-831: changing port to 49163 (from 0)
> [2017-04-05 13:40:07.087134] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-460: changing port to 49162 (from 0)
> [2017-04-05 13:40:07.087157] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-841: changing port to 49164 (from 0)
> [2017-04-05 13:40:07.087181] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-541: changing port to 49170 (from 0)
> [2017-04-05 13:40:07.087185] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-470: changing port to 49163 (from 0)
> [2017-04-05 13:40:07.087202] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-851: changing port to 49165 (from 0)
> [2017-04-05 13:40:07.087241] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-480: changing port to 49164 (from 0)
> [2017-04-05 13:40:07.087240] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-551: changing port to 49171 (from 0)
> [2017-04-05 13:40:07.087263] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-861: changing port to 49166 (from 0)
> [2017-04-05 13:40:07.087281] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-571: changing port to 49173 (from 0)
> [2017-04-05 13:40:07.087284] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-561: changing port to 49172 (from 0)
> [2017-04-05 13:40:07.087318] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-581: changing port to 49174 (from 0)
> [2017-04-05 13:40:07.087318] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-490: changing port to 49165 (from 0)
> [2017-04-05 13:40:07.087344] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-500: changing port to 49166 (from 0)
> [2017-04-05 13:40:07.087352] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-871: changing port to 49167 (from 0)
> [2017-04-05 13:40:07.087372] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 2-gfs-vol-client-591: changing port to 49175 (from 0)
>
> [2017-04-05 13:40:07.681293] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk]
> 2-gfs-vol-client-755: Connected to gfs-vol-client-755, attached to remote
> volume '/media/disk4/brick4'.
> [2017-04-05 13:40:07.681312] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk]
> 2-gfs-vol-client-755: Server and Client lk-version numbers are not same,
> reopening the fds
> [2017-04-05 13:40:07.681317] I [MSGID: 122061] [ec.c:340:ec_up]
> 2-gfs-vol-disperse-74: Going UP
> [2017-04-05 13:40:07.681428] I [MSGID: 122061] [ec.c:340:ec_up]
> 2-gfs-vol-disperse-75: Going UP
> [2017-04-05 13:40:07.681454] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk]
> 2-gfs-vol-client-1049: Connected to gfs-vol-client-1049, attached to remote
> volume '/media/disk33/brick33'.
> [2017-04-05 13:45:10.689344] I [MSGID: 114018] [client.c:2276:client_rpc_notify]
> 0-gfs-vol-client-71: disconnected from gfs-vol-client-71. Client process
> will keep trying to connect to glusterd until brick's port is available
> [2017-04-05 13:45:10.689376] I [MSGID: 114021] [client.c:2361:notify]
> 0-gfs-vol-client-73: current graph is no longer active, destroying
> rpc_client
> [2017-04-05 13:45:10.689380] I [MSGID: 114018] [client.c:2276:client_rpc_notify]
> 0-gfs-vol-client-72: disconnected from gfs-vol-client-72. Client process
> will keep trying to connect to glusterd until brick's port is available
> [2017-04-05 13:45:10.689389] I [MSGID: 114021] [client.c:2361:notify]
> 0-gfs-vol-client-74: current graph is no longer active, destroying
> rpc_client
> [2017-04-05 13:45:10.689394] I [MSGID: 114018] [client.c:2276:client_rpc_notify]
> 0-gfs-vol-client-73: disconnected from gfs-vol-client-73. Client process
> will keep trying to connect to glusterd until brick's port is available
> [2017-04-05 13:45:10.689390] I [MSGID: 122062] [ec.c:354:ec_down]
> 0-gfs-vol-disperse-7: Going DOWN
> [2017-04-05 13:45:10.689428] I [MSGID: 114021] [client.c:2361:notify]
> 0-gfs-vol-client-75: current graph is no longer active, destroying
> rpc_client
> [2017-04-05 13:45:10.689443] I [MSGID: 114018] [client.c:2276:client_rpc_notify]
> 0-gfs-vol-client-74: disconnected from gfs-vol-client-74. Client process
> will keep trying to connect to glusterd until brick's port is available
>
> On Fri, Apr 7, 2017 at 11:05 AM, Nithya Balachandran <nbalacha at redhat.com>
> wrote:
>
>>
>>
>> On 6 April 2017 at 14:56, Amudhan P <amudhan83 at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I was able to add bricks to the volume successfully.
>>> Client was reading, writing and listing data from mount point.
>>> But after adding bricks I had issues in folder listing (not listing all
>>> folders or returning empty folder list) and write was interrupted.
>>>
>>
>> This is strange.The issue with listing folders you referred to earlier
>> was because  of the rebalance but this seems new.
>>
>> How many bricks did you add and what is your volume config? What errors
>> did you see while writing or listing folders?
>>
>> remounting volume has solved the issue and now working fine.
>>>
>>> I was under the impression that running rebalance would cause folder
>>> listing issue but now adding brick itself created a problem.
>>> It's irrelevant whether client busy or idle need to remount to solve the
>>> issue.
>>>
>>> Also, i would like to know using brick in a volume without fix-layout
>>> cause folder listing slowness.
>>>
>>>
>>> Below a snippet of log from client when this happened. let me know if
>>> you any more additional info.
>>>
>>> Client and Servers are 3.10.1, volume mounted thru fuse.
>>>
>>> Machine busy downloading & uploading
>>>
>>> [2017-04-05 13:39:33.487176] I [MSGID: 114021] [client.c:2361:notify]
>>> 0-gfs-vol-client-1107: current graph is no longer active, destroying
>>> rpc_client
>>> [2017-04-05 13:39:33.487196] I [MSGID: 114021] [client.c:2361:notify]
>>> 0-gfs-vol-client-1108: current graph is no longer active, destroying
>>> rpc_client
>>> [2017-04-05 13:39:33.487201] I [MSGID: 114018]
>>> [client.c:2276:client_rpc_notify] 0-gfs-vol-client-1107: disconnected
>>> from gfs-vol-client-1107. Client process will keep trying to connect to
>>> glusterd until brick's port is available
>>> [2017-04-05 13:39:33.487212] I [MSGID: 114021] [client.c:2361:notify]
>>> 0-gfs-vol-client-1109: current graph is no longer active, destroying
>>> rpc_client
>>> [2017-04-05 13:39:33.487217] I [MSGID: 114018]
>>> [client.c:2276:client_rpc_notify] 0-gfs-vol-client-1108: disconnected
>>> from gfs-vol-client-1108. Client process will keep trying to connect to
>>> glusterd until brick's port is available
>>> [2017-04-05 13:39:33.487232] I [MSGID: 114018]
>>> [client.c:2276:client_rpc_notify] 0-gfs-vol-client-1109: disconnected
>>> from gfs-vol-client-1109. Client process will keep trying to connect to
>>> glusterd until brick's port is available
>>>
>>>
>>> Idle system
>>>
>>> 2017-04-05 13:40:07.692336] I [MSGID: 114035]
>>> [client-handshake.c:202:client_set_lk_version_cbk]
>>> 2-gfs-vol-client-1065: Server lk version = 1
>>> [2017-04-05 13:40:07.692383] I [MSGID: 114035]
>>> [client-handshake.c:202:client_set_lk_version_cbk]
>>> 2-gfs-vol-client-995: Server lk version = 1
>>> [2017-04-05 13:40:07.692430] I [MSGID: 114035]
>>> [client-handshake.c:202:client_set_lk_version_cbk]
>>> 2-gfs-vol-client-965: Server lk version = 1
>>> [2017-04-05 13:40:07.692485] I [MSGID: 114035]
>>> [client-handshake.c:202:client_set_lk_version_cbk]
>>> 2-gfs-vol-client-1075: Server lk version = 1
>>> [2017-04-05 13:40:07.692532] I [MSGID: 114035]
>>> [client-handshake.c:202:client_set_lk_version_cbk]
>>> 2-gfs-vol-client-1025: Server lk version = 1
>>> [2017-04-05 13:40:07.692569] I [MSGID: 114035]
>>> [client-handshake.c:202:client_set_lk_version_cbk]
>>> 2-gfs-vol-client-1055: Server lk version = 1
>>> [2017-04-05 13:40:07.692620] I [MSGID: 114035]
>>> [client-handshake.c:202:client_set_lk_version_cbk]
>>> 2-gfs-vol-client-955: Server lk version = 1
>>> [2017-04-05 13:40:07.692681] I [MSGID: 114035]
>>> [client-handshake.c:202:client_set_lk_version_cbk]
>>> 2-gfs-vol-client-1035: Server lk version = 1
>>> [2017-04-05 13:40:07.692870] I [MSGID: 114035]
>>> [client-handshake.c:202:client_set_lk_version_cbk]
>>> 2-gfs-vol-client-1045: Server lk version = 1
>>>
>>>
>>> Regards,
>>> Amudhan
>>>
>>> On Tue, Apr 4, 2017 at 4:31 PM, Amudhan P <amudhan83 at gmail.com> wrote:
>>>
>>>> I mean time takes for listing folders and files? because of "rebalance
>>>> fix layout" was not done.
>>>>
>>>>
>>>> On Tue, Apr 4, 2017 at 1:51 PM, Amudhan P <amudhan83 at gmail.com> wrote:
>>>>
>>>>> Ok, good to hear.
>>>>>
>>>>> will there be any impact in listing folder and files?.
>>>>>
>>>>>
>>>>> On Tue, Apr 4, 2017 at 1:43 PM, Nithya Balachandran <
>>>>> nbalacha at redhat.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On 4 April 2017 at 12:33, Amudhan P <amudhan83 at gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have a query on rebalancing.
>>>>>>>
>>>>>>> let's consider following is my folder hierarchy.
>>>>>>>
>>>>>>> parent1-fol (parent folder)
>>>>>>>               |_
>>>>>>>                  class-fol-1 ( 1 st level subfolder)
>>>>>>>                                |_
>>>>>>>                                   A ( 2 nd level subfolder)
>>>>>>>                                    |_
>>>>>>>                                       childfol-1 (child folder
>>>>>>> created every time before writing files)
>>>>>>>
>>>>>>>
>>>>>>> Now, I have a running cluster with 3.10.1 with disperse volume and I
>>>>>>> am planning to expand cluster by adding bricks.
>>>>>>>
>>>>>>> will there be a problem using newly added bricks without doing a
>>>>>>> "rebalance fix layout" other than existing files cannot be rebalanced to
>>>>>>> new brick and files created under existing folder will not go to new brick?.
>>>>>>>
>>>>>>> I tested above case in my test setup and observed files created
>>>>>>> under new folder goes to new brick. and I don't see any issue on listing
>>>>>>> files and folder.
>>>>>>>
>>>>>>> so, My case is we create child folder every time before creating
>>>>>>> files.
>>>>>>>
>>>>>>> The reason to avoid rebalance is I have more than 10000 folders
>>>>>>> across 1080 bricks. so triggering rebalance will take a long time and in my
>>>>>>> previous expansion in 3.7 was not able to access some folders randomly
>>>>>>> until fix layout completes.
>>>>>>>
>>>>>>>
>>>>>> It sounds like you will not need to run a rebalance or fix-layout for
>>>>>> this. It should work fine.
>>>>>>
>>>>>> Regards,
>>>>>> Nithya
>>>>>>
>>>>>>>
>>>>>>> regards
>>>>>>> Amudhan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170413/eeea967c/attachment.html>


More information about the Gluster-users mailing list