[Gluster-users] rebalance fix layout necessary

Wed Apr 19 16:16:59 UTC 2017

On Wed, 19 Apr 2017 at 4:58 PM, Amudhan P <amudhan83 at gmail.com> wrote:

> Hi,
>
> Does rebalance fix-layout triggers automatically by any chance?
>
> because my cluster currently showing rebalance in progress and running
> command "rebalance status" shows " fix-layout in progress " in nodes added
> recently to cluster and "fix-layout completed" in old nodes.
>
> checking rebalance log in new nodes states it was started on 12th April.
>
> strange what would have triggered rebalance process?
>

Have you done remove-brick?

>
> regards
> Amudhan
>
> On Thu, Apr 13, 2017 at 12:51 PM, Amudhan P <amudhan83 at gmail.com> wrote:
>
>> I have another issue now after expanding cluster folder listing time is
>> increased to 400%.
>>
>> I have also tried to enable readdir-ahead & parallel-readdir but was not
>> showing any improvement in folder listing but started with an issue in
>> listing folders like random folders disappeared from listing and data read
>> shows IO error.
>>
>> Tried disabling Cluster.readdir-optimze and remount fuse client but issue
>> continued. so, disabled readdir-ahead & parallel-readdir  and enabled
>> Cluster.readdir-optimze everything works fine.
>>
>> How do I bring down folder listing time?
>>
>>
>> Below is my config in Volume :
>> Options Reconfigured:
>> nfs.disable: yes
>> cluster.disperse-self-heal-daemon: enable
>> cluster.weighted-rebalance: off
>> cluster.rebal-throttle: aggressive
>> performance.readdir-ahead: off
>> cluster.min-free-disk: 10%
>> features.default-soft-limit: 80%
>> performance.force-readdirp: no
>> dht.force-readdirp: off
>> cluster.readdir-optimize: on
>> cluster.heal-timeout: 43200
>> cluster.data-self-heal: on
>>
>> On Fri, Apr 7, 2017 at 7:35 PM, Amudhan P <amudhan83 at gmail.com> wrote:
>>
>>> Volume type:
>>> Disperse Volume  8+2  = 1080 bricks
>>>
>>> First time added 8+2 * 3 sets and it started giving issue in listing
>>> folder. so, remounted mount point and it was working fine.
>>>
>>> Second added 8+2 *13 sets and it also had the same issue.
>>>
>>> when listing folder it was returning an empty folder or not showing all
>>> the folders.
>>>
>>> when ongoing write was interrupted it throws an error destination not
>>> folder not available.
>>>
>>> adding few more lines from log.. let me know if you need full log file.
>>>
>>> [2017-04-05 13:40:03.702624] I [glusterfsd-mgmt.c:52:mgmt_cbk_spec]
>>> 0-mgmt: Volume file changed
>>> [2017-04-05 13:40:04.970055] I [MSGID: 122067]
>>> [ec-code.c:1046:ec_code_detect] 2-gfs-vol-disperse-123: Using 'sse' CPU
>>> extensions
>>> [2017-04-05 13:40:04.971194] I [MSGID: 122067]
>>> [ec-code.c:1046:ec_code_detect] 2-gfs-vol-disperse-122: Using 'sse' CPU
>>> extensions
>>> [2017-04-05 13:40:04.972144] I [MSGID: 122067]
>>> [ec-code.c:1046:ec_code_detect] 2-gfs-vol-disperse-121: Using 'sse' CPU
>>> extensions
>>> [2017-04-05 13:40:04.973131] I [MSGID: 122067]
>>> [ec-code.c:1046:ec_code_detect] 2-gfs-vol-disperse-120: Using 'sse' CPU
>>> extensions
>>> [2017-04-05 13:40:04.974072] I [MSGID: 122067]
>>> [ec-code.c:1046:ec_code_detect] 2-gfs-vol-disperse-119: Using 'sse' CPU
>>> extensions
>>> [2017-04-05 13:40:04.975005] I [MSGID: 122067]
>>> [ec-code.c:1046:ec_code_detect] 2-gfs-vol-disperse-118: Using 'sse' CPU
>>> extensions
>>> [2017-04-05 13:40:04.975936] I [MSGID: 122067]
>>> [ec-code.c:1046:ec_code_detect] 2-gfs-vol-disperse-117: Using 'sse' CPU
>>> extensions
>>> [2017-04-05 13:40:04.976905] I [MSGID: 122067]
>>> [ec-code.c:1046:ec_code_detect] 2-gfs-vol-disperse-116: Using 'sse' CPU
>>> extensions
>>> [2017-04-05 13:40:04.977825] I [MSGID: 122067]
>>> [ec-code.c:1046:ec_code_detect] 2-gfs-vol-disperse-115: Using 'sse' CPU
>>> extensions
>>> [2017-04-05 13:40:04.978755] I [MSGID: 122067]
>>> [ec-code.c:1046:ec_code_detect] 2-gfs-vol-disperse-114: Using 'sse' CPU
>>> extensions
>>> [2017-04-05 13:40:04.979689] I [MSGID: 122067]
>>> [ec-code.c:1046:ec_code_detect] 2-gfs-vol-disperse-113: Using 'sse' CPU
>>> extensions
>>> [2017-04-05 13:40:04.980626] I [MSGID: 122067]
>>> [ec-code.c:1046:ec_code_detect] 2-gfs-vol-disperse-112: Using 'sse' CPU
>>> extensions
>>> [2017-04-05 13:40:07.270412] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-736: changing port to 49153 (from 0)
>>> [2017-04-05 13:40:07.271902] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-746: changing port to 49154 (from 0)
>>> [2017-04-05 13:40:07.272076] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-756: changing port to 49155 (from 0)
>>> [2017-04-05 13:40:07.273154] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-766: changing port to 49156 (from 0)
>>> [2017-04-05 13:40:07.273193] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-776: changing port to 49157 (from 0)
>>> [2017-04-05 13:40:07.273371] I [MSGID: 114046]
>>> [client-handshake.c:1216:client_setvolume_cbk] 2-gfs-vol-client-579:
>>> Connected to gfs-vol-client-579, attached to remote volume
>>> '/media/disk22/brick22'.
>>> [2017-04-05 13:40:07.273388] I [MSGID: 114047]
>>> [client-handshake.c:1227:client_setvolume_cbk] 2-gfs-vol-client-579: Server
>>> and Client lk-version numbers are not same, reopening the fds
>>> [2017-04-05 13:40:07.273435] I [MSGID: 114035]
>>> [client-handshake.c:202:client_set_lk_version_cbk] 2-gfs-vol-client-433:
>>> Server lk version = 1
>>> [2017-04-05 13:40:07.275632] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-786: changing port to 49158 (from 0)
>>> [2017-04-05 13:40:07.275685] I [MSGID: 114046]
>>> [client-handshake.c:1216:client_setvolume_cbk] 2-gfs-vol-client-589:
>>> Connected to gfs-vol-client-589, attached to remote volume
>>> '/media/disk23/brick23'.
>>> [2017-04-05 13:40:07.275707] I [MSGID: 114047]
>>> [client-handshake.c:1227:client_setvolume_cbk] 2-gfs-vol-client-589: Server
>>> and Client lk-version numbers are not same, reopening the fds
>>> [2017-04-05 13:40:07.087011] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-811: changing port to 49161 (from 0)
>>> [2017-04-05 13:40:07.087031] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-420: changing port to 49158 (from 0)
>>> [2017-04-05 13:40:07.087045] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-521: changing port to 49168 (from 0)
>>> [2017-04-05 13:40:07.087060] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-430: changing port to 49159 (from 0)
>>> [2017-04-05 13:40:07.087074] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-531: changing port to 49169 (from 0)
>>> [2017-04-05 13:40:07.087098] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-440: changing port to 49160 (from 0)
>>> [2017-04-05 13:40:07.087105] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-821: changing port to 49162 (from 0)
>>> [2017-04-05 13:40:07.087117] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-450: changing port to 49161 (from 0)
>>> [2017-04-05 13:40:07.087131] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-831: changing port to 49163 (from 0)
>>> [2017-04-05 13:40:07.087134] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-460: changing port to 49162 (from 0)
>>> [2017-04-05 13:40:07.087157] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-841: changing port to 49164 (from 0)
>>> [2017-04-05 13:40:07.087181] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-541: changing port to 49170 (from 0)
>>> [2017-04-05 13:40:07.087185] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-470: changing port to 49163 (from 0)
>>> [2017-04-05 13:40:07.087202] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-851: changing port to 49165 (from 0)
>>> [2017-04-05 13:40:07.087241] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-480: changing port to 49164 (from 0)
>>> [2017-04-05 13:40:07.087240] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-551: changing port to 49171 (from 0)
>>> [2017-04-05 13:40:07.087263] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-861: changing port to 49166 (from 0)
>>> [2017-04-05 13:40:07.087281] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-571: changing port to 49173 (from 0)
>>> [2017-04-05 13:40:07.087284] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-561: changing port to 49172 (from 0)
>>> [2017-04-05 13:40:07.087318] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-581: changing port to 49174 (from 0)
>>> [2017-04-05 13:40:07.087318] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-490: changing port to 49165 (from 0)
>>> [2017-04-05 13:40:07.087344] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-500: changing port to 49166 (from 0)
>>> [2017-04-05 13:40:07.087352] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-871: changing port to 49167 (from 0)
>>> [2017-04-05 13:40:07.087372] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
>>> 2-gfs-vol-client-591: changing port to 49175 (from 0)
>>>
>>> [2017-04-05 13:40:07.681293] I [MSGID: 114046]
>>> [client-handshake.c:1216:client_setvolume_cbk] 2-gfs-vol-client-755:
>>> Connected to gfs-vol-client-755, attached to remote volume
>>> '/media/disk4/brick4'.
>>> [2017-04-05 13:40:07.681312] I [MSGID: 114047]
>>> [client-handshake.c:1227:client_setvolume_cbk] 2-gfs-vol-client-755: Server
>>> and Client lk-version numbers are not same, reopening the fds
>>> [2017-04-05 13:40:07.681317] I [MSGID: 122061] [ec.c:340:ec_up]
>>> 2-gfs-vol-disperse-74: Going UP
>>> [2017-04-05 13:40:07.681428] I [MSGID: 122061] [ec.c:340:ec_up]
>>> 2-gfs-vol-disperse-75: Going UP
>>> [2017-04-05 13:40:07.681454] I [MSGID: 114046]
>>> [client-handshake.c:1216:client_setvolume_cbk] 2-gfs-vol-client-1049:
>>> Connected to gfs-vol-client-1049, attached to remote volume
>>> '/media/disk33/brick33'.
>>> [2017-04-05 13:45:10.689344] I [MSGID: 114018]
>>> [client.c:2276:client_rpc_notify] 0-gfs-vol-client-71: disconnected from
>>> gfs-vol-client-71. Client process will keep trying to connect to glusterd
>>> until brick's port is available
>>> [2017-04-05 13:45:10.689376] I [MSGID: 114021] [client.c:2361:notify]
>>> 0-gfs-vol-client-73: current graph is no longer active, destroying
>>> rpc_client
>>> [2017-04-05 13:45:10.689380] I [MSGID: 114018]
>>> [client.c:2276:client_rpc_notify] 0-gfs-vol-client-72: disconnected from
>>> gfs-vol-client-72. Client process will keep trying to connect to glusterd
>>> until brick's port is available
>>> [2017-04-05 13:45:10.689389] I [MSGID: 114021] [client.c:2361:notify]
>>> 0-gfs-vol-client-74: current graph is no longer active, destroying
>>> rpc_client
>>> [2017-04-05 13:45:10.689394] I [MSGID: 114018]
>>> [client.c:2276:client_rpc_notify] 0-gfs-vol-client-73: disconnected from
>>> gfs-vol-client-73. Client process will keep trying to connect to glusterd
>>> until brick's port is available
>>> [2017-04-05 13:45:10.689390] I [MSGID: 122062] [ec.c:354:ec_down]
>>> 0-gfs-vol-disperse-7: Going DOWN
>>> [2017-04-05 13:45:10.689428] I [MSGID: 114021] [client.c:2361:notify]
>>> 0-gfs-vol-client-75: current graph is no longer active, destroying
>>> rpc_client
>>> [2017-04-05 13:45:10.689443] I [MSGID: 114018]
>>> [client.c:2276:client_rpc_notify] 0-gfs-vol-client-74: disconnected from
>>> gfs-vol-client-74. Client process will keep trying to connect to glusterd
>>> until brick's port is available
>>>
>>> On Fri, Apr 7, 2017 at 11:05 AM, Nithya Balachandran <
>>> nbalacha at redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On 6 April 2017 at 14:56, Amudhan P <amudhan83 at gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I was able to add bricks to the volume successfully.
>>>>> Client was reading, writing and listing data from mount point.
>>>>> But after adding bricks I had issues in folder listing (not listing
>>>>> all folders or returning empty folder list) and write was interrupted.
>>>>>
>>>>
>>>> This is strange.The issue with listing folders you referred to earlier
>>>> was because  of the rebalance but this seems new.
>>>>
>>>> How many bricks did you add and what is your volume config? What errors
>>>> did you see while writing or listing folders?
>>>>
>>>> remounting volume has solved the issue and now working fine.
>>>>>
>>>>> I was under the impression that running rebalance would cause folder
>>>>> listing issue but now adding brick itself created a problem.
>>>>> It's irrelevant whether client busy or idle need to remount to solve
>>>>> the issue.
>>>>>
>>>>> Also, i would like to know using brick in a volume without fix-layout
>>>>> cause folder listing slowness.
>>>>>
>>>>>
>>>>> Below a snippet of log from client when this happened. let me know if
>>>>> you any more additional info.
>>>>>
>>>>> Client and Servers are 3.10.1, volume mounted thru fuse.
>>>>>
>>>>> Machine busy downloading & uploading
>>>>>
>>>>> [2017-04-05 13:39:33.487176] I [MSGID: 114021] [client.c:2361:notify]
>>>>> 0-gfs-vol-client-1107: current graph is no longer active, destroying
>>>>> rpc_client
>>>>> [2017-04-05 13:39:33.487196] I [MSGID: 114021] [client.c:2361:notify]
>>>>> 0-gfs-vol-client-1108: current graph is no longer active, destroying
>>>>> rpc_client
>>>>> [2017-04-05 13:39:33.487201] I [MSGID: 114018]
>>>>> [client.c:2276:client_rpc_notify] 0-gfs-vol-client-1107: disconnected from
>>>>> gfs-vol-client-1107. Client process will keep trying to connect to glusterd
>>>>> until brick's port is available
>>>>> [2017-04-05 13:39:33.487212] I [MSGID: 114021] [client.c:2361:notify]
>>>>> 0-gfs-vol-client-1109: current graph is no longer active, destroying
>>>>> rpc_client
>>>>> [2017-04-05 13:39:33.487217] I [MSGID: 114018]
>>>>> [client.c:2276:client_rpc_notify] 0-gfs-vol-client-1108: disconnected from
>>>>> gfs-vol-client-1108. Client process will keep trying to connect to glusterd
>>>>> until brick's port is available
>>>>> [2017-04-05 13:39:33.487232] I [MSGID: 114018]
>>>>> [client.c:2276:client_rpc_notify] 0-gfs-vol-client-1109: disconnected from
>>>>> gfs-vol-client-1109. Client process will keep trying to connect to glusterd
>>>>> until brick's port is available
>>>>>
>>>>>
>>>>> Idle system
>>>>>
>>>>> 2017-04-05 13:40:07.692336] I [MSGID: 114035]
>>>>> [client-handshake.c:202:client_set_lk_version_cbk] 2-gfs-vol-client-1065:
>>>>> Server lk version = 1
>>>>> [2017-04-05 13:40:07.692383] I [MSGID: 114035]
>>>>> [client-handshake.c:202:client_set_lk_version_cbk] 2-gfs-vol-client-995:
>>>>> Server lk version = 1
>>>>> [2017-04-05 13:40:07.692430] I [MSGID: 114035]
>>>>> [client-handshake.c:202:client_set_lk_version_cbk] 2-gfs-vol-client-965:
>>>>> Server lk version = 1
>>>>> [2017-04-05 13:40:07.692485] I [MSGID: 114035]
>>>>> [client-handshake.c:202:client_set_lk_version_cbk] 2-gfs-vol-client-1075:
>>>>> Server lk version = 1
>>>>> [2017-04-05 13:40:07.692532] I [MSGID: 114035]
>>>>> [client-handshake.c:202:client_set_lk_version_cbk] 2-gfs-vol-client-1025:
>>>>> Server lk version = 1
>>>>> [2017-04-05 13:40:07.692569] I [MSGID: 114035]
>>>>> [client-handshake.c:202:client_set_lk_version_cbk] 2-gfs-vol-client-1055:
>>>>> Server lk version = 1
>>>>> [2017-04-05 13:40:07.692620] I [MSGID: 114035]
>>>>> [client-handshake.c:202:client_set_lk_version_cbk] 2-gfs-vol-client-955:
>>>>> Server lk version = 1
>>>>> [2017-04-05 13:40:07.692681] I [MSGID: 114035]
>>>>> [client-handshake.c:202:client_set_lk_version_cbk] 2-gfs-vol-client-1035:
>>>>> Server lk version = 1
>>>>> [2017-04-05 13:40:07.692870] I [MSGID: 114035]
>>>>> [client-handshake.c:202:client_set_lk_version_cbk] 2-gfs-vol-client-1045:
>>>>> Server lk version = 1
>>>>>
>>>>>
>>>>> Regards,
>>>>> Amudhan
>>>>>
>>>>> On Tue, Apr 4, 2017 at 4:31 PM, Amudhan P <amudhan83 at gmail.com> wrote:
>>>>>
>>>>>> I mean time takes for listing folders and files? because of
>>>>>> "rebalance fix layout" was not done.
>>>>>>
>>>>>>
>>>>>> On Tue, Apr 4, 2017 at 1:51 PM, Amudhan P <amudhan83 at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Ok, good to hear.
>>>>>>>
>>>>>>> will there be any impact in listing folder and files?.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Apr 4, 2017 at 1:43 PM, Nithya Balachandran <
>>>>>>> nbalacha at redhat.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 4 April 2017 at 12:33, Amudhan P <amudhan83 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have a query on rebalancing.
>>>>>>>>>
>>>>>>>>> let's consider following is my folder hierarchy.
>>>>>>>>>
>>>>>>>>> parent1-fol (parent folder)
>>>>>>>>>               |_
>>>>>>>>>                  class-fol-1 ( 1 st level subfolder)
>>>>>>>>>                                |_
>>>>>>>>>                                   A ( 2 nd level subfolder)
>>>>>>>>>                                    |_
>>>>>>>>>                                       childfol-1 (child folder
>>>>>>>>> created every time before writing files)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Now, I have a running cluster with 3.10.1 with disperse volume and
>>>>>>>>> I am planning to expand cluster by adding bricks.
>>>>>>>>>
>>>>>>>>> will there be a problem using newly added bricks without doing a
>>>>>>>>> "rebalance fix layout" other than existing files cannot be rebalanced to
>>>>>>>>> new brick and files created under existing folder will not go to new brick?.
>>>>>>>>>
>>>>>>>>> I tested above case in my test setup and observed files created
>>>>>>>>> under new folder goes to new brick. and I don't see any issue on listing
>>>>>>>>> files and folder.
>>>>>>>>>
>>>>>>>>> so, My case is we create child folder every time before creating
>>>>>>>>> files.
>>>>>>>>>
>>>>>>>>> The reason to avoid rebalance is I have more than 10000 folders
>>>>>>>>> across 1080 bricks. so triggering rebalance will take a long time and in my
>>>>>>>>> previous expansion in 3.7 was not able to access some folders randomly
>>>>>>>>> until fix layout completes.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> It sounds like you will not need to run a rebalance or fix-layout
>>>>>>>> for this. It should work fine.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Nithya
>>>>>>>>
>>>>>>>>>
>>>>>>>>> regards
>>>>>>>>> Amudhan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Gluster-users mailing list
>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

-- 
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170419/5ee08ad6/attachment.html>