[Gluster-users] Fwd: Troubleshooting glusterfs

Thu Feb 15 08:03:04 UTC 2018

Hi Nikita,

Sorry for taking so long to get back to you. I will take a look at the logs
and get back.

Regards,
Nithya

On 7 February 2018 at 19:33, Nikita Yeryomin <nikyer at gmail.com> wrote:

> Hello Nithya! Thank you for your help on figuring this out!
> We changed our configuration and after having a successful test yesterday
> we have run into new issue today.
> The test including moderate read/write (~20-30 Mb/s) and scaling the
> storage was running about 3 hours and at some moment system got stuck:
> On the user level there are such errors when trying to work with
> filesystem:
>
> OSError: [Errno 2] No such file or directory: '/home/public/data/outputs/
> merged/c0a91c500be311e8846eb2f7a7fdd356-video_audio_merge-2/
> c0a91c500be311e8846eb2f7a7fdd356-vi
> deo_join-2.mp4'
>
> I've checked mnt log and seems there are issues with sharding:
>
> [2018-02-07 11:52:36.200554] E [MSGID: 133010] [shard.c:1724:shard_common_lookup_shards_cbk]
> 140-gv1-shard: Lookup on shard 1 failed. Base file gfid =
> b3a24312-c1fb-4fe0-b11c-0ca264233f62 [Stale file handle]
>
> So this time we started a distributed not-replicated volume with 4 20Gb
> bricks. Per your advice to add more storage at a time we were adding 2 more
> 20Gb bricks each time storage total free  space was getting lower than a
> threshold value (in this test it was 70Gb in the beginning and than was
> changed to 150Gb). I can say if was ~50-60% used all the time.
>
> When stopping the test the volume was looking like this:
>
> Volume Name: gv1
>
> Type: Distribute
>
> Volume ID: fcdae350-cda9-4da3-bb70-63558ab11f56
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 22
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: dev-gluster1.qencode.com:/var/storage/brick/gv1
>
> Brick2: dev-gluster2.qencode.com:/var/storage/brick/gv1
>
> Brick3: master-59e8248a0ac511e892e90671029ed6b8.qencode.com:/var/storage/
> brick2/gv1
>
> Brick4: master-59e8248a0ac511e892e90671029ed6b8.qencode.com:/var/storage/
> brick1/gv1
>
> Brick5: encoder-9fe7821c0b8011e8af7e0671029ed6b8.qencode.com:/var/storage/
> brick/gv1
>
> Brick6: encoder-2d3a6d6a0be411e8a9470671029ed6b8.qencode.com:/var/storage/
> brick/gv1
>
> Brick7: encoder-2d3b4f960be411e88c7f0671029ed6b8.qencode.com:/var/storage/
> brick/gv1
>
> Brick8: encoder-327b832c0be411e8b3a80671029ed6b8.qencode.com:/var/storage/
> brick/gv1
>
> Brick9: encoder-3272cd540be411e88f120671029ed6b8.qencode.com:/var/storage/
> brick/gv1
>
> Brick10: encoder-327890720be411e8ba570671029ed6b8.qencode.com:
> /var/storage/brick/gv1
>
> Brick11: encoder-327065d20be411e899620671029ed6b8.qencode.com:
> /var/storage/brick/gv1
>
> Brick12: encoder-327570540be411e898da0671029ed6b8.qencode.com:
> /var/storage/brick/gv1
>
> Brick13: encoder-327e2a640be411e89fd40671029ed6b8.qencode.com:
> /var/storage/brick/gv1
>
> Brick14: encoder-328336080be411e8bbe70671029ed6b8.qencode.com:
> /var/storage/brick/gv1
>
> Brick15: encoder-3286494c0be411e88edb0671029ed6b8.qencode.com:
> /var/storage/brick/gv1
>
> Brick16: encoder-45c894060be411e895e00671029ed6b8.qencode.com:
> /var/storage/brick/gv1
>
> Brick17: encoder-49565b6c0be411e8b47d0671029ed6b8.qencode.com:
> /var/storage/brick/gv1
>
> Brick18: encoder-4b26e1c80be411e889ce0671029ed6b8.qencode.com:
> /var/storage/brick/gv1
>
> Brick19: encoder-4b30f8200be411e8b9770671029ed6b8.qencode.com:
> /var/storage/brick/gv1
>
> Brick20: encoder-4b3b2f160be411e886ec0671029ed6b8.qencode.com:
> /var/storage/brick/gv1
>
> Brick21: encoder-4b40827c0be411e89edd0671029ed6b8.qencode.com:
> /var/storage/brick/gv1
>
> Brick22: encoder-4b956ec20be411e8ac900671029ed6b8.qencode.com:
> /var/storage/brick/gv1
>
> Options Reconfigured:
>
> nfs.disable: on
>
> transport.address-family: inet
>
> features.shard: on
>
> cluster.min-free-disk: 10%
>
> performance.cache-max-file-size: 1048576
>
> performance.client-io-threads: on
>
>
> The test started at ~8:50 AM server time.
> Attaching mnt and rebalance logs.
>
> Looking forward for your advice!
>
> Thanks,
> Nikita
>
> 2018-02-05 14:32 GMT+02:00 Nikita Yeryomin <nikyer at gmail.com>:
>
>> Hello Nithya!
>> Thank you so much, I think we are close to build a stable storage
>> solution according to your recommendations. Here's our rebalance log -
>> please don't pay attention to error messages after 9AM - this is when we
>> manually destroyed volume to recreate it for further testing. Also all
>> remove-brick operations you could see in the log were executed manually
>> when recreating volume.
>> We are now changing our code to follow your advise and will do more
>> testing.
>>
>> Thanks,
>> Nikita
>>
>> 2018-02-05 12:20 GMT+02:00 Nithya Balachandran <nbalacha at redhat.com>:
>>
>>>
>>>
>>> On 5 February 2018 at 15:40, Nithya Balachandran <nbalacha at redhat.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> I see a lot of the following messages in the logs:
>>>> [2018-02-04 03:22:01.544446] I [glusterfsd-mgmt.c:1821:mgmt_getspec_cbk]
>>>> 0-glusterfs: No change in volfile,continuing
>>>> [2018-02-04 07:41:16.189349] W [MSGID: 109011]
>>>> [dht-layout.c:186:dht_layout_search] 48-gv0-dht: no subvolume for hash
>>>> (value) = 122440868
>>>> [2018-02-04 07:41:16.244261] W [fuse-bridge.c:2398:fuse_writev_cbk]
>>>> 0-glusterfs-fuse: 3615890: WRITE => -1 gfid=c73ca10f-e83e-42a9-9b0a-1de4e12c6798
>>>> fd=0x7ffa3802a5f0 (Ошибка ввода/вывода)
>>>> [2018-02-04 07:41:16.254503] W [fuse-bridge.c:1377:fuse_err_cbk]
>>>> 0-glusterfs-fuse: 3615891: FLUSH() ERR => -1 (Ошибка ввода/вывода)
>>>> The message "W [MSGID: 109011] [dht-layout.c:186:dht_layout_search]
>>>> 48-gv0-dht: no subvolume for hash (value) = 122440868" repeated 81 times
>>>> between [2018-02-04 07:41:16.189349] and [2018-02-04 07:41:16.254480]
>>>> [2018-02-04 10:50:27.624283] W [MSGID: 109011]
>>>> [dht-layout.c:186:dht_layout_search] 48-gv0-dht: no subvolume for hash
>>>> (value) = 116958174
>>>> [2018-02-04 10:50:27.752107] W [fuse-bridge.c:2398:fuse_writev_cbk]
>>>> 0-glusterfs-fuse: 3997764: WRITE => -1 gfid=18e2adee-ff52-414f-aa37-506cff1472ee
>>>> fd=0x7ffa3801d7d0 (Ошибка ввода/вывода)
>>>> [2018-02-04 10:50:27.762331] W [fuse-bridge.c:1377:fuse_err_cbk]
>>>> 0-glusterfs-fuse: 3997765: FLUSH() ERR => -1 (Ошибка ввода/вывода)
>>>> The message "W [MSGID: 109011] [dht-layout.c:186:dht_layout_search]
>>>> 48-gv0-dht: no subvolume for hash (value) = 116958174" repeated 147 times
>>>> between [2018-02-04 10:50:27.624283] and [2018-02-04 10:50:27.762292]
>>>> [2018-02-04 10:55:35.256018] W [MSGID: 109011]
>>>> [dht-layout.c:186:dht_layout_search] 48-gv0-dht: no subvolume for hash
>>>> (value) = 28918667
>>>> [2018-02-04 10:55:35.387073] W [fuse-bridge.c:2398:fuse_writev_cbk]
>>>> 0-glusterfs-fuse: 4006263: WRITE => -1 gfid=54e6f8ea-27d7-4e92-ae64-5e198bd3cb42
>>>> fd=0x7ffa38036bf0 (Ошибка ввода/вывода)
>>>> [2018-02-04 10:55:35.407554] W [fuse-bridge.c:1377:fuse_err_cbk]
>>>> 0-glusterfs-fuse: 4006264: FLUSH() ERR => -1 (Ошибка ввода/вывода)
>>>> [2018-02-04 10:55:59.677734] W [MSGID: 109011]
>>>> [dht-layout.c:186:dht_layout_search] 48-gv0-dht: no subvolume for hash
>>>> (value) = 69319528
>>>> [2018-02-04 10:55:59.827012] W [fuse-bridge.c:2398:fuse_writev_cbk]
>>>> 0-glusterfs-fuse: 4014645: WRITE => -1 gfid=ce700d9b-ef55-4e55-a371-9642e90555cb
>>>> fd=0x7ffa38036bf0 (Ошибка ввода/вывода)
>>>>
>>>>
>>>>
>>>> This is the reason for the I/O errors you are seeing. Gluster cannot
>>>> find the subvolume for the file in question so it will fail the write with
>>>> I/O error. It looks like some bricks may not have been up at the time the
>>>> volume tried to get the layout.
>>>>
>>>> This is a problem as this is a pure distributed volume. For some reason
>>>> the layout is not set on some bricks/some bricks are unreachable.
>>>>
>>>> There are a lot of graph changes in the logs - I would recommend
>>>> against so many changes in such a short interval. There aren't logs for the
>>>> interval before to find out why. Can you send me the rebalance logs from
>>>> the nodes?
>>>>
>>>
>>> To clarify, I see multiple graph changes in a few minutes. I would
>>> recommend adding/removing multiple bricks at a time when
>>> expanding/shrinking the volume instead of one at a time.
>>>
>>>>
>>>>
>>>> >I case we have too much capacity that's not needed at the moment we
>>>> are going to remove-brick and fix-layout again in order to shrink >storage.
>>>>
>>>>
>>>> I do see the number of bricks reducing in the graphs.Are you sure a
>>>> remove-brick has not been run?  There is no need to run a fix-layout after
>>>> using "remove-brick start" as that will automatically rebalance data.
>>>>
>>>>
>>>>
>>>> Regards,
>>>> Nithya
>>>>
>>>> On 5 February 2018 at 14:06, Nikita Yeryomin <nikyer at gmail.com> wrote:
>>>>
>>>>> Attached the log. There are some errors in it like
>>>>>
>>>>> [2018-02-04 18:50:41.112962] E [fuse-bridge.c:903:fuse_getattr_resume]
>>>>> 0-glusterfs-fuse: 9613852: GETATTR 140712792330896
>>>>> (7d39d329-c0e0-4997-85e6-0e66e0436315) resolution failed
>>>>>
>>>>> But when it occurs it seems not affecting current file i/o operations.
>>>>> I've already re-created the volume yesterday and I was not able to
>>>>> reproduce the error during file download after that, but still there are
>>>>> errors in logs like above and system seems a bit unstable.
>>>>> Let me share some more details on how we are trying to use glusterfs.
>>>>> So it's distributed NOT replicated volume with sharding enabled.
>>>>> We have many small servers (20GB each) in a cloud and a need to work
>>>>> with rather large files (~300GB).
>>>>> We start volume with one 15GB brick which is a separate XFS partition
>>>>> on each server and then add bricks one by one to reach needed capacity.
>>>>> After each brick is added we do rebalance fix-layout.
>>>>> I case we have too much capacity that's not needed at the moment we
>>>>> are going to remove-brick and fix-layout again in order to shrink storage.
>>>>> But we have not yet been able to test removing bricks as system behaves not
>>>>> stable after scaling out.
>>>>>
>>>>> What I've found here https://bugzilla.redhat.c
>>>>> om/show_bug.cgi?id=875076 - seems starting with one brick is not a
>>>>> good idea.. so we are going to try starting with 2 bricks.
>>>>> Please let me know if there are anything else we should consider
>>>>> changing in our strategy.
>>>>>
>>>>> Many thanks in advance!
>>>>> Nikita Yeryomin
>>>>>
>>>>> 2018-02-05 7:53 GMT+02:00 Nithya Balachandran <nbalacha at redhat.com>:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Please provide the log for the mount process from the node on which
>>>>>> you have mounted the volume. This should be in /var/log/glusterfs and the
>>>>>> name of the file will the the hyphenated path of the mount point. For e.g.,
>>>>>> If the volume in mounted at /mnt/glustervol, the log file will be
>>>>>> /var/log/glusterfs/mnt-glusterfs.log
>>>>>>
>>>>>> Regards,
>>>>>> Nithya
>>>>>>
>>>>>> On 4 February 2018 at 21:09, Nikita Yeryomin <nikyer at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Please help troubleshooting glusterfs with the following setup:
>>>>>>> Distributed volume without replication. Sharding enabled.
>>>>>>>
>>>>>>> # cat /etc/centos-release
>>>>>>>
>>>>>>> CentOS release 6.9 (Final)
>>>>>>>
>>>>>>> # glusterfs --version
>>>>>>>
>>>>>>> glusterfs 3.12.3
>>>>>>>
>>>>>>> [root at master-5f81bad0054a11e8bf7d0671029ed6b8 uploads]# gluster
>>>>>>> volume info
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Volume Name: gv0
>>>>>>>
>>>>>>> Type: Distribute
>>>>>>>
>>>>>>> Volume ID: 1a7e05f6-4aa8-48d3-b8e3-300637031925
>>>>>>>
>>>>>>> Status: Started
>>>>>>>
>>>>>>> Snapshot Count: 0
>>>>>>>
>>>>>>> Number of Bricks: 27
>>>>>>>
>>>>>>> Transport-type: tcp
>>>>>>>
>>>>>>> Bricks:
>>>>>>>
>>>>>>> Brick1: gluster3.qencode.com:/var/storage/brick/gv0
>>>>>>>
>>>>>>> Brick2: encoder-376cac0405f311e884700671029ed6b8.qencode.com:/var/st
>>>>>>> orage/brick/gv0
>>>>>>>
>>>>>>> Brick3: encoder-ee6761c0091c11e891ba0671029ed6b8.qencode.com:/var/st
>>>>>>> orage/brick/gv0
>>>>>>>
>>>>>>> Brick4: encoder-ee68b8ea091c11e89c2d0671029ed6b8.qencode.com:/var/st
>>>>>>> orage/brick/gv0
>>>>>>>
>>>>>>> Brick5: encoder-ee663700091c11e8b48f0671029ed6b8.qencode.com:/var/st
>>>>>>> orage/brick/gv0
>>>>>>>
>>>>>>> Brick6: encoder-efcf113e091c11e899520671029ed6b8.qencode.com:/var/st
>>>>>>> orage/brick/gv0
>>>>>>>
>>>>>>> Brick7: encoder-efcd5a24091c11e8963a0671029ed6b8.qencode.com:/var/st
>>>>>>> orage/brick/gv0
>>>>>>>
>>>>>>> Brick8: encoder-099f557e091d11e882f70671029ed6b8.qencode.com:/var/st
>>>>>>> orage/brick/gv0
>>>>>>>
>>>>>>> Brick9: encoder-099bdda4091d11e881090671029ed6b8.qencode.com:/var/st
>>>>>>> orage/brick/gv0
>>>>>>>
>>>>>>> Brick10: encoder-099dca56091d11e8b3410671029ed6b8.qencode.com:
>>>>>>> /var/storage/brick/gv0
>>>>>>>
>>>>>>> Brick11: encoder-09a1ba4e091d11e8a3c20671029ed6b8.qencode.com:
>>>>>>> /var/storage/brick/gv0
>>>>>>>
>>>>>>> Brick12: encoder-099a826a091d11e895940671029ed6b8.qencode.com:
>>>>>>> /var/storage/brick/gv0
>>>>>>>
>>>>>>> Brick13: encoder-0998aa8a091d11e8a8160671029ed6b8.qencode.com:
>>>>>>> /var/storage/brick/gv0
>>>>>>>
>>>>>>> Brick14: encoder-0b582724091d11e8b3b40671029ed6b8.qencode.com:
>>>>>>> /var/storage/brick/gv0
>>>>>>>
>>>>>>> Brick15: encoder-0dff527c091d11e896f20671029ed6b8.qencode.com:
>>>>>>> /var/storage/brick/gv0
>>>>>>>
>>>>>>> Brick16: encoder-0e0d5c14091d11e886cf0671029ed6b8.qencode.com:
>>>>>>> /var/storage/brick/gv0
>>>>>>>
>>>>>>> Brick17: encoder-7f1bf3d4093b11e8a3580671029ed6b8.qencode.com:
>>>>>>> /var/storage/brick/gv0
>>>>>>>
>>>>>>> Brick18: encoder-7f70378c093b11e885260671029ed6b8.qencode.com:
>>>>>>> /var/storage/brick/gv0
>>>>>>>
>>>>>>> Brick19: encoder-7f19528c093b11e88f100671029ed6b8.qencode.com:
>>>>>>> /var/storage/brick/gv0
>>>>>>>
>>>>>>> Brick20: encoder-7f76c048093b11e8a7470671029ed6b8.qencode.com:
>>>>>>> /var/storage/brick/gv0
>>>>>>>
>>>>>>> Brick21: encoder-7f7fc90e093b11e8a74e0671029ed6b8.qencode.com:
>>>>>>> /var/storage/brick/gv0
>>>>>>>
>>>>>>> Brick22: encoder-7f6bc382093b11e8b8a30671029ed6b8.qencode.com:
>>>>>>> /var/storage/brick/gv0
>>>>>>>
>>>>>>> Brick23: encoder-7f7b44d8093b11e8906f0671029ed6b8.qencode.com:
>>>>>>> /var/storage/brick/gv0
>>>>>>>
>>>>>>> Brick24: encoder-7f72aa30093b11e89a8e0671029ed6b8.qencode.com:
>>>>>>> /var/storage/brick/gv0
>>>>>>>
>>>>>>> Brick25: encoder-7f7d735c093b11e8b4650671029ed6b8.qencode.com:
>>>>>>> /var/storage/brick/gv0
>>>>>>>
>>>>>>> Brick26: encoder-7f1a5006093b11e89bcb0671029ed6b8.qencode.com:
>>>>>>> /var/storage/brick/gv0
>>>>>>>
>>>>>>> Brick27: encoder-95791076093b11e8af170671029ed6b8.qencode.com:
>>>>>>> /var/storage/brick/gv0
>>>>>>>
>>>>>>> Options Reconfigured:
>>>>>>>
>>>>>>> cluster.min-free-disk: 10%
>>>>>>>
>>>>>>> performance.cache-max-file-size: 1048576
>>>>>>>
>>>>>>> nfs.disable: on
>>>>>>>
>>>>>>> transport.address-family: inet
>>>>>>>
>>>>>>> features.shard: on
>>>>>>>
>>>>>>> performance.client-io-threads: on
>>>>>>>
>>>>>>> Each brick is 15Gb size.
>>>>>>>
>>>>>>> After using volume for several hours with intensive read/write
>>>>>>> operations (~300GB written and then deleted) an attempt to write to volume
>>>>>>> results in an Input/Output error:
>>>>>>>
>>>>>>> # wget https://speed.hetzner.de/1GB.bin
>>>>>>>
>>>>>>> --2018-02-04 12:02:34--  https://speed.hetzner.de/1GB.bin
>>>>>>>
>>>>>>> Resolving speed.hetzner.de... 88.198.248.254, 2a01:4f8:0:59ed::2
>>>>>>>
>>>>>>> Connecting to speed.hetzner.de|88.198.248.254|:443... connected.
>>>>>>>
>>>>>>> HTTP request sent, awaiting response... 200 OK
>>>>>>>
>>>>>>> Length: 1048576000 (1000M) [application/octet-stream]
>>>>>>>
>>>>>>> Saving to: `1GB.bin'
>>>>>>>
>>>>>>>
>>>>>>> 38% [=============================================================>
>>>>>>>
>>>>>>>                       ] 403,619,518 27.8M/s   in 15s
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Cannot write to `1GB.bin' (Input/output error).
>>>>>>>
>>>>>>> I don't see anything written to glusterd.log, or any other logs in
>>>>>>> /var/log/glusterfs/* when this error occurs.
>>>>>>>
>>>>>>> Deleting partially downloaded file works without error.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Nikita Yeryomin
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180215/dc145960/attachment.html>