[Gluster-users] Self-heal Problems with gluster and nfs

Tue Jul 8 11:02:23 UTC 2014

On 07/08/2014 04:23 PM, Norman Mähler wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Of course:
>
> The configuration is:
>
> Volume Name: gluster_dateisystem
> Type: Replicate
> Volume ID: 2766695c-b8aa-46fd-b84d-4793b7ce847a
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: filecluster1:/mnt/raid
> Brick2: filecluster2:/mnt/raid
> Options Reconfigured:
> nfs.enable-ino32: on
> performance.cache-size: 512MB
> diagnostics.brick-log-level: WARNING
> diagnostics.client-log-level: WARNING
> nfs.addr-namelookup: off
> performance.cache-refresh-timeout: 60
> performance.cache-max-file-size: 100MB
> performance.write-behind-window-size: 10MB
> performance.io-thread-count: 18
> performance.stat-prefetch: off
>
>
> The file count in xattrop is
Do "gluster volume set gluster_dateisystem cluster.self-heal-daemon off"
This should stop all the entry self-heals and should also get the CPU 
usage low. When you don't have a lot of activity you can enable it again 
using "gluster volume set gluster_dateisystem cluster.self-heal-daemon on"
If it doesn't get the CPU down execute "gluster volume set 
gluster_dateisystem cluster.entry-self-heal off". Let me know how it goes.

Pranith
>
> Brick 1: 2706
> Brick 2: 2687
>
> Norman
>
> Am 08.07.2014 12:28, schrieb Pranith Kumar Karampuri:
>> It seems like entry self-heal is happening. What is the volume
>> configuration? Could you give ls
>> <brick-path>/.glusterfs/indices/xattrop | wc -l Count for all the
>> bricks
>>
>> Pranith On 07/08/2014 03:36 PM, Norman Mähler wrote:
>>> Hello Pranith,
>>>
>>> here are the logs. I only giv you the last 3000 lines, because
>>> the nfs.log from today is already 550 MB.
>>>
>>> There are the standard files from a user home on the gluster
>>> system. All you normally find in a user home. Config files,
>>> firefox and thunderbird files etc.
>>>
>>> Thanks in advance Norman
>>>
>>> Am 08.07.2014 11:46, schrieb Pranith Kumar Karampuri:
>>>> On 07/08/2014 02:46 PM, Norman Mähler wrote: Hello again,
>>>>
>>>> i could resolve the self heal problems with the missing gfid
>>>> files on one of the servers by deleting the gfid files on the
>>>> other server.
>>>>
>>>> They had a link count of 1 which means that the file on that
>>>> the gfid pointed was already deleted.
>>>>
>>>>
>>>> We have still these errors
>>>>
>>>> [2014-07-08 09:09:43.564488] W
>>>> [client-rpc-fops.c:2469:client3_3_link_cbk]
>>>> 0-gluster_dateisystem-client-0: remote operation failed: File
>>>> exists (00000000-0000-0000-0000-000000000000 ->
>>>> <gfid:b338b09e-2577-45b3-82bd-032f954dd083>/lock)
>>>>
>>>> which appear in the glusterfshd.log and these
>>>>
>>>> [2014-07-08 09:13:31.198462] E
>>>> [client-rpc-fops.c:5179:client3_3_inodelk]
>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/cluster/replicate.so(+0x466b8)
>>>>
>>>>
>>>>
>>>>
> [0x7f5d29d4e6b8]
>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/cluster/replicate.so(afr_lock_blocking+0x844)
>>>>
>>>>
>>>>
>>>>
> [0x7f5d29d4e2e4]
>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/protocol/client.so(client_inodelk+0x99)
>>>>
>>>>
>>>>
>>>>
> [0x7f5d29f8b3c9]))) 0-: Assertion failed: 0
>>>> from the nfs.log.
>>>>> Could you attach mount (nfs.log) and brick logs please. Do
>>>>> you have files with lots of hard-links? Pranith
>>>> I think the error messages belong together but I don't have any
>>>> idea how to solve them.
>>>>
>>>> Still we have got a very bad performance issue. The system load
>>>> on the servers is above 20 and nearly no one is able to work in
>>>> here on a client...
>>>>
>>>> Hope for help Norman
>>>>
>>>>
>>>> Am 07.07.2014 15:39, schrieb Pranith Kumar Karampuri:
>>>>>>> On 07/07/2014 06:58 PM, Norman Mähler wrote: Dear
>>>>>>> community,
>>>>>>>
>>>>>>> we have got some serious problems with our Gluster
>>>>>>> installation.
>>>>>>>
>>>>>>> Here is the setting:
>>>>>>>
>>>>>>> We have got 2 bricks (version 3.4.4) on a debian 7.5, one
>>>>>>> of them with an nfs export. There are about 120 clients
>>>>>>> connecting to the exported nfs. These clients are thin
>>>>>>> clients reading and writing their Linux home directories
>>>>>>> from the exported nfs.
>>>>>>>
>>>>>>> We want to change the access of these clients one by one
>>>>>>> to access via gluster client.
>>>>>>>> I did not understand what you meant by this. Are you
>>>>>>>> moving to glusterfs-fuse based mounts?
>>>>>>> Here are our problems:
>>>>>>>
>>>>>>> In the moment we have got two types of error messages
>>>>>>> which come in burts to our glusterfshd.log
>>>>>>>
>>>>>>> [2014-07-07 13:10:21.572487] W
>>>>>>> [client-rpc-fops.c:1538:client3_3_inodelk_cbk]
>>>>>>> 0-gluster_dateisystem-client-1: remote operation failed:
>>>>>>> No such file or directory [2014-07-07 13:10:21.573448] W
>>>>>>> [client-rpc-fops.c:471:client3_3_open_cbk]
>>>>>>> 0-gluster_dateisystem-client-1: remote operation failed:
>>>>>>> No such file or directory. Path:
>>>>>>> <gfid:b0c4f78a-249f-4db7-9d5b-0902c7d8f6cc>
>>>>>>> (00000000-0000-0000-0000-000000000000) [2014-07-07
>>>>>>> 13:10:21.573468] E
>>>>>>> [afr-self-heal-data.c:1270:afr_sh_data_open_cbk]
>>>>>>> 0-gluster_dateisystem-replicate-0: open of
>>>>>>> <gfid:b0c4f78a-249f-4db7-9d5b-0902c7d8f6cc> failed on
>>>>>>> child gluster_dateisystem-client-1 (No such file or
>>>>>>> directory)
>>>>>>>
>>>>>>>
>>>>>>> This looks like a missing gfid file on one of the bricks.
>>>>>>> I looked it up and yes the file is missing on the second
>>>>>>> brick.
>>>>>>>
>>>>>>> We got these messages the other way round, too (missing
>>>>>>> on client-0 and the first brick).
>>>>>>>
>>>>>>> Is it possible to repair this one by copying the gfid
>>>>>>> file to the brick where it was missing? Or ist there
>>>>>>> another way to repair it?
>>>>>>>
>>>>>>>
>>>>>>> The second message is
>>>>>>>
>>>>>>> [2014-07-07 13:06:35.948738] W
>>>>>>> [client-rpc-fops.c:2469:client3_3_link_cbk]
>>>>>>> 0-gluster_dateisystem-client-1: remote operation failed:
>>>>>>> File exists (00000000-0000-0000-0000-000000000000 ->
>>>>>>> <gfid:aae47250-8f69-480c-ac75-2da2f4d21d7a>/lock)
>>>>>>>
>>>>>>> and I really do not know what to do with this one...
>>>>>>>> Did any of the bricks went offline and came back
>>>>>>>> online? Pranith
>>>>>>> I am really looking forward to your help because this is
>>>>>>> an active system and the system load on the nfs brick is
>>>>>>> about 25 (!!)
>>>>>>>
>>>>>>> Thanks in advance! Norman Maehler
>>>>>>>
>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-users mailing list Gluster-users at gluster.org
>>>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> - -- 
> Mit freundlichen Grüßen,
>
> Norman Mähler
>
> Bereichsleiter IT-Hochschulservice
> uni-assist e. V.
> Geneststr. 5
> Aufgang H, 3. Etage
> 10829 Berlin
>
> Tel.: 030-66644382
> n.maehler at uni-assist.de
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQEcBAEBAgAGBQJTu82TAAoJEB810LSP8y+R3jMH/0K9U2jBrukBDrdDvMf542Cz
> Qoi4Lq2KU+SwUL6tcR4kymC0iGe5ZDk0baEOBwzdBmW1Nu19saGKjhXxYskmjSu4
> lKJP216229eSOHD6mTlwamgj6DCgxlFZwMzLJMbiEaRhZzFTK5PMbkhslV3IP8IK
> jmKlNwdhGVJ7nUCjt+Mu203kCdQUv8X/a3UKO341LkdqlOSSsmhOEL34Mop51vmL
> mZZdw5fCZisK29vKeZr1vBvIbRYvx3kBSRjYWPtBq1pRx4DbhTdoYnSfLULt+MJJ
> fgYIDS3ykYx/U10wmtHs75+rFxvtXOLe3QiwuakE8nj/quIvKRZorGJ9BSvqYoQ=
> =vWIo
> -----END PGP SIGNATURE-----