[Gluster-users] Self-heal Problems with gluster and nfs

Pranith Kumar Karampuri pkarampu at redhat.com
Tue Jul 8 11:24:32 UTC 2014


On 07/08/2014 04:49 PM, Norman Mähler wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
>
> Am 08.07.2014 13:02, schrieb Pranith Kumar Karampuri:
>> On 07/08/2014 04:23 PM, Norman Mähler wrote: Of course:
>>
>> The configuration is:
>>
>> Volume Name: gluster_dateisystem Type: Replicate Volume ID:
>> 2766695c-b8aa-46fd-b84d-4793b7ce847a Status: Started Number of
>> Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1:
>> filecluster1:/mnt/raid Brick2: filecluster2:/mnt/raid Options
>> Reconfigured: nfs.enable-ino32: on performance.cache-size: 512MB
>> diagnostics.brick-log-level: WARNING diagnostics.client-log-level:
>> WARNING nfs.addr-namelookup: off performance.cache-refresh-timeout:
>> 60 performance.cache-max-file-size: 100MB
>> performance.write-behind-window-size: 10MB
>> performance.io-thread-count: 18 performance.stat-prefetch: off
>>
>>
>> The file count in xattrop is
>>> Do "gluster volume set gluster_dateisystem
>>> cluster.self-heal-daemon off" This should stop all the entry
>>> self-heals and should also get the CPU usage low. When you don't
>>> have a lot of activity you can enable it again using "gluster
>>> volume set gluster_dateisystem cluster.self-heal-daemon on" If it
>>> doesn't get the CPU down execute "gluster volume set
>>> gluster_dateisystem cluster.entry-self-heal off". Let me know how
>>> it goes.
>>> Pranith
> Thanks for your help so far but stopping the self heal deamon and the
> self heal machanism itself did not improve the situation.
>
> Do you have further suggestions?
> Is it simply the load on the system? NFS could handle it easily before...
Is it at least a little better or no improvement at all?

Pranith
>
> Norman
>
>> Brick 1: 2706 Brick 2: 2687
>>
>> Norman
>>
>> Am 08.07.2014 12:28, schrieb Pranith Kumar Karampuri:
>>>>> It seems like entry self-heal is happening. What is the
>>>>> volume configuration? Could you give ls
>>>>> <brick-path>/.glusterfs/indices/xattrop | wc -l Count for all
>>>>> the bricks
>>>>>
>>>>> Pranith On 07/08/2014 03:36 PM, Norman Mähler wrote:
>>>>>> Hello Pranith,
>>>>>>
>>>>>> here are the logs. I only giv you the last 3000 lines,
>>>>>> because the nfs.log from today is already 550 MB.
>>>>>>
>>>>>> There are the standard files from a user home on the
>>>>>> gluster system. All you normally find in a user home.
>>>>>> Config files, firefox and thunderbird files etc.
>>>>>>
>>>>>> Thanks in advance Norman
>>>>>>
>>>>>> Am 08.07.2014 11:46, schrieb Pranith Kumar Karampuri:
>>>>>>> On 07/08/2014 02:46 PM, Norman Mähler wrote: Hello
>>>>>>> again,
>>>>>>>
>>>>>>> i could resolve the self heal problems with the missing
>>>>>>> gfid files on one of the servers by deleting the gfid
>>>>>>> files on the other server.
>>>>>>>
>>>>>>> They had a link count of 1 which means that the file on
>>>>>>> that the gfid pointed was already deleted.
>>>>>>>
>>>>>>>
>>>>>>> We have still these errors
>>>>>>>
>>>>>>> [2014-07-08 09:09:43.564488] W
>>>>>>> [client-rpc-fops.c:2469:client3_3_link_cbk]
>>>>>>> 0-gluster_dateisystem-client-0: remote operation failed:
>>>>>>> File exists (00000000-0000-0000-0000-000000000000 ->
>>>>>>> <gfid:b338b09e-2577-45b3-82bd-032f954dd083>/lock)
>>>>>>>
>>>>>>> which appear in the glusterfshd.log and these
>>>>>>>
>>>>>>> [2014-07-08 09:13:31.198462] E
>>>>>>> [client-rpc-fops.c:5179:client3_3_inodelk]
>>>>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/cluster/replicate.so(+0x466b8)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
> [0x7f5d29d4e6b8]
>>>>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/cluster/replicate.so(afr_lock_blocking+0x844)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
> [0x7f5d29d4e2e4]
>>>>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/protocol/client.so(client_inodelk+0x99)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
> [0x7f5d29f8b3c9]))) 0-: Assertion failed: 0
>>>>>>> from the nfs.log.
>>>>>>>> Could you attach mount (nfs.log) and brick logs please.
>>>>>>>> Do you have files with lots of hard-links? Pranith
>>>>>>> I think the error messages belong together but I don't
>>>>>>> have any idea how to solve them.
>>>>>>>
>>>>>>> Still we have got a very bad performance issue. The
>>>>>>> system load on the servers is above 20 and nearly no one
>>>>>>> is able to work in here on a client...
>>>>>>>
>>>>>>> Hope for help Norman
>>>>>>>
>>>>>>>
>>>>>>> Am 07.07.2014 15:39, schrieb Pranith Kumar Karampuri:
>>>>>>>>>> On 07/07/2014 06:58 PM, Norman Mähler wrote: Dear
>>>>>>>>>> community,
>>>>>>>>>>
>>>>>>>>>> we have got some serious problems with our Gluster
>>>>>>>>>> installation.
>>>>>>>>>>
>>>>>>>>>> Here is the setting:
>>>>>>>>>>
>>>>>>>>>> We have got 2 bricks (version 3.4.4) on a debian
>>>>>>>>>> 7.5, one of them with an nfs export. There are
>>>>>>>>>> about 120 clients connecting to the exported nfs.
>>>>>>>>>> These clients are thin clients reading and writing
>>>>>>>>>> their Linux home directories from the exported
>>>>>>>>>> nfs.
>>>>>>>>>>
>>>>>>>>>> We want to change the access of these clients one
>>>>>>>>>> by one to access via gluster client.
>>>>>>>>>>> I did not understand what you meant by this. Are
>>>>>>>>>>> you moving to glusterfs-fuse based mounts?
>>>>>>>>>> Here are our problems:
>>>>>>>>>>
>>>>>>>>>> In the moment we have got two types of error
>>>>>>>>>> messages which come in burts to our
>>>>>>>>>> glusterfshd.log
>>>>>>>>>>
>>>>>>>>>> [2014-07-07 13:10:21.572487] W
>>>>>>>>>> [client-rpc-fops.c:1538:client3_3_inodelk_cbk]
>>>>>>>>>> 0-gluster_dateisystem-client-1: remote operation
>>>>>>>>>> failed: No such file or directory [2014-07-07
>>>>>>>>>> 13:10:21.573448] W
>>>>>>>>>> [client-rpc-fops.c:471:client3_3_open_cbk]
>>>>>>>>>> 0-gluster_dateisystem-client-1: remote operation
>>>>>>>>>> failed: No such file or directory. Path:
>>>>>>>>>> <gfid:b0c4f78a-249f-4db7-9d5b-0902c7d8f6cc>
>>>>>>>>>> (00000000-0000-0000-0000-000000000000) [2014-07-07
>>>>>>>>>> 13:10:21.573468] E
>>>>>>>>>> [afr-self-heal-data.c:1270:afr_sh_data_open_cbk]
>>>>>>>>>> 0-gluster_dateisystem-replicate-0: open of
>>>>>>>>>> <gfid:b0c4f78a-249f-4db7-9d5b-0902c7d8f6cc> failed
>>>>>>>>>> on child gluster_dateisystem-client-1 (No such file
>>>>>>>>>> or directory)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This looks like a missing gfid file on one of the
>>>>>>>>>> bricks. I looked it up and yes the file is missing
>>>>>>>>>> on the second brick.
>>>>>>>>>>
>>>>>>>>>> We got these messages the other way round, too
>>>>>>>>>> (missing on client-0 and the first brick).
>>>>>>>>>>
>>>>>>>>>> Is it possible to repair this one by copying the
>>>>>>>>>> gfid file to the brick where it was missing? Or ist
>>>>>>>>>> there another way to repair it?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The second message is
>>>>>>>>>>
>>>>>>>>>> [2014-07-07 13:06:35.948738] W
>>>>>>>>>> [client-rpc-fops.c:2469:client3_3_link_cbk]
>>>>>>>>>> 0-gluster_dateisystem-client-1: remote operation
>>>>>>>>>> failed: File exists
>>>>>>>>>> (00000000-0000-0000-0000-000000000000 ->
>>>>>>>>>> <gfid:aae47250-8f69-480c-ac75-2da2f4d21d7a>/lock)
>>>>>>>>>>
>>>>>>>>>> and I really do not know what to do with this
>>>>>>>>>> one...
>>>>>>>>>>> Did any of the bricks went offline and came back
>>>>>>>>>>> online? Pranith
>>>>>>>>>> I am really looking forward to your help because
>>>>>>>>>> this is an active system and the system load on the
>>>>>>>>>> nfs brick is about 25 (!!)
>>>>>>>>>>
>>>>>>>>>> Thanks in advance! Norman Maehler
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> - -- Mit freundlichen Grüßen,
>> Norman Mähler
>>
>> Bereichsleiter IT-Hochschulservice uni-assist e. V. Geneststr. 5
>> Aufgang H, 3. Etage 10829 Berlin
>>
>> Tel.: 030-66644382 n.maehler at uni-assist.de
>>
> - -- 
> Mit freundlichen Grüßen,
>
> Norman Mähler
>
> Bereichsleiter IT-Hochschulservice
> uni-assist e. V.
> Geneststr. 5
> Aufgang H, 3. Etage
> 10829 Berlin
>
> Tel.: 030-66644382
> n.maehler at uni-assist.de
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQEcBAEBAgAGBQJTu9PdAAoJEB810LSP8y+R8Y0IAIgMbGUqcvXrRvDr2hOuZ4+f
> AyQ6srf3LvWtxfBG7pOfjfQMrdTFlsCzDLeYUMTpt30Yn6ZfUjStth3dp3K5ZUAT
> iX9zOZC1xaKF1NPwGAzyKMNr83I/54tv/au4VGrJwAV2WAPvNfsEjbY5x+i4YSfH
> 9Tc2IA4G51Ecd0Lr06LPThyj8Sa++635Bms+Q7swL+mkjItE+Quu+xOXUiND7u31
> l6b/bnZZc77OFWkcZRG97vWkGkb+xQkupLH18VIl5l0nDBJVHN4wN+Xlym183NvO
> ygH6hd1dpvLAZnzWS2jKhEp48jwPeDc6/kJt0PcIZW7+2cDwKBTbXx/b+u1PEuE=
> =c8Px
> -----END PGP SIGNATURE-----




More information about the Gluster-users mailing list