[Gluster-users] Self-heal Problems with gluster and nfs

Norman Mähler n.maehler at uni-assist.de
Tue Jul 8 11:19:57 UTC 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



Am 08.07.2014 13:02, schrieb Pranith Kumar Karampuri:
> 
> On 07/08/2014 04:23 PM, Norman Mähler wrote: Of course:
> 
> The configuration is:
> 
> Volume Name: gluster_dateisystem Type: Replicate Volume ID:
> 2766695c-b8aa-46fd-b84d-4793b7ce847a Status: Started Number of
> Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1:
> filecluster1:/mnt/raid Brick2: filecluster2:/mnt/raid Options
> Reconfigured: nfs.enable-ino32: on performance.cache-size: 512MB 
> diagnostics.brick-log-level: WARNING diagnostics.client-log-level:
> WARNING nfs.addr-namelookup: off performance.cache-refresh-timeout:
> 60 performance.cache-max-file-size: 100MB 
> performance.write-behind-window-size: 10MB 
> performance.io-thread-count: 18 performance.stat-prefetch: off
> 
> 
> The file count in xattrop is
>> Do "gluster volume set gluster_dateisystem
>> cluster.self-heal-daemon off" This should stop all the entry
>> self-heals and should also get the CPU usage low. When you don't
>> have a lot of activity you can enable it again using "gluster
>> volume set gluster_dateisystem cluster.self-heal-daemon on" If it
>> doesn't get the CPU down execute "gluster volume set 
>> gluster_dateisystem cluster.entry-self-heal off". Let me know how
>> it goes.
> 
>> Pranith

Thanks for your help so far but stopping the self heal deamon and the
self heal machanism itself did not improve the situation.

Do you have further suggestions?
Is it simply the load on the system? NFS could handle it easily before...

Norman

> 
> Brick 1: 2706 Brick 2: 2687
> 
> Norman
> 
> Am 08.07.2014 12:28, schrieb Pranith Kumar Karampuri:
>>>> It seems like entry self-heal is happening. What is the
>>>> volume configuration? Could you give ls 
>>>> <brick-path>/.glusterfs/indices/xattrop | wc -l Count for all
>>>> the bricks
>>>> 
>>>> Pranith On 07/08/2014 03:36 PM, Norman Mähler wrote:
>>>>> Hello Pranith,
>>>>> 
>>>>> here are the logs. I only giv you the last 3000 lines,
>>>>> because the nfs.log from today is already 550 MB.
>>>>> 
>>>>> There are the standard files from a user home on the
>>>>> gluster system. All you normally find in a user home.
>>>>> Config files, firefox and thunderbird files etc.
>>>>> 
>>>>> Thanks in advance Norman
>>>>> 
>>>>> Am 08.07.2014 11:46, schrieb Pranith Kumar Karampuri:
>>>>>> On 07/08/2014 02:46 PM, Norman Mähler wrote: Hello
>>>>>> again,
>>>>>> 
>>>>>> i could resolve the self heal problems with the missing
>>>>>> gfid files on one of the servers by deleting the gfid
>>>>>> files on the other server.
>>>>>> 
>>>>>> They had a link count of 1 which means that the file on
>>>>>> that the gfid pointed was already deleted.
>>>>>> 
>>>>>> 
>>>>>> We have still these errors
>>>>>> 
>>>>>> [2014-07-08 09:09:43.564488] W 
>>>>>> [client-rpc-fops.c:2469:client3_3_link_cbk] 
>>>>>> 0-gluster_dateisystem-client-0: remote operation failed:
>>>>>> File exists (00000000-0000-0000-0000-000000000000 -> 
>>>>>> <gfid:b338b09e-2577-45b3-82bd-032f954dd083>/lock)
>>>>>> 
>>>>>> which appear in the glusterfshd.log and these
>>>>>> 
>>>>>> [2014-07-08 09:13:31.198462] E 
>>>>>> [client-rpc-fops.c:5179:client3_3_inodelk] 
>>>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/cluster/replicate.so(+0x466b8)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>
>>>>>> 
[0x7f5d29d4e6b8]
>>>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/cluster/replicate.so(afr_lock_blocking+0x844)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>
>>>>>> 
[0x7f5d29d4e2e4]
>>>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/protocol/client.so(client_inodelk+0x99)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>
>>>>>> 
[0x7f5d29f8b3c9]))) 0-: Assertion failed: 0
>>>>>> from the nfs.log.
>>>>>>> Could you attach mount (nfs.log) and brick logs please.
>>>>>>> Do you have files with lots of hard-links? Pranith
>>>>>> I think the error messages belong together but I don't
>>>>>> have any idea how to solve them.
>>>>>> 
>>>>>> Still we have got a very bad performance issue. The
>>>>>> system load on the servers is above 20 and nearly no one
>>>>>> is able to work in here on a client...
>>>>>> 
>>>>>> Hope for help Norman
>>>>>> 
>>>>>> 
>>>>>> Am 07.07.2014 15:39, schrieb Pranith Kumar Karampuri:
>>>>>>>>> On 07/07/2014 06:58 PM, Norman Mähler wrote: Dear 
>>>>>>>>> community,
>>>>>>>>> 
>>>>>>>>> we have got some serious problems with our Gluster 
>>>>>>>>> installation.
>>>>>>>>> 
>>>>>>>>> Here is the setting:
>>>>>>>>> 
>>>>>>>>> We have got 2 bricks (version 3.4.4) on a debian
>>>>>>>>> 7.5, one of them with an nfs export. There are
>>>>>>>>> about 120 clients connecting to the exported nfs.
>>>>>>>>> These clients are thin clients reading and writing
>>>>>>>>> their Linux home directories from the exported
>>>>>>>>> nfs.
>>>>>>>>> 
>>>>>>>>> We want to change the access of these clients one
>>>>>>>>> by one to access via gluster client.
>>>>>>>>>> I did not understand what you meant by this. Are
>>>>>>>>>> you moving to glusterfs-fuse based mounts?
>>>>>>>>> Here are our problems:
>>>>>>>>> 
>>>>>>>>> In the moment we have got two types of error
>>>>>>>>> messages which come in burts to our
>>>>>>>>> glusterfshd.log
>>>>>>>>> 
>>>>>>>>> [2014-07-07 13:10:21.572487] W 
>>>>>>>>> [client-rpc-fops.c:1538:client3_3_inodelk_cbk] 
>>>>>>>>> 0-gluster_dateisystem-client-1: remote operation
>>>>>>>>> failed: No such file or directory [2014-07-07
>>>>>>>>> 13:10:21.573448] W 
>>>>>>>>> [client-rpc-fops.c:471:client3_3_open_cbk] 
>>>>>>>>> 0-gluster_dateisystem-client-1: remote operation
>>>>>>>>> failed: No such file or directory. Path: 
>>>>>>>>> <gfid:b0c4f78a-249f-4db7-9d5b-0902c7d8f6cc> 
>>>>>>>>> (00000000-0000-0000-0000-000000000000) [2014-07-07 
>>>>>>>>> 13:10:21.573468] E 
>>>>>>>>> [afr-self-heal-data.c:1270:afr_sh_data_open_cbk] 
>>>>>>>>> 0-gluster_dateisystem-replicate-0: open of 
>>>>>>>>> <gfid:b0c4f78a-249f-4db7-9d5b-0902c7d8f6cc> failed
>>>>>>>>> on child gluster_dateisystem-client-1 (No such file
>>>>>>>>> or directory)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> This looks like a missing gfid file on one of the
>>>>>>>>> bricks. I looked it up and yes the file is missing
>>>>>>>>> on the second brick.
>>>>>>>>> 
>>>>>>>>> We got these messages the other way round, too
>>>>>>>>> (missing on client-0 and the first brick).
>>>>>>>>> 
>>>>>>>>> Is it possible to repair this one by copying the
>>>>>>>>> gfid file to the brick where it was missing? Or ist
>>>>>>>>> there another way to repair it?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> The second message is
>>>>>>>>> 
>>>>>>>>> [2014-07-07 13:06:35.948738] W 
>>>>>>>>> [client-rpc-fops.c:2469:client3_3_link_cbk] 
>>>>>>>>> 0-gluster_dateisystem-client-1: remote operation
>>>>>>>>> failed: File exists
>>>>>>>>> (00000000-0000-0000-0000-000000000000 -> 
>>>>>>>>> <gfid:aae47250-8f69-480c-ac75-2da2f4d21d7a>/lock)
>>>>>>>>> 
>>>>>>>>> and I really do not know what to do with this
>>>>>>>>> one...
>>>>>>>>>> Did any of the bricks went offline and came back 
>>>>>>>>>> online? Pranith
>>>>>>>>> I am really looking forward to your help because
>>>>>>>>> this is an active system and the system load on the
>>>>>>>>> nfs brick is about 25 (!!)
>>>>>>>>> 
>>>>>>>>> Thanks in advance! Norman Maehler
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> _______________________________________________ 
>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>> Gluster-users at gluster.org 
>>>>>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>>>>>>>>>> 
- -- Mit freundlichen Grüßen,
> 
> Norman Mähler
> 
> Bereichsleiter IT-Hochschulservice uni-assist e. V. Geneststr. 5 
> Aufgang H, 3. Etage 10829 Berlin
> 
> Tel.: 030-66644382 n.maehler at uni-assist.de
> 

- -- 
Mit freundlichen Grüßen,

Norman Mähler

Bereichsleiter IT-Hochschulservice
uni-assist e. V.
Geneststr. 5
Aufgang H, 3. Etage
10829 Berlin

Tel.: 030-66644382
n.maehler at uni-assist.de
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTu9PdAAoJEB810LSP8y+R8Y0IAIgMbGUqcvXrRvDr2hOuZ4+f
AyQ6srf3LvWtxfBG7pOfjfQMrdTFlsCzDLeYUMTpt30Yn6ZfUjStth3dp3K5ZUAT
iX9zOZC1xaKF1NPwGAzyKMNr83I/54tv/au4VGrJwAV2WAPvNfsEjbY5x+i4YSfH
9Tc2IA4G51Ecd0Lr06LPThyj8Sa++635Bms+Q7swL+mkjItE+Quu+xOXUiND7u31
l6b/bnZZc77OFWkcZRG97vWkGkb+xQkupLH18VIl5l0nDBJVHN4wN+Xlym183NvO
ygH6hd1dpvLAZnzWS2jKhEp48jwPeDc6/kJt0PcIZW7+2cDwKBTbXx/b+u1PEuE=
=c8Px
-----END PGP SIGNATURE-----



More information about the Gluster-users mailing list