[Gluster-users] Self-heal Problems with gluster and nfs
Norman Mähler
n.maehler at uni-assist.de
Tue Jul 8 11:30:49 UTC 2014
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Am 08.07.2014 13:24, schrieb Pranith Kumar Karampuri:
>
> On 07/08/2014 04:49 PM, Norman Mähler wrote:
>
>
> Am 08.07.2014 13:02, schrieb Pranith Kumar Karampuri:
>>>> On 07/08/2014 04:23 PM, Norman Mähler wrote: Of course:
>>>>
>>>> The configuration is:
>>>>
>>>> Volume Name: gluster_dateisystem Type: Replicate Volume ID:
>>>> 2766695c-b8aa-46fd-b84d-4793b7ce847a Status: Started Number
>>>> of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1:
>>>> filecluster1:/mnt/raid Brick2: filecluster2:/mnt/raid
>>>> Options Reconfigured: nfs.enable-ino32: on
>>>> performance.cache-size: 512MB diagnostics.brick-log-level:
>>>> WARNING diagnostics.client-log-level: WARNING
>>>> nfs.addr-namelookup: off performance.cache-refresh-timeout:
>>>> 60 performance.cache-max-file-size: 100MB
>>>> performance.write-behind-window-size: 10MB
>>>> performance.io-thread-count: 18 performance.stat-prefetch:
>>>> off
>>>>
>>>>
>>>> The file count in xattrop is
>>>>> Do "gluster volume set gluster_dateisystem
>>>>> cluster.self-heal-daemon off" This should stop all the
>>>>> entry self-heals and should also get the CPU usage low.
>>>>> When you don't have a lot of activity you can enable it
>>>>> again using "gluster volume set gluster_dateisystem
>>>>> cluster.self-heal-daemon on" If it doesn't get the CPU down
>>>>> execute "gluster volume set gluster_dateisystem
>>>>> cluster.entry-self-heal off". Let me know how it goes.
>>>>> Pranith
> Thanks for your help so far but stopping the self heal deamon and
> the self heal machanism itself did not improve the situation.
>
> Do you have further suggestions? Is it simply the load on the
> system? NFS could handle it easily before...
>> Is it at least a little better or no improvement at all?
>
>> Pranith
There is a very small improvement of about 1 point in the 15 minute
load. The 15 minute load now is at about 20 to 22 at the moment.
Norman
>
> Norman
>
>>>> Brick 1: 2706 Brick 2: 2687
>>>>
>>>> Norman
>>>>
>>>> Am 08.07.2014 12:28, schrieb Pranith Kumar Karampuri:
>>>>>>> It seems like entry self-heal is happening. What is
>>>>>>> the volume configuration? Could you give ls
>>>>>>> <brick-path>/.glusterfs/indices/xattrop | wc -l Count
>>>>>>> for all the bricks
>>>>>>>
>>>>>>> Pranith On 07/08/2014 03:36 PM, Norman Mähler wrote:
>>>>>>>> Hello Pranith,
>>>>>>>>
>>>>>>>> here are the logs. I only giv you the last 3000
>>>>>>>> lines, because the nfs.log from today is already 550
>>>>>>>> MB.
>>>>>>>>
>>>>>>>> There are the standard files from a user home on the
>>>>>>>> gluster system. All you normally find in a user
>>>>>>>> home. Config files, firefox and thunderbird files
>>>>>>>> etc.
>>>>>>>>
>>>>>>>> Thanks in advance Norman
>>>>>>>>
>>>>>>>> Am 08.07.2014 11:46, schrieb Pranith Kumar
>>>>>>>> Karampuri:
>>>>>>>>> On 07/08/2014 02:46 PM, Norman Mähler wrote: Hello
>>>>>>>>> again,
>>>>>>>>>
>>>>>>>>> i could resolve the self heal problems with the
>>>>>>>>> missing gfid files on one of the servers by
>>>>>>>>> deleting the gfid files on the other server.
>>>>>>>>>
>>>>>>>>> They had a link count of 1 which means that the
>>>>>>>>> file on that the gfid pointed was already deleted.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We have still these errors
>>>>>>>>>
>>>>>>>>> [2014-07-08 09:09:43.564488] W
>>>>>>>>> [client-rpc-fops.c:2469:client3_3_link_cbk]
>>>>>>>>> 0-gluster_dateisystem-client-0: remote operation
>>>>>>>>> failed: File exists
>>>>>>>>> (00000000-0000-0000-0000-000000000000 ->
>>>>>>>>> <gfid:b338b09e-2577-45b3-82bd-032f954dd083>/lock)
>>>>>>>>>
>>>>>>>>> which appear in the glusterfshd.log and these
>>>>>>>>>
>>>>>>>>> [2014-07-08 09:13:31.198462] E
>>>>>>>>> [client-rpc-fops.c:5179:client3_3_inodelk]
>>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/cluster/replicate.so(+0x466b8)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>
>>>>>>>>>
[0x7f5d29d4e6b8]
>>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/cluster/replicate.so(afr_lock_blocking+0x844)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>
>>>>>>>>>
[0x7f5d29d4e2e4]
>>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/protocol/client.so(client_inodelk+0x99)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>
>>>>>>>>>
[0x7f5d29f8b3c9]))) 0-: Assertion failed: 0
>>>>>>>>> from the nfs.log.
>>>>>>>>>> Could you attach mount (nfs.log) and brick logs
>>>>>>>>>> please. Do you have files with lots of
>>>>>>>>>> hard-links? Pranith
>>>>>>>>> I think the error messages belong together but I
>>>>>>>>> don't have any idea how to solve them.
>>>>>>>>>
>>>>>>>>> Still we have got a very bad performance issue.
>>>>>>>>> The system load on the servers is above 20 and
>>>>>>>>> nearly no one is able to work in here on a
>>>>>>>>> client...
>>>>>>>>>
>>>>>>>>> Hope for help Norman
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Am 07.07.2014 15:39, schrieb Pranith Kumar
>>>>>>>>> Karampuri:
>>>>>>>>>>>> On 07/07/2014 06:58 PM, Norman Mähler wrote:
>>>>>>>>>>>> Dear community,
>>>>>>>>>>>>
>>>>>>>>>>>> we have got some serious problems with our
>>>>>>>>>>>> Gluster installation.
>>>>>>>>>>>>
>>>>>>>>>>>> Here is the setting:
>>>>>>>>>>>>
>>>>>>>>>>>> We have got 2 bricks (version 3.4.4) on a
>>>>>>>>>>>> debian 7.5, one of them with an nfs export.
>>>>>>>>>>>> There are about 120 clients connecting to the
>>>>>>>>>>>> exported nfs. These clients are thin clients
>>>>>>>>>>>> reading and writing their Linux home
>>>>>>>>>>>> directories from the exported nfs.
>>>>>>>>>>>>
>>>>>>>>>>>> We want to change the access of these clients
>>>>>>>>>>>> one by one to access via gluster client.
>>>>>>>>>>>>> I did not understand what you meant by
>>>>>>>>>>>>> this. Are you moving to glusterfs-fuse
>>>>>>>>>>>>> based mounts?
>>>>>>>>>>>> Here are our problems:
>>>>>>>>>>>>
>>>>>>>>>>>> In the moment we have got two types of error
>>>>>>>>>>>> messages which come in burts to our
>>>>>>>>>>>> glusterfshd.log
>>>>>>>>>>>>
>>>>>>>>>>>> [2014-07-07 13:10:21.572487] W
>>>>>>>>>>>> [client-rpc-fops.c:1538:client3_3_inodelk_cbk]
>>>>>>>>>>>>
>>>>>>>>>>>>
0-gluster_dateisystem-client-1: remote operation
>>>>>>>>>>>> failed: No such file or directory
>>>>>>>>>>>> [2014-07-07 13:10:21.573448] W
>>>>>>>>>>>> [client-rpc-fops.c:471:client3_3_open_cbk]
>>>>>>>>>>>> 0-gluster_dateisystem-client-1: remote
>>>>>>>>>>>> operation failed: No such file or directory.
>>>>>>>>>>>> Path:
>>>>>>>>>>>> <gfid:b0c4f78a-249f-4db7-9d5b-0902c7d8f6cc>
>>>>>>>>>>>> (00000000-0000-0000-0000-000000000000)
>>>>>>>>>>>> [2014-07-07 13:10:21.573468] E
>>>>>>>>>>>> [afr-self-heal-data.c:1270:afr_sh_data_open_cbk]
>>>>>>>>>>>>
>>>>>>>>>>>>
0-gluster_dateisystem-replicate-0: open of
>>>>>>>>>>>> <gfid:b0c4f78a-249f-4db7-9d5b-0902c7d8f6cc>
>>>>>>>>>>>> failed on child gluster_dateisystem-client-1
>>>>>>>>>>>> (No such file or directory)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> This looks like a missing gfid file on one of
>>>>>>>>>>>> the bricks. I looked it up and yes the file
>>>>>>>>>>>> is missing on the second brick.
>>>>>>>>>>>>
>>>>>>>>>>>> We got these messages the other way round,
>>>>>>>>>>>> too (missing on client-0 and the first
>>>>>>>>>>>> brick).
>>>>>>>>>>>>
>>>>>>>>>>>> Is it possible to repair this one by copying
>>>>>>>>>>>> the gfid file to the brick where it was
>>>>>>>>>>>> missing? Or ist there another way to repair
>>>>>>>>>>>> it?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> The second message is
>>>>>>>>>>>>
>>>>>>>>>>>> [2014-07-07 13:06:35.948738] W
>>>>>>>>>>>> [client-rpc-fops.c:2469:client3_3_link_cbk]
>>>>>>>>>>>> 0-gluster_dateisystem-client-1: remote
>>>>>>>>>>>> operation failed: File exists
>>>>>>>>>>>> (00000000-0000-0000-0000-000000000000 ->
>>>>>>>>>>>> <gfid:aae47250-8f69-480c-ac75-2da2f4d21d7a>/lock)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
and I really do not know what to do with this
>>>>>>>>>>>> one...
>>>>>>>>>>>>> Did any of the bricks went offline and came
>>>>>>>>>>>>> back online? Pranith
>>>>>>>>>>>> I am really looking forward to your help
>>>>>>>>>>>> because this is an active system and the
>>>>>>>>>>>> system load on the nfs brick is about 25
>>>>>>>>>>>> (!!)
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks in advance! Norman Maehler
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>
>>>>>>>>>>>>>
Gluster-users mailing list
>>>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>>>>>>>>>>>>>
- -- Mit freundlichen Grüßen,
>>>> Norman Mähler
>>>>
>>>> Bereichsleiter IT-Hochschulservice uni-assist e. V.
>>>> Geneststr. 5 Aufgang H, 3. Etage 10829 Berlin
>>>>
>>>> Tel.: 030-66644382 n.maehler at uni-assist.de
>>>>
> -- Mit freundlichen Grüßen,
>
> Norman Mähler
>
> Bereichsleiter IT-Hochschulservice uni-assist e. V. Geneststr. 5
> Aufgang H, 3. Etage 10829 Berlin
>
> Tel.: 030-66644382 n.maehler at uni-assist.de
>
- --
Mit freundlichen Grüßen,
Norman Mähler
Bereichsleiter IT-Hochschulservice
uni-assist e. V.
Geneststr. 5
Aufgang H, 3. Etage
10829 Berlin
Tel.: 030-66644382
n.maehler at uni-assist.de
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQEcBAEBAgAGBQJTu9ZpAAoJEB810LSP8y+R6sQH/ieTOn6W8LheGswXcRJvHgSB
7BRjo3BxFrN/xa63EpRIXzdY+ScRwuNAp76z6/IZ+A/l3DrGQW2lxDXnvDB81CNW
2ergEJ4WuiC3x29tYHAj+A7DStiONz1qoH1v1VRsluHpPYOyhgQ6OKi6zWiFWllR
+gk3QfDOjpYaG0lQNHAci3pdBeg0uzYjaxhsMeMxq8T2NH0656++sx/vAW3XPyb6
Pkw7yDHuD4PKUOcyaR6QY7MrUPnVgSrlU1XTlLqwDyTR6erZqQHPBoaxG+Klm9vM
EFyi4MT8s/KE/fwlSh/EGP7+9CvRmNGilX2gPZoS/Y9ugrL+3c7jvFEosWgYCc4=
=roJ7
-----END PGP SIGNATURE-----
More information about the Gluster-users
mailing list