[Gluster-users] Self-heal Problems with gluster and nfs
Norman Mähler
n.maehler at uni-assist.de
Tue Jul 8 15:05:07 UTC 2014
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Am 08.07.2014 16:26, schrieb Pranith Kumar Karampuri:
>
> On 07/08/2014 06:14 PM, Norman Mähler wrote:
>
>
> Am 08.07.2014 14:34, schrieb Pranith Kumar Karampuri:
>>>> On 07/08/2014 05:23 PM, Norman Mähler wrote:
>>>>
>>>>
>>>> Am 08.07.2014 13:24, schrieb Pranith Kumar Karampuri:
>>>>>>> On 07/08/2014 04:49 PM, Norman Mähler wrote:
>>>>>>>
>>>>>>>
>>>>>>> Am 08.07.2014 13:02, schrieb Pranith Kumar Karampuri:
>>>>>>>>>> On 07/08/2014 04:23 PM, Norman Mähler wrote: Of
>>>>>>>>>> course:
>>>>>>>>>>
>>>>>>>>>> The configuration is:
>>>>>>>>>>
>>>>>>>>>> Volume Name: gluster_dateisystem Type: Replicate
>>>>>>>>>> Volume ID: 2766695c-b8aa-46fd-b84d-4793b7ce847a
>>>>>>>>>> Status: Started Number of Bricks: 1 x 2 = 2
>>>>>>>>>> Transport-type: tcp Bricks: Brick1:
>>>>>>>>>> filecluster1:/mnt/raid Brick2:
>>>>>>>>>> filecluster2:/mnt/raid Options Reconfigured:
>>>>>>>>>> nfs.enable-ino32: on performance.cache-size:
>>>>>>>>>> 512MB diagnostics.brick-log-level: WARNING
>>>>>>>>>> diagnostics.client-log-level: WARNING
>>>>>>>>>> nfs.addr-namelookup: off
>>>>>>>>>> performance.cache-refresh-timeout: 60
>>>>>>>>>> performance.cache-max-file-size: 100MB
>>>>>>>>>> performance.write-behind-window-size: 10MB
>>>>>>>>>> performance.io-thread-count: 18
>>>>>>>>>> performance.stat-prefetch: off
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The file count in xattrop is
>>>>>>>>>>> Do "gluster volume set gluster_dateisystem
>>>>>>>>>>> cluster.self-heal-daemon off" This should stop
>>>>>>>>>>> all the entry self-heals and should also get
>>>>>>>>>>> the CPU usage low. When you don't have a lot of
>>>>>>>>>>> activity you can enable it again using "gluster
>>>>>>>>>>> volume set gluster_dateisystem
>>>>>>>>>>> cluster.self-heal-daemon on" If it doesn't get
>>>>>>>>>>> the CPU down execute "gluster volume set
>>>>>>>>>>> gluster_dateisystem cluster.entry-self-heal
>>>>>>>>>>> off". Let me know how it goes. Pranith
>>>>>>> Thanks for your help so far but stopping the self heal
>>>>>>> deamon and the self heal machanism itself did not
>>>>>>> improve the situation.
>>>>>>>
>>>>>>> Do you have further suggestions? Is it simply the load
>>>>>>> on the system? NFS could handle it easily before...
>>>>>>>> Is it at least a little better or no improvement at
>>>>>>>> all?
>>>> After waiting half an hour more the system load is falling
>>>> steadily. At the moment it is around 10 which is not good but
>>>> a lot better than before. There are no messages in the
>>>> nfs.log and the glusterfshd.log anymore. In the brick log
>>>> there are still "inode not found - anonymous fd creation
>>>> failed" messages.
>>>>> They should go away once the heal is complete and the
>>>>> system is back to normal. I believe you have directories
>>>>> with lots of files? When can you start the healing process
>>>>> again (i.e. window where there won't be a lot of activity
>>>>> and you can afford the high CPU usage) so that things will
>>>>> be back to normal?
> We have got a window at night, but by now our admin decided to
> copy the files back to an nfs system, because even with diabled
> self heal our colleagues can not do their work with such a slow
> system.
>> This performance problem is addressed in 3.6 with a design change
>> in replication module in glusterfs.
Ok, this sounds good.
>
> After that we may be able to start again with a new system. We are
> considering taking another network cluster sytem, but we are not
> quite sure what to do.
>
>> Things should be smooth again after the self-heals are complete
>> IMO. What is the size of volume? How many files approximately? It
>> would be nice if you could give the complete logs at least later
>> to help in analyzing.
There are about 250 GB in approximately 650000 files on the volumes.
I will send you an additional Mail with links to the complete logs later.
Norman
>
>> Pranith
>
>
> There are a lot of small files and lock files in these
> directories.
>
> Norman
>
>
>>>>> Pranith
>>>>
>>>>
>>>> Norman
>>>>
>>>>>>>> Pranith
>>>>>>> Norman
>>>>>>>
>>>>>>>>>> Brick 1: 2706 Brick 2: 2687
>>>>>>>>>>
>>>>>>>>>> Norman
>>>>>>>>>>
>>>>>>>>>> Am 08.07.2014 12:28, schrieb Pranith Kumar
>>>>>>>>>> Karampuri:
>>>>>>>>>>>>> It seems like entry self-heal is happening.
>>>>>>>>>>>>> What is the volume configuration? Could you
>>>>>>>>>>>>> give ls
>>>>>>>>>>>>> <brick-path>/.glusterfs/indices/xattrop |
>>>>>>>>>>>>> wc -l Count for all the bricks
>>>>>>>>>>>>>
>>>>>>>>>>>>> Pranith On 07/08/2014 03:36 PM, Norman
>>>>>>>>>>>>> Mähler wrote:
>>>>>>>>>>>>>> Hello Pranith,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> here are the logs. I only giv you the
>>>>>>>>>>>>>> last 3000 lines, because the nfs.log from
>>>>>>>>>>>>>> today is already 550 MB.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There are the standard files from a user
>>>>>>>>>>>>>> home on the gluster system. All you
>>>>>>>>>>>>>> normally find in a user home. Config
>>>>>>>>>>>>>> files, firefox and thunderbird files
>>>>>>>>>>>>>> etc.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks in advance Norman
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 08.07.2014 11:46, schrieb Pranith
>>>>>>>>>>>>>> Kumar Karampuri:
>>>>>>>>>>>>>>> On 07/08/2014 02:46 PM, Norman Mähler
>>>>>>>>>>>>>>> wrote: Hello again,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> i could resolve the self heal problems
>>>>>>>>>>>>>>> with the missing gfid files on one of
>>>>>>>>>>>>>>> the servers by deleting the gfid files
>>>>>>>>>>>>>>> on the other server.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> They had a link count of 1 which means
>>>>>>>>>>>>>>> that the file on that the gfid pointed
>>>>>>>>>>>>>>> was already deleted.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We have still these errors
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [2014-07-08 09:09:43.564488] W
>>>>>>>>>>>>>>> [client-rpc-fops.c:2469:client3_3_link_cbk]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
0-gluster_dateisystem-client-0: remote
>>>>>>>>>>>>>>> operation failed: File exists
>>>>>>>>>>>>>>> (00000000-0000-0000-0000-000000000000
>>>>>>>>>>>>>>> ->
>>>>>>>>>>>>>>> <gfid:b338b09e-2577-45b3-82bd-032f954dd083>/lock)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>
which appear in the glusterfshd.log and these
>>>>>>>>>>>>>>> [2014-07-08 09:13:31.198462] E
>>>>>>>>>>>>>>> [client-rpc-fops.c:5179:client3_3_inodelk]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/cluster/replicate.so(+0x466b8)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
> [0x7f5d29d4e6b8]
>>>>>>>>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/cluster/replicate.so(afr_lock_blocking+0x844)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>
[0x7f5d29d4e2e4]
>>>>>>>>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/protocol/client.so(client_inodelk+0x99)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>
[0x7f5d29f8b3c9]))) 0-: Assertion failed: 0
>>>>>>>>>>>>>>> from the nfs.log.
>>>>>>>>>>>>>>>> Could you attach mount (nfs.log) and
>>>>>>>>>>>>>>>> brick logs please. Do you have files
>>>>>>>>>>>>>>>> with lots of hard-links? Pranith
>>>>>>>>>>>>>>> I think the error messages belong
>>>>>>>>>>>>>>> together but I don't have any idea how
>>>>>>>>>>>>>>> to solve them.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Still we have got a very bad
>>>>>>>>>>>>>>> performance issue. The system load on
>>>>>>>>>>>>>>> the servers is above 20 and nearly no
>>>>>>>>>>>>>>> one is able to work in here on a
>>>>>>>>>>>>>>> client...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hope for help Norman
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Am 07.07.2014 15:39, schrieb Pranith
>>>>>>>>>>>>>>> Kumar Karampuri:
>>>>>>>>>>>>>>>>>> On 07/07/2014 06:58 PM, Norman
>>>>>>>>>>>>>>>>>> Mähler wrote: Dear community,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> we have got some serious problems
>>>>>>>>>>>>>>>>>> with our Gluster installation.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Here is the setting:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We have got 2 bricks (version
>>>>>>>>>>>>>>>>>> 3.4.4) on a debian 7.5, one of
>>>>>>>>>>>>>>>>>> them with an nfs export. There
>>>>>>>>>>>>>>>>>> are about 120 clients connecting
>>>>>>>>>>>>>>>>>> to the exported nfs. These
>>>>>>>>>>>>>>>>>> clients are thin clients reading
>>>>>>>>>>>>>>>>>> and writing their Linux home
>>>>>>>>>>>>>>>>>> directories from the exported
>>>>>>>>>>>>>>>>>> nfs.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We want to change the access of
>>>>>>>>>>>>>>>>>> these clients one by one to
>>>>>>>>>>>>>>>>>> access via gluster client.
>>>>>>>>>>>>>>>>>>> I did not understand what you
>>>>>>>>>>>>>>>>>>> meant by this. Are you moving
>>>>>>>>>>>>>>>>>>> to glusterfs-fuse based
>>>>>>>>>>>>>>>>>>> mounts?
>>>>>>>>>>>>>>>>>> Here are our problems:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In the moment we have got two
>>>>>>>>>>>>>>>>>> types of error messages which
>>>>>>>>>>>>>>>>>> come in burts to our
>>>>>>>>>>>>>>>>>> glusterfshd.log
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [2014-07-07 13:10:21.572487] W
>>>>>>>>>>>>>>>>>> [client-rpc-fops.c:1538:client3_3_inodelk_cbk]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>
0-gluster_dateisystem-client-1: remote operation
>>>>>>>>>>>>>>>>>> failed: No such file or
>>>>>>>>>>>>>>>>>> directory [2014-07-07
>>>>>>>>>>>>>>>>>> 13:10:21.573448] W
>>>>>>>>>>>>>>>>>> [client-rpc-fops.c:471:client3_3_open_cbk]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>
0-gluster_dateisystem-client-1: remote
>>>>>>>>>>>>>>>>>> operation failed: No such file
>>>>>>>>>>>>>>>>>> or directory. Path:
>>>>>>>>>>>>>>>>>> <gfid:b0c4f78a-249f-4db7-9d5b-0902c7d8f6cc>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>
(00000000-0000-0000-0000-000000000000)
>>>>>>>>>>>>>>>>>> [2014-07-07 13:10:21.573468] E
>>>>>>>>>>>>>>>>>> [afr-self-heal-data.c:1270:afr_sh_data_open_cbk]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>
0-gluster_dateisystem-replicate-0: open of
>>>>>>>>>>>>>>>>>> <gfid:b0c4f78a-249f-4db7-9d5b-0902c7d8f6cc>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>
failed on child gluster_dateisystem-client-1
>>>>>>>>>>>>>>>>>> (No such file or directory)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This looks like a missing gfid
>>>>>>>>>>>>>>>>>> file on one of the bricks. I
>>>>>>>>>>>>>>>>>> looked it up and yes the file is
>>>>>>>>>>>>>>>>>> missing on the second brick.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We got these messages the other
>>>>>>>>>>>>>>>>>> way round, too (missing on
>>>>>>>>>>>>>>>>>> client-0 and the first brick).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Is it possible to repair this one
>>>>>>>>>>>>>>>>>> by copying the gfid file to the
>>>>>>>>>>>>>>>>>> brick where it was missing? Or
>>>>>>>>>>>>>>>>>> ist there another way to repair
>>>>>>>>>>>>>>>>>> it?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The second message is
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [2014-07-07 13:06:35.948738] W
>>>>>>>>>>>>>>>>>> [client-rpc-fops.c:2469:client3_3_link_cbk]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>
0-gluster_dateisystem-client-1: remote
>>>>>>>>>>>>>>>>>> operation failed: File exists
>>>>>>>>>>>>>>>>>> (00000000-0000-0000-0000-000000000000
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
- ->
>>>>>>>>>>>>>>>>>> <gfid:aae47250-8f69-480c-ac75-2da2f4d21d7a>/lock)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>
and I really do not know what to do with this
>>>>>>>>>>>>>>>>>> one...
>>>>>>>>>>>>>>>>>>> Did any of the bricks went
>>>>>>>>>>>>>>>>>>> offline and came back online?
>>>>>>>>>>>>>>>>>>> Pranith
>>>>>>>>>>>>>>>>>> I am really looking forward to
>>>>>>>>>>>>>>>>>> your help because this is an
>>>>>>>>>>>>>>>>>> active system and the system load
>>>>>>>>>>>>>>>>>> on the nfs brick is about 25
>>>>>>>>>>>>>>>>>> (!!)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks in advance! Norman
>>>>>>>>>>>>>>>>>> Maehler
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>
Gluster-users mailing list
>>>>>>>>>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>>>>>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>
>>>>>>>>>>>>>>>>>>>
- -- Mit freundlichen Grüßen,
>>>>>>>>>> Norman Mähler
>>>>>>>>>>
>>>>>>>>>> Bereichsleiter IT-Hochschulservice uni-assist e.
>>>>>>>>>> V. Geneststr. 5 Aufgang H, 3. Etage 10829 Berlin
>>>>>>>>>>
>>>>>>>>>> Tel.: 030-66644382 n.maehler at uni-assist.de
>>>>>>>>>>
>>>>>>> -- Mit freundlichen Grüßen,
>>>>>>>
>>>>>>> Norman Mähler
>>>>>>>
>>>>>>> Bereichsleiter IT-Hochschulservice uni-assist e. V.
>>>>>>> Geneststr. 5 Aufgang H, 3. Etage 10829 Berlin
>>>>>>>
>>>>>>> Tel.: 030-66644382 n.maehler at uni-assist.de
>>>>>>>
>>>> -- Mit freundlichen Grüßen,
>>>>
>>>> Norman Mähler
>>>>
>>>> Bereichsleiter IT-Hochschulservice uni-assist e. V.
>>>> Geneststr. 5 Aufgang H, 3. Etage 10829 Berlin
>>>>
>>>> Tel.: 030-66644382 n.maehler at uni-assist.de
>>>>
> -- Mit freundlichen Grüßen,
>
> Norman Mähler
>
> Bereichsleiter IT-Hochschulservice uni-assist e. V. Geneststr. 5
> Aufgang H, 3. Etage 10829 Berlin
>
> Tel.: 030-66644382 n.maehler at uni-assist.de
>
- --
Mit freundlichen Grüßen,
Norman Mähler
Bereichsleiter IT-Hochschulservice
uni-assist e. V.
Geneststr. 5
Aufgang H, 3. Etage
10829 Berlin
Tel.: 030-66644382
n.maehler at uni-assist.de
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQEcBAEBAgAGBQJTvAijAAoJEB810LSP8y+Ri4MIAIk3rPF1FVVL/3R7Bp3lEpV/
5BPb3A8htSkg+Zq8udmWJBNauXyEJ3LOt4XsZIb/9GYZnD6wWWyIQJrq8cz3b67H
MXsTk4wnYzgCc8wDEPVjjz5UgRCA3rSoME1W8cZQmNfA3H9mLVBwh3/jQu9Av6LG
qpVkMPEwH6ln7xjh1UnzEJOWPmn45Q/shqo15fAMcredF7rXZ95u8awlfu9d6zR2
mxruBlnXTLe5xO+RHGR8hFfzS9eZI5XNhE8gz3bgRu0wiyShu4gt24GloxjwSx/N
G1/2vtNBBmabyISSlsjWlws0PjOznRzZcs8IFitQ1pE59sGCAUEVEr5HLyQkuQ0=
=N2gT
-----END PGP SIGNATURE-----
More information about the Gluster-users
mailing list