[Gluster-users] Self-heal Problems with gluster and nfs

Pranith Kumar Karampuri pkarampu at redhat.com
Tue Jul 8 10:28:09 UTC 2014


It seems like entry self-heal is happening. What is the volume 
configuration? Could you give
ls <brick-path>/.glusterfs/indices/xattrop | wc -l
Count for all the bricks

Pranith
On 07/08/2014 03:36 PM, Norman Mähler wrote:
> Hello Pranith,
>
> here are the logs. I only giv you the last 3000 lines, because the
> nfs.log from today is already 550 MB.
>
> There are the standard files from a user home on the gluster system. All
> you normally find in a user home. Config files, firefox and thunderbird
> files etc.
>
> Thanks in advance
> Norman
>
> Am 08.07.2014 11:46, schrieb Pranith Kumar Karampuri:
>> On 07/08/2014 02:46 PM, Norman Mähler wrote:
>> Hello again,
>>
>> i could resolve the self heal problems with the missing gfid files on
>> one of the servers by deleting the gfid files on the other server.
>>
>> They had a link count of 1 which means that the file on that the gfid
>> pointed was already deleted.
>>
>>
>> We have still these errors
>>
>> [2014-07-08 09:09:43.564488] W
>> [client-rpc-fops.c:2469:client3_3_link_cbk]
>> 0-gluster_dateisystem-client-0: remote operation failed: File exists
>> (00000000-0000-0000-0000-000000000000 ->
>> <gfid:b338b09e-2577-45b3-82bd-032f954dd083>/lock)
>>
>> which appear in the glusterfshd.log and these
>>
>> [2014-07-08 09:13:31.198462] E
>> [client-rpc-fops.c:5179:client3_3_inodelk]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/cluster/replicate.so(+0x466b8)
>>
>> [0x7f5d29d4e6b8]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/cluster/replicate.so(afr_lock_blocking+0x844)
>>
>> [0x7f5d29d4e2e4]
>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/protocol/client.so(client_inodelk+0x99)
>>
>> [0x7f5d29f8b3c9]))) 0-: Assertion failed: 0
>>
>> from the nfs.log.
>>> Could you attach mount (nfs.log) and brick logs please.
>>> Do you have files with lots of hard-links?
>>> Pranith
>> I think the error messages belong together but I don't have any idea
>> how to solve them.
>>
>> Still we have got a very bad performance issue. The system load on the
>> servers is above 20 and nearly no one is able to work in here on a
>> client...
>>
>> Hope for help
>> Norman
>>
>>
>> Am 07.07.2014 15:39, schrieb Pranith Kumar Karampuri:
>>>>> On 07/07/2014 06:58 PM, Norman Mähler wrote: Dear community,
>>>>>
>>>>> we have got some serious problems with our Gluster installation.
>>>>>
>>>>> Here is the setting:
>>>>>
>>>>> We have got 2 bricks (version 3.4.4) on a debian 7.5, one of them
>>>>> with an nfs export. There are about 120 clients connecting to the
>>>>> exported nfs. These clients are thin clients reading and writing
>>>>> their Linux home directories from the exported nfs.
>>>>>
>>>>> We want to change the access of these clients one by one to access
>>>>> via gluster client.
>>>>>> I did not understand what you meant by this. Are you moving to
>>>>>> glusterfs-fuse based mounts?
>>>>>
>>>>> Here are our problems:
>>>>>
>>>>> In the moment we have got two types of error messages which come
>>>>> in burts to our glusterfshd.log
>>>>>
>>>>> [2014-07-07 13:10:21.572487] W
>>>>> [client-rpc-fops.c:1538:client3_3_inodelk_cbk]
>>>>> 0-gluster_dateisystem-client-1: remote operation failed: No such
>>>>> file or directory [2014-07-07 13:10:21.573448] W
>>>>> [client-rpc-fops.c:471:client3_3_open_cbk]
>>>>> 0-gluster_dateisystem-client-1: remote operation failed: No such
>>>>> file or directory. Path:
>>>>> <gfid:b0c4f78a-249f-4db7-9d5b-0902c7d8f6cc>
>>>>> (00000000-0000-0000-0000-000000000000) [2014-07-07 13:10:21.573468]
>>>>> E [afr-self-heal-data.c:1270:afr_sh_data_open_cbk]
>>>>> 0-gluster_dateisystem-replicate-0: open of
>>>>> <gfid:b0c4f78a-249f-4db7-9d5b-0902c7d8f6cc> failed on child
>>>>> gluster_dateisystem-client-1 (No such file or directory)
>>>>>
>>>>>
>>>>> This looks like a missing gfid file on one of the bricks. I looked
>>>>> it up and yes the file is missing on the second brick.
>>>>>
>>>>> We got these messages the other way round, too (missing on
>>>>> client-0 and the first brick).
>>>>>
>>>>> Is it possible to repair this one by copying the gfid file to the
>>>>> brick where it was missing? Or ist there another way to repair it?
>>>>>
>>>>>
>>>>> The second message is
>>>>>
>>>>> [2014-07-07 13:06:35.948738] W
>>>>> [client-rpc-fops.c:2469:client3_3_link_cbk]
>>>>> 0-gluster_dateisystem-client-1: remote operation failed: File
>>>>> exists (00000000-0000-0000-0000-000000000000 ->
>>>>> <gfid:aae47250-8f69-480c-ac75-2da2f4d21d7a>/lock)
>>>>>
>>>>> and I really do not know what to do with this one...
>>>>>> Did any of the bricks went offline and came back online?
>>>>>> Pranith
>>>>> I am really looking forward to your help because this is an active
>>>>> system and the system load on the nfs brick is about 25 (!!)
>>>>>
>>>>> Thanks in advance! Norman Maehler
>>>>>
>>>>>
>>>>>> _______________________________________________ Gluster-users
>>>>>> mailing list Gluster-users at gluster.org
>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users




More information about the Gluster-users mailing list