[Gluster-users] Empty info file preventing glusterd from starting

Wed May 31 10:17:14 UTC 2017

Hi Atin,

Could you please let us know any time plan for deliver of this patch.

Regards,
Abhishek

On Tue, May 9, 2017 at 6:37 PM, ABHISHEK PALIWAL <abhishpaliwal at gmail.com>
wrote:

> Actually it is very risky if it will reproduce in production thats is why
> I said it is on high priority as want to resolve it before production.
>
> On Tue, May 9, 2017 at 6:20 PM, Atin Mukherjee <amukherj at redhat.com>
> wrote:
>
>>
>>
>> On Tue, May 9, 2017 at 6:10 PM, ABHISHEK PALIWAL <abhishpaliwal at gmail.com
>> > wrote:
>>
>>> Hi Atin,
>>>
>>> Thanks for your reply.
>>>
>>>
>>> Its urgent because this error is very rarely reproducible we have seen
>>> this 2 3 times in our system till now.
>>>
>>> We have delivery in near future so that we want it asap. Please try to
>>> review it internally.
>>>
>>
>> I don't think your statements justified the reason of urgency as (a) you
>> have mentioned it to be *rarely* reproducible and (b) I am still waiting
>> for a real use case where glusterd will go through multiple restarts in a
>> loop?
>>
>>
>>> Regards,
>>> Abhishek
>>>
>>> On Tue, May 9, 2017 at 5:58 PM, Atin Mukherjee <amukherj at redhat.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, May 9, 2017 at 3:37 PM, ABHISHEK PALIWAL <
>>>> abhishpaliwal at gmail.com> wrote:
>>>>
>>>>> + Muthu-vingeshwaran
>>>>>
>>>>> On Tue, May 9, 2017 at 11:30 AM, ABHISHEK PALIWAL <
>>>>> abhishpaliwal at gmail.com> wrote:
>>>>>
>>>>>> Hi Atin/Team,
>>>>>>
>>>>>> We are using gluster-3.7.6 with setup of two brick and while restart
>>>>>> of system I have seen that the glusterd daemon is getting failed from start.
>>>>>>
>>>>>>
>>>>>> At the time of analyzing the logs from etc-glusterfs.......log file I
>>>>>> have received the below logs
>>>>>>
>>>>>>
>>>>>> [2017-05-06 03:33:39.798087] I [MSGID: 100030]
>>>>>> [glusterfsd.c:2348:main] 0-/usr/sbin/glusterd: Started running
>>>>>> /usr/sbin/glusterd version 3.7.6 (args: /usr/sbin/glusterd -p
>>>>>> /var/run/glusterd.pid --log-level INFO)
>>>>>> [2017-05-06 03:33:39.807859] I [MSGID: 106478] [glusterd.c:1350:init]
>>>>>> 0-management: Maximum allowed open file descriptors set to 65536
>>>>>> [2017-05-06 03:33:39.807974] I [MSGID: 106479] [glusterd.c:1399:init]
>>>>>> 0-management: Using /system/glusterd as working directory
>>>>>> [2017-05-06 03:33:39.826833] I [MSGID: 106513]
>>>>>> [glusterd-store.c:2047:glusterd_restore_op_version] 0-glusterd:
>>>>>> retrieved op-version: 30706
>>>>>> [2017-05-06 03:33:39.827515] E [MSGID: 106206]
>>>>>> [glusterd-store.c:2562:glusterd_store_update_volinfo] 0-management:
>>>>>> Failed to get next store iter
>>>>>> [2017-05-06 03:33:39.827563] E [MSGID: 106207]
>>>>>> [glusterd-store.c:2844:glusterd_store_retrieve_volume] 0-management:
>>>>>> Failed to update volinfo for c_glusterfs volume
>>>>>> [2017-05-06 03:33:39.827625] E [MSGID: 106201]
>>>>>> [glusterd-store.c:3042:glusterd_store_retrieve_volumes]
>>>>>> 0-management: Unable to restore volume: c_glusterfs
>>>>>> [2017-05-06 03:33:39.827722] E [MSGID: 101019]
>>>>>> [xlator.c:428:xlator_init] 0-management: Initialization of volume
>>>>>> 'management' failed, review your volfile again
>>>>>> [2017-05-06 03:33:39.827762] E [graph.c:322:glusterfs_graph_init]
>>>>>> 0-management: initializing translator failed
>>>>>> [2017-05-06 03:33:39.827784] E [graph.c:661:glusterfs_graph_activate]
>>>>>> 0-graph: init failed
>>>>>> [2017-05-06 03:33:39.828396] W [glusterfsd.c:1238:cleanup_and_exit]
>>>>>> (-->/usr/sbin/glusterd(glusterfs_volumes_init-0x1b0b8) [0x1000a648]
>>>>>> -->/usr/sbin/glusterd(glusterfs_process_volfp-0x1b210) [0x1000a4d8]
>>>>>> -->/usr/sbin/glusterd(cleanup_and_exit-0x1beac) [0x100097ac] ) 0-:
>>>>>> received signum (0), shutting down
>>>>>>
>>>>>
>>>> Abhishek,
>>>>
>>>> This patch needs to be thoroughly reviewed to ensure that it doesn't
>>>> cause any regression given this touches on the core store management
>>>> functionality of glusterd. AFAICT, we get into an empty info file only when
>>>> volume set operation is executed and in parallel one of the glusterd
>>>> instance in other nodes have been brought down and whole sequence of
>>>> operation happens in a loop. The test case through which you can get into
>>>> this situation is not something you'd hit in production. Please help me to
>>>> understand the urgency here.
>>>>
>>>> Also in one of the earlier thread, I did mention the workaround of this
>>>> issue back to Xin through http://lists.gluster.org/piper
>>>> mail/gluster-users/2017-January/029600.html
>>>>
>>>> "If you end up in having a 0 byte info file you'd need to copy the same info file from other node and put it there and restart glusterd"
>>>>
>>>>
>>>>>>
>>>>>> I have found one of the existing case is there and also solution
>>>>>> patch is available but the status of that patch in "cannot merge". Also the
>>>>>> "info" file is empty and "info.tmp" file present in "lib/glusterd/vol"
>>>>>> directory.
>>>>>>
>>>>>> Below is the link of the existing case.
>>>>>>
>>>>>> https://review.gluster.org/#/c/16279/5
>>>>>>
>>>>>> please let me know what is the plan of community to provide the
>>>>>> solution of this problem and in which version.
>>>>>>
>>>>>> Regards
>>>>>> Abhishek Paliwal
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Regards
>>>>> Abhishek Paliwal
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>> Regards
>>> Abhishek Paliwal
>>>
>>
>>
>
>
> --
>
>
>
>
> Regards
> Abhishek Paliwal
>

-- 

Regards
Abhishek Paliwal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170531/92834ce1/attachment.html>