[Gluster-users] question about info and info.tmp

Tue Jan 10 04:02:12 UTC 2017

Xin,

There is a patch [1] attempted to handle this case which is under review.

[1] http://review.gluster.org/#/c/16279

On Tue, Jan 10, 2017 at 7:15 AM, songxin <songxin_1980 at 126.com> wrote:

> Hi Atin,
>
> Have you fix this issue?
>
> Thanks,
> Xin
>
>
>
> 在 2016-11-25 15:46:25，"Atin Mukherjee" <amukherj at redhat.com> 写道：
>
>
>
> On Fri, Nov 25, 2016 at 1:14 PM, songxin <songxin_1980 at 126.com> wrote:
>
>> Hi Atin,
>> It seems that this workaround should be done by manual.
>> Is that right?
>> And even the files in bricks/* may be empty too.
>>
>
> Yes, that's right
>
>
>>
>> Do you have a workaround, which is implemented in glusterfs code？
>>
>
> Workaround is by nature manual and anything to be done through code should
> be considered as fix not work around :)
>
>
>>
>> Thanks,
>> Xin
>>
>>
>>
>>
>>
>> 在 2016-11-25 15:36:29，"Atin Mukherjee" <amukherj at redhat.com> 写道：
>>
>>
>>
>> On Fri, Nov 25, 2016 at 12:06 PM, songxin <songxin_1980 at 126.com> wrote:
>>
>>> Hi Atin,
>>> Do you mean that you have the workaround applicable now？
>>> Or it will take time to design the workaround？
>>>
>>> If you have workaround now, could you share it to me ?
>>>
>>
>> If you end up in having a 0 byte info file you'd need to copy the same
>> info file from other node and put it there and restart glusterd.
>>
>>
>>>
>>> Thanks,
>>> Xin,
>>>
>>>
>>>
>>>
>>>
>>> 在 2016-11-24 19:12:07，"Atin Mukherjee" <amukherj at redhat.com> 写道：
>>>
>>> Xin - I appreciate your patience. I'd need some more time to pick this
>>> item up from my backlog. I believe we have a workaround applicable here too.
>>>
>>> On Thu, 24 Nov 2016 at 14:24, songxin <songxin_1980 at 126.com> wrote:
>>>
>>>>
>>>>
>>>>
>>>> Hi Atin,
>>>> Actually, the glusterfs is used in my project.
>>>> And our test team find this issue.
>>>> So I want to make sure that whether you plan to fix it.
>>>> if you have plan I will wait you because your method shoud be better
>>>> than mine.
>>>>
>>>> Thanks,
>>>> Xin
>>>>
>>>>
>>>> 在 2016-11-21 10:00:36，"Atin Mukherjee" <atin.mukherjee83 at gmail.com> 写道：
>>>>
>>>> Hi Xin,
>>>>
>>>> I've not got a chance to look into it yet. delete stale volume function
>>>> is in place to take care of wiping off volume configuration data which has
>>>> been deleted from the cluster. However we need to revisit this code to see
>>>> if this function is anymore needed given we recently added a validation to
>>>> fail delete request if one of the glusterd is down. I'll get back to you on
>>>> this.
>>>>
>>>> On Mon, 21 Nov 2016 at 07:24, songxin <songxin_1980 at 126.com> wrote:
>>>>
>>>> Hi Atin,
>>>> Thank you for your support.
>>>>
>>>> And any conclusions about this issue?
>>>>
>>>> Thanks,
>>>> Xin
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 在 2016-11-16 20:59:05，"Atin Mukherjee" <amukherj at redhat.com> 写道：
>>>>
>>>>
>>>>
>>>> On Tue, Nov 15, 2016 at 1:53 PM, songxin <songxin_1980 at 126.com> wrote:
>>>>
>>>> ok, thank you.
>>>>
>>>>
>>>>
>>>>
>>>> 在 2016-11-15 16:12:34，"Atin Mukherjee" <amukherj at redhat.com> 写道：
>>>>
>>>>
>>>>
>>>> On Tue, Nov 15, 2016 at 12:47 PM, songxin <songxin_1980 at 126.com> wrote:
>>>>
>>>>
>>>> Hi Atin,
>>>>
>>>> I think the root cause is in the function glusterd_import_friend_volume
>>>> as below.
>>>>
>>>> int32_t
>>>> glusterd_import_friend_volume (dict_t *peer_data, size_t count)
>>>> {
>>>> ...
>>>>         ret = glusterd_volinfo_find (new_volinfo->volname,
>>>> &old_volinfo);
>>>>         if (0 == ret) {
>>>>                 (void) gd_check_and_update_rebalance_info
>>>> (old_volinfo,
>>>>                                                            n
>>>> ew_volinfo);
>>>>                 (void) glusterd_delete_stale_volume (old_volinfo,
>>>> new_volinfo);
>>>>         }
>>>> ...
>>>>         ret = glusterd_store_volinfo (new_volinfo,
>>>> GLUSTERD_VOLINFO_VER_AC_NONE);
>>>>         if (ret) {
>>>>                 gf_msg (this->name, GF_LOG_ERROR, 0,
>>>>                         GD_MSG_VOLINFO_STORE_FAIL, "Failed to store "
>>>>                         "volinfo for volume %s",
>>>> new_volinfo->volname);
>>>>                 goto out;
>>>>         }
>>>> ...
>>>> }
>>>>
>>>> glusterd_delete_stale_volume will remove the info and bricks/* and the
>>>> glusterd_store_volinfo will create the new one.
>>>> But if glusterd is killed before rename the info will is empty.
>>>>
>>>> And glusterd will start failed because the infois empty in the next
>>>> time you start the glusterd.
>>>>
>>>> Any idea， Atin？
>>>>
>>>>
>>>> Give me some time, will check it out, but reading at this analysis
>>>> looks very well possible if a volume is changed when the glusterd was done
>>>> on node a and when the same comes up during peer handshake we update the
>>>> volinfo and during that time glusterd goes down once again. I'll confirm it
>>>> by tomorrow.
>>>>
>>>>
>>>> I checked the code and it does look like you have got the right RCA for
>>>> the issue which you simulated through those two scripts. However this can
>>>> happen even when you try to create a fresh volume and while glusterd tries
>>>> to write the content into the store and goes down before renaming the
>>>> info.tmp file you get into the same situation.
>>>>
>>>> I'd really need to think through if this can be fixed. Suggestions are
>>>> always appreciated.
>>>>
>>>>
>>>>
>>>>
>>>> BTW, excellent work Xin!
>>>>
>>>>
>>>> Thanks,
>>>> Xin
>>>>
>>>>
>>>> 在 2016-11-15 12:07:05，"Atin Mukherjee" <amukherj at redhat.com> 写道：
>>>>
>>>>
>>>>
>>>> On Tue, Nov 15, 2016 at 8:58 AM, songxin <songxin_1980 at 126.com> wrote:
>>>>
>>>> Hi Atin,
>>>> I have some clues about this issue.
>>>> I could reproduce this issue use the scrip that mentioned in
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1308487 .
>>>>
>>>>
>>>> I really appreciate your help in trying to nail down this issue. While
>>>> I am at your email and going through the code to figure out the possible
>>>> cause for it, unfortunately I don't see any script in the attachment of the
>>>> bug.  Could you please cross check?
>>>>
>>>>
>>>>
>>>> After I added some debug print,which like below, in glusterd-store.c
>>>> and I found that the /var/lib/glusterd/vols/xxx/info and
>>>> /var/lib/glusterd/vols/xxx/bricks/* are removed.
>>>> But other files in /var/lib/glusterd/vols/xxx/ will not be remove.
>>>>
>>>> int32_t
>>>> glusterd_store_volinfo (glusterd_volinfo_t *volinfo,
>>>> glusterd_volinfo_ver_ac_t ac)
>>>> {
>>>>         int32_t                 ret = -1;
>>>>
>>>>         GF_ASSERT (volinfo)
>>>>
>>>>         ret = access("/var/lib/glusterd/vols/gv0/info", F_OK);
>>>>         if(ret < 0)
>>>>         {
>>>>                 gf_msg (THIS->name, GF_LOG_ERROR, 0, 0, "info is not
>>>> exit(%d)", errno);
>>>>         }
>>>>         else
>>>>         {
>>>>                 ret = stat("/var/lib/glusterd/vols/gv0/info", &buf);
>>>>                 if(ret < 0)
>>>>                 {
>>>>                         gf_msg (THIS->name, GF_LOG_ERROR, 0, 0, "stat
>>>> info error");
>>>>                 }
>>>>                 else
>>>>                 {
>>>>                         gf_msg (THIS->name, GF_LOG_ERROR, 0, 0, "info
>>>> size is %lu, inode num is %lu", buf.st_size, buf.st_ino);
>>>>                 }
>>>>         }
>>>>
>>>>         glusterd_perform_volinfo_version_action (volinfo, ac);
>>>>         ret = glusterd_store_create_volume_dir (volinfo);
>>>>         if (ret)
>>>>                 goto out;
>>>>
>>>> ...
>>>> }
>>>>
>>>> So it is easy to understand why  the info or
>>>> 10.32.1.144.-opt-lvmdir-c2-brick sometimes is empty.
>>>> It is becaue the info file is not exist, and it will be create by “fd =
>>>> open (path, O_RDWR | O_CREAT | O_APPEND, 0600);” in function
>>>> gf_store_handle_new.
>>>> And the info file is empty before rename.
>>>> So the info file is empty if glusterd shutdown before rename.
>>>>
>>>>
>>>>
>>>> My question is following.
>>>> 1.I did not find the point the info is removed.Could you tell me the
>>>> point where the info and /bricks/* are removed?
>>>> 2.why the file info and bricks/* is removed?But other files in var/lib/glusterd/vols/xxx/
>>>> are not be removed?
>>>>
>>>>
>>>> AFAIK, we never delete the info file and hence this file is opened with
>>>> O_APPEND flag. As I said I will go back and cross check the code once again.
>>>>
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Xin
>>>>
>>>>
>>>> 在 2016-11-11 20:34:05，"Atin Mukherjee" <amukherj at redhat.com> 写道：
>>>>
>>>>
>>>>
>>>> On Fri, Nov 11, 2016 at 4:00 PM, songxin <songxin_1980 at 126.com> wrote:
>>>>
>>>> Hi Atin,
>>>>
>>>> Thank you for your support.
>>>> Sincerely wait for your reply.
>>>>
>>>> By the way, could you make sure that the issue, file info is empty,
>>>> cause by rename is interrupted in kernel?
>>>>
>>>>
>>>> As per my RCA on that bug, it looked to be.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Xin
>>>>
>>>> 在 2016-11-11 15:49:02，"Atin Mukherjee" <amukherj at redhat.com> 写道：
>>>>
>>>>
>>>>
>>>> On Fri, Nov 11, 2016 at 1:15 PM, songxin <songxin_1980 at 126.com> wrote:
>>>>
>>>> Hi Atin,
>>>> Thank you for your reply.
>>>> Actually it is very difficult to reproduce because I don't know when there
>>>> was an ongoing commit happening.It is just a coincidence.
>>>> But I want to make sure the root cause.
>>>>
>>>>
>>>> I'll give it a another try and see if this situation can be
>>>> simulated/reproduced and will keep you posted.
>>>>
>>>>
>>>>
>>>> So I would be grateful if you could answer my questions below.
>>>>
>>>> You said that "This issue is hit at part of the negative testing where
>>>> while gluster volume set was executed at the same point of time glusterd in
>>>> another instance was brought down. In the faulty node we could see
>>>> /var/lib/glusterd/vols/<volname>info file been empty whereas the
>>>> info.tmp file has the correct contents." in comment.
>>>>
>>>> I have two questions for you.
>>>>
>>>> 1.Could you reproduce this issue by gluster volume set glusterd which was brought down?
>>>> 2.Could you be certain that this issue is cause by rename is interrupted in kernel?
>>>>
>>>> In my case there are two files, info and 10.32.1.144.-opt-lvmdir-c2-brick, are both empty.
>>>> But in my view only one rename can be running at the same time because of the big lock.
>>>> Why there are two files are empty?
>>>>
>>>>
>>>> Could rename("info.tmp", "info") and rename("xxx-brick.tmp", "xxx-brick") be running in two thread?
>>>>
>>>> Thanks,
>>>> Xin
>>>>
>>>>
>>>> 在 2016-11-11 15:27:03，"Atin Mukherjee" <amukherj at redhat.com> 写道：
>>>>
>>>>
>>>>
>>>> On Fri, Nov 11, 2016 at 12:38 PM, songxin <songxin_1980 at 126.com> wrote:
>>>>
>>>>
>>>> Hi Atin,
>>>> Thank you for your reply.
>>>>
>>>> As you said that the info file can only be changed in the glusterd_store_volinfo()
>>>> sequentially because of the big lock.
>>>>
>>>> I have found the similar issue as below that you mentioned.
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1308487
>>>>
>>>>
>>>> Great, so this is what I was actually trying to refer in my first email
>>>> that I saw a similar issue. Have you got a chance to look at
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1308487#c4 ? But in your
>>>> case, did you try to bring down glusterd when there was an ongoing commit
>>>> happening?
>>>>
>>>>
>>>>
>>>> You said that "This issue is hit at part of the negative testing where
>>>> while gluster volume set was executed at the same point of time glusterd in
>>>> another instance was brought down. In the faulty node we could see
>>>> /var/lib/glusterd/vols/<volname>info file been empty whereas the
>>>> info.tmp file has the correct contents." in comment.
>>>>
>>>> I have two questions for you.
>>>>
>>>> 1.Could you reproduce this issue by gluster volume set glusterd which was brought down?
>>>> 2.Could you be certain that this issue is cause by rename is interrupted in kernel?
>>>>
>>>> In my case there are two files, info and 10.32.1.144.-opt-lvmdir-c2-brick, are both empty.
>>>> But in my view only one rename can be running at the same time because of the big lock.
>>>> Why there are two files are empty?
>>>>
>>>>
>>>> Could rename("info.tmp", "info") and rename("xxx-brick.tmp", "xxx-brick") be running in two thread?
>>>>
>>>> Thanks,
>>>> Xin
>>>>
>>>>
>>>>
>>>>
>>>> 在 2016-11-11 14:36:40，"Atin Mukherjee" <amukherj at redhat.com> 写道：
>>>>
>>>>
>>>>
>>>> On Fri, Nov 11, 2016 at 8:33 AM, songxin <songxin_1980 at 126.com> wrote:
>>>>
>>>> Hi Atin,
>>>>
>>>> Thank you for your reply.
>>>> I have two questions for you.
>>>>
>>>> 1.Are the two files info and info.tmp are only to be created or changed
>>>> in function glusterd_store_volinfo()? I did not find other point in which
>>>> the two file are changed.
>>>>
>>>>
>>>> If we are talking about info file volume then yes, the mentioned
>>>> function actually takes care of it.
>>>>
>>>>
>>>> 2.I found that glusterd_store_volinfo() will be call in many point by
>>>> glusterd.Is there a problem of thread synchronization?If so, one thread may
>>>> open a same file info.tmp using O_TRUNC flag when another thread is
>>>> writing the info,tmp.Could this case happen?
>>>>
>>>>
>>>>  In glusterd threads are big lock protected and I don't see a
>>>> possibility (theoretically) to have two glusterd_store_volinfo () calls at
>>>> a given point of time.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Xin
>>>>
>>>>
>>>> At 2016-11-10 21:41:06, "Atin Mukherjee" <amukherj at redhat.com> wrote:
>>>>
>>>> Did you run out of disk space by any chance? AFAIK, the code is like we
>>>> write new stuffs to .tmp file and rename it back to the original file. In
>>>> case of a disk space issue I expect both the files to be of non zero size.
>>>> But having said that I vaguely remember a similar issue (in the form of a
>>>> bug or an email) landed up once but we couldn't reproduce it, so something
>>>> is wrong with the atomic update here is what I guess. I'll be glad if you
>>>> have a reproducer for the same and then we can dig into it further.
>>>>
>>>> On Thu, Nov 10, 2016 at 1:32 PM, songxin <songxin_1980 at 126.com> wrote:
>>>>
>>>> Hi,
>>>> When I start the glusterd some error happened.
>>>> And the log is following.
>>>>
>>>> [2016-11-08 07:58:34.989365] I [MSGID: 100030] [glusterfsd.c:2318:main]
>>>> 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.7.6
>>>> (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
>>>> [2016-11-08 07:58:34.998356] I [MSGID: 106478] [glusterd.c:1350:init]
>>>> 0-management: Maximum allowed open file descriptors set to 65536
>>>> [2016-11-08 07:58:35.000667] I [MSGID: 106479] [glusterd.c:1399:init]
>>>> 0-management: Using /system/glusterd as working directory
>>>> [2016-11-08 07:58:35.024508] I [MSGID: 106514]
>>>> [glusterd-store.c:2075:glusterd_restore_op_version] 0-management:
>>>> Upgrade detected. Setting op-version to minimum : 1
>>>> *[2016-11-08 07:58:35.025356] E [MSGID: 106206]
>>>> [glusterd-store.c:2562:glusterd_store_update_volinfo] 0-management: Failed
>>>> to get next store iter *
>>>> *[2016-11-08 07:58:35.025401] E [MSGID: 106207]
>>>> [glusterd-store.c:2844:glusterd_store_retrieve_volume] 0-management: Failed
>>>> to update volinfo for c_glusterfs volume *
>>>> *[2016-11-08 07:58:35.025463] E [MSGID: 106201]
>>>> [glusterd-store.c:3042:glusterd_store_retrieve_volumes] 0-management:
>>>> Unable to restore volume: c_glusterfs *
>>>> *[2016-11-08 07:58:35.025544] E [MSGID: 101019]
>>>> [xlator.c:428:xlator_init] 0-management: Initialization of volume
>>>> 'management' failed, review your volfile again *
>>>> *[2016-11-08 07:58:35.025582] E [graph.c:322:glusterfs_graph_init]
>>>> 0-management: initializing translator failed *
>>>> *[2016-11-08 07:58:35.025629] E [graph.c:661:glusterfs_graph_activate]
>>>> 0-graph: init failed *
>>>> [2016-11-08 07:58:35.026109] W [glusterfsd.c:1236:cleanup_and_exit]
>>>> (-->/usr/sbin/glusterd(glusterfs_volumes_init-0x1b260) [0x1000a718]
>>>> -->/usr/sbin/glusterd(glusterfs_process_volfp-0x1b3b8) [0x1000a5a8]
>>>> -->/usr/sbin/glusterd(cleanup_and_exit-0x1c02c) [0x100098bc] ) 0-:
>>>> received signum (0), shutting down
>>>>
>>>>
>>>> And then I found that the size of vols/volume_name/info is 0.It cause
>>>> glusterd shutdown.
>>>> But I found that vols/volume_name_info.tmp is not 0.
>>>> And I found that there is a brick file vols/volume_name/bricks/xxxx.brick
>>>> is 0, but vols/volume_name/bricks/xxxx.brick.tmp is not 0.
>>>>
>>>> I read the function code glusterd_store_volinfo () in glusterd-store.c
>>>> .
>>>> I know that the info.tmp will be rename to info in function
>>>> glusterd_store_volume_atomic_update().
>>>>
>>>> But my question is that why the info file is 0 but info.tmp is not 0.
>>>>
>>>>
>>>> Thanks,
>>>> Xin
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ~ Atin (atinm)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ~ Atin (atinm)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ~ Atin (atinm)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ~ Atin (atinm)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ~ Atin (atinm)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ~ Atin (atinm)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ~ Atin (atinm)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ~ Atin (atinm)
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>> --
>>>> --Atin
>>>>
>>>> --
>>> - Atin (atinm)
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>>
>> ~ Atin (atinm)
>>
>>
>>
>>
>>
>
>
>
> --
>
> ~ Atin (atinm)
>
>
>
>
>

-- 

~ Atin (atinm)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20170110/2a88e228/attachment.html>