[Gluster-users] question about info and info.tmp

Atin Mukherjee amukherj at redhat.com
Fri Nov 25 07:36:29 UTC 2016


On Fri, Nov 25, 2016 at 12:06 PM, songxin <songxin_1980 at 126.com> wrote:

> Hi Atin,
> Do you mean that you have the workaround applicable now?
> Or it will take time to design the workaround?
>
> If you have workaround now, could you share it to me ?
>

If you end up in having a 0 byte info file you'd need to copy the same info
file from other node and put it there and restart glusterd.


>
> Thanks,
> Xin,
>
>
>
>
>
> 在 2016-11-24 19:12:07,"Atin Mukherjee" <amukherj at redhat.com> 写道:
>
> Xin - I appreciate your patience. I'd need some more time to pick this
> item up from my backlog. I believe we have a workaround applicable here too.
>
> On Thu, 24 Nov 2016 at 14:24, songxin <songxin_1980 at 126.com> wrote:
>
>>
>>
>>
>> Hi Atin,
>> Actually, the glusterfs is used in my project.
>> And our test team find this issue.
>> So I want to make sure that whether you plan to fix it.
>> if you have plan I will wait you because your method shoud be better than
>> mine.
>>
>> Thanks,
>> Xin
>>
>>
>> 在 2016-11-21 10:00:36,"Atin Mukherjee" <atin.mukherjee83 at gmail.com> 写道:
>>
>> Hi Xin,
>>
>> I've not got a chance to look into it yet. delete stale volume function
>> is in place to take care of wiping off volume configuration data which has
>> been deleted from the cluster. However we need to revisit this code to see
>> if this function is anymore needed given we recently added a validation to
>> fail delete request if one of the glusterd is down. I'll get back to you on
>> this.
>>
>> On Mon, 21 Nov 2016 at 07:24, songxin <songxin_1980 at 126.com> wrote:
>>
>> Hi Atin,
>> Thank you for your support.
>>
>> And any conclusions about this issue?
>>
>> Thanks,
>> Xin
>>
>>
>>
>>
>>
>> 在 2016-11-16 20:59:05,"Atin Mukherjee" <amukherj at redhat.com> 写道:
>>
>>
>>
>> On Tue, Nov 15, 2016 at 1:53 PM, songxin <songxin_1980 at 126.com> wrote:
>>
>> ok, thank you.
>>
>>
>>
>>
>> 在 2016-11-15 16:12:34,"Atin Mukherjee" <amukherj at redhat.com> 写道:
>>
>>
>>
>> On Tue, Nov 15, 2016 at 12:47 PM, songxin <songxin_1980 at 126.com> wrote:
>>
>>
>> Hi Atin,
>>
>> I think the root cause is in the function glusterd_import_friend_volume
>> as below.
>>
>> int32_t
>> glusterd_import_friend_volume (dict_t *peer_data, size_t count)
>> {
>> ...
>>         ret = glusterd_volinfo_find (new_volinfo->volname, &old_volinfo);
>>         if (0 == ret) {
>>                 (void) gd_check_and_update_rebalance_info (old_volinfo,
>>                                                            new_volinfo);
>>                 (void) glusterd_delete_stale_volume (old_volinfo,
>> new_volinfo);
>>         }
>> ...
>>         ret = glusterd_store_volinfo (new_volinfo,
>> GLUSTERD_VOLINFO_VER_AC_NONE);
>>         if (ret) {
>>                 gf_msg (this->name, GF_LOG_ERROR, 0,
>>                         GD_MSG_VOLINFO_STORE_FAIL, "Failed to store "
>>                         "volinfo for volume %s", new_volinfo->volname);
>>                 goto out;
>>         }
>> ...
>> }
>>
>> glusterd_delete_stale_volume will remove the info and bricks/* and the
>> glusterd_store_volinfo will create the new one.
>> But if glusterd is killed before rename the info will is empty.
>>
>> And glusterd will start failed because the infois empty in the next time
>> you start the glusterd.
>>
>> Any idea, Atin?
>>
>>
>> Give me some time, will check it out, but reading at this analysis looks
>> very well possible if a volume is changed when the glusterd was done on
>> node a and when the same comes up during peer handshake we update the
>> volinfo and during that time glusterd goes down once again. I'll confirm it
>> by tomorrow.
>>
>>
>> I checked the code and it does look like you have got the right RCA for
>> the issue which you simulated through those two scripts. However this can
>> happen even when you try to create a fresh volume and while glusterd tries
>> to write the content into the store and goes down before renaming the
>> info.tmp file you get into the same situation.
>>
>> I'd really need to think through if this can be fixed. Suggestions are
>> always appreciated.
>>
>>
>>
>>
>> BTW, excellent work Xin!
>>
>>
>> Thanks,
>> Xin
>>
>>
>> 在 2016-11-15 12:07:05,"Atin Mukherjee" <amukherj at redhat.com> 写道:
>>
>>
>>
>> On Tue, Nov 15, 2016 at 8:58 AM, songxin <songxin_1980 at 126.com> wrote:
>>
>> Hi Atin,
>> I have some clues about this issue.
>> I could reproduce this issue use the scrip that mentioned in
>> https://bugzilla.redhat.com/show_bug.cgi?id=1308487 .
>>
>>
>> I really appreciate your help in trying to nail down this issue. While I
>> am at your email and going through the code to figure out the possible
>> cause for it, unfortunately I don't see any script in the attachment of the
>> bug.  Could you please cross check?
>>
>>
>>
>> After I added some debug print,which like below, in glusterd-store.c and
>> I found that the /var/lib/glusterd/vols/xxx/info and
>> /var/lib/glusterd/vols/xxx/bricks/* are removed.
>> But other files in /var/lib/glusterd/vols/xxx/ will not be remove.
>>
>> int32_t
>> glusterd_store_volinfo (glusterd_volinfo_t *volinfo,
>> glusterd_volinfo_ver_ac_t ac)
>> {
>>         int32_t                 ret = -1;
>>
>>         GF_ASSERT (volinfo)
>>
>>         ret = access("/var/lib/glusterd/vols/gv0/info", F_OK);
>>         if(ret < 0)
>>         {
>>                 gf_msg (THIS->name, GF_LOG_ERROR, 0, 0, "info is not
>> exit(%d)", errno);
>>         }
>>         else
>>         {
>>                 ret = stat("/var/lib/glusterd/vols/gv0/info", &buf);
>>                 if(ret < 0)
>>                 {
>>                         gf_msg (THIS->name, GF_LOG_ERROR, 0, 0, "stat
>> info error");
>>                 }
>>                 else
>>                 {
>>                         gf_msg (THIS->name, GF_LOG_ERROR, 0, 0, "info
>> size is %lu, inode num is %lu", buf.st_size, buf.st_ino);
>>                 }
>>         }
>>
>>         glusterd_perform_volinfo_version_action (volinfo, ac);
>>         ret = glusterd_store_create_volume_dir (volinfo);
>>         if (ret)
>>                 goto out;
>>
>> ...
>> }
>>
>> So it is easy to understand why  the info or 10.32.1.144.-opt-lvmdir-c2-
>> brick sometimes is empty.
>> It is becaue the info file is not exist, and it will be create by “fd =
>> open (path, O_RDWR | O_CREAT | O_APPEND, 0600);” in function
>> gf_store_handle_new.
>> And the info file is empty before rename.
>> So the info file is empty if glusterd shutdown before rename.
>>
>>
>>
>> My question is following.
>> 1.I did not find the point the info is removed.Could you tell me the
>> point where the info and /bricks/* are removed?
>> 2.why the file info and bricks/* is removed?But other files in var/lib/glusterd/vols/xxx/
>> are not be removed?
>>
>>
>> AFAIK, we never delete the info file and hence this file is opened with
>> O_APPEND flag. As I said I will go back and cross check the code once again.
>>
>>
>>
>>
>> Thanks,
>> Xin
>>
>>
>> 在 2016-11-11 20:34:05,"Atin Mukherjee" <amukherj at redhat.com> 写道:
>>
>>
>>
>> On Fri, Nov 11, 2016 at 4:00 PM, songxin <songxin_1980 at 126.com> wrote:
>>
>> Hi Atin,
>>
>> Thank you for your support.
>> Sincerely wait for your reply.
>>
>> By the way, could you make sure that the issue, file info is empty, cause
>> by rename is interrupted in kernel?
>>
>>
>> As per my RCA on that bug, it looked to be.
>>
>>
>>
>> Thanks,
>> Xin
>>
>> 在 2016-11-11 15:49:02,"Atin Mukherjee" <amukherj at redhat.com> 写道:
>>
>>
>>
>> On Fri, Nov 11, 2016 at 1:15 PM, songxin <songxin_1980 at 126.com> wrote:
>>
>> Hi Atin,
>> Thank you for your reply.
>> Actually it is very difficult to reproduce because I don't know when there
>> was an ongoing commit happening.It is just a coincidence.
>> But I want to make sure the root cause.
>>
>>
>> I'll give it a another try and see if this situation can be
>> simulated/reproduced and will keep you posted.
>>
>>
>>
>> So I would be grateful if you could answer my questions below.
>>
>> You said that "This issue is hit at part of the negative testing where
>> while gluster volume set was executed at the same point of time glusterd in
>> another instance was brought down. In the faulty node we could see
>> /var/lib/glusterd/vols/<volname>info file been empty whereas the
>> info.tmp file has the correct contents." in comment.
>>
>> I have two questions for you.
>>
>> 1.Could you reproduce this issue by gluster volume set glusterd which was brought down?
>> 2.Could you be certain that this issue is cause by rename is interrupted in kernel?
>>
>> In my case there are two files, info and 10.32.1.144.-opt-lvmdir-c2-brick, are both empty.
>> But in my view only one rename can be running at the same time because of the big lock.
>> Why there are two files are empty?
>>
>>
>> Could rename("info.tmp", "info") and rename("xxx-brick.tmp", "xxx-brick") be running in two thread?
>>
>> Thanks,
>> Xin
>>
>>
>> 在 2016-11-11 15:27:03,"Atin Mukherjee" <amukherj at redhat.com> 写道:
>>
>>
>>
>> On Fri, Nov 11, 2016 at 12:38 PM, songxin <songxin_1980 at 126.com> wrote:
>>
>>
>> Hi Atin,
>> Thank you for your reply.
>>
>> As you said that the info file can only be changed in the glusterd_store_volinfo()
>> sequentially because of the big lock.
>>
>> I have found the similar issue as below that you mentioned.
>> https://bugzilla.redhat.com/show_bug.cgi?id=1308487
>>
>>
>> Great, so this is what I was actually trying to refer in my first email
>> that I saw a similar issue. Have you got a chance to look at
>> https://bugzilla.redhat.com/show_bug.cgi?id=1308487#c4 ? But in your
>> case, did you try to bring down glusterd when there was an ongoing commit
>> happening?
>>
>>
>>
>> You said that "This issue is hit at part of the negative testing where
>> while gluster volume set was executed at the same point of time glusterd in
>> another instance was brought down. In the faulty node we could see
>> /var/lib/glusterd/vols/<volname>info file been empty whereas the
>> info.tmp file has the correct contents." in comment.
>>
>> I have two questions for you.
>>
>> 1.Could you reproduce this issue by gluster volume set glusterd which was brought down?
>> 2.Could you be certain that this issue is cause by rename is interrupted in kernel?
>>
>> In my case there are two files, info and 10.32.1.144.-opt-lvmdir-c2-brick, are both empty.
>> But in my view only one rename can be running at the same time because of the big lock.
>> Why there are two files are empty?
>>
>>
>> Could rename("info.tmp", "info") and rename("xxx-brick.tmp", "xxx-brick") be running in two thread?
>>
>> Thanks,
>> Xin
>>
>>
>>
>>
>> 在 2016-11-11 14:36:40,"Atin Mukherjee" <amukherj at redhat.com> 写道:
>>
>>
>>
>> On Fri, Nov 11, 2016 at 8:33 AM, songxin <songxin_1980 at 126.com> wrote:
>>
>> Hi Atin,
>>
>> Thank you for your reply.
>> I have two questions for you.
>>
>> 1.Are the two files info and info.tmp are only to be created or changed
>> in function glusterd_store_volinfo()? I did not find other point in which
>> the two file are changed.
>>
>>
>> If we are talking about info file volume then yes, the mentioned function
>> actually takes care of it.
>>
>>
>> 2.I found that glusterd_store_volinfo() will be call in many point by
>> glusterd.Is there a problem of thread synchronization?If so, one thread may
>> open a same file info.tmp using O_TRUNC flag when another thread is
>> writing the info,tmp.Could this case happen?
>>
>>
>>  In glusterd threads are big lock protected and I don't see a possibility
>> (theoretically) to have two glusterd_store_volinfo () calls at a given
>> point of time.
>>
>>
>>
>> Thanks,
>> Xin
>>
>>
>> At 2016-11-10 21:41:06, "Atin Mukherjee" <amukherj at redhat.com> wrote:
>>
>> Did you run out of disk space by any chance? AFAIK, the code is like we
>> write new stuffs to .tmp file and rename it back to the original file. In
>> case of a disk space issue I expect both the files to be of non zero size.
>> But having said that I vaguely remember a similar issue (in the form of a
>> bug or an email) landed up once but we couldn't reproduce it, so something
>> is wrong with the atomic update here is what I guess. I'll be glad if you
>> have a reproducer for the same and then we can dig into it further.
>>
>> On Thu, Nov 10, 2016 at 1:32 PM, songxin <songxin_1980 at 126.com> wrote:
>>
>> Hi,
>> When I start the glusterd some error happened.
>> And the log is following.
>>
>> [2016-11-08 07:58:34.989365] I [MSGID: 100030] [glusterfsd.c:2318:main]
>> 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.7.6
>> (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
>> [2016-11-08 07:58:34.998356] I [MSGID: 106478] [glusterd.c:1350:init]
>> 0-management: Maximum allowed open file descriptors set to 65536
>> [2016-11-08 07:58:35.000667] I [MSGID: 106479] [glusterd.c:1399:init]
>> 0-management: Using /system/glusterd as working directory
>> [2016-11-08 07:58:35.024508] I [MSGID: 106514] [glusterd-store.c:2075:glusterd_restore_op_version]
>> 0-management: Upgrade detected. Setting op-version to minimum : 1
>> *[2016-11-08 07:58:35.025356] E [MSGID: 106206]
>> [glusterd-store.c:2562:glusterd_store_update_volinfo] 0-management: Failed
>> to get next store iter *
>> *[2016-11-08 07:58:35.025401] E [MSGID: 106207]
>> [glusterd-store.c:2844:glusterd_store_retrieve_volume] 0-management: Failed
>> to update volinfo for c_glusterfs volume *
>> *[2016-11-08 07:58:35.025463] E [MSGID: 106201]
>> [glusterd-store.c:3042:glusterd_store_retrieve_volumes] 0-management:
>> Unable to restore volume: c_glusterfs *
>> *[2016-11-08 07:58:35.025544] E [MSGID: 101019]
>> [xlator.c:428:xlator_init] 0-management: Initialization of volume
>> 'management' failed, review your volfile again *
>> *[2016-11-08 07:58:35.025582] E [graph.c:322:glusterfs_graph_init]
>> 0-management: initializing translator failed *
>> *[2016-11-08 07:58:35.025629] E [graph.c:661:glusterfs_graph_activate]
>> 0-graph: init failed *
>> [2016-11-08 07:58:35.026109] W [glusterfsd.c:1236:cleanup_and_exit]
>> (-->/usr/sbin/glusterd(glusterfs_volumes_init-0x1b260) [0x1000a718]
>> -->/usr/sbin/glusterd(glusterfs_process_volfp-0x1b3b8) [0x1000a5a8]
>> -->/usr/sbin/glusterd(cleanup_and_exit-0x1c02c) [0x100098bc] ) 0-:
>> received signum (0), shutting down
>>
>>
>> And then I found that the size of vols/volume_name/info is 0.It cause
>> glusterd shutdown.
>> But I found that vols/volume_name_info.tmp is not 0.
>> And I found that there is a brick file vols/volume_name/bricks/xxxx.brick
>> is 0, but vols/volume_name/bricks/xxxx.brick.tmp is not 0.
>>
>> I read the function code glusterd_store_volinfo () in glusterd-store.c .
>> I know that the info.tmp will be rename to info in function
>> glusterd_store_volume_atomic_update().
>>
>> But my question is that why the info file is 0 but info.tmp is not 0.
>>
>>
>> Thanks,
>> Xin
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>> --
>>
>> ~ Atin (atinm)
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> ~ Atin (atinm)
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> ~ Atin (atinm)
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> ~ Atin (atinm)
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> ~ Atin (atinm)
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> ~ Atin (atinm)
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> ~ Atin (atinm)
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> ~ Atin (atinm)
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>> --
>> --Atin
>>
>> --
> - Atin (atinm)
>
>
>
>
>



-- 

~ Atin (atinm)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161125/befd9bde/attachment.html>


More information about the Gluster-users mailing list