[Gluster-devel] gfid and volume-id extended attributes lost
Pranith Kumar Karampuri
pkarampu at redhat.com
Sat Jul 8 01:36:08 UTC 2017
Ram,
As per the code, self-heal was the only candidate which *can* do it.
Could you check logs of self-heal daemon and the mount to check if there
are any metadata heals on root?
+Sanoj
Sanoj,
Is there any systemtap script we can use to detect which process is
removing these xattrs?
On Sat, Jul 8, 2017 at 2:58 AM, Ankireddypalle Reddy <areddy at commvault.com>
wrote:
> We lost the attributes on all the bricks on servers glusterfs2 and
> glusterfs3 again.
>
>
>
> [root at glusterfs2 Log_Files]# gluster volume info
>
>
>
> Volume Name: StoragePool
>
> Type: Distributed-Disperse
>
> Volume ID: 149e976f-4e21-451c-bf0f-f5691208531f
>
> Status: Started
>
> Number of Bricks: 20 x (2 + 1) = 60
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: glusterfs1sds:/ws/disk1/ws_brick
>
> Brick2: glusterfs2sds:/ws/disk1/ws_brick
>
> Brick3: glusterfs3sds:/ws/disk1/ws_brick
>
> Brick4: glusterfs1sds:/ws/disk2/ws_brick
>
> Brick5: glusterfs2sds:/ws/disk2/ws_brick
>
> Brick6: glusterfs3sds:/ws/disk2/ws_brick
>
> Brick7: glusterfs1sds:/ws/disk3/ws_brick
>
> Brick8: glusterfs2sds:/ws/disk3/ws_brick
>
> Brick9: glusterfs3sds:/ws/disk3/ws_brick
>
> Brick10: glusterfs1sds:/ws/disk4/ws_brick
>
> Brick11: glusterfs2sds:/ws/disk4/ws_brick
>
> Brick12: glusterfs3sds:/ws/disk4/ws_brick
>
> Brick13: glusterfs1sds:/ws/disk5/ws_brick
>
> Brick14: glusterfs2sds:/ws/disk5/ws_brick
>
> Brick15: glusterfs3sds:/ws/disk5/ws_brick
>
> Brick16: glusterfs1sds:/ws/disk6/ws_brick
>
> Brick17: glusterfs2sds:/ws/disk6/ws_brick
>
> Brick18: glusterfs3sds:/ws/disk6/ws_brick
>
> Brick19: glusterfs1sds:/ws/disk7/ws_brick
>
> Brick20: glusterfs2sds:/ws/disk7/ws_brick
>
> Brick21: glusterfs3sds:/ws/disk7/ws_brick
>
> Brick22: glusterfs1sds:/ws/disk8/ws_brick
>
> Brick23: glusterfs2sds:/ws/disk8/ws_brick
>
> Brick24: glusterfs3sds:/ws/disk8/ws_brick
>
> Brick25: glusterfs4sds.commvault.com:/ws/disk1/ws_brick
>
> Brick26: glusterfs5sds.commvault.com:/ws/disk1/ws_brick
>
> Brick27: glusterfs6sds.commvault.com:/ws/disk1/ws_brick
>
> Brick28: glusterfs4sds.commvault.com:/ws/disk10/ws_brick
>
> Brick29: glusterfs5sds.commvault.com:/ws/disk10/ws_brick
>
> Brick30: glusterfs6sds.commvault.com:/ws/disk10/ws_brick
>
> Brick31: glusterfs4sds.commvault.com:/ws/disk11/ws_brick
>
> Brick32: glusterfs5sds.commvault.com:/ws/disk11/ws_brick
>
> Brick33: glusterfs6sds.commvault.com:/ws/disk11/ws_brick
>
> Brick34: glusterfs4sds.commvault.com:/ws/disk12/ws_brick
>
> Brick35: glusterfs5sds.commvault.com:/ws/disk12/ws_brick
>
> Brick36: glusterfs6sds.commvault.com:/ws/disk12/ws_brick
>
> Brick37: glusterfs4sds.commvault.com:/ws/disk2/ws_brick
>
> Brick38: glusterfs5sds.commvault.com:/ws/disk2/ws_brick
>
> Brick39: glusterfs6sds.commvault.com:/ws/disk2/ws_brick
>
> Brick40: glusterfs4sds.commvault.com:/ws/disk3/ws_brick
>
> Brick41: glusterfs5sds.commvault.com:/ws/disk3/ws_brick
>
> Brick42: glusterfs6sds.commvault.com:/ws/disk3/ws_brick
>
> Brick43: glusterfs4sds.commvault.com:/ws/disk4/ws_brick
>
> Brick44: glusterfs5sds.commvault.com:/ws/disk4/ws_brick
>
> Brick45: glusterfs6sds.commvault.com:/ws/disk4/ws_brick
>
> Brick46: glusterfs4sds.commvault.com:/ws/disk5/ws_brick
>
> Brick47: glusterfs5sds.commvault.com:/ws/disk5/ws_brick
>
> Brick48: glusterfs6sds.commvault.com:/ws/disk5/ws_brick
>
> Brick49: glusterfs4sds.commvault.com:/ws/disk6/ws_brick
>
> Brick50: glusterfs5sds.commvault.com:/ws/disk6/ws_brick
>
> Brick51: glusterfs6sds.commvault.com:/ws/disk6/ws_brick
>
> Brick52: glusterfs4sds.commvault.com:/ws/disk7/ws_brick
>
> Brick53: glusterfs5sds.commvault.com:/ws/disk7/ws_brick
>
> Brick54: glusterfs6sds.commvault.com:/ws/disk7/ws_brick
>
> Brick55: glusterfs4sds.commvault.com:/ws/disk8/ws_brick
>
> Brick56: glusterfs5sds.commvault.com:/ws/disk8/ws_brick
>
> Brick57: glusterfs6sds.commvault.com:/ws/disk8/ws_brick
>
> Brick58: glusterfs4sds.commvault.com:/ws/disk9/ws_brick
>
> Brick59: glusterfs5sds.commvault.com:/ws/disk9/ws_brick
>
> Brick60: glusterfs6sds.commvault.com:/ws/disk9/ws_brick
>
> Options Reconfigured:
>
> performance.readdir-ahead: on
>
> diagnostics.client-log-level: INFO
>
> auth.allow: glusterfs1sds,glusterfs2sds,glusterfs3sds,glusterfs4sds.
> commvault.com,glusterfs5sds.commvault.com,glusterfs6sds.commvault.com
>
>
>
> Thanks and Regards,
>
> Ram
>
> *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com]
> *Sent:* Friday, July 07, 2017 12:15 PM
>
> *To:* Ankireddypalle Reddy
> *Cc:* Gluster Devel (gluster-devel at gluster.org); gluster-users at gluster.org
> *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes lost
>
>
>
>
>
>
>
> On Fri, Jul 7, 2017 at 9:25 PM, Ankireddypalle Reddy <areddy at commvault.com>
> wrote:
>
> 3.7.19
>
>
>
> These are the only callers for removexattr and only _posix_remove_xattr
> has the potential to do removexattr as posix_removexattr already makes sure
> that it is not gfid/volume-id. And surprise surprise _posix_remove_xattr
> happens only from healing code of afr/ec. And this can only happen if the
> source brick doesn't have gfid, which doesn't seem to match with the
> situation you explained.
>
> # line filename / context / line
> 1 1234 xlators/mgmt/glusterd/src/glusterd-quota.c
> <<glusterd_remove_quota_limit>>
> ret = sys_lremovexattr (abspath, QUOTA_LIMIT_KEY);
> 2 1243 xlators/mgmt/glusterd/src/glusterd-quota.c
> <<glusterd_remove_quota_limit>>
> ret = sys_lremovexattr (abspath, QUOTA_LIMIT_OBJECTS_KEY);
> 3 6102 xlators/mgmt/glusterd/src/glusterd-utils.c
> <<glusterd_check_and_set_brick_xattr>>
> sys_lremovexattr (path, "trusted.glusterfs.test");
> 4 80 xlators/storage/posix/src/posix-handle.h
> <<REMOVE_PGFID_XATTR>>
> op_ret = sys_lremovexattr (path, key); \
> 5 5026 xlators/storage/posix/src/posix.c <<_posix_remove_xattr>>
> op_ret = sys_lremovexattr (filler->real_path, key);
> 6 5101 xlators/storage/posix/src/posix.c <<posix_removexattr>>
> op_ret = sys_lremovexattr (real_path, name);
> 7 6811 xlators/storage/posix/src/posix.c <<init>>
> sys_lremovexattr (dir_data->data, "trusted.glusterfs.test");
>
> So there are only two possibilities:
>
> 1) Source directory in ec/afr doesn't have gfid
>
> 2) Something else removed these xattrs.
>
> What is your volume info? May be that will give more clues.
>
>
>
> PS: sys_fremovexattr is called only from posix_fremovexattr(), so that
> doesn't seem to be the culprit as it also have checks to guard against
> gfid/volume-id removal.
>
>
>
> Thanks and Regards,
>
> Ram
>
> *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com]
> *Sent:* Friday, July 07, 2017 11:54 AM
>
>
> *To:* Ankireddypalle Reddy
> *Cc:* Gluster Devel (gluster-devel at gluster.org); gluster-users at gluster.org
> *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes lost
>
>
>
>
>
>
>
> On Fri, Jul 7, 2017 at 9:20 PM, Ankireddypalle Reddy <areddy at commvault.com>
> wrote:
>
> Pranith,
>
> Thanks for looking in to the issue. The bricks were
> mounted after the reboot. One more thing that I noticed was when the
> attributes were manually set when glusterd was up then on starting the
> volume the attributes were again lost. Had to stop glusterd set attributes
> and then start glusterd. After that the volume start succeeded.
>
>
>
> Which version is this?
>
>
>
>
>
> Thanks and Regards,
>
> Ram
>
>
>
> *From:* Pranith Kumar Karampuri [mailto:pkarampu at redhat.com]
> *Sent:* Friday, July 07, 2017 11:46 AM
> *To:* Ankireddypalle Reddy
> *Cc:* Gluster Devel (gluster-devel at gluster.org); gluster-users at gluster.org
> *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes lost
>
>
>
> Did anything special happen on these two bricks? It can't happen in the
> I/O path:
> posix_removexattr() has:
> 0 if (!strcmp (GFID_XATTR_KEY, name))
> {
>
>
> 1 gf_msg (this->name, GF_LOG_WARNING, 0,
> P_MSG_XATTR_NOT_REMOVED,
> 2 "Remove xattr called on gfid for file %s",
> real_path);
> 3 op_ret = -1;
>
> 4 goto out;
>
> 5 }
>
> 6 if (!strcmp (GF_XATTR_VOL_ID_KEY, name))
> {
> 7 gf_msg (this->name, GF_LOG_WARNING, 0,
> P_MSG_XATTR_NOT_REMOVED,
> 8 "Remove xattr called on volume-id for file
> %s",
> 9 real_path);
>
> 10 op_ret = -1;
>
> 11 goto out;
>
> 12 }
>
> I just found that op_errno is not set correctly, but it can't happen in
> the I/O path, so self-heal/rebalance are off the hook.
>
> I also grepped for any removexattr of trusted.gfid from glusterd and
> didn't find any.
>
> So one thing that used to happen was that sometimes when machines reboot,
> the brick mounts wouldn't happen and this would lead to absence of both
> trusted.gfid and volume-id. So at the moment this is my wild guess.
>
>
>
>
>
> On Fri, Jul 7, 2017 at 8:39 PM, Ankireddypalle Reddy <areddy at commvault.com>
> wrote:
>
> Hi,
>
> We faced an issue in the production today. We had to stop the
> volume and reboot all the servers in the cluster. Once the servers
> rebooted starting of the volume failed because the following extended
> attributes were not present on all the bricks on 2 servers.
>
> 1) trusted.gfid
>
> 2) trusted.glusterfs.volume-id
>
>
>
> We had to manually set these extended attributes to start the volume. Are
> there any such known issues.
>
>
>
> Thanks and Regards,
>
> Ram
>
> ***************************Legal Disclaimer***************************
>
> "This communication may contain confidential and privileged material for
> the
>
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
>
> by others is strictly prohibited. If you have received the message by
> mistake,
>
> please advise the sender by reply email and delete the message. Thank you."
>
> **********************************************************************
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
>
>
> --
>
> Pranith
>
> ***************************Legal Disclaimer***************************
>
> "This communication may contain confidential and privileged material for
> the
>
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
>
> by others is strictly prohibited. If you have received the message by
> mistake,
>
> please advise the sender by reply email and delete the message. Thank you."
>
> **********************************************************************
>
>
>
>
> --
>
> Pranith
>
> ***************************Legal Disclaimer***************************
>
> "This communication may contain confidential and privileged material for
> the
>
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
>
> by others is strictly prohibited. If you have received the message by
> mistake,
>
> please advise the sender by reply email and delete the message. Thank you."
>
> **********************************************************************
>
>
>
>
> --
>
> Pranith
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material for
> the
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
> by others is strictly prohibited. If you have received the message by
> mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **********************************************************************
>
--
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20170708/de831927/attachment-0001.html>
More information about the Gluster-devel
mailing list