[Gluster-devel] Bitrot stub forget()

Thu Feb 18 04:41:18 UTC 2016

I will take care of putting up the patch upstream.

Thanks and Regards,
Kotresh H R

----- Original Message -----
> From: "Venky Shankar" <vshankar at redhat.com>
> To: "FNU Raghavendra Manjunath" <rabhat at redhat.com>
> Cc: "Gluster Devel" <gluster-devel at gluster.org>, "Kotresh Hiremath Ravishankar" <khiremat at redhat.com>
> Sent: Thursday, February 18, 2016 9:44:27 AM
> Subject: Re: Bitrot stub forget()
> 
> On Tue, Feb 16, 2016 at 11:38:24PM -0500, FNU Raghavendra Manjunath wrote:
> > Venky,
> > 
> > Yes. You are right. We should not remove the quarantine entry in forget.
> > 
> > We have to remove it upon getting -ve lookups in bit-rot-stub and upon
> > getting an unlink.
> > 
> > I have attached a patch for it.
> > 
> > Unfortunately rfc.sh is failing for me with the below error.
> 
> Thanks for the patch. One of us will take care of putting it up for review.
> 
> > 
> > 
> > ssh: connect to host git.gluster.com port 22: Connection timed out
> > fatal: Could not read from remote repository.
> > 
> > Please make sure you have the correct access rights
> > and the repository exists."
> > 
> > 
> > Regards,
> > Raghavendra
> > 
> > 
> > On Tue, Feb 16, 2016 at 10:53 AM, Venky Shankar <vshankar at redhat.com>
> > wrote:
> > 
> > > Hey Raghu,
> > >
> > > Bitrot stub inode forget implementation (br_stub_forget()) deletes the
> > > bad
> > > object
> > > marker (under quarantine directory) if present. This looks incorrect as
> > > ->forget()
> > > can be trigerred when inode table LRU size exceeeds configured limit -
> > > check bug
> > > #1308961 which tracks this issue. I recall that protocol/server calls
> > > inode_forget()
> > > on negative lookup (that might not invoke ->forget() though) and that's
> > > the reason
> > > why br_stub_forget() has this code.
> > >
> > > So, would it make sense to purge bad object marker just in lookup()?
> > > There
> > > might be
> > > a need to do the same in unlink() in case the object was removed by the
> > > client.
> > >
> > > Thoughts?
> > >
> > > Thanks,
> > >
> > >                 Venky
> > >
> 
> > From a0cc49172df24e263e0db25c53b57f58c19d2cab Mon Sep 17 00:00:00 2001
> > From: Raghavendra Bhat <raghavendra at redhat.com>
> > Date: Tue, 16 Feb 2016 20:22:36 -0500
> > Subject: [PATCH] features/bitrot: do not remove the quarantine handle in
> >  forget
> > 
> > If an object is marked as bad, then an entry is corresponding to the
> > bad object is created in the .glusterfs/quarantine directory to help
> > scrub status. The entry name is the gfid of the corrupted object.
> > The quarantine handle is removed in below 2 cases.
> > 
> > 1) When protocol/server revceives the -ve lookup on an entry whose inode
> >    is there in the inode table (it can happen when the corrupted object
> >    is deleted directly from the backend for recovery purpose) it sends a
> >    forget on the inode and bit-rot-stub removes the quarantine handle in
> >    upon getting the forget.
> >    refer to the below commit
> >    f853ed9c61bf65cb39f859470a8ffe8973818868:
> >    http://review.gluster.org/12743)
> > 
> > 2) When bit-rot-stub itself realizes that lookup on a corrupted object
> >    has failed with ENOENT.
> > 
> > But with step1, there is a problem when the bit-rot-stub receives forget
> > due to lru limit exceeding in the inode table. In such cases, though the
> > corrupted object is not deleted (either from the mount point or from the
> > backend), the handle in the quarantine directory is removed and that object
> > is not shown in the bad objects list in the scrub status command.
> > 
> > So it is better to follow only 2nd step (i.e. bit-rot-stub removing the
> > handle
> > from the quarantine directory in -ve lookups). Also the handle has to be
> > removed
> > when a corrupted object is unlinked from the mount point itself.
> > 
> > Change-Id: Ibc3bbaf4bc8a5f8986085e87b729ab912cbf8cf9
> > Signed-off-by: Raghavendra Bhat <raghavendra at redhat.com>
> > ---
> >  xlators/features/bit-rot/src/stub/bit-rot-stub.c | 103
> >  +++++++++++++++++++++--
> >  1 file changed, 95 insertions(+), 8 deletions(-)
> > 
> > diff --git a/xlators/features/bit-rot/src/stub/bit-rot-stub.c
> > b/xlators/features/bit-rot/src/stub/bit-rot-stub.c
> > index 9c0f622..920aa7c 100644
> > --- a/xlators/features/bit-rot/src/stub/bit-rot-stub.c
> > +++ b/xlators/features/bit-rot/src/stub/bit-rot-stub.c
> > @@ -2841,19 +2841,46 @@ unwind:
> >  
> >  /** {{{ */
> >  
> > -/* forget() */
> > +/* unlink() */
> >  
> >  int
> > -br_stub_forget (xlator_t *this, inode_t *inode)
> > +br_stub_unlink_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
> > +                   int32_t op_ret, int32_t op_errno, struct iatt
> > *preparent,
> > +                   struct iatt *postparent, dict_t *xdata)
> >  {
> > -        uint64_t ctx_addr = 0;
> > -        br_stub_inode_ctx_t *ctx = NULL;
> > +        br_stub_local_t     *local    = NULL;
> > +        inode_t             *inode    = NULL;
> > +        uint64_t             ctx_addr = 0;
> > +        br_stub_inode_ctx_t *ctx      = NULL;
> > +        int32_t              ret      = -1;
> >  
> > -        inode_ctx_del (inode, this, &ctx_addr);
> > -        if (!ctx_addr)
> > -                return 0;
> > +        if (op_ret < 0)
> > +                goto unwind;
> >  
> > -        ctx = (br_stub_inode_ctx_t *) (long) ctx_addr;
> > +        local = frame->local;
> > +        frame->local = NULL;
> > +
> > +        inode = local->u.context.inode;
> > +
> > +        ret = br_stub_get_inode_ctx (this, inode, &ctx_addr);
> > +        if (ret) {
> > +                /**
> > +                 * If the inode is bad AND context is not there, then
> > there
> > +                 * is a possibility of the gfid of the object being listed
> > +                 * in the quarantine directory and will be shown in the
> > +                 * bad objects list. So continuing with the fop with a
> > +                 * warning log. The entry from the quarantine directory
> > +                 * has to be removed manually. Its not a good idea to fail
> > +                 * the fop, as the object has already been deleted.
> > +                 */
> > +                gf_msg (this->name, GF_LOG_WARNING, 0,
> > +                        BRS_MSG_GET_INODE_CONTEXT_FAILED,
> > +                        "failed to get the context for the inode %s",
> > +                        uuid_utoa (inode->gfid));
> > +                goto unwind;
> > +        }
> > +
> > +        ctx = (br_stub_inode_ctx_t *)(long)ctx_addr;
> >  
> >          LOCK (&inode->lock);
> >          {
> > @@ -2868,6 +2895,65 @@ br_stub_forget (xlator_t *this, inode_t *inode)
> >          }
> >          UNLOCK (&inode->lock);
> >  
> > +unwind:
> > +        STACK_UNWIND_STRICT (unlink, frame, op_ret, op_errno, preparent,
> > postparent, xdata);
> > +        br_stub_cleanup_local (local);
> > +        br_stub_dealloc_local (local);
> > +        return 0;
> > +}
> > +
> > +int
> > +br_stub_unlink (call_frame_t *frame, xlator_t *this, loc_t *loc, int flag,
> > +                dict_t *xdata)
> > +{
> > +        br_stub_local_t *local = NULL;
> > +        int32_t          op_ret = -1;
> > +        int32_t          op_errno = 0;
> > +
> > +        local = br_stub_alloc_local (this);
> > +        if (!local) {
> > +                op_ret = -1;
> > +                op_errno = ENOMEM;
> > +                gf_msg (this->name, GF_LOG_ERROR, ENOMEM,
> > BRS_MSG_NO_MEMORY,
> > +                        "failed to allocate memory for local (path: %s,
> > gfid: %s)",
> > +                        loc->path, uuid_utoa (loc->inode->gfid));
> > +                goto unwind;
> > +        }
> > +
> > +        br_stub_fill_local (local, NULL, NULL, loc->inode,
> > +                            loc->inode->gfid,
> > +                            BR_STUB_NO_VERSIONING, 0);
> > +
> > +        frame->local = local;
> > +
> > +        STACK_WIND (frame, br_stub_unlink_cbk, FIRST_CHILD (this),
> > +                    FIRST_CHILD (this)->fops->unlink, loc, flag, xdata);
> > +        return 0;
> > +
> > +unwind:
> > +        STACK_UNWIND_STRICT (unlink, frame, op_ret, op_errno, NULL, NULL,
> > NULL);
> > +        return 0;
> > +}
> > +
> > +
> > +/** }}} */
> > +
> > +/** {{{ */
> > +
> > +/* forget() */
> > +
> > +int
> > +br_stub_forget (xlator_t *this, inode_t *inode)
> > +{
> > +        uint64_t ctx_addr = 0;
> > +        br_stub_inode_ctx_t *ctx = NULL;
> > +
> > +        inode_ctx_del (inode, this, &ctx_addr);
> > +        if (!ctx_addr)
> > +                return 0;
> > +
> > +        ctx = (br_stub_inode_ctx_t *) (long) ctx_addr;
> > +
> >          GF_FREE (ctx);
> >  
> >          return 0;
> > @@ -3133,6 +3219,7 @@ struct xlator_fops fops = {
> >          .setxattr  = br_stub_setxattr,
> >          .opendir   = br_stub_opendir,
> >          .readdir   = br_stub_readdir,
> > +        .unlink    = br_stub_unlink,
> >  };
> >  
> >  struct xlator_cbks cbks = {
> > --
> > 2.5.0
> > 
> 
>