[Gluster-devel] a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes"

Thu Jan 18 10:03:19 UTC 2018

On Thu, Jan 18, 2018 at 12:17 PM, Lian, George (NSB - CN/Hangzhou) <
george.lian at nokia-sbell.com> wrote:

> Hi,
>
> >>>I actually tried it with replica-2 and replica-3 and then distributed
> replica-2 before replying to the earlier mail. We can have a debugging
> session if you are okay with it.
>
>
>
> It is fine if you can’t reproduce the issue in your ENV.
>
> And I has attached the detail reproduce log in the Bugzilla FYI
>
>
>
> But I am sorry I maybe OOO at Monday and Tuesday next week, so debug
> session will be fine to me at next Wednesday.
>

Cool, this works for me too. Send me a mail off-list once you are available
and we can figure out a way to get into a call and work on this.

>
>
>
>
> Paste the detail reproduce log FYI here:
>
> *root at ubuntu:~# gluster peer probe ubuntu*
>
> *peer probe: success. Probe on localhost not needed*
>
> *root at ubuntu:~# gluster v create test replica 2 ubuntu:/home/gfs/b1
> ubuntu:/home/gfs/b2 force*
>
> *volume create: test: success: please start the volume to access data*
>
> *root at ubuntu:~# gluster v start test*
>
> *volume start: test: success*
>
> *root at ubuntu:~# gluster v info test*
>
>
>
> *Volume Name: test*
>
> *Type: Replicate*
>
> *Volume ID: fef5fca3-81d9-46d3-8847-74cde6f701a5*
>
> *Status: Started*
>
> *Snapshot Count: 0*
>
> *Number of Bricks: 1 x 2 = 2*
>
> *Transport-type: tcp*
>
> *Bricks:*
>
> *Brick1: ubuntu:/home/gfs/b1*
>
> *Brick2: ubuntu:/home/gfs/b2*
>
> *Options Reconfigured:*
>
> *transport.address-family: inet*
>
> *nfs.disable: on*
>
> *performance.client-io-threads: off*
>
> *root at ubuntu:~# gluster v status*
>
> *Status of volume: test*
>
> *Gluster process                             TCP Port  RDMA Port  Online
> Pid*
>
>
> *------------------------------------------------------------------------------*
>
> *Brick ubuntu:/home/gfs/b1                   49152     0          Y
> 7798*
>
> *Brick ubuntu:/home/gfs/b2                   49153     0          Y
> 7818*
>
> *Self-heal Daemon on localhost               N/A       N/A        Y
> 7839*
>
>
>
> *Task Status of Volume test*
>
>
> *------------------------------------------------------------------------------*
>
> *There are no active volume tasks*
>
>
>
>
>
> *root at ubuntu:~# gluster v set test cluster.consistent-metadata on*
>
> *volume set: success*
>
>
>
> *root at ubuntu:~# ls /mnt/test*
>
> *ls: cannot access '/mnt/test': No such file or directory*
>
> *root at ubuntu:~# mkdir -p /mnt/test*
>
> *root at ubuntu:~# mount -t glusterfs ubuntu:/test /mnt/test*
>
>
>
> *root at ubuntu:~# cd /mnt/test*
>
> *root at ubuntu:/mnt/test# echo "abc">aaa*
>
> *root at ubuntu:/mnt/test# cp aaa bbb;link bbb ccc*
>
>
>
> *root at ubuntu:/mnt/test# kill -9 7818*
>
> *root at ubuntu:/mnt/test# cp aaa ddd;link ddd eee*
>
> *link: cannot create link 'eee' to 'ddd': No such file or directory*
>
>
>
>
>
> Best Regards,
>
> George
>
>
>
> *From:* gluster-devel-bounces at gluster.org [mailto:gluster-devel-bounces@
> gluster.org] *On Behalf Of *Pranith Kumar Karampuri
> *Sent:* Thursday, January 18, 2018 2:40 PM
>
> *To:* Lian, George (NSB - CN/Hangzhou) <george.lian at nokia-sbell.com>
> *Cc:* Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com>;
> Gluster-devel at gluster.org; Li, Deqian (NSB - CN/Hangzhou) <
> deqian.li at nokia-sbell.com>; Sun, Ping (NSB - CN/Hangzhou) <
> ping.sun at nokia-sbell.com>
> *Subject:* Re: [Gluster-devel] a link issue maybe introduced in a bug fix
> " Don't let NFS cache stat after writes"
>
>
>
>
>
>
>
> On Thu, Jan 18, 2018 at 6:33 AM, Lian, George (NSB - CN/Hangzhou) <
> george.lian at nokia-sbell.com> wrote:
>
> Hi,
>
> I suppose the brick numbers in your testing is six, and you just shut down
> the 3 process.
>
> When I reproduce the issue, I only create a replicate volume with 2
> bricks, only let ONE brick working and set cluster.consistent-metadata on,
>
> With this 2 test condition, the issue could 100% reproducible.
>
>
>
> Hi,
>
>       I actually tried it with replica-2 and replica-3 and then
> distributed replica-2 before replying to the earlier mail. We can have a
> debugging session if you are okay with it.
>
> I am in the middle of a customer issue myself(That is the reason for this
> delay :-( ) and thinking of wrapping it up early next week. Would that be
> fine with you?
>
>
>
>
>
>
>
>
>
> 16:44:28 :) ⚡ gluster v status
>
> Status of volume: r2
>
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
>
> ------------------------------------------------------------
> ------------------
>
> Brick localhost.localdomain:/home/gfs/r2_0  49152     0          Y
> 5309
>
> Brick localhost.localdomain:/home/gfs/r2_1  49154     0          Y
> 5330
>
> Brick localhost.localdomain:/home/gfs/r2_2  49156     0          Y
> 5351
>
> Brick localhost.localdomain:/home/gfs/r2_3  49158     0          Y
> 5372
>
> Brick localhost.localdomain:/home/gfs/r2_4  49159     0          Y
> 5393
>
> Brick localhost.localdomain:/home/gfs/r2_5  49160     0          Y
> 5414
>
> Self-heal Daemon on localhost               N/A       N/A        Y
> 5436
>
>
>
> Task Status of Volume r2
>
> ------------------------------------------------------------
> ------------------
>
> There are no active volume tasks
>
>
>
> root at dhcp35-190 - ~
>
> 16:44:38 :) ⚡ kill -9 5309 5351 5393
>
>
>
> Best Regards,
>
> George
>
> *From:* gluster-devel-bounces at gluster.org [mailto:gluster-devel-bounces@
> gluster.org] *On Behalf Of *Pranith Kumar Karampuri
> *Sent:* Wednesday, January 17, 2018 7:27 PM
> *To:* Lian, George (NSB - CN/Hangzhou) <george.lian at nokia-sbell.com>
> *Cc:* Li, Deqian (NSB - CN/Hangzhou) <deqian.li at nokia-sbell.com>;
> Gluster-devel at gluster.org; Zhou, Cynthia (NSB - CN/Hangzhou) <
> cynthia.zhou at nokia-sbell.com>; Sun, Ping (NSB - CN/Hangzhou) <
> ping.sun at nokia-sbell.com>
>
>
> *Subject:* Re: [Gluster-devel] a link issue maybe introduced in a bug fix
> " Don't let NFS cache stat after writes"
>
>
>
>
>
>
>
> On Mon, Jan 15, 2018 at 1:55 PM, Pranith Kumar Karampuri <
> pkarampu at redhat.com> wrote:
>
>
>
>
>
> On Mon, Jan 15, 2018 at 8:46 AM, Lian, George (NSB - CN/Hangzhou) <
> george.lian at nokia-sbell.com> wrote:
>
> Hi,
>
>
>
> Have you reproduced this issue? If yes, could you please confirm whether
> it is an issue or not?
>
>
>
> Hi,
>
>        I tried recreating this on my laptop and on both master and 3.12
> and I am not able to recreate the issue :-(.
>
> Here is the execution log: https://paste.fedoraproject.org/paste/-
> csXUKrwsbrZAVW1KzggQQ
>
> Since I was doing this on my laptop, I changed shutting down of the
> replica to killing the brick process to simulate this test.
>
> Let me know if I missed something.
>
>
>
>
>
> Sorry, I am held up with some issue at work, so I think I will get some
> time day after tomorrow to look at this. In the mean time I am adding more
> people who know about afr to see if they get a chance to work on this
> before me.
>
>
>
>
>
> And if it is an issue,  do you have any solution for this issue?
>
>
>
> Thanks & Best Regards,
>
> George
>
>
>
> *From:* Lian, George (NSB - CN/Hangzhou)
> *Sent:* Thursday, January 11, 2018 2:01 PM
> *To:* Pranith Kumar Karampuri <pkarampu at redhat.com>
> *Cc:* Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com>;
> Gluster-devel at gluster.org; Li, Deqian (NSB - CN/Hangzhou) <
> deqian.li at nokia-sbell.com>; Sun, Ping (NSB - CN/Hangzhou) <
> ping.sun at nokia-sbell.com>
> *Subject:* RE: [Gluster-devel] a link issue maybe introduced in a bug fix
> " Don't let NFS cache stat after writes"
>
>
>
> Hi,
>
>
>
> Please see detail test step on https://bugzilla.redhat.com/
> show_bug.cgi?id=1531457
>
>
>
> How reproducible:
>
>
>
>
>
> Steps to Reproduce:
>
> 1.create a volume name "test" with replicated
>
> 2.set volume option cluster.consistent-metadata with on:
>
>   gluster v set test cluster.consistent-metadata on
>
> 3. mount volume test on client on /mnt/test
>
> 4. create a file aaa size more than 1 byte
>
>    echo "1234567890" >/mnt/test/aaa
>
> 5. shutdown a replicat node, let's say sn-1, only let sn-0 worked
>
> 6. cp /mnt/test/aaa /mnt/test/bbb; link /mnt/test/bbb /mnt/test/ccc
>
>
>
>
>
> BRs
>
> George
>
>
>
> *From:* gluster-devel-bounces at gluster.org [mailto:gluster-devel-bounces@
> gluster.org <gluster-devel-bounces at gluster.org>] *On Behalf Of *Pranith
> Kumar Karampuri
> *Sent:* Thursday, January 11, 2018 12:39 PM
> *To:* Lian, George (NSB - CN/Hangzhou) <george.lian at nokia-sbell.com>
> *Cc:* Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com>;
> Gluster-devel at gluster.org; Li, Deqian (NSB - CN/Hangzhou) <
> deqian.li at nokia-sbell.com>; Sun, Ping (NSB - CN/Hangzhou) <
> ping.sun at nokia-sbell.com>
> *Subject:* Re: [Gluster-devel] a link issue maybe introduced in a bug fix
> " Don't let NFS cache stat after writes"
>
>
>
>
>
>
>
> On Thu, Jan 11, 2018 at 6:35 AM, Lian, George (NSB - CN/Hangzhou) <
> george.lian at nokia-sbell.com> wrote:
>
> Hi,
>
> >>> In which protocol are you seeing this issue? Fuse/NFS/SMB?
>
> It is fuse, within mountpoint by “mount -t glusterfs  …“ command.
>
>
>
> Could you let me know the test you did so that I can try to re-create and
> see what exactly is going on?
>
> Configuration of the volume and the steps to re-create the issue you are
> seeing would be helpful in debugging the issue further.
>
>
>
>
>
> Thanks & Best Regards,
>
> George
>
>
>
> *From:* gluster-devel-bounces at gluster.org [mailto:gluster-devel-bounces@
> gluster.org] *On Behalf Of *Pranith Kumar Karampuri
> *Sent:* Wednesday, January 10, 2018 8:08 PM
> *To:* Lian, George (NSB - CN/Hangzhou) <george.lian at nokia-sbell.com>
> *Cc:* Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou at nokia-sbell.com>;
> Zhong, Hua (NSB - CN/Hangzhou) <hua.zhong at nokia-sbell.com>; Li, Deqian
> (NSB - CN/Hangzhou) <deqian.li at nokia-sbell.com>; Gluster-devel at gluster.org;
> Sun, Ping (NSB - CN/Hangzhou) <ping.sun at nokia-sbell.com>
> *Subject:* Re: [Gluster-devel] a link issue maybe introduced in a bug fix
> " Don't let NFS cache stat after writes"
>
>
>
>
>
>
>
> On Wed, Jan 10, 2018 at 11:09 AM, Lian, George (NSB - CN/Hangzhou) <
> george.lian at nokia-sbell.com> wrote:
>
> Hi, Pranith Kumar,
>
>
>
> I has create a bug on Bugzilla https://bugzilla.redhat.com/
> show_bug.cgi?id=1531457
>
> After my investigation for this link issue, I suppose your changes on
> afr-dir-write.c with issue " Don't let NFS cache stat after writes" , your
> fix is like:
>
> --------------------------------------
>
>        if (afr_txn_nothing_failed (frame, this)) {
>
>                         /*if it did pre-op, it will do post-op changing
> ctime*/
>
>                         if (priv->consistent_metadata &&
>
>                             afr_needs_changelog_update (local))
>
>                                 afr_zero_fill_stat (local);
>
>                         local->transaction.unwind (frame, this);
>
>                 }
>
> In the above fix, it set the ia_nlink to ‘0’ if option
> consistent-metadata is set to “on”.
>
> And hard link a file with which just created will lead to an error, and
> the error is caused in kernel function “vfs_link”:
>
> if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE))
>
>              error =  -ENOENT;
>
>
>
> could you please have a check and give some comments here?
>
>
>
> When stat is "zero filled", understanding is that the higher layer
> protocol doesn't send stat value to the kernel and a separate lookup is
> sent by the kernel to get the latest stat value. In which protocol are you
> seeing this issue? Fuse/NFS/SMB?
>
>
>
>
>
> Thanks & Best Regards,
>
> George
>
>
>
>
> --
>
> Pranith
>
>
>
>
> --
>
> Pranith
>
>
>
>
> --
>
> Pranith
>
>
>
>
> --
>
> Pranith
>
>
>
>
> --
>
> Pranith
>

-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180118/9a5499f4/attachment-0001.html>