[Gluster-devel] Issue about the size of fstat is less than the really size of the syslog file

Wed Oct 19 06:35:01 UTC 2016

Hi, Raghavendra,

Thanks a lots for your quickly update!
In my case, there are so many process(write) is writing to the syslog file, it do involve the writer is in the same host and writing in same mount point while the tail(reader) is reading it.

The bug I just guess is: 
When a writer write the data with write-behind, it call the call-back function " mdc_writev_cbk" and called "mdc_inode_iatt_set_validate" to validate the "iatt" data, but with the code I mentioned last mail, it do nothing.
And in same time, the reader(tail) read the "iatt" data, but in case of the cache-time is not timeout, it will return the "iatt" data without the last change.

Do your think it is a possible bug?

Thanks & Best Regards,
George

-----Original Message-----
From: Raghavendra Gowdappa [mailto:rgowdapp at redhat.com] 
Sent: Wednesday, October 19, 2016 2:06 PM
To: Lian, George (Nokia - CN/Hangzhou) <george.lian at nokia.com>
Cc: Gluster-devel at gluster.org; I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com>; Zhang, Bingxuan (Nokia - CN/Hangzhou) <bingxuan.zhang at nokia.com>; Zizka, Jan (Nokia - CZ/Prague) <jan.zizka at nokia.com>
Subject: Re: [Gluster-devel] Issue about the size of fstat is less than the really size of the syslog file

----- Original Message -----
> From: "George Lian (Nokia - CN/Hangzhou)" <george.lian at nokia.com>
> To: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> Cc: Gluster-devel at gluster.org, "I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS"
> <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com>, "Bingxuan Zhang (Nokia - CN/Hangzhou)"
> <bingxuan.zhang at nokia.com>, "Jan Zizka (Nokia - CZ/Prague)" <jan.zizka at nokia.com>
> Sent: Wednesday, October 19, 2016 10:51:24 AM
> Subject: RE: [Gluster-devel] Issue about the size of fstat is less than the really size of the syslog file
> 
> Hi, Raghavendra,
> 
> When we disable md-cache(gluster volume set log performance.md-cache-timeout
> 0),  the issue seems gone.
> (we can't disable with " gluster volume set log performance.md-cache off"
> why?)

Please use
#gluster volume set log performance.stat-prefetch off

> 
> So I double confuse that the code I abstract in last mail maybe have some
> issue for this case.
> Could you please share your comments?

Please find my comments below.

> 
> Thanks & Best Regards,
> George
> 
> -----Original Message-----
> From: Lian, George (Nokia - CN/Hangzhou)
> Sent: Friday, October 14, 2016 1:44 PM
> To: 'Raghavendra Gowdappa' <rgowdapp at redhat.com>
> Cc: Gluster-devel at gluster.org; I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS
> <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com>; Zhang, Bingxuan (Nokia -
> CN/Hangzhou) <bingxuan.zhang at nokia.com>; Zizka, Jan (Nokia - CZ/Prague)
> <jan.zizka at nokia.com>
> Subject: RE: [Gluster-devel] Issue about the size of fstat is less than the
> really size of the syslog file
> 
> Hi, Raghavendra,
> 
> Our version of GlusterFS is 3.6.9, and I also check the newest code of main
> branch, the function of " mdc_inode_iatt_set_validate" is almost same, from
> the following code of this function,
> We could see a "TODO" comments inline, does it mean if we enhance
> write-behind feature, the "iatt" field in callback will be NULL, so that
> inode_invalidate will not be called? So the size of file will not update
> since "write behind" enabled ?
> Is it the root cause for "tail" application failed with "file truncated"
> issue ?
> 
>         LOCK (&mdc->lock);
>         {
>                 if (!iatt || !iatt->ia_ctime) {
>                         mdc->ia_time = 0;
>                         goto unlock;
>                 }
> 
> 		/*
> 		 * Invalidate the inode if the mtime or ctime has changed
> 		 * and the prebuf doesn't match the value we have cached.
> 		 * TODO: writev returns with a NULL iatt due to
> 		 * performance/write-behind, causing invalidation on writes.
> 		 */

The issue explained in this comment is hit only when writes are done. But, in your use-case only "tail" is the application running on the mount (If I am not wrong, the  writer is running on a different mountpoint). So, I doubt you are hitting this issue. But, you are saying that the issue goes away when write-behind/md-cache is turned off pointing to some interaction between md-cache and write-behind causing the issue. I need more time to look into this issue. Can you file a bug on this?

> 		if (IA_ISREG(inode->ia_type) &&
> 		    ((iatt->ia_mtime != mdc->md_mtime) ||
> 		    (iatt->ia_ctime != mdc->md_ctime)))
> 			if (!prebuf || (prebuf->ia_ctime != mdc->md_ctime) ||
> 			    (prebuf->ia_mtime != mdc->md_mtime))
> 				inode_invalidate(inode);
> 
>                 mdc_from_iatt (mdc, iatt);
> 
>                 time (&mdc->ia_time);
>         }
> 
> Best Regards,
> George
> -----Original Message-----
> From: Raghavendra Gowdappa [mailto:rgowdapp at redhat.com]
> Sent: Thursday, October 13, 2016 8:58 PM
> To: Lian, George (Nokia - CN/Hangzhou) <george.lian at nokia.com>
> Cc: Gluster-devel at gluster.org; I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS
> <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com>; Zhang, Bingxuan (Nokia -
> CN/Hangzhou) <bingxuan.zhang at nokia.com>; Zizka, Jan (Nokia - CZ/Prague)
> <jan.zizka at nokia.com>
> Subject: Re: [Gluster-devel] Issue about the size of fstat is less than the
> really size of the syslog file
> 
> 
> 
> ----- Original Message -----
> > From: "George Lian (Nokia - CN/Hangzhou)" <george.lian at nokia.com>
> > To: Gluster-devel at gluster.org
> > Cc: "I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS"
> > <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com>, "Bingxuan Zhang (Nokia
> > - CN/Hangzhou)" <bingxuan.zhang at nokia.com>, "Jan Zizka (Nokia - CZ/Prague)"
> > <jan.zizka at nokia.com>
> > Sent: Thursday, October 13, 2016 2:33:53 PM
> > Subject: [Gluster-devel] Issue about the size of fstat is less than the
> > really size of the syslog file
> > 
> > Hi, Dear Expert,
> > We have use glusterfs as a network filesystem, and syslog store in there,
> > some clients on different host may write the syslog file via “glusterfs”
> > mount point.
> > Now we encounter an issue when we “tail” the syslog file, it will
> > occasional
> > failed with error “ file truncated ”
> > As we study and trace with the “tail” source code, it failed with the
> > following code:
> > if ( S_ISREG (mode) && stats.st_size < f[i].size )
> > {
> > error (0, 0, _("%s: file truncated"), quotef (name));
> > /* Assume the file was truncated to 0,
> > and therefore output all "new" data. */
> > xlseek (fd, 0, SEEK_SET, name);
> > f[i].size = 0;
> > }
> > When stats.st_size < f[i].size, what mean the size report by fstat is less
> > than “tail” had read, it lead to “file truncated”, we also use “strace”
> > tools to trace the tail application, the related tail strace log as the
> > below:
> > nanosleep({1, 0}, NULL) = 0
> > fstat(3, {st_mode=S_IFREG|0644, st_size=192543105, ...}) = 0
> > nanosleep({1, 0}, NULL) = 0
> > fstat(3, {st_mode=S_IFREG|0644, st_size=192543105, ...}) = 0
> > nanosleep({1, 0}, NULL) = 0
> > fstat(3, {st_mode=S_IFREG|0644, st_size=192543105, ...}) = 0
> > nanosleep({1, 0}, NULL) = 0
> > fstat(3, {st_mode=S_IFREG|0644, st_size=192544549, ...}) = 0
> > read(3, " Data … -"..., 8192) = 1444
> > read(3, " Data.. "..., 8192) = 720
> > read(3, "", 8192) = 0
> > fstat(3, {st_mode=S_IFREG|0644, st_size=192544789, ...}) = 0
> > write(1, “DATA…..” ) = 2164
> > write(2, "tail: ", 6tail: ) = 6
> > write(2, "/mnt/log/master/syslog: file tru"..., 38/mnt/log/master/syslog:
> > file truncated) = 38
> > as the above strace log, tail has read 1444+720=2164 bytes,
> > but fstat tell “tail” 192544789 – 192543105 = 1664 which less than 2164, so
> > it lead to “tail” application “file truncated”.
> > And if we turn off “write-behind” feature, the issue will not be reproduced
> > any more.
> 
> That seems strange. There are no writes happening on the fd/inode through
> which tail is reading/stating from. So, it seems strange that write-behind
> is involved here. I suspect whether any of md-cache/read-ahead/io-cache is
> causing the issue. Can you,
> 
> 1. Turn off md-cache, read-ahead, io-cache xlators
> 2. mount glusterfs with --attribute-timeout=0
> 3. set write-behind on
> 
> and rerun the tests? If you don't hit the issue, you can experiment by
> turning on/off of md-cache, read-ahead and io-cache translators and see what
> are the minimal number of xlators that need to be turned off to not hit the
> issue (with write-behind on)?
> 
> regards,
> Raghavendra
> 
> > So we think it may be related to cache consistence issue due to performance
> > consider, but we still have concern that:
> > The syslog file is used only with “Append” mode, so the size of file
> > shouldn’t be reduced, when a client read the file, why “fstat” can’t return
> > the really size match to the cache?
> > From current investigation, we doubt that the current implement of
> > “glusterfs” has a bug on “fstat” when cache is on.
> > Your comments is our highly appreciated!
> > Thanks & Best Regards
> > George
> > 
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
>