[Gluster-devel] Issue about the size of fstat is less than the really size of the syslog file

Raghavendra Gowdappa rgowdapp at redhat.com
Tue Nov 1 02:19:22 UTC 2016



----- Original Message -----
> From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> To: "George Lian (Nokia - CN/Hangzhou)" <george.lian at nokia.com>
> Cc: "I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS" <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com>, "Bingxuan Zhang (Nokia
> - CN/Hangzhou)" <bingxuan.zhang at nokia.com>, Gluster-devel at gluster.org, "Jan Zizka (Nokia - CZ/Prague)"
> <jan.zizka at nokia.com>
> Sent: Tuesday, November 1, 2016 7:46:47 AM
> Subject: Re: [Gluster-devel] Issue about the size of fstat is less than the really size of the syslog file
> 
> 
> 
> ----- Original Message -----
> > From: "George Lian (Nokia - CN/Hangzhou)" <george.lian at nokia.com>
> > To: "Raghavendra Gowdappa" <rgowdapp at redhat.com>, "Jan Zizka (Nokia -
> > CZ/Prague)" <jan.zizka at nokia.com>, "Bingxuan
> > Zhang (Nokia - CN/Hangzhou)" <bingxuan.zhang at nokia.com>
> > Cc: "Pranith Kumar Karampuri" <pkarampu at redhat.com>,
> > "I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS"
> > <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com>,
> > Gluster-devel at gluster.org
> > Sent: Tuesday, November 1, 2016 6:35:10 AM
> > Subject: RE: [Gluster-devel] Issue about the size of fstat is less than the
> > really size of the syslog file
> > 
> > Hi, Raghavendra,
> > 
> > Thanks a lots for your update!
> > 
> > >IIUC, the "tail issue" can happen if 'tail -f' reads a stat with st_size
> > >lesser than previously read value (and hence the complaint - file
> > >truncated). In this case, even though fstat at T2 doesn't account the
> > >write
> > >at T0, it doesn't prove that st_size of fstat at T2 is lesser than that at
> > >any time before T2.
> > 
> > I just mean the st_size of fstat maybe less than the previously read value
> > in
> > that time, and it will lead to the "tail truncated" issue. Do you agree
> > with
> > me?
> 
> Yes. But, in your example there is only one fstat. For this to happen we need
> atleast two fstats and the latest st_size should be less than the oldest
> one. Am I missing anything here?
> 
> > 
> > >As to the relative ordering of write at T0 and fstat at T2, POSIX leaves
> > >it
> > >undefined. Unless write and fstat happen from same
> > >thread/single-threaded-application there is no requirement for maintaining
> > >that order (If they are issued from same thread fstat should account write
> > >at T0). Also note that it is not mentioned here fstat at T2 is issued
> > >_after_ write at T0 is _complete_. If that is the case, mdc_writev_cbk
> > >would've updated correct stat in cache and fstat would get correct value.
> > >If it is not the case, then there is no well defined order here.
> > 
> > >So, I don't think there is a bug here, unless I've missed out something.
> > 
> > Do you mean the GlusterFS not conflict with the requirement, so that the
> > application like "tail" should consider the case in network file system?
> 
> No. Applications shouldn't do anything different to work on Glusterfs.
> Otherwise its a bug :). What I am saying is that the issue with 'tail -f'
> might be because of a different bug than the example you gave. In other
> words, the RCA you posted may not be correct. It might be because of issues
> with write-behind (and other xlators) as I posted in other mail.
> 
> Priliminary testing by Pranith showed that Elasticsearch works fine with just
> write-behind. 

with patch http://review.gluster.org/15757 applied.

> So, that's a progress. Will keep you posted with our efforts
> on getting Elasticsearch working on Gluster. I've a feeling that, it will
> solve your issue (tail -f) too.
> 
> regards,
> Raghavendra
> 
> > 
> > @Jan & @Bingxuan, do you have some comments for the above information?
> > 
> > 
> > Best Regards,
> > George
> > 
> > -----Original Message-----
> > From: Raghavendra Gowdappa [mailto:rgowdapp at redhat.com]
> > Sent: Monday, October 31, 2016 6:35 PM
> > To: Lian, George (Nokia - CN/Hangzhou) <george.lian at nokia.com>
> > Cc: Pranith Kumar Karampuri <pkarampu at redhat.com>;
> > I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS
> > <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com>; Zhang, Bingxuan (Nokia
> > -
> > CN/Hangzhou) <bingxuan.zhang at nokia.com>; Gluster-devel at gluster.org; Zizka,
> > Jan (Nokia - CZ/Prague) <jan.zizka at nokia.com>
> > Subject: Re: [Gluster-devel] Issue about the size of fstat is less than the
> > really size of the syslog file
> > 
> > 
> > 
> > ----- Original Message -----
> > > From: "George Lian (Nokia - CN/Hangzhou)" <george.lian at nokia.com>
> > > To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>, "Raghavendra
> > > Gowdappa"
> > > <rgowdapp at redhat.com>
> > > Cc: "I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS"
> > > <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com>, "Bingxuan Zhang
> > > (Nokia
> > > - CN/Hangzhou)" <bingxuan.zhang at nokia.com>, Gluster-devel at gluster.org,
> > > "Jan
> > > Zizka (Nokia - CZ/Prague)"
> > > <jan.zizka at nokia.com>
> > > Sent: Monday, October 31, 2016 2:32:34 PM
> > > Subject: RE: [Gluster-devel] Issue about the size of fstat is less than
> > > the
> > > really size of the syslog file
> > > 
> > > Hi,
> > > 
> > > I suppose there seems a defect on mdc_writev_cbk  and mdc_fstat
> > > Let’s assume in 2 timestamp which called write and fstat operation in
> > > application:
> > > T0:  write (process a)
> > > T1: read (process b) with the data of T0 of process a.
> > > T2: fstat   (process c)
> > > In my view, mdc_write is non-block operation and have some lock to
> > > protect
> > > in
> > > afr xlator,  because mdc_fstat not check the lock in AFR xaltor, so
> > > mdc_writev_cbk which called “mdc_inode_iatt_set_validate” maybe later
> > > than
> > > mdc_fstat.
> > > Such like
> > > T3: fstat result of T2  without the “mdc_inode_iatt_set_validate” of T0
> > > when
> > > stat-prefetch options is on.
> > > T4: “mdc_inode_iatt_set_validate” is called of T0 in mdc_writev_cbk.
> > > 
> > > Lets’ assume T0<T1<T2<T3<T4, is the above assumption is reasonable case
> > > when
> > > in multi-process environment and the load of CPU is high?
> > > If it is reasonable, then issue of “tail issue” will be happened.
> > 
> > IIUC, the "tail issue" can happen if 'tail -f' reads a stat with st_size
> > lesser than previously read value (and hence the complaint - file
> > truncated). In this case, even though fstat at T2 doesn't account the write
> > at T0, it doesn't prove that st_size of fstat at T2 is lesser than that at
> > any time before T2.
> > 
> > As to the relative ordering of write at T0 and fstat at T2, POSIX leaves it
> > undefined. Unless write and fstat happen from same
> > thread/single-threaded-application there is no requirement for maintaining
> > that order (If they are issued from same thread fstat should account write
> > at T0). Also note that it is not mentioned here fstat at T2 is issued
> > _after_ write at T0 is _complete_. If that is the case, mdc_writev_cbk
> > would've updated correct stat in cache and fstat would get correct value.
> > If
> > it is not the case, then there is no well defined order here.
> > 
> > So, I don't think there is a bug here, unless I've missed out something.
> > 
> > 
> > > 
> > > So maybe a fix suggestion is on mdc_fstat operation , we should add an
> > > operation to check whether the writev operation is ongoing or not, if
> > > write-operation is ongoing, should goto uncached label in mdc_fstat
> > > function.
> > > 
> > > Could you please confirm the above assumption and suggestion?
> > > 
> > > 
> > > Thanks & Best Regards,
> > > George
> > > 
> > > 
> > > From: Lian, George (Nokia - CN/Hangzhou)
> > > Sent: Monday, October 31, 2016 4:25 PM
> > > To: Pranith Kumar Karampuri <pkarampu at redhat.com>; Raghavendra Gowdappa
> > > <rgowdapp at redhat.com>
> > > Cc: I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS
> > > <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com>; Zhang, Bingxuan
> > > (Nokia
> > > -
> > > CN/Hangzhou) <bingxuan.zhang at nokia.com>; Gluster-devel at gluster.org;
> > > Zizka,
> > > Jan (Nokia - CZ/Prague) <jan.zizka at nokia.com>
> > > Subject: RE: [Gluster-devel] Issue about the size of fstat is less than
> > > the
> > > really size of the syslog file
> > > 
> > > Hi,
> > > 
> > > How can we enable debug.trace so that we can inspect the debug data on
> > > different xlator?
> > > I just set “debug.trace on” and “debug.log-file yes” seems not work now.
> > > 
> > > And one more update for this issue, if we set performance.stat-prefetch
> > > to
> > > off, the issue will not be occurred. (our previous test maybe not
> > > correct☺
> > > )
> > > 
> > > Thanks & Best Regards,
> > > George
> > > 
> > > From: Pranith Kumar Karampuri [mailto:pkarampu at redhat.com]
> > > Sent: Friday, October 28, 2016 2:39 PM
> > > To: Lian, George (Nokia - CN/Hangzhou)
> > > <george.lian at nokia.com<mailto:george.lian at nokia.com>>
> > > Cc: Raghavendra Gowdappa
> > > <rgowdapp at redhat.com<mailto:rgowdapp at redhat.com>>;
> > > I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS
> > > <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com<mailto:I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com>>;
> > > Zhang, Bingxuan (Nokia - CN/Hangzhou)
> > > <bingxuan.zhang at nokia.com<mailto:bingxuan.zhang at nokia.com>>;
> > > Gluster-devel at gluster.org<mailto:Gluster-devel at gluster.org>; Zizka, Jan
> > > (Nokia - CZ/Prague) <jan.zizka at nokia.com<mailto:jan.zizka at nokia.com>>
> > > Subject: Re: [Gluster-devel] Issue about the size of fstat is less than
> > > the
> > > really size of the syslog file
> > > 
> > > hi George,
> > >        It would help if we can identify the bare minimum xlators which
> > >        are
> > >        contributing to the issue like Raghavendra was mentioning earlier.
> > >        We
> > >        were wondering if it is possible for you to help us in identifying
> > >        the issue by running the workload on a modified setup? We can
> > >        suggest
> > >        testing out using custom volfiles so that we can slowly build the
> > >        graph which could be causing this issue. We would like you guys to
> > >        try out this problem with just posix-xlator and fuse and nothing
> > >        else.
> > > 
> > > On Thu, Oct 27, 2016 at 1:40 PM, Lian, George (Nokia - CN/Hangzhou)
> > > <george.lian at nokia.com<mailto:george.lian at nokia.com>> wrote:
> > > Hi, Raghavendra,
> > > 
> > > Could you please give some suggestion for this issue? we try to find the
> > > clue
> > > for this issue for a long time, but it has no progress:(
> > > 
> > > Thanks & Best Regards,
> > > George
> > > 
> > > -----Original Message-----
> > > From: Lian, George (Nokia - CN/Hangzhou)
> > > Sent: Wednesday, October 19, 2016 4:40 PM
> > > To: 'Raghavendra Gowdappa'
> > > <rgowdapp at redhat.com<mailto:rgowdapp at redhat.com>>
> > > Cc: Gluster-devel at gluster.org<mailto:Gluster-devel at gluster.org>;
> > > I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS
> > > <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com<mailto:I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com>>;
> > > Zhang, Bingxuan (Nokia - CN/Hangzhou)
> > > <bingxuan.zhang at nokia.com<mailto:bingxuan.zhang at nokia.com>>; Zizka, Jan
> > > (Nokia - CZ/Prague) <jan.zizka at nokia.com<mailto:jan.zizka at nokia.com>>
> > > Subject: RE: [Gluster-devel] Issue about the size of fstat is less than
> > > the
> > > really size of the syslog file
> > > 
> > > Hi, Raghavendra
> > > 
> > > Just now, we test it with glusterfs log with debug-level "TRACE", and let
> > > some application trigger "glusterfs" produce large log, in that case,
> > > when
> > > we set write-behind and stat-prefetch both OFF,
> > > Tail the glusterfs log such like mnt-{VOLUME-NAME}.log, it still failed
> > > with
> > > "file truncated",
> > > 
> > > So that means if file's IO in huge amount, the issue will still be there
> > > even
> > > write-behind and stat-prefetch both OFF.
> > > 
> > > Best Regards,
> > > George
> > > 
> > > -----Original Message-----
> > > From: Raghavendra Gowdappa
> > > [mailto:rgowdapp at redhat.com<mailto:rgowdapp at redhat.com>]
> > > Sent: Wednesday, October 19, 2016 2:54 PM
> > > To: Lian, George (Nokia - CN/Hangzhou)
> > > <george.lian at nokia.com<mailto:george.lian at nokia.com>>
> > > Cc: Gluster-devel at gluster.org<mailto:Gluster-devel at gluster.org>;
> > > I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS
> > > <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com<mailto:I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com>>;
> > > Zhang, Bingxuan (Nokia - CN/Hangzhou)
> > > <bingxuan.zhang at nokia.com<mailto:bingxuan.zhang at nokia.com>>; Zizka, Jan
> > > (Nokia - CZ/Prague) <jan.zizka at nokia.com<mailto:jan.zizka at nokia.com>>
> > > Subject: Re: [Gluster-devel] Issue about the size of fstat is less than
> > > the
> > > really size of the syslog file
> > > 
> > > 
> > > 
> > > ----- Original Message -----
> > > > From: "George Lian (Nokia - CN/Hangzhou)"
> > > > <george.lian at nokia.com<mailto:george.lian at nokia.com>>
> > > > To: "Raghavendra Gowdappa"
> > > > <rgowdapp at redhat.com<mailto:rgowdapp at redhat.com>>
> > > > Cc: Gluster-devel at gluster.org<mailto:Gluster-devel at gluster.org>,
> > > > "I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS"
> > > > <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com<mailto:I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com>>,
> > > > "Bingxuan Zhang (Nokia - CN/Hangzhou)"
> > > > <bingxuan.zhang at nokia.com<mailto:bingxuan.zhang at nokia.com>>, "Jan Zizka
> > > > (Nokia - CZ/Prague)" <jan.zizka at nokia.com<mailto:jan.zizka at nokia.com>>
> > > > Sent: Wednesday, October 19, 2016 12:05:01 PM
> > > > Subject: RE: [Gluster-devel] Issue about the size of fstat is less than
> > > > the
> > > > really size of the syslog file
> > > >
> > > > Hi, Raghavendra,
> > > >
> > > > Thanks a lots for your quickly update!
> > > > In my case, there are so many process(write) is writing to the syslog
> > > > file,
> > > > it do involve the writer is in the same host and writing in same mount
> > > > point
> > > > while the tail(reader) is reading it.
> > > >
> > > > The bug I just guess is:
> > > > When a writer write the data with write-behind, it call the call-back
> > > > function " mdc_writev_cbk" and called "mdc_inode_iatt_set_validate" to
> > > > validate the "iatt" data, but with the code I mentioned last mail, it
> > > > do
> > > > nothing.
> > > 
> > > mdc_inode_iatt_set_validate has following code
> > > 
> > > <snippet>
> > >                 if (!iatt || !iatt->ia_ctime) {
> > >                         mdc->ia_time = 0;
> > >                         goto unlock;
> > >                 }
> > > </snippet>
> > > 
> > > Which means a NULL iatt sets mdc->ia_time to 0. This results in
> > > subsequent
> > > lookup/stat calls to be NOT served from md-cache. Instead, the stat is
> > > served from backend bricks. So, I don't see an issue here.
> > > 
> > > However, one case where a NULL iatt is different from a valid iatt (which
> > > differs from the value stored in md-cache) is that the latter results in
> > > a
> > > call to inode_invalidate. This invalidation propagates to kernel and all
> > > dentry and page cache corresponding to file is purged. So, I am
> > > suspecting
> > > whether the stale stat you saw was served from kernel cache (not from
> > > glusterfs). If this is the case, having mount options
> > > "attribute-timeout=0"
> > > and "entry-timeout=0" should've helped.
> > > 
> > > I am still at loss to point out the RCA for this issue.
> > > 
> > > 
> > > > And in same time, the reader(tail) read the "iatt" data, but in case of
> > > > the
> > > > cache-time is not timeout, it will return the "iatt" data without the
> > > > last
> > > > change.
> > > >
> > > > Do your think it is a possible bug?
> > > >
> > > > Thanks & Best Regards,
> > > > George
> > > >
> > > > -----Original Message-----
> > > > From: Raghavendra Gowdappa
> > > > [mailto:rgowdapp at redhat.com<mailto:rgowdapp at redhat.com>]
> > > > Sent: Wednesday, October 19, 2016 2:06 PM
> > > > To: Lian, George (Nokia - CN/Hangzhou)
> > > > <george.lian at nokia.com<mailto:george.lian at nokia.com>>
> > > > Cc: Gluster-devel at gluster.org<mailto:Gluster-devel at gluster.org>;
> > > > I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS
> > > > <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com<mailto:I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com>>;
> > > > Zhang, Bingxuan (Nokia -
> > > > CN/Hangzhou)
> > > > <bingxuan.zhang at nokia.com<mailto:bingxuan.zhang at nokia.com>>;
> > > > Zizka, Jan (Nokia - CZ/Prague)
> > > > <jan.zizka at nokia.com<mailto:jan.zizka at nokia.com>>
> > > > Subject: Re: [Gluster-devel] Issue about the size of fstat is less than
> > > > the
> > > > really size of the syslog file
> > > >
> > > >
> > > >
> > > > ----- Original Message -----
> > > > > From: "George Lian (Nokia - CN/Hangzhou)"
> > > > > <george.lian at nokia.com<mailto:george.lian at nokia.com>>
> > > > > To: "Raghavendra Gowdappa"
> > > > > <rgowdapp at redhat.com<mailto:rgowdapp at redhat.com>>
> > > > > Cc: Gluster-devel at gluster.org<mailto:Gluster-devel at gluster.org>,
> > > > > "I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS"
> > > > > <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com<mailto:I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com>>,
> > > > > "Bingxuan Zhang (Nokia
> > > > > - CN/Hangzhou)"
> > > > > <bingxuan.zhang at nokia.com<mailto:bingxuan.zhang at nokia.com>>, "Jan
> > > > > Zizka
> > > > > (Nokia - CZ/Prague)"
> > > > > <jan.zizka at nokia.com<mailto:jan.zizka at nokia.com>>
> > > > > Sent: Wednesday, October 19, 2016 10:51:24 AM
> > > > > Subject: RE: [Gluster-devel] Issue about the size of fstat is less
> > > > > than
> > > > > the
> > > > > really size of the syslog file
> > > > >
> > > > > Hi, Raghavendra,
> > > > >
> > > > > When we disable md-cache(gluster volume set log
> > > > > performance.md-cache-timeout
> > > > > 0),  the issue seems gone.
> > > > > (we can't disable with " gluster volume set log performance.md-cache
> > > > > off"
> > > > > why?)
> > > >
> > > > Please use
> > > > #gluster volume set log performance.stat-prefetch off
> > > >
> > > > >
> > > > > So I double confuse that the code I abstract in last mail maybe have
> > > > > some
> > > > > issue for this case.
> > > > > Could you please share your comments?
> > > >
> > > > Please find my comments below.
> > > >
> > > > >
> > > > > Thanks & Best Regards,
> > > > > George
> > > > >
> > > > > -----Original Message-----
> > > > > From: Lian, George (Nokia - CN/Hangzhou)
> > > > > Sent: Friday, October 14, 2016 1:44 PM
> > > > > To: 'Raghavendra Gowdappa'
> > > > > <rgowdapp at redhat.com<mailto:rgowdapp at redhat.com>>
> > > > > Cc: Gluster-devel at gluster.org<mailto:Gluster-devel at gluster.org>;
> > > > > I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS
> > > > > <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com<mailto:I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com>>;
> > > > > Zhang, Bingxuan (Nokia
> > > > > -
> > > > > CN/Hangzhou)
> > > > > <bingxuan.zhang at nokia.com<mailto:bingxuan.zhang at nokia.com>>;
> > > > > Zizka, Jan (Nokia - CZ/Prague)
> > > > > <jan.zizka at nokia.com<mailto:jan.zizka at nokia.com>>
> > > > > Subject: RE: [Gluster-devel] Issue about the size of fstat is less
> > > > > than
> > > > > the
> > > > > really size of the syslog file
> > > > >
> > > > > Hi, Raghavendra,
> > > > >
> > > > > Our version of GlusterFS is 3.6.9, and I also check the newest code
> > > > > of
> > > > > main
> > > > > branch, the function of " mdc_inode_iatt_set_validate" is almost
> > > > > same,
> > > > > from
> > > > > the following code of this function,
> > > > > We could see a "TODO" comments inline, does it mean if we enhance
> > > > > write-behind feature, the "iatt" field in callback will be NULL, so
> > > > > that
> > > > > inode_invalidate will not be called? So the size of file will not
> > > > > update
> > > > > since "write behind" enabled ?
> > > > > Is it the root cause for "tail" application failed with "file
> > > > > truncated"
> > > > > issue ?
> > > > >
> > > > >         LOCK (&mdc->lock);
> > > > >         {
> > > > >                 if (!iatt || !iatt->ia_ctime) {
> > > > >                         mdc->ia_time = 0;
> > > > >                         goto unlock;
> > > > >                 }
> > > > >
> > > > >             /*
> > > > >              * Invalidate the inode if the mtime or ctime has changed
> > > > >              * and the prebuf doesn't match the value we have cached.
> > > > >              * TODO: writev returns with a NULL iatt due to
> > > > >              * performance/write-behind, causing invalidation on
> > > > >              writes.
> > > > >              */
> > > >
> > > > The issue explained in this comment is hit only when writes are done.
> > > > But,
> > > > in
> > > > your use-case only "tail" is the application running on the mount (If I
> > > > am
> > > > not wrong, the  writer is running on a different mountpoint). So, I
> > > > doubt
> > > > you are hitting this issue. But, you are saying that the issue goes
> > > > away
> > > > when write-behind/md-cache is turned off pointing to some interaction
> > > > between md-cache and write-behind causing the issue. I need more time
> > > > to
> > > > look into this issue. Can you file a bug on this?
> > > >
> > > > >             if (IA_ISREG(inode->ia_type) &&
> > > > >                 ((iatt->ia_mtime != mdc->md_mtime) ||
> > > > >                 (iatt->ia_ctime != mdc->md_ctime)))
> > > > >                     if (!prebuf || (prebuf->ia_ctime !=
> > > > >                     mdc->md_ctime)
> > > > >                     ||
> > > > >                         (prebuf->ia_mtime != mdc->md_mtime))
> > > > >                             inode_invalidate(inode);
> > > > >
> > > > >                 mdc_from_iatt (mdc, iatt);
> > > > >
> > > > >                 time (&mdc->ia_time);
> > > > >         }
> > > > >
> > > > > Best Regards,
> > > > > George
> > > > > -----Original Message-----
> > > > > From: Raghavendra Gowdappa
> > > > > [mailto:rgowdapp at redhat.com<mailto:rgowdapp at redhat.com>]
> > > > > Sent: Thursday, October 13, 2016 8:58 PM
> > > > > To: Lian, George (Nokia - CN/Hangzhou)
> > > > > <george.lian at nokia.com<mailto:george.lian at nokia.com>>
> > > > > Cc: Gluster-devel at gluster.org<mailto:Gluster-devel at gluster.org>;
> > > > > I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS
> > > > > <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com<mailto:I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com>>;
> > > > > Zhang, Bingxuan (Nokia
> > > > > -
> > > > > CN/Hangzhou)
> > > > > <bingxuan.zhang at nokia.com<mailto:bingxuan.zhang at nokia.com>>;
> > > > > Zizka, Jan (Nokia - CZ/Prague)
> > > > > <jan.zizka at nokia.com<mailto:jan.zizka at nokia.com>>
> > > > > Subject: Re: [Gluster-devel] Issue about the size of fstat is less
> > > > > than
> > > > > the
> > > > > really size of the syslog file
> > > > >
> > > > >
> > > > >
> > > > > ----- Original Message -----
> > > > > > From: "George Lian (Nokia - CN/Hangzhou)"
> > > > > > <george.lian at nokia.com<mailto:george.lian at nokia.com>>
> > > > > > To: Gluster-devel at gluster.org<mailto:Gluster-devel at gluster.org>
> > > > > > Cc: "I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS"
> > > > > > <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com<mailto:I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com>>,
> > > > > > "Bingxuan Zhang
> > > > > > (Nokia
> > > > > > - CN/Hangzhou)"
> > > > > > <bingxuan.zhang at nokia.com<mailto:bingxuan.zhang at nokia.com>>, "Jan
> > > > > > Zizka (Nokia -
> > > > > > CZ/Prague)"
> > > > > > <jan.zizka at nokia.com<mailto:jan.zizka at nokia.com>>
> > > > > > Sent: Thursday, October 13, 2016 2:33:53 PM
> > > > > > Subject: [Gluster-devel] Issue about the size of fstat is less than
> > > > > > the
> > > > > > really size of the syslog file
> > > > > >
> > > > > > Hi, Dear Expert,
> > > > > > We have use glusterfs as a network filesystem, and syslog store in
> > > > > > there,
> > > > > > some clients on different host may write the syslog file via
> > > > > > “glusterfs”
> > > > > > mount point.
> > > > > > Now we encounter an issue when we “tail” the syslog file, it will
> > > > > > occasional
> > > > > > failed with error “ file truncated ”
> > > > > > As we study and trace with the “tail” source code, it failed with
> > > > > > the
> > > > > > following code:
> > > > > > if ( S_ISREG (mode) && stats.st_size < f[i].size )
> > > > > > {
> > > > > > error (0, 0, _("%s: file truncated"), quotef (name));
> > > > > > /* Assume the file was truncated to 0,
> > > > > > and therefore output all "new" data. */
> > > > > > xlseek (fd, 0, SEEK_SET, name);
> > > > > > f[i].size = 0;
> > > > > > }
> > > > > > When stats.st_size < f[i].size, what mean the size report by fstat
> > > > > > is
> > > > > > less
> > > > > > than “tail” had read, it lead to “file truncated”, we also use
> > > > > > “strace”
> > > > > > tools to trace the tail application, the related tail strace log as
> > > > > > the
> > > > > > below:
> > > > > > nanosleep({1, 0}, NULL) = 0
> > > > > > fstat(3, {st_mode=S_IFREG|0644, st_size=192543105, ...}) = 0
> > > > > > nanosleep({1, 0}, NULL) = 0
> > > > > > fstat(3, {st_mode=S_IFREG|0644, st_size=192543105, ...}) = 0
> > > > > > nanosleep({1, 0}, NULL) = 0
> > > > > > fstat(3, {st_mode=S_IFREG|0644, st_size=192543105, ...}) = 0
> > > > > > nanosleep({1, 0}, NULL) = 0
> > > > > > fstat(3, {st_mode=S_IFREG|0644, st_size=192544549, ...}) = 0
> > > > > > read(3, " Data … -"..., 8192) = 1444
> > > > > > read(3, " Data.. "..., 8192) = 720
> > > > > > read(3, "", 8192) = 0
> > > > > > fstat(3, {st_mode=S_IFREG|0644, st_size=192544789, ...}) = 0
> > > > > > write(1, “DATA…..” ) = 2164
> > > > > > write(2, "tail: ", 6tail: ) = 6
> > > > > > write(2, "/mnt/log/master/syslog: file tru"...,
> > > > > > 38/mnt/log/master/syslog:
> > > > > > file truncated) = 38
> > > > > > as the above strace log, tail has read 1444+720=2164 bytes,
> > > > > > but fstat tell “tail” 192544789 – 192543105 = 1664 which less than
> > > > > > 2164,
> > > > > > so
> > > > > > it lead to “tail” application “file truncated”.
> > > > > > And if we turn off “write-behind” feature, the issue will not be
> > > > > > reproduced
> > > > > > any more.
> > > > >
> > > > > That seems strange. There are no writes happening on the fd/inode
> > > > > through
> > > > > which tail is reading/stating from. So, it seems strange that
> > > > > write-behind
> > > > > is involved here. I suspect whether any of
> > > > > md-cache/read-ahead/io-cache
> > > > > is
> > > > > causing the issue. Can you,
> > > > >
> > > > > 1. Turn off md-cache, read-ahead, io-cache xlators
> > > > > 2. mount glusterfs with --attribute-timeout=0
> > > > > 3. set write-behind on
> > > > >
> > > > > and rerun the tests? If you don't hit the issue, you can experiment
> > > > > by
> > > > > turning on/off of md-cache, read-ahead and io-cache translators and
> > > > > see
> > > > > what
> > > > > are the minimal number of xlators that need to be turned off to not
> > > > > hit
> > > > > the
> > > > > issue (with write-behind on)?
> > > > >
> > > > > regards,
> > > > > Raghavendra
> > > > >
> > > > > > So we think it may be related to cache consistence issue due to
> > > > > > performance
> > > > > > consider, but we still have concern that:
> > > > > > The syslog file is used only with “Append” mode, so the size of
> > > > > > file
> > > > > > shouldn’t be reduced, when a client read the file, why “fstat”
> > > > > > can’t
> > > > > > return
> > > > > > the really size match to the cache?
> > > > > > From current investigation, we doubt that the current implement of
> > > > > > “glusterfs” has a bug on “fstat” when cache is on.
> > > > > > Your comments is our highly appreciated!
> > > > > > Thanks & Best Regards
> > > > > > George
> > > > > >
> > > > > > _______________________________________________
> > > > > > Gluster-devel mailing list
> > > > > > Gluster-devel at gluster.org<mailto:Gluster-devel at gluster.org>
> > > > > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > > >
> > > >
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel at gluster.org<mailto:Gluster-devel at gluster.org>
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > 
> > > 
> > > 
> > > --
> > > Pranith
> > > 
> > 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel


More information about the Gluster-devel mailing list