[Gluster-devel] [Gluster-users] gluster volume heal info split brain command not showing files in split-brain

Anuradha Talur atalur at redhat.com
Wed Mar 23 06:24:16 UTC 2016



----- Original Message -----
> From: "ABHISHEK PALIWAL" <abhishpaliwal at gmail.com>
> To: "Anuradha Talur" <atalur at redhat.com>
> Cc: gluster-users at gluster.org, gluster-devel at gluster.org
> Sent: Monday, March 21, 2016 10:44:53 AM
> Subject: Re: [Gluster-users] gluster volume heal info split brain command not showing files in split-brain
> 
> Hi Anuradha,
> 
> Have you got any pointer from the above scenarios.
> 
Hi Abhishek,

I went through all the logs that you have given.
There is only one brick's log in the info you provided,
and for only one day. Where is the other brick's logfile?

In the same log file I see a lot of connects and disconnects in quick succession.
Which could be the cause of gfid mismatch if I/O was going on during the time.
The other logs that have been provided also do not have enough information to
determine how your setup could have ended up with no pending markers.

I understand that output from heal info split-brain is more easy to get info
for files in split-brain. But without pending markers, this info cannot be obtained.

For second scenario, is your self-heal-daemon on?
> Regards,
> Abhishek
> 
> On Fri, Mar 18, 2016 at 11:18 AM, ABHISHEK PALIWAL <abhishpaliwal at gmail.com>
> wrote:
> 
> >
> >
> > On Fri, Mar 18, 2016 at 1:41 AM, Anuradha Talur <atalur at redhat.com> wrote:
> >
> >>
> >>
> >> ----- Original Message -----
> >> > From: "ABHISHEK PALIWAL" <abhishpaliwal at gmail.com>
> >> > To: "Anuradha Talur" <atalur at redhat.com>
> >> > Cc: gluster-users at gluster.org, gluster-devel at gluster.org
> >> > Sent: Thursday, March 17, 2016 4:00:58 PM
> >> > Subject: Re: [Gluster-users] gluster volume heal info split brain
> >> command not showing files in split-brain
> >> >
> >> > Hi Anuradha,
> >> >
> >> > Please confirm me, this is bug in glusterfs or we need to do something
> >> at
> >> > our end.
> >> >
> >> > Because this problem is stopping our development.
> >> Hi Abhishek,
> >>
> >> When you say file is not getting sync, do you mean that the files are not
> >> in sync after healing or that the existing GFID mismatch that you tried to
> >> heal failed?
> >> In one of the previous mails, you said that the GFID mismatch problem is
> >> resolved, is it not so?
> >>
> >
> > As I mentioned I have two scenario:
> > 1. First scenario is where files are in split-brain but not recognized by
> > the the split-brain and heal info commands. So we are identifying those
> > file when I/O errors occurred on those files (the same method mentioned in
> > the link which you shared earlier) but this method is not reliable in our
> > case because other modules have the dependencies on this file and those
> > modules can't wait until heal in progress. In this case we required manual
> > identification of the file those are falling in I/O error which is somehow
> > not the correct way. It is better if the split-brain or heal info command
> > identify the files and based on the output we will perform the self healing
> > on those files only.
> >
> > 2. Second scenario in which we have one log file which have the fixed size
> > and wrapping of data properties and continuously written by the system even
> > when the other brick is down or rebooting. In this case, we have two brick
> > in replica mode and when one goes down and comes up but this file remains
> > out of sync. We are not getting any of the following on this file:
> > A. Not recognized by the split-brain and heal info command.
> > B. Not getting any I/O error
> > C. Do not have the GFID mismatch
> >
> > Here, are the getfattr output of this file
> >
> > Brick B which rebooted and have the file out of sync
> >
> > getfattr -d -m . -e hex
> > opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
> >
> > # file:
> > opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
> > trusted.afr.c_glusterfs-client-1=0x000000000000000000000000
> > trusted.afr.dirty=0x000000000000000000000000
> > trusted.bit-rot.version=0x000000000000000b56d6dd1d000ec7a9
> > trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae
> >
> >
> > Brick A where file was getting updated when Brick B was rebooting
> >
> > getfattr -d -m . -e hex
> > opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
> > trusted.afr.c_glusterfs-client-0=0x000000080000000000000000
> > trusted.afr.c_glusterfs-client-2=0x000000020000000000000000
> > trusted.afr.c_glusterfs-client-4=0x000000020000000000000000
> > trusted.afr.c_glusterfs-client-6=0x000000020000000000000000
> > trusted.afr.dirty=0x000000000000000000000000
> > trusted.bit-rot.version=0x000000000000000b56d6dcb7000c87e7
> > trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae
> >
> > This scenario is not 100% reproducible but out of 20 cycle we can
> > reproduce it one or two times.
> >
> >
> >> To your question about finding the files in split-brain, can you try
> >> running gluster volume heal <volname> info? Heal info is also supposed to
> >> show
> >> the files in split-brain.
> >
> >
> > This heal info command is also not working.
> >
> >
> >> If the GFID mismatch is not resolved yet, it would really help understand
> >> the underlying problem if you give the output of getfattr -m. -de hex
> >> <path-to-parent-directory-of-the-file-in-GFID-mismatch>.
> >>
> >
> > It resolved the problem but we don't want to go with that solution
> > (mentioned in the link provided by) we want the consistency and out come
> > from the split-brain or heal info files.
> >
> > The output you are asking for parent directory I have already shared you
> > in this mail chain. I am also attaching one more time for your reference.
> >
> > Please let me know if you have any query regarding the above scenarios.
> >
> > Regards,
> > Abhishek
> >
> >> >
> >> > Regards,
> >> > Abhishek
> >> >
> >> > On Thu, Mar 17, 2016 at 1:54 PM, ABHISHEK PALIWAL <
> >> abhishpaliwal at gmail.com>
> >> > wrote:
> >> >
> >> > > Hi Anuradha,
> >> > >
> >> > > But in this case I need to do tail on each file which is time taking
> >> > > process and other end I can't pause my module until these file is
> >> getting
> >> > > healed.
> >> > >
> >> > > Any how I need the output of the split-brain to resolve this problem.
> >> > >
> >> > > Regards,
> >> > > Abhishek
> >> > >
> >> > > On Wed, Mar 16, 2016 at 6:21 PM, ABHISHEK PALIWAL <
> >> abhishpaliwal at gmail.com
> >> > > > wrote:
> >> > >
> >> > >> Hi Anuradha,
> >> > >>
> >> > >> The issue is resolved but we have one more issue something similar to
> >> > >> this one in which the file is not getting sync after the steps
> >> followed,
> >> > >> mentioned in the link which you shared in the previous mail.
> >> > >>
> >> > >> And problem is that why split-brain command is not showing
> >> split-brain
> >> > >> entries.
> >> > >>
> >> > >> Regards,
> >> > >> Abhishek
> >> > >>
> >> > >> On Wed, Mar 16, 2016 at 6:06 PM, Anuradha Talur <atalur at redhat.com>
> >> > >> wrote:
> >> > >>
> >> > >>>
> >> > >>>
> >> > >>> ----- Original Message -----
> >> > >>> > From: "Anuradha Talur" <atalur at redhat.com>
> >> > >>> > To: "ABHISHEK PALIWAL" <abhishpaliwal at gmail.com>
> >> > >>> > Cc: gluster-users at gluster.org, gluster-devel at gluster.org
> >> > >>> > Sent: Wednesday, March 16, 2016 5:32:26 PM
> >> > >>> > Subject: Re: [Gluster-users] gluster volume heal info split brain
> >> > >>> command not showing files in split-brain
> >> > >>> >
> >> > >>> >
> >> > >>> >
> >> > >>> > ----- Original Message -----
> >> > >>> > > From: "ABHISHEK PALIWAL" <abhishpaliwal at gmail.com>
> >> > >>> > > To: "Anuradha Talur" <atalur at redhat.com>
> >> > >>> > > Cc: gluster-users at gluster.org, gluster-devel at gluster.org
> >> > >>> > > Sent: Wednesday, March 16, 2016 4:39:26 PM
> >> > >>> > > Subject: Re: [Gluster-users] gluster volume heal info split
> >> brain
> >> > >>> command
> >> > >>> > > not showing files in split-brain
> >> > >>> > >
> >> > >>> > > Hi Anuradha,
> >> > >>> > >
> >> > >>> > > I am doing the same which is mentioned in the link you shared
> >> and It
> >> > >>> has
> >> > >>> > > been resolved the issue.
> >> > >>> > >
> >> > >>> > > But my question is if it is the split-brain scenario then why
> >> the
> >> > >>> command
> >> > >>> > > "gluster volume heal info split-brain"
> >> > >>> > >  not showing these files in the output even not the parent
> >> directory
> >> > >>> is
> >> > >>> > > present in split-brain.
> >> > >>> > >
> >> > >>> > > Please find the requested logs
> >> > >>>
> >> > >>> Abhishek,
> >> > >>>
> >> > >>> Yes, ideally it should show. I will look into it. The only reason I
> >> can
> >> > >>> think of, is when parent directory did not have any pending markers
> >> to
> >> > >>> indicate split-brain; which is why I asked getfattr output for the
> >> parent
> >> > >>> directory too. But if the issue is resolved, there isn't much info
> >> we can
> >> > >>> get out of it. Thanks for sharing the logs. Will see what could have
> >> > >>> caused
> >> > >>> this.
> >> > >>>
> >> > >>> > Abhishek,
> >> > >>> >
> >> > >>> > Yes, ideally it should show. I will look into it.
> >> > >>> > I saw another case with this issue. The parent directory did not
> >> have
> >> > >>> any pe
> >> > >>> > >
> >> > >>> > > On Wed, Mar 16, 2016 at 4:20 PM, Anuradha Talur <
> >> atalur at redhat.com>
> >> > >>> wrote:
> >> > >>> > >
> >> > >>> > > > Hi Abhishek,
> >> > >>> > > >
> >> > >>> > > > The files that are reporting i/o error have gfid-mismatch.
> >> This
> >> > >>> situation
> >> > >>> > > > is called directory or entry split-brain. You can find steps
> >> to
> >> > >>> resolve
> >> > >>> > > > this kind of split brain here :
> >> > >>> > > >
> >> > >>>
> >> https://gluster.readthedocs.org/en/latest/Troubleshooting/split-brain/ .
> >> > >>> > > >
> >> > >>> > > > Ideally, the parent directories of these files have to be
> >> listed
> >> > >>> in heal
> >> > >>> > > > info split-brain output. Can you please get extended
> >> attributes of
> >> > >>> parent
> >> > >>> > > > directories of the files that show i/o error (Same getfattr
> >> > >>> command that
> >> > >>> > > > you previously used.) ?
> >> > >>> > > >
> >> > >>> > > > ----- Original Message -----
> >> > >>> > > > > From: "ABHISHEK PALIWAL" <abhishpaliwal at gmail.com>
> >> > >>> > > > > To: "Anuradha Talur" <atalur at redhat.com>
> >> > >>> > > > > Cc: gluster-users at gluster.org, gluster-devel at gluster.org
> >> > >>> > > > > Sent: Thursday, March 10, 2016 11:22:35 AM
> >> > >>> > > > > Subject: Re: [Gluster-users] gluster volume heal info split
> >> brain
> >> > >>> > > > command not showing files in split-brain
> >> > >>> > > > >
> >> > >>> > > > > Hi Anuradha,
> >> > >>> > > > >
> >> > >>> > > > > Please find the glusterfs and glusterd logs directory as an
> >> > >>> attachment.
> >> > >>> > > > >
> >> > >>> > > > > Regards,
> >> > >>> > > > > Abhishek
> >> > >>> > > > >
> >> > >>> > > > >
> >> > >>> > > > >
> >> > >>> > > > > On Wed, Mar 9, 2016 at 5:54 PM, ABHISHEK PALIWAL <
> >> > >>> > > > abhishpaliwal at gmail.com>
> >> > >>> > > > > wrote:
> >> > >>> > > > >
> >> > >>> > > > > > Hi Anuradha,
> >> > >>> > > > > >
> >> > >>> > > > > > Sorry for late reply.
> >> > >>> > > > > >
> >> > >>> > > > > > Please find the requested logs below:
> >> > >>> > > > > >
> >> > >>> > > > > > Remote: 10.32.0.48
> >> > >>> > > > > > Local : 10.32.1.144
> >> > >>> > > > > >
> >> > >>> > > > > > Local:
> >> > >>> > > > > > #gluster volume heal c_glusterfs info split-brain
> >> > >>> > > > > > Brick 10.32.1.144:/opt/lvmdir/c2/brick
> >> > >>> > > > > > Number of entries in split-brain: 0
> >> > >>> > > > > >
> >> > >>> > > > > > Brick 10.32.0.48:/opt/lvmdir/c2/brick
> >> > >>> > > > > > Number of entries in split-brain: 0
> >> > >>> > > > > >
> >> > >>> > > > > > Remote:
> >> > >>> > > > > > #gluster volume heal c_glusterfs info split-brain
> >> > >>> > > > > > Brick 10.32.1.144:/opt/lvmdir/c2/brick
> >> > >>> > > > > > Number of entries in split-brain: 0
> >> > >>> > > > > >
> >> > >>> > > > > > Brick 10.32.0.48:/opt/lvmdir/c2/brick
> >> > >>> > > > > > Number of entries in split-brain: 0
> >> > >>> > > > > >
> >> > >>> > > > > > auto-sync.sh.
> >> > >>> > > > > > Here you can see that i/o error is detected. Below is the
> >> > >>> required
> >> > >>> > > > > > meta
> >> > >>> > > > > > data from both the bricks.
> >> > >>> > > > > >
> >> > >>> > > > > > 1)
> >> > >>> > > > > > stat: cannot stat
> >> > >>> '/mnt/c//public_html/cello/ior_files/nameroot.ior':
> >> > >>> > > > > > Input/output error
> >> > >>> > > > > > Remote:
> >> > >>> > > > > >
> >> > >>> > > > > > getfattr -d -m . -e hex
> >> > >>> > > > > >
> >> opt/lvmdir/c2/brick/public_html/cello/ior_files/nameroot.ior
> >> > >>> > > > > > # file:
> >> > >>> opt/lvmdir/c2/brick/public_html/cello/ior_files/nameroot.ior
> >> > >>> > > > > > trusted.afr.dirty=0x000000000000000000000000
> >> > >>> > > > > > trusted.bit-rot.version=0x000000000000000256ded2f6000ad80f
> >> > >>> > > > > > trusted.gfid=0x771221a7bb3c4f1aade40ce9e38a95ee
> >> > >>> > > > > >
> >> > >>> > > > > > Local:
> >> > >>> > > > > >
> >> > >>> > > > > > getfattr -d -m . -e hex
> >> > >>> > > > > >
> >> opt/lvmdir/c2/brick/public_html/cello/ior_files/nameroot.ior
> >> > >>> > > > > > # file:
> >> > >>> opt/lvmdir/c2/brick/public_html/cello/ior_files/nameroot.ior
> >> > >>> > > > > > trusted.bit-rot.version=0x000000000000000256ded38f000e3a51
> >> > >>> > > > > > trusted.gfid=0x8ea33f46703c4e2d95c09153c1b858fd
> >> > >>> > > > > >
> >> > >>> > > > > >
> >> > >>> > > > > > 2)
> >> > >>> > > > > > stat: cannot stat '/mnt/c//security/corbasecurity':
> >> > >>> Input/output
> >> > >>> > > > > > error
> >> > >>> > > > > > Remote:
> >> > >>> > > > > >
> >> > >>> > > > > > getfattr -d -m . -e hex
> >> > >>> opt/lvmdir/c2/brick/security/corbasecurity
> >> > >>> > > > > > # file: opt/lvmdir/c2/brick/security/corbasecurity
> >> > >>> > > > > > trusted.afr.dirty=0x000000000000000000000000
> >> > >>> > > > > > trusted.bit-rot.version=0x000000000000000256ded2f6000ad80f
> >> > >>> > > > > > trusted.gfid=0xd298b7a0c8834f3e99abb39741363013
> >> > >>> > > > > >
> >> > >>> > > > > > Local:
> >> > >>> > > > > >
> >> > >>> > > > > > getfattr -d -m . -e hex
> >> > >>> opt/lvmdir/c2/brick/security/corbasecurity
> >> > >>> > > > > > # file: opt/lvmdir/c2/brick/security/corbasecurity
> >> > >>> > > > > > trusted.bit-rot.version=0x000000000000000256ded38f000e3a51
> >> > >>> > > > > > trusted.gfid=0x890df0f706184b52803fac3242a2f15b
> >> > >>> > > > > >
> >> > >>> > > > > > I observed that getfattr command output doesn't show all
> >> the
> >> > >>> fields
> >> > >>> > > > > > all
> >> > >>> > > > > > the times.
> >> > >>> > > > > >
> >> > >>> > > > > > Here you can check that gluster split-brain command hasn't
> >> > >>> reported
> >> > >>> > > > > > any
> >> > >>> > > > > > split-brains but resulted in IO errors when accessed few
> >> files.
> >> > >>> > > > > >
> >> > >>> > > > > > Could you please tell me if "split-brain" command doesn't
> >> > >>> reported
> >> > >>> > > > > > any
> >> > >>> > > > > > entry as output, then is there any way through which we
> >> can
> >> > >>> find out
> >> > >>> > > > that
> >> > >>> > > > > > the files are in split-brain if we are getting the IO
> >> error on
> >> > >>> those
> >> > >>> > > > file.
> >> > >>> > > > > >
> >> > >>> > > > > >
> >> > >>> > > > > > Regards,
> >> > >>> > > > > > Abhishek
> >> > >>> > > > > >
> >> > >>> > > > > >
> >> > >>> > > > > >
> >> > >>> > > > > > On Thu, Mar 3, 2016 at 5:32 PM, Anuradha Talur <
> >> > >>> atalur at redhat.com>
> >> > >>> > > > wrote:
> >> > >>> > > > > >
> >> > >>> > > > > >>
> >> > >>> > > > > >>
> >> > >>> > > > > >> ----- Original Message -----
> >> > >>> > > > > >> > From: "ABHISHEK PALIWAL" <abhishpaliwal at gmail.com>
> >> > >>> > > > > >> > To: gluster-users at gluster.org,
> >> gluster-devel at gluster.org
> >> > >>> > > > > >> > Sent: Thursday, March 3, 2016 12:10:42 PM
> >> > >>> > > > > >> > Subject: [Gluster-users] gluster volume heal info split
> >> > >>> brain
> >> > >>> > > > command
> >> > >>> > > > > >> not     showing files in split-brain
> >> > >>> > > > > >> >
> >> > >>> > > > > >> >
> >> > >>> > > > > >> > Hello,
> >> > >>> > > > > >> >
> >> > >>> > > > > >> > In gluster, we use the command "gluster volume heal
> >> > >>> c_glusterfs
> >> > >>> > > > > >> > info
> >> > >>> > > > > >> > split-brain" to find the files that are in split-brain
> >> > >>> scenario.
> >> > >>> > > > > >> > We run heal script (developed by Windriver prime team)
> >> on
> >> > >>> the
> >> > >>> > > > > >> > files
> >> > >>> > > > > >> reported
> >> > >>> > > > > >> > by above command to resolve split-brain issue.
> >> > >>> > > > > >> >
> >> > >>> > > > > >> > But we observed that the above command is not showing
> >> all
> >> > >>> files
> >> > >>> > > > > >> > that
> >> > >>> > > > > >> are in
> >> > >>> > > > > >> > split-brain,
> >> > >>> > > > > >> > even though split brain scenario actually exists on the
> >> > >>> node.
> >> > >>> > > > > >> >
> >> > >>> > > > > >> > Now a days this issue is seen more often and IO errors
> >> are
> >> > >>> > > > > >> > reported
> >> > >>> > > > when
> >> > >>> > > > > >> > tried to access these files under split-brain.
> >> > >>> > > > > >> >
> >> > >>> > > > > >> > Can you please check why this gluster command is not
> >> > >>> showing files
> >> > >>> > > > under
> >> > >>> > > > > >> > split-brain?
> >> > >>> > > > > >> > We can provide you required logs and support to
> >> resolve this
> >> > >>> > > > > >> > issue.
> >> > >>> > > > > >> Hi,
> >> > >>> > > > > >>
> >> > >>> > > > > >> Could you paste the output of getfattr -m. -de hex
> >> > >>> > > > > >> <path-to-files-in-split-brain> from all the bricks that
> >> the
> >> > >>> files
> >> > >>> > > > > >> lie
> >> > >>> > > > in?
> >> > >>> > > > > >>
> >> > >>> > > > > >> >
> >> > >>> > > > > >> > Please reply on this because I am not getting any reply
> >> > >>> from the
> >> > >>> > > > > >> community.
> >> > >>> > > > > >> >
> >> > >>> > > > > >> > --
> >> > >>> > > > > >> >
> >> > >>> > > > > >> > Regards
> >> > >>> > > > > >> > Abhishek Paliwal
> >> > >>> > > > > >> >
> >> > >>> > > > > >> > _______________________________________________
> >> > >>> > > > > >> > Gluster-users mailing list
> >> > >>> > > > > >> > Gluster-users at gluster.org
> >> > >>> > > > > >> > http://www.gluster.org/mailman/listinfo/gluster-users
> >> > >>> > > > > >>
> >> > >>> > > > > >> --
> >> > >>> > > > > >> Thanks,
> >> > >>> > > > > >> Anuradha.
> >> > >>> > > > > >>
> >> > >>> > > > > >
> >> > >>> > > > >
> >> > >>> > > >
> >> > >>> > > > --
> >> > >>> > > > Thanks,
> >> > >>> > > > Anuradha.
> >> > >>> > > >
> >> > >>> > >
> >> > >>> > >
> >> > >>> > >
> >> > >>> > > --
> >> > >>> > >
> >> > >>> > >
> >> > >>> > >
> >> > >>> > >
> >> > >>> > > Regards
> >> > >>> > > Abhishek Paliwal
> >> > >>> > >
> >> > >>> >
> >> > >>> > --
> >> > >>> > Thanks,
> >> > >>> > Anuradha.
> >> > >>> >
> >> > >>>
> >> > >>> --
> >> > >>> Thanks,
> >> > >>> Anuradha.
> >> > >>>
> >> > >>
> >> > >>
> >> > >>
> >> > >> --
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> Regards
> >> > >> Abhishek Paliwal
> >> > >>
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > Regards
> >> > > Abhishek Paliwal
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> >
> >> >
> >> >
> >> > Regards
> >> > Abhishek Paliwal
> >> >
> >>
> >> --
> >> Thanks,
> >> Anuradha.
> >>
> >
> >
> >
> > --
> >
> >
> >
> >
> > Regards
> > Abhishek Paliwal
> >
> 
> 
> 
> --
> 
> 
> 
> 
> Regards
> Abhishek Paliwal
> 

-- 
Thanks,
Anuradha.


More information about the Gluster-devel mailing list