[Gluster-devel] Strange Error

Ruslan Bondar r.bondar at ctncorp.com
Wed Feb 25 10:30:37 UTC 2009


Krishna,

After moving server and client to a different processes, bug went away. 
Production servers worked for 2 weeks with no errors.

Thank you, very much for your help.

BUT :) we found another bug, it looks like previous but not so simple in 
understanding. We found some strange bug in our web application, when one of 
our server script opens file it gets content of another file. But when we try 
to open this file by _any_ other ways everything works ok. After some 
debugging we understood, that if script tries to get contents of this file by 
any function it always gets content of another file. 
It was 100% reproducible, and file on file system looked ok, so we thought the 
problem was in webserver, and restarted it many times. 
As a result, I tried to restart gluster, and this solved the problem. 

As in previous time, in log i found nothing connecting this both files.

On Monday 09 February 2009 21:31:08 Krishna Srinivas wrote:
> Ruslan,
>
> Indeed it is a strange error. Is it an easy bug to reproduce? By the
> way, don't use single process server and client, we found issues
> regarding locking. If the bug is easy to reproduce you can also check
> if it is seen if server and client are different processes.
>
> Krishna
>
> On Thu, Feb 5, 2009 at 4:25 PM, Ruslan Bondar <r.bondar at ctncorp.com> wrote:
> > Hi,
> >
> >  We found something strange with new glusterfs-2.0.0rc1. Previously, we
> > worked on 1.3.12, and some time ago upgraded to 2.0.0.
> >
> > So our configuration is 2 web servers web1 and web2. On gluster we store
> > only webscripts, for High Availability, in case one server down.
> > Glusterfs configured in simple single process AFR. All changes we made on
> > first server (web1).
> >
> > After some time we found that some script files on _second_ server have
> > content of other files.
> > Making diff of all files md5sum show some files have different checksum
> > on different servers. But this checksums have other files on glusterfs
> > tree. For examples
> > file
> > Web1: /gluster/joinus.php 954fdb7686c1a8836b863fa8a09deeb8
> > Web1: /gluster/lib.php    9cd1fb4db6021d37f3098d361b089f65
> > Web2: /gluster/joinus.php 9cd1fb4db6021d37f3098d361b089f65
> > Web2: /gluster/lib.php    9cd1fb4db6021d37f3098d361b089f65
> >
> > If we open joinus.php and lib.php they are the same on web2, but on web1
> > this file is different.
> >
> > The way that this conflict can be solved is:
> > web1# mv joinus.php joinus.php1
> > web2# mv joinus.php1 joinus.php
> >
> > after this we have:
> > Web1: /gluster/joinus.php 954fdb7686c1a8836b863fa8a09deeb8
> > Web2: /gluster/joinus.php 954fdb7686c1a8836b863fa8a09deeb8
> >
> > In logs i found nothing.
> >
> > Our gluster config web1:
> > ++++++++++++++++++++++++++++++++++
> > # file: /etc/glusterfs/glusterfs-client.vol
> > volume store
> >  type storage/posix
> >  option directory /export/storage0/glusterfs-test
> > end-volume
> >
> > volume store-lock
> >  type features/posix-locks
> >  subvolumes store
> > end-volume
> >
> > volume brick1
> >  type performance/io-threads
> >  option thread-count 4
> >  subvolumes store-lock
> > end-volume
> >
> > volume outserver
> >  type protocol/server
> >  option transport-type tcp/server
> >  option auth.addr.brick1.allow *
> >  subvolumes brick1
> > end-volume
> >
> > volume remote2
> >  type protocol/client
> >  option transport-type tcp/client
> >  option transport-timeout 5
> >  option remote-host web2
> >  option remote-subvolume brick2
> > end-volume
> >
> > volume server
> >  type cluster/afr
> >  subvolumes remote2 brick1
> >  option read-subvolume brick1
> > end-volume
> > +++++++++++++++++++++++++++++++++++++
> >
> > gluster on web2:
> > +++++++++++++++++++++++++++++++++++++
> > # file: /etc/glusterfs/glusterfs-client.vol
> >
> > volume store
> >  type storage/posix
> >  option directory /export/storage0/glusterfs-test
> > end-volume
> >
> > volume store-lock
> >  type features/posix-locks
> >  subvolumes store
> > end-volume
> >
> > volume brick2
> >  type performance/io-threads
> >  option thread-count 4
> >  subvolumes store-lock
> > end-volume
> >
> > volume outserver
> >  type protocol/server
> >  option transport-type tcp/server
> >  option auth.addr.brick2.allow *
> >  subvolumes brick2
> > end-volume
> >
> > volume remote1
> >  type protocol/client
> >  option transport-type tcp/client
> >  option transport-timeout 5
> >  option remote-host web1
> >  option remote-subvolume brick1
> > end-volume
> >
> > volume server
> >  type cluster/afr
> >  subvolumes brick2 remote1
> >  option read-subvolume brick2
> > end-volume
> > +++++++++++++++++++++++++++++++++++++
> >


-- 
Best regards,
  Ruslan Bondar
  Skype: b0rland
  mailto:rus at iq-labs.net 





More information about the Gluster-devel mailing list