[Gluster-users] Fwd: files not syncing up with glusterfs 3.1.2
Joe Landman
landman at scalableinformatics.com
Mon Feb 21 18:47:13 UTC 2011
On 02/21/2011 01:39 PM, Kon Wilms wrote:
> On Mon, Feb 21, 2011 at 9:45 AM, Steve Wilson<stevew at purdue.edu> wrote:
>> We had trouble with reliability for small, actively-accessed files on a
>> distribute-replicate volume in both GlusterFS 3.11 and 3.12. It seems that
>> the replicated servers would eventually get out of sync with each other on
>> these kinds of files. For a while, we dropped replication and only ran the
>> volume as distributed. This has worked reliably for the past week or so
>> without any errors that we were seeing before: no such file, invalid
>> argument, etc.
>
> I'm running thousands of small files over NFSv3 through NGINX with
> distribute and have had the opposite experience. Unfortunately when
> NGINX can't access a file over NFS it means a customer calling us, so
> right now gluster is basically sitting idle (posted my output to the
> list a while back with no response).
We've had lots of issues with files disappearing or being inaccessible
prior to 3.1.2 with the NFS client and server translator. After 3.1.2,
many of these problems *seem* to have been resolved, though all this
means in this instance is that the customer hasn't submitted a ticket yet.
I had thought it was originally a timebase issue ... as we had a minute
or two drift on some of the nodes (since fixed). But we had a pretty
consistent error in this regard.
We did open problem reports. Unfortunately, no action so far (they just
closed them this morning, though nothing has been solved per se, the
issue simply has not yet resurfaced). I'll leave those reports closed
for now.
This said, this error, or one with a very similar signature, has been in
the code since the 2.x series. I really ... really want to track it
down, but I can't create a simple replicator for it to present to the
team. If you have what you think is a simple replicator, please, email
me offline. We'll try it here, and if we can get it down to a very
simple replication case and test, we'll re-open the bugs.
I'd hate to think its a heisenbug, but that is where I am leaning now.
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web : http://scalableinformatics.com
http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
More information about the Gluster-users
mailing list