[Gluster-users] Gluster 3.3 Questions

Tue Dec 18 19:33:40 UTC 2012

Hi All,

I am having a few problems with a gluster configuration I'm using.  The
issues are:

1) Sometimes the gluster client running on ServerA stops serving files.
Doing an "ls" on the mount point returns an empty directory.  All the other
clients seem fine when this happens.  Unmounting and remounting the gluster
directory temporarily "fixes" the problem.  Sometimes it fixes it for a few
minutes, sometimes it fixes it for a day.
2) The log files in /var/log/glusterfs are not being rotated on ServerA.
They are being rotated on ServerB.
3) On ServerB I have both /etc/glusterd and /etc/glusterfs.  ServerA and
the pure clients have only /etc/glusterfs.

Here is some info on my setup, but if there is any info missing please let
me know and I'll provide it.

Gluster version: 3.3.0
OS: Ubuntu 12.04 (running on EC2)

On ServerA the following is filling up the
/var/log/glusterfs/glustershd.log file:

[2012-12-18 19:20:46.819623] I [afr-common.c:1340:afr_launch_self_heal]
0-default-replicate-0: background  entry self-heal triggered. path:
<gfid:d88ad693-86fd-49eb-9360-7fe89d0e6cf6>, reason: lookup detected
pending operations
[2012-12-18 19:20:46.831481] E
[afr-self-heal-common.c:1087:afr_sh_common_lookup_resp_handler]
0-default-replicate-0: path
<gfid:d88ad693-86fd-49eb-9360-7fe89d0e6cf6>/test_quote.pdf on subvolume
default-client-1 => -1 (No such file or directory)
[2012-12-18 19:20:46.831512] I
[afr-self-heal-entry.c:1904:afr_sh_entry_common_lookup_done]
0-default-replicate-0:
<gfid:d88ad693-86fd-49eb-9360-7fe89d0e6cf6>/test_quote.pdf: Skipping entry
self-heal because of gfid absence
[2012-12-18 19:20:46.833554] E
[afr-self-heal-common.c:2156:afr_self_heal_completion_cbk]
0-default-replicate-0: background  entry self-heal failed on
<gfid:d88ad693-86fd-49eb-9360-7fe89d0e6cf6>

I have a single replicated volume called "default".  There are two servers
each with one brick.

gluster> volume info

Volume Name: default
Type: Replicate
Volume ID: cb46f3ac-2ae1-4c9d-a2af-0df242b2acd3
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: ServerA:/ebs/gluster/default
Brick2: ServerB:/ebs/gluster/default

gluster> volume status all
Status of volume: default
Gluster process                        Port    Online    Pid
------------------------------------------------------------------------------
Brick ServerA:/ebs/gluster/default     24009    Y    3575
Brick ServerB:/ebs/gluster/default     24009    Y    2241
NFS Server on localhost                38467    Y    3581
Self-heal Daemon on localhost          N/A      Y    3587
NFS Server on ServerB                  38467    Y    2247
Self-heal Daemon on ServerB            N/A      Y    2253

In addition to ServerA and ServerB (which are also running the gluster
client) there are about 10 other systems acting as pure clients.

Does anybody have any ideas what might be causing my problems?  Or
additional things to check?

Thanks in advance!
- chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121218/05428c36/attachment.html>