[Gluster-users] what action is required for this log entry?
Khoi Mai
KHOIMAI at UP.COM
Wed Dec 11 04:49:40 UTC 2013
Gluster community,
[2013-12-11 04:40:06.609091] W
[server-resolve.c:419:resolve_anonfd_simple] 0-server: inode for the gfid
(76240621-1362-494d-a70a-f5824c3ce56e) is not found. anonymous fd creation
failed
[2013-12-11 04:40:06.610588] W
[server-resolve.c:419:resolve_anonfd_simple] 0-server: inode for the gfid
(03ada1a2-ee51-4c85-a79f-a72aabde116d) is not found. anonymous fd creation
failed
[2013-12-11 04:40:06.616978] W
[server-resolve.c:419:resolve_anonfd_simple] 0-server: inode for the gfid
(64fbc834-e00b-4afd-800e-97d64a32de92) is not found. anonymous fd creation
failed
[2013-12-11 04:40:06.617069] W
[server-resolve.c:419:resolve_anonfd_simple] 0-server: inode for the gfid
(64fbc834-e00b-4afd-800e-97d64a32de92) is not found. anonymous fd creation
failed
[2013-12-11 04:40:06.624845] W
[server-resolve.c:419:resolve_anonfd_simple] 0-server: inode for the gfid
(27837527-5dea-4367-a050-248a6266b2db) is not found. anonymous fd creation
failed
followed by
[2013-12-11 04:40:10.462202] W
[marker-quota.c:2039:mq_inspect_directory_xattr] 0-devstatic-marker:
cannot add a new contribution node
[2013-12-11 04:40:29.331476] W
[marker-quota.c:2039:mq_inspect_directory_xattr] 0-devstatic-marker:
cannot add a new contribution node
[2013-12-11 04:40:53.125088] W
[marker-quota.c:2039:mq_inspect_directory_xattr] 0-devstatic-marker:
cannot add a new contribution node
[2013-12-11 04:41:00.975222] W
[marker-quota.c:2039:mq_inspect_directory_xattr] 0-devstatic-marker:
cannot add a new contribution node
[2013-12-11 04:41:01.517990] W
[marker-quota.c:2039:mq_inspect_directory_xattr] 0-devstatic-marker:
cannot add a new contribution node
Tue Dec 10 22:41:01 CST 2013
[2013-12-11 04:41:05.874819] W
[marker-quota.c:2039:mq_inspect_directory_xattr] 0-devstatic-marker:
cannot add a new contribution node
[2013-12-11 04:41:05.878135] W
[marker-quota.c:2039:mq_inspect_directory_xattr] 0-devstatic-marker:
cannot add a new contribution node
Tue Dec 10 22:42:01 CST 2013
[2013-12-11 04:42:05.136054] W
[marker-quota.c:2039:mq_inspect_directory_xattr] 0-devstatic-marker:
cannot add a new contribution node
[2013-12-11 04:42:05.330591] W
[marker-quota.c:2039:mq_inspect_directory_xattr] 0-devstatic-marker:
cannot add a new contribution node
[2013-12-11 04:42:41.224927] W
[marker-quota.c:2039:mq_inspect_directory_xattr] 0-devstatic-marker:
cannot add a new contribution node
Please help me understand what is being logged from the
/var/log/glusterfs/bricks/static-content.log file
Here is my config for this particular brick in a 4 node distr/rep design.
cat /var/lib/glusterd/vols/devstatic/devstatic.host2.static-content.vol
volume devstatic-posix
type storage/posix
option volume-id 75832afb-f20e-4018-8d74-8550a92233fc
option directory /static/content
end-volume
volume devstatic-access-control
type features/access-control
subvolumes devstatic-posix
end-volume
volume devstatic-locks
type features/locks
subvolumes devstatic-access-control
end-volume
volume devstatic-io-threads
type performance/io-threads
subvolumes devstatic-locks
end-volume
volume devstatic-index
type features/index
option index-base /static/content/.glusterfs/indices
subvolumes devstatic-io-threads
end-volume
volume devstatic-marker
type features/marker
option quota on
option xtime off
option timestamp-file /var/lib/glusterd/vols/devstatic/marker.tstamp
option volume-uuid 75832afb-f20e-4018-8d74-8550a92233fc
subvolumes devstatic-index
end-volume
volume /static/content
type debug/io-stats
option count-fop-hits off
option latency-measurement off
subvolumes devstatic-marker
end-volume
volume devstatic-server
type protocol/server
option auth.addr./static/content.allow *
option auth.login.6173ce00-d694-4793-a755-cd1d80f5001f.password
13702989-510c-44c1-9bc4-8f1f21b65403
option auth.login./static/content.allow
6173ce00-d694-4793-a755-cd1d80f5001f
option transport-type tcp
subvolumes /static/content
end-volume
Khoi Mai
From: gluster-users-request at gluster.org
To: gluster-users at gluster.org
Date: 12/10/2013 05:58 AM
Subject: Gluster-users Digest, Vol 68, Issue 11
Sent by: gluster-users-bounces at gluster.org
Send Gluster-users mailing list submissions to
gluster-users at gluster.org
To subscribe or unsubscribe via the World Wide Web, visit
http://supercolony.gluster.org/mailman/listinfo/gluster-users
or, via email, send a message with subject or body 'help' to
gluster-users-request at gluster.org
You can reach the person managing the list at
gluster-users-owner at gluster.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Gluster-users digest..."
Today's Topics:
1. Re: Testing failover and recovery (Per Hallsmark)
2. Gluster - replica - Unable to self-heal contents of '/'
(possible split-brain) (Alexandru Coseru)
3. Gluster infrastructure question (Heiko Kr?mer)
4. Re: How reliable is XFS under Gluster? (Kal Black)
5. Re: Gluster infrastructure question (Nux!)
6. Scalability - File system or Object Store (Randy Breunling)
7. Re: Scalability - File system or Object Store (Jay Vyas)
8. Re: Gluster infrastructure question (Joe Julian)
9. Re: [Gluster-devel] GlusterFest Test Weekend - 3.5 Test #1
(John Mark Walker)
10. Re: Gluster infrastructure question (Nux!)
11. compatibility between 3.3 and 3.4 (samuel)
12. Re: Gluster infrastructure question (bernhard glomm)
13. Re: Gluster infrastructure question (Ben Turner)
14. Re: Gluster infrastructure question (Ben Turner)
15. Re: Scalability - File system or Object Store (Jeff Darcy)
16. Re: Gluster infrastructure question (Dan Mons)
17. Re: Gluster infrastructure question (Joe Julian)
18. Re: Gluster infrastructure question (Dan Mons)
19. Re: [CentOS 6] Upgrade to the glusterfs version in base or in
glusterfs-epel (Diep Pham Van)
20. Where does the 'date' string in '/var/log/glusterfs/gl.log'
come from? (harry mangalam)
21. Re: Where does the 'date' string in
'/var/log/glusterfs/gl.log' come from? (Sharuzzaman Ahmat Raslan)
22. FW: Self Heal Issue GlusterFS 3.3.1 (Bobby Jacob)
23. Re: Self Heal Issue GlusterFS 3.3.1 (Joe Julian)
24. Pausing rebalance (Franco Broi)
25. Re: Where does the 'date' string in
'/var/log/glusterfs/gl.log' come from? (Vijay Bellur)
26. Re: Pausing rebalance (shishir gowda)
27. Re: replace-brick failing - transport.address-family not
specified (Vijay Bellur)
28. Re: [CentOS 6] Upgrade to the glusterfs version in base or in
glusterfs-epel (Vijay Bellur)
29. Re: Pausing rebalance (Franco Broi)
30. Re: replace-brick failing - transport.address-family not
specified (Vijay Bellur)
31. Re: Pausing rebalance (Kaushal M)
32. Re: Pausing rebalance (Franco Broi)
33. Re: Self Heal Issue GlusterFS 3.3.1 (Bobby Jacob)
34. Structure needs cleaning on some files (Johan Huysmans)
35. Re: replace-brick failing - transport.address-family
not
specified (Bernhard Glomm)
36. Re: Structure needs cleaning on some files (Johan Huysmans)
37. Re: Gluster infrastructure question (Heiko Kr?mer)
38. Re: Errors from PHP stat() on files and directories in a
glusterfs mount (Johan Huysmans)
39. Re: Gluster infrastructure question (Andrew Lau)
40. Re: replace-brick failing - transport.address-family not
specified (Vijay Bellur)
41. Re: Gluster - replica - Unable to self-heal contents of '/'
(possible split-brain) (Vijay Bellur)
42. Error after crash of Virtual Machine during migration
(Mariusz Sobisiak)
43. Re: Structure needs cleaning on some files (Johan Huysmans)
----------------------------------------------------------------------
Message: 1
Date: Mon, 9 Dec 2013 14:12:22 +0100
From: Per Hallsmark <per at hallsmark.se>
To: gluster-users at gluster.org
Subject: Re: [Gluster-users] Testing failover and recovery
Message-ID:
<CAPaVuL-DL8R3GBNzv9fMJq-rTOYCs=NufTf-B5V7xKpoNML+7Q at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
Hello,
Interesting, we seems to be several users with issues regarding recovery
but there is no to little replies... ;-)
I did some more testing over the weekend. Same initial workload (two
glusterfs servers, one client that continuesly
updates a file with timestamps) and then two easy testcases:
1. one of the glusterfs servers is constantly rebooting (just a initscript
that sleeps for 60 seconds before issuing "reboot")
2. similar to 1 but instead of rebooting itself, it is rebooting the other
glusterfs server so that the result is that they a server
comes up, wait for a bit and then rebooting the other server.
During the whole weekend this has progressed nicely. The client is running
all the time without issues and the glusterfs
that comes back (either only one or one of the servers, depending on the
testcase shown above) is actively getting into
sync and updates it's copy of the file.
So it seems to me that we need to look deeper in the recovery case (of
course, but it is interesting to know about the
nice&easy usescases as well). I'm surprised that the recovery from a
failover (to restore the rendundancy) isn't getting
higher attention here. Are we (and others that has difficulties in this
area) running a unusual usecase?
BR,
Per
On Wed, Dec 4, 2013 at 12:17 PM, Per Hallsmark <per at hallsmark.se> wrote:
> Hello,
>
> I've found GlusterFS to be an interesting project. Not so much
experience
> of it
> (although from similar usecases with DRBD+NFS setups) so I setup some
> testcase to try out failover and recovery.
>
> For this I have a setup with two glusterfs servers (each is a VM) and
one
> client (also a VM).
> I'm using GlusterFS 3.4 btw.
>
> The servers manages a gluster volume created as:
>
> gluster volume create testvol rep 2 transport tcp gs1:/export/vda1/brick
> gs2:/export/vda1/brick
> gluster volume start testvol
> gluster volume set testvol network.ping-timeout 5
>
> Then the client mounts this volume as:
> mount -t glusterfs gs1:/testvol /import/testvol
>
> Everything seems to work good in normal usecases, I can write/read to
the
> volume, take servers down and up again etc.
>
> As a fault scenario, I'm testing a fault injection like this:
>
> 1. continuesly writing timestamps to a file on the volume from the
client.
> It is automated in a smaller testscript like:
> :~/glusterfs-test$ cat scripts/test-gfs-client.sh
> #!/bin/sh
>
> gfs=/import/testvol
>
> while true; do
> date +%s >> $gfs/timestamp.txt
> ts=`tail -1 $gfs/timestamp.txt`
> md5sum=`md5sum $gfs/timestamp.txt | cut -f1 -d" "`
> echo "Timestamp = $ts, md5sum = $md5sum"
> sleep 1
> done
> :~/glusterfs-test$
>
> As can be seen, the client is a quite simple user of the glusterfs
volume.
> Low datarate and single user for example.
>
>
> 2. disabling ethernet in one of the VM (ifconfig eth0 down) to simulate
> like a broken network
>
> 3. After a short while, the failed server is brought alive again
(ifconfig
> eth0 up)
>
> Step 2 and 3 is also automated in a testscript like:
>
> :~/glusterfs-test$ cat scripts/fault-injection.sh
> #!/bin/sh
>
> # fault injection script tailored for two glusterfs nodes named gs1 and
gs2
>
> if [ "$HOSTNAME" == "gs1" ]; then
> peer="gs2"
> else
> peer="gs1"
> fi
>
> inject_eth_fault() {
> echo "network down..."
> ifconfig eth0 down
> sleep 10
> ifconfig eth0 up
> echo "... and network up again."
> }
>
> recover() {
> echo "recovering from fault..."
> service glusterd restart
> }
>
> while true; do
> sleep 60
> if [ ! -f /tmp/nofault ]; then
> if ping -c 1 $peer; then
> inject_eth_fault
> recover
> fi
> fi
> done
> :~/glusterfs-test$
>
>
> I then see that:
>
> A. This goes well first time, one server leaves the cluster and the
client
> hang for like 8 seconds before beeing able to write to the volume again.
>
> B. When the failed server comes back, I can check that from both servers
> they see each other and "gluster peer status" shows they believe the
other
> is in connected state.
>
> C. When the failed server comes back, it is not automatically seeking
> active participation on syncing volume etc (the local storage timestamp
> file isn't updated).
>
> D. If I do restart of glusterd service (service glusterd restart) the
> failed node seems to get back like it was before. Not always though...
The
> chance is higher if I have long time between fault injections (long = 60
> sec or so, with a forced faulty state of 10 sec)
> With a period time of some minutes, I could have the cluster servicing
the
> client OK for up to 8+ hours at least.
> Shortening the period, I'm easily down to like 10-15 minutes.
>
> E. Sooner or later I enter a state where the two servers seems to be up,
> seeing it's peer (gluster peer status) and such but none is serving the
> volume to the client.
> I've tried to "heal" the volume in different way but it doesn't help.
> Sometimes it is just that one of the timestamp copies in each of
> the servers is ahead which is simpler but sometimes both the timestamp
> files have added data at end that the other doesnt have.
>
> To the questions:
>
> * Is it so that from a design point of perspective, the choice in the
> glusterfs team is that one shouldn't rely soley on glusterfs daemons
beeing
> able to recover from a faulty state? There is need for cluster manager
> services (like heartbeat for example) to be part? That would make
> experience C understandable and one could then take heartbeat or similar
> packages to start/stop services.
>
> * What would then be the recommended procedure to recover from a faulty
> glusterfs node? (so that experience D and E is not happening)
>
> * What is the expected failover timing (of course depending on config,
but
> say with a give ping timeout etc)?
> and expected recovery timing (with similar dependency on config)?
>
> * What/how is glusterfs team testing to make sure that the failover,
> recovery/healing functionality etc works?
>
> Any opinion if the testcase is bad is of course also very welcome.
>
> Best regards,
> Per
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/69c23114/attachment-0001.html
>
------------------------------
Message: 2
Date: Mon, 9 Dec 2013 15:51:31 +0200
From: "Alexandru Coseru" <alex.coseru at simplus.ro>
To: <gluster-users at gluster.org>
Subject: [Gluster-users] Gluster - replica - Unable to self-heal
contents of '/' (possible split-brain)
Message-ID: <01fe01cef4e5$c3f2cb00$4bd86100$@coseru at simplus.ro>
Content-Type: text/plain; charset="us-ascii"
Hello,
I'm trying to build a replica volume, on two servers.
The servers are: blade6 and blade7. (another blade1 in the peer, but
with
no volumes)
The volume seems ok, but I cannot mount it from NFS.
Here are some logs:
[root at blade6 stor1]# df -h
/dev/mapper/gluster_stor1 882G 200M 837G 1% /gluster/stor1
[root at blade7 stor1]# df -h
/dev/mapper/gluster_fast 846G 158G 646G 20% /gluster/stor_fast
/dev/mapper/gluster_stor1 882G 72M 837G 1% /gluster/stor1
[root at blade6 stor1]# pwd
/gluster/stor1
[root at blade6 stor1]# ls -lh
total 0
[root at blade7 stor1]# pwd
/gluster/stor1
[root at blade7 stor1]# ls -lh
total 0
[root at blade6 stor1]# gluster volume info
Volume Name: stor_fast
Type: Distribute
Volume ID: ad82b554-8ff0-4903-be32-f8dcb9420f31
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: blade7.xen:/gluster/stor_fast
Options Reconfigured:
nfs.port: 2049
Volume Name: stor1
Type: Replicate
Volume ID: 6bd88164-86c2-40f6-9846-b21e90303e73
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: blade7.xen:/gluster/stor1
Brick2: blade6.xen:/gluster/stor1
Options Reconfigured:
nfs.port: 2049
[root at blade7 stor1]# gluster volume info
Volume Name: stor_fast
Type: Distribute
Volume ID: ad82b554-8ff0-4903-be32-f8dcb9420f31
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: blade7.xen:/gluster/stor_fast
Options Reconfigured:
nfs.port: 2049
Volume Name: stor1
Type: Replicate
Volume ID: 6bd88164-86c2-40f6-9846-b21e90303e73
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: blade7.xen:/gluster/stor1
Brick2: blade6.xen:/gluster/stor1
Options Reconfigured:
nfs.port: 2049
[root at blade6 stor1]# gluster volume status
Status of volume: stor_fast
Gluster process Port Online Pid
----------------------------------------------------------------------------
--
Brick blade7.xen:/gluster/stor_fast 49152 Y 1742
NFS Server on localhost 2049 Y
20074
NFS Server on blade1.xen 2049 Y 22255
NFS Server on blade7.xen 2049 Y 7574
There are no active volume tasks
Status of volume: stor1
Gluster process Port Online Pid
----------------------------------------------------------------------------
--
Brick blade7.xen:/gluster/stor1 49154 Y 7562
Brick blade6.xen:/gluster/stor1 49154 Y 20053
NFS Server on localhost 2049 Y
20074
Self-heal Daemon on localhost N/A Y
20079
NFS Server on blade1.xen 2049 Y 22255
Self-heal Daemon on blade1.xen N/A Y 22260
NFS Server on blade7.xen 2049 Y 7574
Self-heal Daemon on blade7.xen N/A Y 7578
There are no active volume tasks
[root at blade7 stor1]# gluster volume status
Status of volume: stor_fast
Gluster process Port Online Pid
----------------------------------------------------------------------------
--
Brick blade7.xen:/gluster/stor_fast 49152 Y 1742
NFS Server on localhost 2049 Y 7574
NFS Server on blade6.xen 2049 Y 20074
NFS Server on blade1.xen 2049 Y 22255
There are no active volume tasks
Status of volume: stor1
Gluster process Port Online Pid
----------------------------------------------------------------------------
--
Brick blade7.xen:/gluster/stor1 49154 Y 7562
Brick blade6.xen:/gluster/stor1 49154 Y 20053
NFS Server on localhost 2049 Y 7574
Self-heal Daemon on localhost N/A Y 7578
NFS Server on blade1.xen 2049 Y 22255
Self-heal Daemon on blade1.xen N/A Y 22260
NFS Server on blade6.xen 2049 Y 20074
Self-heal Daemon on blade6.xen N/A Y 20079
There are no active volume tasks
[root at blade6 stor1]# gluster peer status
Number of Peers: 2
Hostname: blade1.xen
Port: 24007
Uuid: 194a57a7-cb0e-43de-a042-0ac4026fd07b
State: Peer in Cluster (Connected)
Hostname: blade7.xen
Port: 24007
Uuid: 574eb256-30d2-4639-803e-73d905835139
State: Peer in Cluster (Connected)
[root at blade7 stor1]# gluster peer status
Number of Peers: 2
Hostname: blade6.xen
Port: 24007
Uuid: a65cadad-ef79-4821-be41-5649fb204f3e
State: Peer in Cluster (Connected)
Hostname: blade1.xen
Uuid: 194a57a7-cb0e-43de-a042-0ac4026fd07b
State: Peer in Cluster (Connected)
[root at blade6 stor1]# gluster volume heal stor1 info
Gathering Heal info on volume stor1 has been successful
Brick blade7.xen:/gluster/stor1
Number of entries: 0
Brick blade6.xen:/gluster/stor1
Number of entries: 0
[root at blade7 stor1]# gluster volume heal stor1 info
Gathering Heal info on volume stor1 has been successful
Brick blade7.xen:/gluster/stor1
Number of entries: 0
Brick blade6.xen:/gluster/stor1
Number of entries: 0
When I'm trying to mount the volume with NFS, I have the following errors:
[2013-12-09 13:20:52.066978] E
[afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
0-stor1-replicate-0: Unable to self-heal contents of '/' (possible
split-brain). Please delete the file from all but the preferred
subvolume.-
Pending matrix: [ [ 0 2 ] [ 2 0 ] ]
[2013-12-09 13:20:52.067386] E
[afr-self-heal-common.c:2212:afr_self_heal_completion_cbk]
0-stor1-replicate-0: background meta-data self-heal failed on /
[2013-12-09 13:20:52.067452] E [mount3.c:290:mnt3svc_lookup_mount_cbk]
0-nfs: error=Input/output error
[2013-12-09 13:20:53.092039] E
[afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
0-stor1-replicate-0: Unable to self-heal contents of '/' (possible
split-brain). Please delete the file from all but the preferred
subvolume.-
Pending matrix: [ [ 0 2 ] [ 2 0 ] ]
[2013-12-09 13:20:53.092497] E
[afr-self-heal-common.c:2212:afr_self_heal_completion_cbk]
0-stor1-replicate-0: background meta-data self-heal failed on /
[2013-12-09 13:20:53.092559] E [mount3.c:290:mnt3svc_lookup_mount_cbk]
0-nfs: error=Input/output error
What I'm doing wrong ?
PS: Volume stor_fast works like a charm.
Best Regards,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/b0b21677/attachment-0001.html
>
------------------------------
Message: 3
Date: Mon, 09 Dec 2013 14:18:28 +0100
From: Heiko Kr?mer <hkraemer at anynines.de>
To: "gluster-users at gluster.org List" <gluster-users at gluster.org>
Subject: [Gluster-users] Gluster infrastructure question
Message-ID: <52A5C324.4090408 at anynines.de>
Content-Type: text/plain; charset="iso-8859-1"
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Heyho guys,
I'm running since years glusterfs in a small environment without big
problems.
Now I'm going to use glusterFS for a bigger cluster but I've some
questions :)
Environment:
* 4 Servers
* 20 x 2TB HDD, each
* Raidcontroller
* Raid 10
* 4x bricks => Replicated, Distributed volume
* Gluster 3.4
1)
I'm asking me, if I can delete the raid10 on each server and create
for each HDD a separate brick.
In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is there
any experience about the write throughput in a production system with
many of bricks like in this case? In addition i'll get double of HDD
capacity.
2)
I've heard a talk about glusterFS and out scaling. The main point was
if more bricks are in use, the scale out process will take a long
time. The problem was/is the Hash-Algo. So I'm asking me how is it if
I've one very big brick (Raid10 20TB on each server) or I've much more
bricks, what's faster and is there any issues?
Is there any experiences ?
3)
Failover of a HDD is for a raid controller with HotSpare HDD not a big
deal. Glusterfs will rebuild automatically if a brick fails and there
are no data present, this action will perform a lot of network traffic
between the mirror bricks but it will handle it equal as the raid
controller right ?
Thanks and cheers
Heiko
- --
Anynines.com
Avarteq GmbH
B.Sc. Informatik
Heiko Kr?mer
CIO
Twitter: @anynines
- ----
Gesch?ftsf?hrer: Alexander Fai?t, Dipl.-Inf.(FH) Julian Fischer
Handelsregister: AG Saarbr?cken HRB 17413, Ust-IdNr.: DE262633168
Sitz: Saarbr?cken
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQEcBAEBAgAGBQJSpcMfAAoJELxFogM4ixOF/ncH/3L9DvOWHrF0XBqCgeT6QQ6B
lDwtXiD9xoznht0Zs2S9LA9Z7r2l5/fzMOUSOawEMv6M16Guwq3gQ1lClUi4Iwj0
GKKtYQ6F4aG4KXHY4dlu1QKT5OaLk8ljCQ47Tc9aAiJMhfC1/IgQXOslFv26utdJ
N9jxiCl2+r/tQvQRw6mA4KAuPYPwOV+hMtkwfrM4UsIYGGbkNPnz1oqmBsfGdSOs
TJh6+lQRD9KYw72q3I9G6ZYlI7ylL9Q7vjTroVKH232pLo4G58NLxyvWvcOB9yK6
Bpf/gRMxFNKA75eW5EJYeZ6EovwcyCAv7iAm+xNKhzsoZqbBbTOJxS5zKm4YWoY=
=bDly
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hkraemer.vcf
Type: text/x-vcard
Size: 277 bytes
Desc: not available
URL: <
http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/d70112ef/attachment-0001.vcf
>
------------------------------
Message: 4
Date: Mon, 9 Dec 2013 09:51:41 -0500
From: Kal Black <kaloblak at gmail.com>
To: Paul Robert Marino <prmarino1 at gmail.com>
Cc: "gluster-users at gluster.org" <gluster-users at gluster.org>
Subject: Re: [Gluster-users] How reliable is XFS under Gluster?
Message-ID:
<CADZk1LMcRjn=qG-mWbc5S8SeJtkFB2AZica2NKuU3Z7mwQ=2kQ at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
Thank you all for the wonderful input,
I haven't used extensively XFS so far and my concerns primarily came from
reading an article (mostly the discussion after it) by Jonathan Corbetrom
on LWN (http://lwn.net/Articles/476263/) and another one
http://toruonu.blogspot.ca/2012/12/xfs-vs-ext4.html. They are both
relatively recent and I was under the impression the XFS still has
problems, in certain cases of power loss, where the metadata and the
actual
data are not being in sync, which might lead existing data being
corrupted.
But again, like Paul Robert Marino pointed out, choosing a right IO
scheduler might greatly reduce the risk of this to happen.
On Sun, Dec 8, 2013 at 11:04 AM, Paul Robert Marino
<prmarino1 at gmail.com>wrote:
> XFS is fine Ive been using it on various distros in production for
> over a decade now and I've rarely had any problems with it and when I
> have they have been trivial to fix which is something I honestly cant
> say about ext3 or ext4.
>
> Usually when there is a power failure during a write if the
> transaction wasn't completely committed to the disk it is rolled back
> via the journal.the one exception to this is when you have a battery
> backed cache where the battery discharges before power is restored, or
> a very cheap consumer grade disk which uses its cache for writes and
> lies about the sync state.
> in either of these scenarios any file system will have problems.
>
> Out of any of the filesystems Ive worked with in general XFS handles
> the battery discharge senario the cleanest and is the easiest to
> recover.
> if you have the second scenario with the cheap disks with a cache that
> lies nothing will help you not even a fsync because the hardware lies.
> Also the subject of fsync is a little more complicated than most
> people think there are several kinds of fsync and each behaves
> differently on different filesystems. PostgreSQL has documentation
> about it here
> http://www.postgresql.org/docs/9.1/static/runtime-config-wal.html
> looks at wal_sync_method if you would like to have a better about how
> fsync works without getting too deep into the subject.
>
> By the way most apps don't need to do fsyncs and it would bring your
> system to a crawl if they all did so take people saying
> all programs should fsync with a grain of salt.
>
> In most cases when these problems come up its really that they didn't
> set the right IO scheduler for what the server does. For example CFQ
> which is the EL default can leave your write in ram cache for quite a
> while before sending it to disk in an attempt to optimize your IO;
> however the deadline scheduler will attempt to optimize your IO but
> will predictably sync it to disk after a period of time regardless of
> whether it was able to fully optimize it or not. Also there is noop
> which does no optimization at all and leaves every thing to the
> hardware, this is common and recommended for VM's and there is some
> argument to use it with high end raid controllers for things like
> financial data where you need to absolutely ensure the write happen
> ASAP because there may be fines or other large penalties if you loose
> any data.
>
>
>
> On Sat, Dec 7, 2013 at 3:04 AM, Franco Broi <Franco.Broi at iongeo.com>
> wrote:
> > Been using ZFS for about 9 months and am about to add as other 400TB,
no
> > issues so far.
> >
> > On 7 Dec 2013 04:23, Brian Foster <bfoster at redhat.com> wrote:
> > On 12/06/2013 01:57 PM, Kal Black wrote:
> >> Hello,
> >> I am in the point of picking up a FS for new brick nodes. I was used
to
> >> like and use ext4 until now but I recently red for an issue
introduced
> by
> >> a
> >> patch in ext4 that breaks the distributed translator. In the same
time,
> it
> >> looks like the recommended FS for a brick is no longer ext4 but XFS
> which
> >> apparently will also be the default FS in the upcoming RedHat7. On
the
> >> other hand, XFS is being known as a file system that can be easily
> >> corrupted (zeroing files) in case of a power failure. Supporters of
the
> >> file system claim that this should never happen if an application has
> been
> >> properly coded (properly committing/fsync-ing data to storage) and
the
> >> storage itself has been properly configured (disk cash disabled on
> >> individual disks and battery backed cache used on the controllers).
My
> >> question is, should I be worried about losing data in a power failure
or
> >> similar scenarios (or any) using GlusterFS and XFS? Are there best
> >> practices for setting up a Gluster brick + XFS? Has the ext4 issue
been
> >> reliably fixed? (my understanding is that this will be impossible
unless
> >> ext4 isn't being modified to allow popper work with Gluster)
> >>
> >
> > Hi Kal,
> >
> > You are correct in that Red Hat recommends using XFS for gluster
bricks.
> > I'm sure there are plenty of ext4 (and other fs) users as well, so
other
> > users should chime in as far as real experiences with various brick
> > filesystems goes. Also, I believe the dht/ext issue has been resolved
> > for some time now.
> >
> > With regard to "XFS zeroing files on power failure," I'd suggest you
> > check out the following blog post:
> >
> >
>
http://sandeen.net/wordpress/computers/xfs-does-not-null-files-and-requires-no-flux/
> >
> > My cursory understanding is that there were apparently situations
where
> > the inode size of a recently extended file would be written to the log
> > before the actual extending data is written to disk, thus creating a
> > crash window where the updated size would be seen, but not the actual
> > data. In other words, this isn't a "zeroing files" behavior in as much
> > as it is an ordering issue with logging the inode size. This is
probably
> > why you've encountered references to fsync(), because with the fix
your
> > data is still likely lost (unless/until you've run an fsync to flush
to
> > disk), you just shouldn't see the extended inode size unless the
actual
> > data made it to disk.
> >
> > Also note that this was fixed in 2007. ;)
> >
> > Brian
> >
> >> Best regards
> >>
> >>
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> >>
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> >
> > ________________________________
> >
> >
> > This email and any files transmitted with it are confidential and are
> > intended solely for the use of the individual or entity to whom they
are
> > addressed. If you are not the original recipient or the person
> responsible
> > for delivering the email to the intended recipient, be advised that
you
> have
> > received this email in error, and that any use, dissemination,
> forwarding,
> > printing, or copying of this email is strictly prohibited. If you
> received
> > this email in error, please immediately notify the sender and delete
the
> > original.
> >
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/4b56a323/attachment-0001.html
>
------------------------------
Message: 5
Date: Mon, 09 Dec 2013 15:44:24 +0000
From: Nux! <nux at li.nux.ro>
To: gluster-users at gluster.org
Subject: Re: [Gluster-users] Gluster infrastructure question
Message-ID: <9775f8114ebbc392472010f2d9bdf432 at li.nux.ro>
Content-Type: text/plain; charset=UTF-8; format=flowed
On 09.12.2013 13:18, Heiko Kr?mer wrote:
> 1)
> I'm asking me, if I can delete the raid10 on each server and create
> for each HDD a separate brick.
> In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is there
> any experience about the write throughput in a production system with
> many of bricks like in this case? In addition i'll get double of HDD
> capacity.
I have found problems with bricks to be disruptive whereas replacing a
RAID member is quite trivial. I would recommend against dropping RAID.
> 3)
> Failover of a HDD is for a raid controller with HotSpare HDD not a big
> deal. Glusterfs will rebuild automatically if a brick fails and there
> are no data present, this action will perform a lot of network traffic
> between the mirror bricks but it will handle it equal as the raid
> controller right ?
Gluster will not "rebuild automatically" a brick, you will need to
manually add/remove it.
Additionally, if a brick goes bad gluster won't do anything about it,
the affected volumes will just slow down or stop working at all.
Again, my advice is KEEP THE RAID and set up good monitoring of drives.
:)
HTH
Lucian
--
Sent from the Delta quadrant using Borg technology!
Nux!
www.nux.ro
------------------------------
Message: 6
Date: Mon, 9 Dec 2013 07:57:47 -0800
From: Randy Breunling <rbreunling at gmail.com>
To: gluster-users at gluster.org
Cc: Randy Breunling <rbreunling at gmail.com>
Subject: [Gluster-users] Scalability - File system or Object Store
Message-ID:
<CAJwwApQ5-SvboWV_iRGC+HJSuT25xSoz_9CBJfGDmpqT4tDJzw at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
>From any experience...which has shown to scale better...a file system or
an
object store?
--Randy
San Jose CA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/dcf7491e/attachment-0001.html
>
------------------------------
Message: 7
Date: Mon, 9 Dec 2013 11:07:58 -0500
From: Jay Vyas <jayunit100 at gmail.com>
To: Randy Breunling <rbreunling at gmail.com>
Cc: "Gluster-users at gluster.org" <gluster-users at gluster.org>
Subject: Re: [Gluster-users] Scalability - File system or Object Store
Message-ID:
<CAAu13zE4kYJ1Dt9ypOMt=M=ps7QfyPSn4LSqZ3YLYBnW5pE4yA at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
in object stores you sacrifice the consistency gauranteed by filesystems
for **higher** availability. probably by "scale" you mean higher
availability, so... the answer is probably object storage.
That said, gluster is an interesting file system in that it is
"object-like" --- it is really fast for lookups.... and so if you aren't
really sure you need objects, you might be able to do just fine with
gluster out of the box.
One really cool idea that is permeating the gluster community nowadays is
this "UFO" concept, -- you can easily start with regular gluster, and then
layer an object store on top at a later date if you want to sacrifice
posix operations for (even) higher availability.
"Unified File and Object Storage - Unified file and object storage allows
admins to utilize the same data store for both POSIX-style mounts as well
as S3 or Swift-compatible APIs." (from
http://gluster.org/community/documentation/index.php/3.3beta)
On Mon, Dec 9, 2013 at 10:57 AM, Randy Breunling
<rbreunling at gmail.com>wrote:
> From any experience...which has shown to scale better...a file system or
> an object store?
>
> --Randy
> San Jose CA
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
--
Jay Vyas
http://jayunit100.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/e46cf569/attachment-0001.html
>
------------------------------
Message: 8
Date: Mon, 09 Dec 2013 08:09:24 -0800
From: Joe Julian <joe at julianfamily.org>
To: Nux! <nux at li.nux.ro>,gluster-users at gluster.org
Subject: Re: [Gluster-users] Gluster infrastructure question
Message-ID: <698ab788-9f27-44a6-bd98-a53eb25f4573 at email.android.com>
Content-Type: text/plain; charset=UTF-8
Nux! <nux at li.nux.ro> wrote:
>On 09.12.2013 13:18, Heiko Kr?mer wrote:
>> 1)
>> I'm asking me, if I can delete the raid10 on each server and create
>> for each HDD a separate brick.
>> In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is there
>> any experience about the write throughput in a production system with
>> many of bricks like in this case? In addition i'll get double of HDD
>> capacity.
>
>I have found problems with bricks to be disruptive whereas replacing a
>RAID member is quite trivial. I would recommend against dropping RAID.
>
Brick disruption has been addressed in 3.4.
>> 3)
>> Failover of a HDD is for a raid controller with HotSpare HDD not a
>big
>> deal. Glusterfs will rebuild automatically if a brick fails and there
>> are no data present, this action will perform a lot of network
>traffic
>> between the mirror bricks but it will handle it equal as the raid
>> controller right ?
>
>Gluster will not "rebuild automatically" a brick, you will need to
>manually add/remove it.
Not exactly, but you will have to manually add an attribute and
"heal...full" to re-mirror the replacement.
>Additionally, if a brick goes bad gluster won't do anything about it,
>the affected volumes will just slow down or stop working at all.
>
Again, addressed in 3.4.
>Again, my advice is KEEP THE RAID and set up good monitoring of drives.
>
I'm not arguing for or against RAID. It's another tool in our tool box. I,
personally, use JBOD. Our use case has a lot of different files being used
by different clients. JBOD maximizes our use of cache.
------------------------------
Message: 9
Date: Mon, 9 Dec 2013 11:28:05 -0500 (EST)
From: John Mark Walker <johnmark at gluster.org>
To: "Kaleb S. KEITHLEY" <kkeithle at redhat.com>
Cc: "Gluster-users at gluster.org List" <gluster-users at gluster.org>,
Gluster Devel <gluster-devel at nongnu.org>
Subject: Re: [Gluster-users] [Gluster-devel] GlusterFest Test Weekend
- 3.5 Test #1
Message-ID:
<1654421306.26844542.1386606485161.JavaMail.root at redhat.com>
Content-Type: text/plain; charset=utf-8
Incidentally, we're wrapping this up today. If you want to be included in
the list of swag-receivers (t-shirt, USB car charger, and stickers), you
still have a couple of hours to file a bug and have it verified by the dev
team.
Thanks, everyone :)
-JM
----- Original Message -----
> On 12/05/2013 09:31 PM, John Mark Walker wrote:
> > Greetings,
> >
> > If you've been keeping up with our weekly meetings and the 3.5
planning
> > page, then you know that tomorrow, December 6, is the first testing
"day"
> > for 3.5. But since this is a Friday, we're going to make the party
last
> > all weekend, through mid-day Monday.
> >
>
> YUM repos with 3.5.0qa3 RPMs for EPEL-6 and Fedora 18, 19, and 20 are
> available at
> http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.5.0qa3/
>
>
> --
>
> Kaleb
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>
------------------------------
Message: 10
Date: Mon, 09 Dec 2013 16:43:42 +0000
From: Nux! <nux at li.nux.ro>
To: Joe Julian <joe at julianfamily.org>
Cc: gluster-users at gluster.org
Subject: Re: [Gluster-users] Gluster infrastructure question
Message-ID: <b48aa7ed1b14432fc4047c934320e941 at li.nux.ro>
Content-Type: text/plain; charset=UTF-8; format=flowed
On 09.12.2013 16:09, Joe Julian wrote:
>>
>
> Brick disruption has been addressed in 3.4.
Good to know! What exactly happens when the brick goes unresponsive?
>> Additionally, if a brick goes bad gluster won't do anything about it,
>> the affected volumes will just slow down or stop working at all.
>>
>
> Again, addressed in 3.4.
How? What is the expected behaviour now?
Thanks!
--
Sent from the Delta quadrant using Borg technology!
Nux!
www.nux.ro
------------------------------
Message: 11
Date: Mon, 9 Dec 2013 18:03:59 +0100
From: samuel <samu60 at gmail.com>
To: "gluster-users at gluster.org" <gluster-users at gluster.org>
Subject: [Gluster-users] compatibility between 3.3 and 3.4
Message-ID:
<CAOg=WDc-JT=CfqE39qWSPTjP2OqKj4L_oCfDG8icQKVTpi+0JQ at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
Hi all,
We're playing around with new versions and uprading options. We currently
have a 2x2x2 stripped-distributed-replicated volume based on 3.3.0 and
we're planning to upgrade to 3.4 version.
We've tried upgrading fist the clients and we've tried with 3.4.0, 3.4.1
and 3.4.2qa2 but all of them caused the same error:
Failed to get stripe-size
So it seems as if 3.4 clients are not compatible to 3.3 volumes. Is this
assumtion right?
Is there any procedure to upgrade the gluster from 3.3 to 3.4 without
stopping the service?
Where are the compatibility limitations between these 2 versions?
Any hint or link to documentation would be highly appreciated.
Thank you in advance,
Samuel.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/cec50893/attachment-0001.html
>
------------------------------
Message: 12
Date: Mon, 9 Dec 2013 19:52:57 +0100
From: bernhard glomm <bernhard.glomm at ecologic.eu>
To: Heiko Kr?mer <hkraemer at anynines.de>
Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org>
Subject: Re: [Gluster-users] Gluster infrastructure question
Message-ID: <E2AB54DC-4D82-4734-9BE2-E7B0B700BBA3 at ecologic.eu>
Content-Type: text/plain; charset="windows-1252"
Hi Heiko,
some years ago I had to deliver a reliable storage that should be easy to
grow in size over time.
For that I was in close contact with
presto prime who produced a lot of interesting research results accessible
to the public.
http://www.prestoprime.org/project/public.en.html
what was striking me was the general concern of how and when and with
which pattern hard drives will fail,
and the rebuilding time in case a "big" (i.e. 2TB+) drive fails. (one of
the papers at pp was dealing in detail with that)
>From that background my approach was to build relatively small raid6
bricks (9 * 2 TB + 1 Hot-Spare)
and connect them together with a distributed glusterfs.
I never experienced any problems with that and felt quite comfortable
about it.
That was for just a lot of big file data exported via samba.
At the same time I used another, mirrored, glusterfs as a storage backend
for
my VM-images, same there, no problem and much less hazel and headache than
drbd and ocfs2
which I run on another system.
hth
best
Bernhard
Bernhard Glomm
IT Administration
Phone: +49 (30) 86880 134
Fax: +49 (30) 86880 100
Skype: bernhard.glomm.ecologic
Ecologic Institut gemeinn?tzige GmbH | Pfalzburger Str. 43/44 | 10717
Berlin | Germany
GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 | USt/VAT-IdNr.:
DE811963464
Ecologic? is a Trade Mark (TM) of Ecologic Institut gemeinn?tzige GmbH
On Dec 9, 2013, at 2:18 PM, Heiko Kr?mer <hkraemer at anynines.de> wrote:
> Signed PGP part
> Heyho guys,
>
> I'm running since years glusterfs in a small environment without big
> problems.
>
> Now I'm going to use glusterFS for a bigger cluster but I've some
> questions :)
>
> Environment:
> * 4 Servers
> * 20 x 2TB HDD, each
> * Raidcontroller
> * Raid 10
> * 4x bricks => Replicated, Distributed volume
> * Gluster 3.4
>
> 1)
> I'm asking me, if I can delete the raid10 on each server and create
> for each HDD a separate brick.
> In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is there
> any experience about the write throughput in a production system with
> many of bricks like in this case? In addition i'll get double of HDD
> capacity.
>
> 2)
> I've heard a talk about glusterFS and out scaling. The main point was
> if more bricks are in use, the scale out process will take a long
> time. The problem was/is the Hash-Algo. So I'm asking me how is it if
> I've one very big brick (Raid10 20TB on each server) or I've much more
> bricks, what's faster and is there any issues?
> Is there any experiences ?
>
> 3)
> Failover of a HDD is for a raid controller with HotSpare HDD not a big
> deal. Glusterfs will rebuild automatically if a brick fails and there
> are no data present, this action will perform a lot of network traffic
> between the mirror bricks but it will handle it equal as the raid
> controller right ?
>
>
>
> Thanks and cheers
> Heiko
>
>
>
> --
> Anynines.com
>
> Avarteq GmbH
> B.Sc. Informatik
> Heiko Kr?mer
> CIO
> Twitter: @anynines
>
> ----
> Gesch?ftsf?hrer: Alexander Fai?t, Dipl.-Inf.(FH) Julian Fischer
> Handelsregister: AG Saarbr?cken HRB 17413, Ust-IdNr.: DE262633168
> Sitz: Saarbr?cken
>
> <hkraemer.vcf>_______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/c95b9cc8/attachment-0001.html
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <
http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/c95b9cc8/attachment-0001.sig
>
------------------------------
Message: 13
Date: Mon, 9 Dec 2013 14:26:45 -0500 (EST)
From: Ben Turner <bturner at redhat.com>
To: Heiko Kr?mer <hkraemer at anynines.de>
Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org>
Subject: Re: [Gluster-users] Gluster infrastructure question
Message-ID: <124648027.2334242.1386617205234.JavaMail.root at redhat.com>
Content-Type: text/plain; charset=utf-8
----- Original Message -----
> From: "Heiko Kr?mer" <hkraemer at anynines.de>
> To: "gluster-users at gluster.org List" <gluster-users at gluster.org>
> Sent: Monday, December 9, 2013 8:18:28 AM
> Subject: [Gluster-users] Gluster infrastructure question
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Heyho guys,
>
> I'm running since years glusterfs in a small environment without big
> problems.
>
> Now I'm going to use glusterFS for a bigger cluster but I've some
> questions :)
>
> Environment:
> * 4 Servers
> * 20 x 2TB HDD, each
> * Raidcontroller
> * Raid 10
> * 4x bricks => Replicated, Distributed volume
> * Gluster 3.4
>
> 1)
> I'm asking me, if I can delete the raid10 on each server and create
> for each HDD a separate brick.
> In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is there
> any experience about the write throughput in a production system with
> many of bricks like in this case? In addition i'll get double of HDD
> capacity.
Have a look at:
http://rhsummit.files.wordpress.com/2012/03/england-rhs-performance.pdf
Specifically:
? RAID arrays
? More RAID LUNs for better concurrency
? For RAID6, 256-KB stripe size
I use a single RAID 6 that is divided into several LUNs for my bricks. For
example, on my Dell servers(with PERC6 RAID controllers) each server has
12 disks that I put into raid 6. Then I break the RAID 6 into 6 LUNs and
create a new PV/VG/LV for each brick. From there I follow the
recommendations listed in the presentation.
HTH!
-b
> 2)
> I've heard a talk about glusterFS and out scaling. The main point was
> if more bricks are in use, the scale out process will take a long
> time. The problem was/is the Hash-Algo. So I'm asking me how is it if
> I've one very big brick (Raid10 20TB on each server) or I've much more
> bricks, what's faster and is there any issues?
> Is there any experiences ?
>
> 3)
> Failover of a HDD is for a raid controller with HotSpare HDD not a big
> deal. Glusterfs will rebuild automatically if a brick fails and there
> are no data present, this action will perform a lot of network traffic
> between the mirror bricks but it will handle it equal as the raid
> controller right ?
>
>
>
> Thanks and cheers
> Heiko
>
>
>
> - --
> Anynines.com
>
> Avarteq GmbH
> B.Sc. Informatik
> Heiko Kr?mer
> CIO
> Twitter: @anynines
>
> - ----
> Gesch?ftsf?hrer: Alexander Fai?t, Dipl.-Inf.(FH) Julian Fischer
> Handelsregister: AG Saarbr?cken HRB 17413, Ust-IdNr.: DE262633168
> Sitz: Saarbr?cken
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQEcBAEBAgAGBQJSpcMfAAoJELxFogM4ixOF/ncH/3L9DvOWHrF0XBqCgeT6QQ6B
> lDwtXiD9xoznht0Zs2S9LA9Z7r2l5/fzMOUSOawEMv6M16Guwq3gQ1lClUi4Iwj0
> GKKtYQ6F4aG4KXHY4dlu1QKT5OaLk8ljCQ47Tc9aAiJMhfC1/IgQXOslFv26utdJ
> N9jxiCl2+r/tQvQRw6mA4KAuPYPwOV+hMtkwfrM4UsIYGGbkNPnz1oqmBsfGdSOs
> TJh6+lQRD9KYw72q3I9G6ZYlI7ylL9Q7vjTroVKH232pLo4G58NLxyvWvcOB9yK6
> Bpf/gRMxFNKA75eW5EJYeZ6EovwcyCAv7iAm+xNKhzsoZqbBbTOJxS5zKm4YWoY=
> =bDly
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
------------------------------
Message: 14
Date: Mon, 9 Dec 2013 14:31:00 -0500 (EST)
From: Ben Turner <bturner at redhat.com>
To: Heiko Kr?mer <hkraemer at anynines.de>
Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org>
Subject: Re: [Gluster-users] Gluster infrastructure question
Message-ID:
<1676822821.2336090.1386617460049.JavaMail.root at redhat.com>
Content-Type: text/plain; charset=utf-8
----- Original Message -----
> From: "Ben Turner" <bturner at redhat.com>
> To: "Heiko Kr?mer" <hkraemer at anynines.de>
> Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org>
> Sent: Monday, December 9, 2013 2:26:45 PM
> Subject: Re: [Gluster-users] Gluster infrastructure question
>
> ----- Original Message -----
> > From: "Heiko Kr?mer" <hkraemer at anynines.de>
> > To: "gluster-users at gluster.org List" <gluster-users at gluster.org>
> > Sent: Monday, December 9, 2013 8:18:28 AM
> > Subject: [Gluster-users] Gluster infrastructure question
> >
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Heyho guys,
> >
> > I'm running since years glusterfs in a small environment without big
> > problems.
> >
> > Now I'm going to use glusterFS for a bigger cluster but I've some
> > questions :)
> >
> > Environment:
> > * 4 Servers
> > * 20 x 2TB HDD, each
> > * Raidcontroller
> > * Raid 10
> > * 4x bricks => Replicated, Distributed volume
> > * Gluster 3.4
> >
> > 1)
> > I'm asking me, if I can delete the raid10 on each server and create
> > for each HDD a separate brick.
> > In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is there
> > any experience about the write throughput in a production system with
> > many of bricks like in this case? In addition i'll get double of HDD
> > capacity.
>
> Have a look at:
>
> http://rhsummit.files.wordpress.com/2012/03/england-rhs-performance.pdf
That one was from 2012, here is the latest:
http://rhsummit.files.wordpress.com/2013/07/england_th_0450_rhs_perf_practices-4_neependra.pdf
-b
> Specifically:
>
> ? RAID arrays
> ? More RAID LUNs for better concurrency
> ? For RAID6, 256-KB stripe size
>
> I use a single RAID 6 that is divided into several LUNs for my bricks.
For
> example, on my Dell servers(with PERC6 RAID controllers) each server has
12
> disks that I put into raid 6. Then I break the RAID 6 into 6 LUNs and
> create a new PV/VG/LV for each brick. From there I follow the
> recommendations listed in the presentation.
>
> HTH!
>
> -b
>
> > 2)
> > I've heard a talk about glusterFS and out scaling. The main point was
> > if more bricks are in use, the scale out process will take a long
> > time. The problem was/is the Hash-Algo. So I'm asking me how is it if
> > I've one very big brick (Raid10 20TB on each server) or I've much more
> > bricks, what's faster and is there any issues?
> > Is there any experiences ?
> >
> > 3)
> > Failover of a HDD is for a raid controller with HotSpare HDD not a big
> > deal. Glusterfs will rebuild automatically if a brick fails and there
> > are no data present, this action will perform a lot of network traffic
> > between the mirror bricks but it will handle it equal as the raid
> > controller right ?
> >
> >
> >
> > Thanks and cheers
> > Heiko
> >
> >
> >
> > - --
> > Anynines.com
> >
> > Avarteq GmbH
> > B.Sc. Informatik
> > Heiko Kr?mer
> > CIO
> > Twitter: @anynines
> >
> > - ----
> > Gesch?ftsf?hrer: Alexander Fai?t, Dipl.-Inf.(FH) Julian Fischer
> > Handelsregister: AG Saarbr?cken HRB 17413, Ust-IdNr.: DE262633168
> > Sitz: Saarbr?cken
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.4.14 (GNU/Linux)
> > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
> >
> > iQEcBAEBAgAGBQJSpcMfAAoJELxFogM4ixOF/ncH/3L9DvOWHrF0XBqCgeT6QQ6B
> > lDwtXiD9xoznht0Zs2S9LA9Z7r2l5/fzMOUSOawEMv6M16Guwq3gQ1lClUi4Iwj0
> > GKKtYQ6F4aG4KXHY4dlu1QKT5OaLk8ljCQ47Tc9aAiJMhfC1/IgQXOslFv26utdJ
> > N9jxiCl2+r/tQvQRw6mA4KAuPYPwOV+hMtkwfrM4UsIYGGbkNPnz1oqmBsfGdSOs
> > TJh6+lQRD9KYw72q3I9G6ZYlI7ylL9Q7vjTroVKH232pLo4G58NLxyvWvcOB9yK6
> > Bpf/gRMxFNKA75eW5EJYeZ6EovwcyCAv7iAm+xNKhzsoZqbBbTOJxS5zKm4YWoY=
> > =bDly
> > -----END PGP SIGNATURE-----
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
------------------------------
Message: 15
Date: Mon, 09 Dec 2013 14:57:08 -0500
From: Jeff Darcy <jdarcy at redhat.com>
To: Randy Breunling <rbreunling at gmail.com>, gluster-users at gluster.org
Subject: Re: [Gluster-users] Scalability - File system or Object Store
Message-ID: <52A62094.1000507 at redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
On 12/09/2013 10:57 AM, Randy Breunling wrote:
> From any experience...which has shown to scale better...a file system
> or an object store?
In terms of numbers of files/objects, I'd have to say object stores. S3
and Azure are both over a *trillion* objects, and I've never heard of a
filesystem that size. In terms of performance it might go the other
way. More importantly, I think the object stores give up too much in
terms of semantics - e.g. hierarchical directories and rename, byte
granularity, consistency/durability guarantees. It saddens me to see so
many people working around these limitations in their apps based on
object stores - duplicating each others' work, creating
incompatibibility (e.g. with a half dozen "conventions" for simulating
hierarchical directories), and sometimes even losing data to subtle
distributed-coordination bugs. An app that uses a subset of an
underlying filesystem's functionality is far more likely to be correct
and portable than one that tries to build extra abstractions on top of a
bare-bones object store.
------------------------------
Message: 16
Date: Tue, 10 Dec 2013 07:58:25 +1000
From: Dan Mons <dmons at cuttingedge.com.au>
To: Ben Turner <bturner at redhat.com>
Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org>,
Heiko Kr?mer <hkraemer at anynines.de>
Subject: Re: [Gluster-users] Gluster infrastructure question
Message-ID:
<CACa6TycgVYLNOWkk7eO2L80hhEdQLJpgk-+Bav_dfL2gPVGpjw at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
I went with big RAID on each node (16x 3TB SATA disks in RAID6 with a
hot spare per node) rather than brick-per-disk. The simple reason
being that I wanted to configure distribute+replicate at the GlusterFS
level, and be 100% guaranteed that the replication happened across to
another node, and not to another brick on the same node. As each node
only has one giant brick, the cluster is forced to replicate to a
separate node each time.
Some careful initial setup could probably have done the same, but I
wanted to avoid the dramas of my employer expanding the cluster one
node at a time later on, causing that design goal to fail as the new
single node with many bricks found replication partners on itself.
On a different topic, I find no real-world difference in RAID10 to
RAID6 with GlusterFS. Most of the access delay in Gluster has little
to do with the speed of the disk. The only downside to RAID6 is a
long rebuild time if you're unlucky enough to blow a couple of drives
at once. RAID50 might be a better choice if you're up at 20 drives
per node.
We invested in SSD caching on our nodes, and to be honest it was
rather pointless. Certainly not bad, but the real-world speed boost
is not noticed by end users.
-Dan
----------------
Dan Mons
R&D SysAdmin
Unbreaker of broken things
Cutting Edge
http://cuttingedge.com.au
On 10 December 2013 05:31, Ben Turner <bturner at redhat.com> wrote:
> ----- Original Message -----
>> From: "Ben Turner" <bturner at redhat.com>
>> To: "Heiko Kr?mer" <hkraemer at anynines.de>
>> Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org>
>> Sent: Monday, December 9, 2013 2:26:45 PM
>> Subject: Re: [Gluster-users] Gluster infrastructure question
>>
>> ----- Original Message -----
>> > From: "Heiko Kr?mer" <hkraemer at anynines.de>
>> > To: "gluster-users at gluster.org List" <gluster-users at gluster.org>
>> > Sent: Monday, December 9, 2013 8:18:28 AM
>> > Subject: [Gluster-users] Gluster infrastructure question
>> >
>> > -----BEGIN PGP SIGNED MESSAGE-----
>> > Hash: SHA1
>> >
>> > Heyho guys,
>> >
>> > I'm running since years glusterfs in a small environment without big
>> > problems.
>> >
>> > Now I'm going to use glusterFS for a bigger cluster but I've some
>> > questions :)
>> >
>> > Environment:
>> > * 4 Servers
>> > * 20 x 2TB HDD, each
>> > * Raidcontroller
>> > * Raid 10
>> > * 4x bricks => Replicated, Distributed volume
>> > * Gluster 3.4
>> >
>> > 1)
>> > I'm asking me, if I can delete the raid10 on each server and create
>> > for each HDD a separate brick.
>> > In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is there
>> > any experience about the write throughput in a production system with
>> > many of bricks like in this case? In addition i'll get double of HDD
>> > capacity.
>>
>> Have a look at:
>>
>> http://rhsummit.files.wordpress.com/2012/03/england-rhs-performance.pdf
>
> That one was from 2012, here is the latest:
>
>
http://rhsummit.files.wordpress.com/2013/07/england_th_0450_rhs_perf_practices-4_neependra.pdf
>
> -b
>
>> Specifically:
>>
>> ? RAID arrays
>> ? More RAID LUNs for better concurrency
>> ? For RAID6, 256-KB stripe size
>>
>> I use a single RAID 6 that is divided into several LUNs for my bricks.
For
>> example, on my Dell servers(with PERC6 RAID controllers) each server
has 12
>> disks that I put into raid 6. Then I break the RAID 6 into 6 LUNs and
>> create a new PV/VG/LV for each brick. From there I follow the
>> recommendations listed in the presentation.
>>
>> HTH!
>>
>> -b
>>
>> > 2)
>> > I've heard a talk about glusterFS and out scaling. The main point was
>> > if more bricks are in use, the scale out process will take a long
>> > time. The problem was/is the Hash-Algo. So I'm asking me how is it if
>> > I've one very big brick (Raid10 20TB on each server) or I've much
more
>> > bricks, what's faster and is there any issues?
>> > Is there any experiences ?
>> >
>> > 3)
>> > Failover of a HDD is for a raid controller with HotSpare HDD not a
big
>> > deal. Glusterfs will rebuild automatically if a brick fails and there
>> > are no data present, this action will perform a lot of network
traffic
>> > between the mirror bricks but it will handle it equal as the raid
>> > controller right ?
>> >
>> >
>> >
>> > Thanks and cheers
>> > Heiko
>> >
>> >
>> >
>> > - --
>> > Anynines.com
>> >
>> > Avarteq GmbH
>> > B.Sc. Informatik
>> > Heiko Kr?mer
>> > CIO
>> > Twitter: @anynines
>> >
>> > - ----
>> > Gesch?ftsf?hrer: Alexander Fai?t, Dipl.-Inf.(FH) Julian Fischer
>> > Handelsregister: AG Saarbr?cken HRB 17413, Ust-IdNr.: DE262633168
>> > Sitz: Saarbr?cken
>> > -----BEGIN PGP SIGNATURE-----
>> > Version: GnuPG v1.4.14 (GNU/Linux)
>> > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>> >
>> > iQEcBAEBAgAGBQJSpcMfAAoJELxFogM4ixOF/ncH/3L9DvOWHrF0XBqCgeT6QQ6B
>> > lDwtXiD9xoznht0Zs2S9LA9Z7r2l5/fzMOUSOawEMv6M16Guwq3gQ1lClUi4Iwj0
>> > GKKtYQ6F4aG4KXHY4dlu1QKT5OaLk8ljCQ47Tc9aAiJMhfC1/IgQXOslFv26utdJ
>> > N9jxiCl2+r/tQvQRw6mA4KAuPYPwOV+hMtkwfrM4UsIYGGbkNPnz1oqmBsfGdSOs
>> > TJh6+lQRD9KYw72q3I9G6ZYlI7ylL9Q7vjTroVKH232pLo4G58NLxyvWvcOB9yK6
>> > Bpf/gRMxFNKA75eW5EJYeZ6EovwcyCAv7iAm+xNKhzsoZqbBbTOJxS5zKm4YWoY=
>> > =bDly
>> > -----END PGP SIGNATURE-----
>> >
>> > _______________________________________________
>> > Gluster-users mailing list
>> > Gluster-users at gluster.org
>> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
------------------------------
Message: 17
Date: Mon, 09 Dec 2013 14:09:11 -0800
From: Joe Julian <joe at julianfamily.org>
To: Dan Mons <dmons at cuttingedge.com.au>
Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org>
Subject: Re: [Gluster-users] Gluster infrastructure question
Message-ID: <52A63F87.8070107 at julianfamily.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Replicas are defined in the order bricks are listed in the volume create
command. So
gluster volume create myvol replica 2 server1:/data/brick1
server2:/data/brick1 server3:/data/brick1 server4:/data/brick1
will replicate between server1 and server2 and replicate between server3
and server4.
Bricks added to a replica 2 volume after it's been created will require
pairs of bricks,
The best way to "force" replication to happen on another server is to
just define it that way.
On 12/09/2013 01:58 PM, Dan Mons wrote:
> I went with big RAID on each node (16x 3TB SATA disks in RAID6 with a
> hot spare per node) rather than brick-per-disk. The simple reason
> being that I wanted to configure distribute+replicate at the GlusterFS
> level, and be 100% guaranteed that the replication happened across to
> another node, and not to another brick on the same node. As each node
> only has one giant brick, the cluster is forced to replicate to a
> separate node each time.
>
> Some careful initial setup could probably have done the same, but I
> wanted to avoid the dramas of my employer expanding the cluster one
> node at a time later on, causing that design goal to fail as the new
> single node with many bricks found replication partners on itself.
>
> On a different topic, I find no real-world difference in RAID10 to
> RAID6 with GlusterFS. Most of the access delay in Gluster has little
> to do with the speed of the disk. The only downside to RAID6 is a
> long rebuild time if you're unlucky enough to blow a couple of drives
> at once. RAID50 might be a better choice if you're up at 20 drives
> per node.
>
> We invested in SSD caching on our nodes, and to be honest it was
> rather pointless. Certainly not bad, but the real-world speed boost
> is not noticed by end users.
>
> -Dan
>
> ----------------
> Dan Mons
> R&D SysAdmin
> Unbreaker of broken things
> Cutting Edge
> http://cuttingedge.com.au
>
>
> On 10 December 2013 05:31, Ben Turner <bturner at redhat.com> wrote:
>> ----- Original Message -----
>>> From: "Ben Turner" <bturner at redhat.com>
>>> To: "Heiko Kr?mer" <hkraemer at anynines.de>
>>> Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org>
>>> Sent: Monday, December 9, 2013 2:26:45 PM
>>> Subject: Re: [Gluster-users] Gluster infrastructure question
>>>
>>> ----- Original Message -----
>>>> From: "Heiko Kr?mer" <hkraemer at anynines.de>
>>>> To: "gluster-users at gluster.org List" <gluster-users at gluster.org>
>>>> Sent: Monday, December 9, 2013 8:18:28 AM
>>>> Subject: [Gluster-users] Gluster infrastructure question
>>>>
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>>
>>>> Heyho guys,
>>>>
>>>> I'm running since years glusterfs in a small environment without big
>>>> problems.
>>>>
>>>> Now I'm going to use glusterFS for a bigger cluster but I've some
>>>> questions :)
>>>>
>>>> Environment:
>>>> * 4 Servers
>>>> * 20 x 2TB HDD, each
>>>> * Raidcontroller
>>>> * Raid 10
>>>> * 4x bricks => Replicated, Distributed volume
>>>> * Gluster 3.4
>>>>
>>>> 1)
>>>> I'm asking me, if I can delete the raid10 on each server and create
>>>> for each HDD a separate brick.
>>>> In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is there
>>>> any experience about the write throughput in a production system with
>>>> many of bricks like in this case? In addition i'll get double of HDD
>>>> capacity.
>>> Have a look at:
>>>
>>>
http://rhsummit.files.wordpress.com/2012/03/england-rhs-performance.pdf
>> That one was from 2012, here is the latest:
>>
>>
http://rhsummit.files.wordpress.com/2013/07/england_th_0450_rhs_perf_practices-4_neependra.pdf
>>
>> -b
>>
>>> Specifically:
>>>
>>> ? RAID arrays
>>> ? More RAID LUNs for better concurrency
>>> ? For RAID6, 256-KB stripe size
>>>
>>> I use a single RAID 6 that is divided into several LUNs for my bricks.
For
>>> example, on my Dell servers(with PERC6 RAID controllers) each server
has 12
>>> disks that I put into raid 6. Then I break the RAID 6 into 6 LUNs and
>>> create a new PV/VG/LV for each brick. From there I follow the
>>> recommendations listed in the presentation.
>>>
>>> HTH!
>>>
>>> -b
>>>
>>>> 2)
>>>> I've heard a talk about glusterFS and out scaling. The main point was
>>>> if more bricks are in use, the scale out process will take a long
>>>> time. The problem was/is the Hash-Algo. So I'm asking me how is it if
>>>> I've one very big brick (Raid10 20TB on each server) or I've much
more
>>>> bricks, what's faster and is there any issues?
>>>> Is there any experiences ?
>>>>
>>>> 3)
>>>> Failover of a HDD is for a raid controller with HotSpare HDD not a
big
>>>> deal. Glusterfs will rebuild automatically if a brick fails and there
>>>> are no data present, this action will perform a lot of network
traffic
>>>> between the mirror bricks but it will handle it equal as the raid
>>>> controller right ?
>>>>
>>>>
>>>>
>>>> Thanks and cheers
>>>> Heiko
>>>>
>>>>
>>>>
>>>> - --
>>>> Anynines.com
>>>>
>>>> Avarteq GmbH
>>>> B.Sc. Informatik
>>>> Heiko Kr?mer
>>>> CIO
>>>> Twitter: @anynines
>>>>
>>>> - ----
>>>> Gesch?ftsf?hrer: Alexander Fai?t, Dipl.-Inf.(FH) Julian Fischer
>>>> Handelsregister: AG Saarbr?cken HRB 17413, Ust-IdNr.: DE262633168
>>>> Sitz: Saarbr?cken
>>>> -----BEGIN PGP SIGNATURE-----
>>>> Version: GnuPG v1.4.14 (GNU/Linux)
>>>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>>>>
>>>> iQEcBAEBAgAGBQJSpcMfAAoJELxFogM4ixOF/ncH/3L9DvOWHrF0XBqCgeT6QQ6B
>>>> lDwtXiD9xoznht0Zs2S9LA9Z7r2l5/fzMOUSOawEMv6M16Guwq3gQ1lClUi4Iwj0
>>>> GKKtYQ6F4aG4KXHY4dlu1QKT5OaLk8ljCQ47Tc9aAiJMhfC1/IgQXOslFv26utdJ
>>>> N9jxiCl2+r/tQvQRw6mA4KAuPYPwOV+hMtkwfrM4UsIYGGbkNPnz1oqmBsfGdSOs
>>>> TJh6+lQRD9KYw72q3I9G6ZYlI7ylL9Q7vjTroVKH232pLo4G58NLxyvWvcOB9yK6
>>>> Bpf/gRMxFNKA75eW5EJYeZ6EovwcyCAv7iAm+xNKhzsoZqbBbTOJxS5zKm4YWoY=
>>>> =bDly
>>>> -----END PGP SIGNATURE-----
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
------------------------------
Message: 18
Date: Tue, 10 Dec 2013 09:38:03 +1000
From: Dan Mons <dmons at cuttingedge.com.au>
To: Joe Julian <joe at julianfamily.org>
Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org>
Subject: Re: [Gluster-users] Gluster infrastructure question
Message-ID:
<CACa6TyenCTAgoKKsXCmrvd0G191VdBPkdNf3j4yROkT_9jTyhQ at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
On 10 December 2013 08:09, Joe Julian <joe at julianfamily.org> wrote:
> Replicas are defined in the order bricks are listed in the volume create
> command. So
> gluster volume create myvol replica 2 server1:/data/brick1
> server2:/data/brick1 server3:/data/brick1 server4:/data/brick1
> will replicate between server1 and server2 and replicate between server3
and
> server4.
>
> Bricks added to a replica 2 volume after it's been created will require
> pairs of bricks,
>
> The best way to "force" replication to happen on another server is to
just
> define it that way.
Yup, that's understood. The problem is when (for argument's sake) :
* We've defined 4 hosts with 10 disks each
* Each individual disk is a brick
* Replication is defined correctly when creating the volume initially
* I'm on holidays, my employer buys a single node, configures it
brick-per-disk, and the IT junior adds it to the cluster
All good up until that final point, and then I've got that fifth node
at the end replicating to itself. Node goes down some months later,
chaos ensues.
Not a GlusterFS/technology problem, but a problem with what frequently
happens at a human level. As a sysadmin, these are also things I need
to work around, even if it means deviating from best practices. :)
-Dan
------------------------------
Message: 19
Date: Tue, 10 Dec 2013 11:06:06 +0700
From: Diep Pham Van <imeo at favadi.com>
To: "gluster-users at gluster.org" <gluster-users at gluster.org>
Subject: Re: [Gluster-users] [CentOS 6] Upgrade to the glusterfs
version in base or in glusterfs-epel
Message-ID: <20131210110606.2e217dc6 at debbox>
Content-Type: text/plain; charset=US-ASCII
On Mon, 9 Dec 2013 19:53:20 +0900
Nguyen Viet Cuong <mrcuongnv at gmail.com> wrote:
> There is no glusterfs-server in the "base" repository, just client.
Silly me.
After install and attempt to mount with base version of glusterfs-fuse,
I realize that I have to change 'backupvolfile-server' mount option to
'backup-volfile-servers'[1].
Links:
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1023950
--
PHAM Van Diep
------------------------------
Message: 20
Date: Mon, 09 Dec 2013 20:44:06 -0800
From: harry mangalam <harry.mangalam at uci.edu>
To: "gluster-users at gluster.org List" <gluster-users at gluster.org>
Subject: [Gluster-users] Where does the 'date' string in
'/var/log/glusterfs/gl.log' come from?
Message-ID: <34671480.j6DT7uby7B at stunted>
Content-Type: text/plain; charset="us-ascii"
Admittedly I should search the source, but I wonder if anyone knows this
offhand.
Background: of our 84 ROCKS (6.1) -provisioned compute nodes, 4 have
picked
up an 'advanced date' in the /var/log/glusterfs/gl.log file - that date
string is running about 5-6 hours ahead of the system date and all the
Gluster
servers (which are identical and correct). The time advancement does not
appear to be identical tho it's hard to tell since it only shows on errors
and
those update irregularly.
All the clients are the same version and all the servers are the same
(gluster
v 3.4.0-8.el6.x86_64
This would not be of interest except that those 4 clients are losing
files,
unable to reliably do IO, etc on the gluster fs. They don't appear to be
having problems with NFS mounts, nor with a Fraunhofer FS that is also
mounted
on each node,
Rebooting 2 of them has no effect - they come right back with an advanced
date.
---
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
---
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/9cde5ba3/attachment-0001.html
>
------------------------------
Message: 21
Date: Tue, 10 Dec 2013 12:49:25 +0800
From: Sharuzzaman Ahmat Raslan <sharuzzaman at gmail.com>
To: harry mangalam <harry.mangalam at uci.edu>
Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org>
Subject: Re: [Gluster-users] Where does the 'date' string in
'/var/log/glusterfs/gl.log' come from?
Message-ID:
<CAK+zuc=5SY7wuFXUe-i2nUXAhGr+Ddaahr_7TKYgMxgtWKh1zg at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
Hi Harry,
Did you setup ntp on each of the node, and sync the time to one single
source?
Thanks.
On Tue, Dec 10, 2013 at 12:44 PM, harry mangalam
<harry.mangalam at uci.edu>wrote:
> Admittedly I should search the source, but I wonder if anyone knows
this
> offhand.
>
>
>
> Background: of our 84 ROCKS (6.1) -provisioned compute nodes, 4 have
> picked up an 'advanced date' in the /var/log/glusterfs/gl.log file -
that
> date string is running about 5-6 hours ahead of the system date and all
the
> Gluster servers (which are identical and correct). The time advancement
> does not appear to be identical tho it's hard to tell since it only
shows
> on errors and those update irregularly.
>
>
>
> All the clients are the same version and all the servers are the same
> (gluster v 3.4.0-8.el6.x86_64
>
>
>
> This would not be of interest except that those 4 clients are losing
> files, unable to reliably do IO, etc on the gluster fs. They don't
appear
> to be having problems with NFS mounts, nor with a Fraunhofer FS that is
> also mounted on each node,
>
>
>
> Rebooting 2 of them has no effect - they come right back with an
advanced
> date.
>
>
>
>
>
> ---
>
> Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
>
> [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
>
> 415 South Circle View Dr, Irvine, CA, 92697 [shipping]
>
> MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
>
> ---
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
--
Sharuzzaman Ahmat Raslan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/d0de4ecd/attachment-0001.html
>
------------------------------
Message: 22
Date: Tue, 10 Dec 2013 04:49:50 +0000
From: Bobby Jacob <bobby.jacob at alshaya.com>
To: "gluster-users at gluster.org" <gluster-users at gluster.org>
Subject: [Gluster-users] FW: Self Heal Issue GlusterFS 3.3.1
Message-ID:
<AC3305F9C186F849B835A3E6D3C9BEFEB5A763 at KWTPRMBX001.mha.local>
Content-Type: text/plain; charset="iso-8859-1"
Hi,
Can someone please advise on this issue. ?? Urgent. Selfheal is working
every 10 minutes only. ??
Thanks & Regards,
Bobby Jacob
From: Bobby Jacob
Sent: Tuesday, December 03, 2013 8:51 AM
To: gluster-users at gluster.org
Subject: FW: Self Heal Issue GlusterFS 3.3.1
Just and addition: on the node where the self heal is not working when I
check /var/log/glusterd/glustershd.log, I see the following:
[2013-12-03 05:49:18.348637] E
[afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0:
inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.350273] E
[afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0:
inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.354813] E
[afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0:
inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.355893] E
[afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0:
inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.356901] E
[afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0:
inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.357730] E
[afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0:
inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.359136] E
[afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0:
inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.360276] E
[afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0:
inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.361168] E
[afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0:
inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.362135] E
[afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0:
inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.363569] E
[afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0:
inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.364232] E
[afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0:
inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.364872] E
[afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0:
inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.365777] E
[afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0:
inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.367383] E
[afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0:
inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.368075] E
[afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0:
inode link failed on the inode (00000000-0000-0000-0000-000000000000)
Thanks & Regards,
Bobby Jacob
From: gluster-users-bounces at gluster.org [
mailto:gluster-users-bounces at gluster.org] On Behalf Of Bobby Jacob
Sent: Tuesday, December 03, 2013 8:48 AM
To: gluster-users at gluster.org
Subject: [Gluster-users] Self Heal Issue GlusterFS 3.3.1
Hi,
I'm running glusterFS 3.3.1 on Centos 6.4.
? Gluster volume status
Status of volume: glustervol
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick KWTOCUATGS001:/mnt/cloudbrick 24009 Y 20031
Brick KWTOCUATGS002:/mnt/cloudbrick 24009 Y 1260
NFS Server on localhost 38467 Y 43320
Self-heal Daemon on localhost N/A Y
43326
NFS Server on KWTOCUATGS002 38467 Y 5842
Self-heal Daemon on KWTOCUATGS002 N/A Y 5848
The self heal stops working and application write only to 1 brick and it
doesn't replicate. When I check /var/log/glusterfs/glustershd.log I see
the following.:
[2013-12-03 05:42:32.033563] W [socket.c:410:__socket_keepalive] 0-socket:
failed to set keep idle on socket 8
[2013-12-03 05:42:32.033646] W [socket.c:1876:socket_server_event_handler]
0-socket.glusterfsd: Failed to set keep-alive: Operation not supported
[2013-12-03 05:42:32.790473] I
[client-handshake.c:1614:select_server_supported_programs]
0-glustervol-client-1: Using Program GlusterFS 3.3.2, Num (1298437),
Version (330)
[2013-12-03 05:42:32.790840] I
[client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-1:
Connected to 172.16.95.153:24009, attached to remote volume
'/mnt/cloudbrick'.
[2013-12-03 05:42:32.790884] I
[client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-1:
Server and Client lk-version numbers are not same, reopening the fds
[2013-12-03 05:42:32.791003] I [afr-common.c:3685:afr_notify]
0-glustervol-replicate-0: Subvolume 'glustervol-client-1' came back up;
going online.
[2013-12-03 05:42:32.791161] I
[client-handshake.c:453:client_set_lk_version_cbk] 0-glustervol-client-1:
Server lk version = 1
[2013-12-03 05:42:32.795103] E
[afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0:
open of <gfid:a7e88fd1-6e32-40ab-90f6-ea452242a7c6> failed on child
glustervol-client-0 (Transport endpoint is not connected)
[2013-12-03 05:42:32.798064] E
[afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0:
open of <gfid:081c6657-301a-42a4-9f95-6eeba6c67413> failed on child
glustervol-client-0 (Transport endpoint is not connected)
[2013-12-03 05:42:32.799278] E
[afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0:
open of <gfid:565f1358-449c-45e2-8535-93b5632c0d1e> failed on child
glustervol-client-0 (Transport endpoint is not connected)
[2013-12-03 05:42:32.800636] E
[afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0:
open of <gfid:9c7010ac-5c11-4561-8b86-5c4d6561f34e> failed on child
glustervol-client-0 (Transport endpoint is not connected)
[2013-12-03 05:42:32.802223] E
[afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0:
open of <gfid:25fd406f-63e0-4037-bb01-da282cbe4d76> failed on child
glustervol-client-0 (Transport endpoint is not connected)
[2013-12-03 05:42:32.803339] E
[afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0:
open of <gfid:a109c429-5885-499e-8711-09fdccd396f2> failed on child
glustervol-client-0 (Transport endpoint is not connected)
[2013-12-03 05:42:32.804308] E
[afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0:
open of <gfid:5a8fd3bf-9215-444c-b974-5c280f5699a6> failed on child
glustervol-client-0 (Transport endpoint is not connected)
[2013-12-03 05:42:32.804877] I
[client-handshake.c:1614:select_server_supported_programs]
0-glustervol-client-0: Using Program GlusterFS 3.3.2, Num (1298437),
Version (330)
[2013-12-03 05:42:32.807517] I
[client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-0:
Connected to 172.16.107.154:24009, attached to remote volume
'/mnt/cloudbrick'.
[2013-12-03 05:42:32.807562] I
[client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-0:
Server and Client lk-version numbers are not same, reopening the fds
[2013-12-03 05:42:32.810357] I
[client-handshake.c:453:client_set_lk_version_cbk] 0-glustervol-client-0:
Server lk version = 1
[2013-12-03 05:42:32.827437] E
[afr-self-heal-data.c:764:afr_sh_data_fxattrop_fstat_done]
0-glustervol-replicate-0: Unable to self-heal contents of
'<gfid:1262d40d-46a3-4e57-b07b-0fcc972c8403>' (possible split-brain).
Please delete the file from all but the preferred subvolume.
[2013-12-03 05:42:39.205157] E
[afr-self-heal-metadata.c:472:afr_sh_metadata_fix]
0-glustervol-replicate-0: Unable to self-heal permissions/ownership of
'<gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f>' (possible split-brain).
Please fix the file on all backend volumes
[2013-12-03 05:42:39.215793] E
[afr-self-heal-metadata.c:472:afr_sh_metadata_fix]
0-glustervol-replicate-0: Unable to self-heal permissions/ownership of
'<gfid:c0660768-289f-48ac-b8e5-e5b5a3a4b965>' (possible split-brain).
Please fix the file on all backend volumes
PLEASE ADVICE.
Thanks & Regards,
Bobby Jacob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/8fa935eb/attachment-0001.html
>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ATT00001.txt
URL: <
http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/8fa935eb/attachment-0001.txt
>
------------------------------
Message: 23
Date: Mon, 09 Dec 2013 20:59:21 -0800
From: Joe Julian <joe at julianfamily.org>
To: Bobby Jacob <bobby.jacob at alshaya.com>
Cc: "gluster-users at gluster.org" <gluster-users at gluster.org>
Subject: Re: [Gluster-users] Self Heal Issue GlusterFS 3.3.1
Message-ID: <1386651561.2455.12.camel at bunion-ii.julianfamily.org>
Content-Type: text/plain; charset="UTF-8"
On Tue, 2013-12-03 at 05:47 +0000, Bobby Jacob wrote:
> Hi,
>
>
>
> I?m running glusterFS 3.3.1 on Centos 6.4.
>
> ? Gluster volume status
>
>
>
> Status of volume: glustervol
>
> Gluster process Port Online
> Pid
>
>
------------------------------------------------------------------------------
>
> Brick KWTOCUATGS001:/mnt/cloudbrick 24009 Y
> 20031
>
> Brick KWTOCUATGS002:/mnt/cloudbrick 24009 Y
> 1260
>
> NFS Server on localhost
> 38467 Y 43320
>
> Self-heal Daemon on localhost N/A
> Y 43326
>
> NFS Server on KWTOCUATGS002 38467 Y
> 5842
>
> Self-heal Daemon on KWTOCUATGS002 N/A Y
> 5848
>
>
>
> The self heal stops working and application write only to 1 brick and
> it doesn?t replicate. When I check /var/log/glusterfs/glustershd.log I
> see the following.:
>
>
>
> [2013-12-03 05:42:32.033563] W [socket.c:410:__socket_keepalive]
> 0-socket: failed to set keep idle on socket 8
>
> [2013-12-03 05:42:32.033646] W
> [socket.c:1876:socket_server_event_handler] 0-socket.glusterfsd:
> Failed to set keep-alive: Operation not supported
>
> [2013-12-03 05:42:32.790473] I
> [client-handshake.c:1614:select_server_supported_programs]
> 0-glustervol-client-1: Using Program GlusterFS 3.3.2, Num (1298437),
> Version (330)
>
> [2013-12-03 05:42:32.790840] I
> [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-1:
> Connected to 172.16.95.153:24009, attached to remote volume
> '/mnt/cloudbrick'.
>
> [2013-12-03 05:42:32.790884] I
> [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-1:
> Server and Client lk-version numbers are not same, reopening the fds
>
> [2013-12-03 05:42:32.791003] I [afr-common.c:3685:afr_notify]
> 0-glustervol-replicate-0: Subvolume 'glustervol-client-1' came back
> up; going online.
>
> [2013-12-03 05:42:32.791161] I
> [client-handshake.c:453:client_set_lk_version_cbk]
> 0-glustervol-client-1: Server lk version = 1
>
> [2013-12-03 05:42:32.795103] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:a7e88fd1-6e32-40ab-90f6-ea452242a7c6> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
>
> [2013-12-03 05:42:32.798064] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:081c6657-301a-42a4-9f95-6eeba6c67413> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
>
> [2013-12-03 05:42:32.799278] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:565f1358-449c-45e2-8535-93b5632c0d1e> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
>
> [2013-12-03 05:42:32.800636] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:9c7010ac-5c11-4561-8b86-5c4d6561f34e> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
>
> [2013-12-03 05:42:32.802223] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:25fd406f-63e0-4037-bb01-da282cbe4d76> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
>
> [2013-12-03 05:42:32.803339] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:a109c429-5885-499e-8711-09fdccd396f2> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
>
> [2013-12-03 05:42:32.804308] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:5a8fd3bf-9215-444c-b974-5c280f5699a6> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
>
> [2013-12-03 05:42:32.804877] I
> [client-handshake.c:1614:select_server_supported_programs]
> 0-glustervol-client-0: Using Program GlusterFS 3.3.2, Num (1298437),
> Version (330)
>
> [2013-12-03 05:42:32.807517] I
> [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-0:
> Connected to 172.16.107.154:24009, attached to remote volume
> '/mnt/cloudbrick'.
>
> [2013-12-03 05:42:32.807562] I
> [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-0:
> Server and Client lk-version numbers are not same, reopening the fds
>
> [2013-12-03 05:42:32.810357] I
> [client-handshake.c:453:client_set_lk_version_cbk]
> 0-glustervol-client-0: Server lk version = 1
>
> [2013-12-03 05:42:32.827437] E
> [afr-self-heal-data.c:764:afr_sh_data_fxattrop_fstat_done]
> 0-glustervol-replicate-0: Unable to self-heal contents of
> '<gfid:1262d40d-46a3-4e57-b07b-0fcc972c8403>' (possible split-brain).
> Please delete the file from all but the preferred subvolume.
That file is at
$brick/.glusterfs/12/62/1262d40d-46a3-4e57-b07b-0fcc972c8403
Try picking one to remove like it says.
>
> [2013-12-03 05:42:39.205157] E
> [afr-self-heal-metadata.c:472:afr_sh_metadata_fix]
> 0-glustervol-replicate-0: Unable to self-heal permissions/ownership of
> '<gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f>' (possible split-brain).
> Please fix the file on all backend volumes
>
> [2013-12-03 05:42:39.215793] E
> [afr-self-heal-metadata.c:472:afr_sh_metadata_fix]
> 0-glustervol-replicate-0: Unable to self-heal permissions/ownership of
> '<gfid:c0660768-289f-48ac-b8e5-e5b5a3a4b965>' (possible split-brain).
> Please fix the file on all backend volumes
>
>
If that doesn't allow it to heal, you may need to find which filename
that's hardlinked to. ls -li the gfid file at the path I demonstrated
earlier. With that inode number in hand, find $brick -inum $inode_number
Once you know which filenames it's linked with, remove all linked copies
from all but one replica. Then the self-heal can continue successfully.
------------------------------
Message: 24
Date: Tue, 10 Dec 2013 13:09:38 +0800
From: Franco Broi <franco.broi at iongeo.com>
To: "gluster-users at gluster.org" <gluster-users at gluster.org>
Subject: [Gluster-users] Pausing rebalance
Message-ID: <1386652178.1682.110.camel at tc1>
Content-Type: text/plain; charset="UTF-8"
Before attempting a rebalance on my existing distributed Gluster volume
I thought I'd do some testing with my new storage. I created a volume
consisting of 4 bricks on the same server and wrote some data to it. I
then added a new brick from a another server. I ran the fix-layout and
wrote some new files and could see them on the new brick. All good so
far, so I started the data rebalance. After it had been running for a
while I wanted to add another brick, which I obviously couldn't do while
it was running so I stopped it. Even with it stopped It wouldn't let me
add a brick so I tried restarting it, but it wouldn't let me do that
either. I presume you just reissue the start command as there's no
restart?
[root at nas3 ~]# gluster vol rebalance test-volume status
Node Rebalanced-files size
scanned failures skipped status run time in secs
--------- ----------- ----------- ----------- -----------
----------- ------------ --------------
localhost 7 611.7GB 1358 0 10
stopped 4929.00
localhost 7 611.7GB 1358 0 10
stopped 4929.00
nas4-10g 0 0Bytes 1506 0 0
completed 8.00
volume rebalance: test-volume: success:
[root at nas3 ~]# gluster vol add-brick test-volume nas4-10g:/data14/gvol
volume add-brick: failed: Volume name test-volume rebalance is in
progress. Please retry after completion
[root at nas3 ~]# gluster vol rebalance test-volume start
volume rebalance: test-volume: failed: Rebalance on test-volume is already
started
In the end I used the force option to make it start but was that the
right thing to do?
glusterfs 3.4.1 built on Oct 28 2013 11:01:59
Volume Name: test-volume
Type: Distribute
Volume ID: 56ee0173-aed1-4be6-a809-ee0544f9e066
Status: Started
Number of Bricks: 5
Transport-type: tcp
Bricks:
Brick1: nas3-10g:/data9/gvol
Brick2: nas3-10g:/data10/gvol
Brick3: nas3-10g:/data11/gvol
Brick4: nas3-10g:/data12/gvol
Brick5: nas4-10g:/data13/gvol
------------------------------
Message: 25
Date: Tue, 10 Dec 2013 10:42:28 +0530
From: Vijay Bellur <vbellur at redhat.com>
To: harry mangalam <harry.mangalam at uci.edu>,
"gluster-users at gluster.org List"
<gluster-users at gluster.org>
Subject: Re: [Gluster-users] Where does the 'date' string in
'/var/log/glusterfs/gl.log' come from?
Message-ID: <52A6A2BC.7010501 at redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
On 12/10/2013 10:14 AM, harry mangalam wrote:
> Admittedly I should search the source, but I wonder if anyone knows this
> offhand.
>
> Background: of our 84 ROCKS (6.1) -provisioned compute nodes, 4 have
> picked up an 'advanced date' in the /var/log/glusterfs/gl.log file -
> that date string is running about 5-6 hours ahead of the system date and
> all the Gluster servers (which are identical and correct). The time
> advancement does not appear to be identical tho it's hard to tell since
> it only shows on errors and those update irregularly.
The timestamps in the log file are by default in UTC. That could
possibly explain why the timestamps look advanced in the log file.
>
> All the clients are the same version and all the servers are the same
> (gluster v 3.4.0-8.el6.x86_64
>
> This would not be of interest except that those 4 clients are losing
> files, unable to reliably do IO, etc on the gluster fs. They don't
> appear to be having problems with NFS mounts, nor with a Fraunhofer FS
> that is also mounted on each node,
Do you observe anything in the client log files of these machines that
indicate I/O problems?
Thanks,
Vijay
------------------------------
Message: 26
Date: Tue, 10 Dec 2013 10:56:52 +0530
From: shishir gowda <gowda.shishir at gmail.com>
To: Franco Broi <franco.broi at iongeo.com>
Cc: "gluster-users at gluster.org" <gluster-users at gluster.org>
Subject: Re: [Gluster-users] Pausing rebalance
Message-ID:
<CAMYy+hVgyiPMYiDtkKtA1EBbbcpJAyp3O1_1=oAqKq1dc4NN+g at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
Hi Franco,
If a file is under migration, and a rebalance stop is encountered, then
rebalance process exits only after the completion of the migration.
That might be one of the reasons why you saw rebalance in progress message
while trying to add the brick
Could you please share the average file size in your setup?
You could always check the rebalance status command to ensure rebalance
has
indeed completed/stopped before proceeding with the add-brick. Using
add-brick force while rebalance is on-going should not be used in normal
scenarios. I do see that in your case, they show stopped/completed.
Glusterd logs would help in triaging the issue.
Rebalance re-writes layouts, and migrates data. While this is happening,
if
a add-brick is done, then the cluster might go into a imbalanced stated.
Hence, the check if rebalance is in progress while doing add-brick
With regards,
Shishir
On 10 December 2013 10:39, Franco Broi <franco.broi at iongeo.com> wrote:
>
> Before attempting a rebalance on my existing distributed Gluster volume
> I thought I'd do some testing with my new storage. I created a volume
> consisting of 4 bricks on the same server and wrote some data to it. I
> then added a new brick from a another server. I ran the fix-layout and
> wrote some new files and could see them on the new brick. All good so
> far, so I started the data rebalance. After it had been running for a
> while I wanted to add another brick, which I obviously couldn't do while
> it was running so I stopped it. Even with it stopped It wouldn't let me
> add a brick so I tried restarting it, but it wouldn't let me do that
> either. I presume you just reissue the start command as there's no
> restart?
>
> [root at nas3 ~]# gluster vol rebalance test-volume status
> Node Rebalanced-files size
> scanned failures skipped status run time in secs
> --------- ----------- ----------- ----------- -----------
> ----------- ------------ --------------
> localhost 7 611.7GB 1358 0
> 10 stopped 4929.00
> localhost 7 611.7GB 1358 0
> 10 stopped 4929.00
> nas4-10g 0 0Bytes 1506 0
> 0 completed 8.00
> volume rebalance: test-volume: success:
> [root at nas3 ~]# gluster vol add-brick test-volume nas4-10g:/data14/gvol
> volume add-brick: failed: Volume name test-volume rebalance is in
> progress. Please retry after completion
> [root at nas3 ~]# gluster vol rebalance test-volume start
> volume rebalance: test-volume: failed: Rebalance on test-volume is
already
> started
>
> In the end I used the force option to make it start but was that the
> right thing to do?
>
> glusterfs 3.4.1 built on Oct 28 2013 11:01:59
> Volume Name: test-volume
> Type: Distribute
> Volume ID: 56ee0173-aed1-4be6-a809-ee0544f9e066
> Status: Started
> Number of Bricks: 5
> Transport-type: tcp
> Bricks:
> Brick1: nas3-10g:/data9/gvol
> Brick2: nas3-10g:/data10/gvol
> Brick3: nas3-10g:/data11/gvol
> Brick4: nas3-10g:/data12/gvol
> Brick5: nas4-10g:/data13/gvol
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/1944e9e8/attachment-0001.html
>
------------------------------
Message: 27
Date: Tue, 10 Dec 2013 11:02:52 +0530
From: Vijay Bellur <vbellur at redhat.com>
To: Alex Pearson <alex at apics.co.uk>
Cc: gluster-users Discussion List <Gluster-users at gluster.org>
Subject: Re: [Gluster-users] replace-brick failing -
transport.address-family not specified
Message-ID: <52A6A784.6070404 at redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
On 12/08/2013 05:44 PM, Alex Pearson wrote:
> Hi All,
> Just to assist anyone else having this issue, and so people can correct
me if I'm wrong...
>
> It would appear that replace-brick is 'horribly broken' and should not
be used in Gluster 3.4. Instead a combination of "remove-brick ... count
X ... start" should be used to remove the resilience from a volume and the
brick, then "add-brick ... count X" to add the new brick.
>
> This does beg the question of why the hell a completely broken command
was left in the 'stable' release of the software. This sort of thing
really hurts Glusters credibility.
A mention of replace-brick not being functional was made in the release
note for 3.4.0:
https://github.com/gluster/glusterfs/blob/release-3.4/doc/release-notes/3.4.0.md
>
> Ref:
http://www.gluster.org/pipermail/gluster-users/2013-August/036936.html
This discussion happened after the release of GlusterFS 3.4. However, I
do get the point you are trying to make here. We can have an explicit
warning in CLI when operations considered broken are attempted. There is
a similar plan to add a warning for rdma volumes:
https://bugzilla.redhat.com/show_bug.cgi?id=1017176
There is a patch under review currently to remove the replace-brick
command from CLI:
http://review.gluster.org/6031
This is intended for master. If you can open a bug report indicating an
appropriate warning message that you would like to see when
replace-brick is attempted, I would be happy to get such a fix in to
both 3.4 and 3.5.
Thanks,
Vijay
>
> Cheers
>
> Alex
>
> ----- Original Message -----
> From: "Alex Pearson" <alex at apics.co.uk>
> To: gluster-users at gluster.org
> Sent: Friday, 6 December, 2013 5:25:43 PM
> Subject: [Gluster-users] replace-brick failing -
transport.address-family not specified
>
> Hello,
> I have what I think is a fairly basic Gluster setup, however when I try
to carry out a replace-brick operation it consistently fails...
>
> Here are the command line options:
>
> root at osh1:~# gluster volume info media
>
> Volume Name: media
> Type: Replicate
> Volume ID: 4c290928-ba1c-4a45-ac05-85365b4ea63a
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: osh1.apics.co.uk:/export/sdc/media
> Brick2: osh2.apics.co.uk:/export/sdb/media
>
> root at osh1:~# gluster volume replace-brick media
osh1.apics.co.uk:/export/sdc/media
osh1.apics.co.uk:/export/WCASJ2055681/media start
> volume replace-brick: success: replace-brick started successfully
> ID: 60bef96f-a5c7-4065-864e-3e0b2773d7bb
> root at osh1:~# gluster volume replace-brick media
osh1.apics.co.uk:/export/sdc/media
osh1.apics.co.uk:/export/WCASJ2055681/media status
> volume replace-brick: failed: Commit failed on localhost. Please check
the log file for more details.
>
> root at osh1:~# tail /var/log/glusterfs/bricks/export-sdc-media.log
> [2013-12-06 17:24:54.795754] E [name.c:147:client_fill_address_family]
0-media-replace-brick: transport.address-family not specified. Could not
guess default value from (remote-host:(null) or
transport.unix.connect-path:(null)) options
> [2013-12-06 17:24:57.796422] W [dict.c:1055:data_to_str]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(+0x528b)
[0x7fb826e3428b]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e)
[0x7fb826e3a25e]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(client_fill_address_family+0x200)
[0x7fb826e39f50]))) 0-dict: data is NULL
> [2013-12-06 17:24:57.796494] W [dict.c:1055:data_to_str]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(+0x528b)
[0x7fb826e3428b]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e)
[0x7fb826e3a25e]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(client_fill_address_family+0x20b)
[0x7fb826e39f5b]))) 0-dict: data is NULL
> [2013-12-06 17:24:57.796519] E [name.c:147:client_fill_address_family]
0-media-replace-brick: transport.address-family not specified. Could not
guess default value from (remote-host:(null) or
transport.unix.connect-path:(null)) options
> [2013-12-06 17:25:00.797153] W [dict.c:1055:data_to_str]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(+0x528b)
[0x7fb826e3428b]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e)
[0x7fb826e3a25e]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(client_fill_address_family+0x200)
[0x7fb826e39f50]))) 0-dict: data is NULL
> [2013-12-06 17:25:00.797226] W [dict.c:1055:data_to_str]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(+0x528b)
[0x7fb826e3428b]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e)
[0x7fb826e3a25e]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(client_fill_address_family+0x20b)
[0x7fb826e39f5b]))) 0-dict: data is NULL
> [2013-12-06 17:25:00.797251] E [name.c:147:client_fill_address_family]
0-media-replace-brick: transport.address-family not specified. Could not
guess default value from (remote-host:(null) or
transport.unix.connect-path:(null)) options
> [2013-12-06 17:25:03.797811] W [dict.c:1055:data_to_str]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(+0x528b)
[0x7fb826e3428b]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e)
[0x7fb826e3a25e]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(client_fill_address_family+0x200)
[0x7fb826e39f50]))) 0-dict: data is NULL
> [2013-12-06 17:25:03.797883] W [dict.c:1055:data_to_str]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(+0x528b)
[0x7fb826e3428b]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e)
[0x7fb826e3a25e]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(client_fill_address_family+0x20b)
[0x7fb826e39f5b]))) 0-dict: data is NULL
> [2013-12-06 17:25:03.797909] E [name.c:147:client_fill_address_family]
0-media-replace-brick: transport.address-family not specified. Could not
guess default value from (remote-host:(null) or
transport.unix.connect-path:(null)) options
>
>
> I've tried placing the transport.address-family option in various
places, however it hasn't helped.
>
> Any help would be very much appreciated.
>
> Thanks in advance
>
> Alex
>
------------------------------
Message: 28
Date: Tue, 10 Dec 2013 11:04:49 +0530
From: Vijay Bellur <vbellur at redhat.com>
To: Diep Pham Van <imeo at favadi.com>, "gluster-users at gluster.org"
<gluster-users at gluster.org>
Subject: Re: [Gluster-users] [CentOS 6] Upgrade to the glusterfs
version in base or in glusterfs-epel
Message-ID: <52A6A7F9.2090009 at redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
On 12/10/2013 09:36 AM, Diep Pham Van wrote:
> On Mon, 9 Dec 2013 19:53:20 +0900
> Nguyen Viet Cuong <mrcuongnv at gmail.com> wrote:
>
>> There is no glusterfs-server in the "base" repository, just client.
> Silly me.
> After install and attempt to mount with base version of glusterfs-fuse,
> I realize that I have to change 'backupvolfile-server' mount option to
> 'backup-volfile-servers'[1].
And a patch to provide backward compatibility for 'backupvolfile-server'
is available now [1].
-Vijay
[1] http://review.gluster.org/6464
>
> Links:
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1023950
>
------------------------------
Message: 29
Date: Tue, 10 Dec 2013 13:39:38 +0800
From: Franco Broi <franco.broi at iongeo.com>
To: shishir gowda <gowda.shishir at gmail.com>
Cc: "gluster-users at gluster.org" <gluster-users at gluster.org>
Subject: Re: [Gluster-users] Pausing rebalance
Message-ID: <1386653978.1682.125.camel at tc1>
Content-Type: text/plain; charset="utf-8"
On Tue, 2013-12-10 at 10:56 +0530, shishir gowda wrote:
> Hi Franco,
>
>
> If a file is under migration, and a rebalance stop is encountered,
> then rebalance process exits only after the completion of the
> migration.
>
> That might be one of the reasons why you saw rebalance in progress
> message while trying to add the brick
The status said it was stopped. I didn't do a top on the machine but are
you saying that it was still rebalancing despite saying it had stopped?
>
> Could you please share the average file size in your setup?
>
Bit hard to say, I just copied some data from our main processing
system. The sizes range from very small to 10's of gigabytes.
>
> You could always check the rebalance status command to ensure
> rebalance has indeed completed/stopped before proceeding with the
> add-brick. Using add-brick force while rebalance is on-going should
> not be used in normal scenarios. I do see that in your case, they show
> stopped/completed. Glusterd logs would help in triaging the issue.
See attached.
>
>
> Rebalance re-writes layouts, and migrates data. While this is
> happening, if a add-brick is done, then the cluster might go into a
> imbalanced stated. Hence, the check if rebalance is in progress while
> doing add-brick
I can see that but as far as I could tell, the rebalance had stopped
according to the status.
Just to be clear, what command restarts the rebalancing?
>
>
> With regards,
> Shishir
>
>
>
> On 10 December 2013 10:39, Franco Broi <franco.broi at iongeo.com> wrote:
>
> Before attempting a rebalance on my existing distributed
> Gluster volume
> I thought I'd do some testing with my new storage. I created a
> volume
> consisting of 4 bricks on the same server and wrote some data
> to it. I
> then added a new brick from a another server. I ran the
> fix-layout and
> wrote some new files and could see them on the new brick. All
> good so
> far, so I started the data rebalance. After it had been
> running for a
> while I wanted to add another brick, which I obviously
> couldn't do while
> it was running so I stopped it. Even with it stopped It
> wouldn't let me
> add a brick so I tried restarting it, but it wouldn't let me
> do that
> either. I presume you just reissue the start command as
> there's no
> restart?
>
> [root at nas3 ~]# gluster vol rebalance test-volume status
> Node Rebalanced-files
> size scanned failures skipped
> status run time in secs
> --------- ----------- ----------- -----------
> ----------- ----------- ------------ --------------
> localhost 7 611.7GB 1358
> 0 10 stopped 4929.00
> localhost 7 611.7GB 1358
> 0 10 stopped 4929.00
> nas4-10g 0 0Bytes 1506
> 0 0 completed 8.00
> volume rebalance: test-volume: success:
> [root at nas3 ~]# gluster vol add-brick test-volume
> nas4-10g:/data14/gvol
> volume add-brick: failed: Volume name test-volume rebalance is
> in progress. Please retry after completion
> [root at nas3 ~]# gluster vol rebalance test-volume start
> volume rebalance: test-volume: failed: Rebalance on
> test-volume is already started
>
> In the end I used the force option to make it start but was
> that the
> right thing to do?
>
> glusterfs 3.4.1 built on Oct 28 2013 11:01:59
> Volume Name: test-volume
> Type: Distribute
> Volume ID: 56ee0173-aed1-4be6-a809-ee0544f9e066
> Status: Started
> Number of Bricks: 5
> Transport-type: tcp
> Bricks:
> Brick1: nas3-10g:/data9/gvol
> Brick2: nas3-10g:/data10/gvol
> Brick3: nas3-10g:/data11/gvol
> Brick4: nas3-10g:/data12/gvol
> Brick5: nas4-10g:/data13/gvol
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: etc-glusterfs-glusterd.vol.log.gz
Type: application/gzip
Size: 7209 bytes
Desc: not available
URL: <
http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/adc5d486/attachment-0001.bin
>
------------------------------
Message: 30
Date: Tue, 10 Dec 2013 11:09:47 +0530
From: Vijay Bellur <vbellur at redhat.com>
To: Nguyen Viet Cuong <mrcuongnv at gmail.com>
Cc: "Gluster-users at gluster.org List" <gluster-users at gluster.org>
Subject: Re: [Gluster-users] replace-brick failing -
transport.address-family not specified
Message-ID: <52A6A923.4030208 at redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
On 12/08/2013 07:06 PM, Nguyen Viet Cuong wrote:
> Thanks for sharing.
>
> Btw, I do believe that GlusterFS 3.2.x is much more stable than 3.4.x in
> production.
>
This is quite contrary to what we have seen in the community. From a
development perspective too, we feel much better about 3.4.1. Are there
specific instances that worked well with 3.2.x which does not work fine
for you in 3.4.x?
Cheers,
Vijay
------------------------------
Message: 31
Date: Tue, 10 Dec 2013 11:30:21 +0530
From: Kaushal M <kshlmster at gmail.com>
To: Franco Broi <franco.broi at iongeo.com>
Cc: "gluster-users at gluster.org" <gluster-users at gluster.org>
Subject: Re: [Gluster-users] Pausing rebalance
Message-ID:
<CAOujamU0J4Tam9ojFAmCoPqSzd5Tm1FeyfMYEBv2znMX9yN=4A at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
On Tue, Dec 10, 2013 at 11:09 AM, Franco Broi <franco.broi at iongeo.com>
wrote:
> On Tue, 2013-12-10 at 10:56 +0530, shishir gowda wrote:
>> Hi Franco,
>>
>>
>> If a file is under migration, and a rebalance stop is encountered,
>> then rebalance process exits only after the completion of the
>> migration.
>>
>> That might be one of the reasons why you saw rebalance in progress
>> message while trying to add the brick
>
> The status said it was stopped. I didn't do a top on the machine but are
> you saying that it was still rebalancing despite saying it had stopped?
>
The 'stopped' status is a little bit misleading. The rebalance process
could have been migrating a large file when the stop command was
issued, so the process would continue migrating that file and quit
once it finished. In this time period, though the status says
'stopped' the rebalance process is actually running, which prevents
other operations from happening. Ideally, we would have a 'stopping'
status which would convey the correct meaning. But for now we can only
verify that a rebalance process has actually stopped by monitoring the
actual rebalance process. The rebalance process is a 'glusterfs'
process with some arguments containing rebalance.
>>
>> Could you please share the average file size in your setup?
>>
>
> Bit hard to say, I just copied some data from our main processing
> system. The sizes range from very small to 10's of gigabytes.
>
>>
>> You could always check the rebalance status command to ensure
>> rebalance has indeed completed/stopped before proceeding with the
>> add-brick. Using add-brick force while rebalance is on-going should
>> not be used in normal scenarios. I do see that in your case, they show
>> stopped/completed. Glusterd logs would help in triaging the issue.
>
> See attached.
>
>>
>>
>> Rebalance re-writes layouts, and migrates data. While this is
>> happening, if a add-brick is done, then the cluster might go into a
>> imbalanced stated. Hence, the check if rebalance is in progress while
>> doing add-brick
>
> I can see that but as far as I could tell, the rebalance had stopped
> according to the status.
>
> Just to be clear, what command restarts the rebalancing?
>
>>
>>
>> With regards,
>> Shishir
>>
>>
>>
>> On 10 December 2013 10:39, Franco Broi <franco.broi at iongeo.com> wrote:
>>
>> Before attempting a rebalance on my existing distributed
>> Gluster volume
>> I thought I'd do some testing with my new storage. I created a
>> volume
>> consisting of 4 bricks on the same server and wrote some data
>> to it. I
>> then added a new brick from a another server. I ran the
>> fix-layout and
>> wrote some new files and could see them on the new brick. All
>> good so
>> far, so I started the data rebalance. After it had been
>> running for a
>> while I wanted to add another brick, which I obviously
>> couldn't do while
>> it was running so I stopped it. Even with it stopped It
>> wouldn't let me
>> add a brick so I tried restarting it, but it wouldn't let me
>> do that
>> either. I presume you just reissue the start command as
>> there's no
>> restart?
>>
>> [root at nas3 ~]# gluster vol rebalance test-volume status
>> Node Rebalanced-files
>> size scanned failures skipped
>> status run time in secs
>> --------- ----------- ----------- -----------
>> ----------- ----------- ------------ --------------
>> localhost 7 611.7GB 1358
>> 0 10 stopped 4929.00
>> localhost 7 611.7GB 1358
>> 0 10 stopped 4929.00
>> nas4-10g 0 0Bytes 1506
>> 0 0 completed 8.00
>> volume rebalance: test-volume: success:
>> [root at nas3 ~]# gluster vol add-brick test-volume
>> nas4-10g:/data14/gvol
>> volume add-brick: failed: Volume name test-volume rebalance is
>> in progress. Please retry after completion
>> [root at nas3 ~]# gluster vol rebalance test-volume start
>> volume rebalance: test-volume: failed: Rebalance on
>> test-volume is already started
>>
>> In the end I used the force option to make it start but was
>> that the
>> right thing to do?
>>
>> glusterfs 3.4.1 built on Oct 28 2013 11:01:59
>> Volume Name: test-volume
>> Type: Distribute
>> Volume ID: 56ee0173-aed1-4be6-a809-ee0544f9e066
>> Status: Started
>> Number of Bricks: 5
>> Transport-type: tcp
>> Bricks:
>> Brick1: nas3-10g:/data9/gvol
>> Brick2: nas3-10g:/data10/gvol
>> Brick3: nas3-10g:/data11/gvol
>> Brick4: nas3-10g:/data12/gvol
>> Brick5: nas4-10g:/data13/gvol
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
------------------------------
Message: 32
Date: Tue, 10 Dec 2013 14:32:46 +0800
From: Franco Broi <franco.broi at iongeo.com>
To: Kaushal M <kshlmster at gmail.com>
Cc: "gluster-users at gluster.org" <gluster-users at gluster.org>
Subject: Re: [Gluster-users] Pausing rebalance
Message-ID: <1386657166.1682.130.camel at tc1>
Content-Type: text/plain; charset="UTF-8"
Thanks for clearing that up. I had to wait about 30 minutes for all
rebalancing activity to cease, then I was able to add a new brick.
What does it use to migrate the files? The copy rate was pretty slow
considering both bricks were on the same server, I only saw about
200MB/Sec. Each brick is a 16 disk ZFS raidz2, copying with dd I can get
well over 500MB/Sec.
On Tue, 2013-12-10 at 11:30 +0530, Kaushal M wrote:
> On Tue, Dec 10, 2013 at 11:09 AM, Franco Broi <franco.broi at iongeo.com>
wrote:
> > On Tue, 2013-12-10 at 10:56 +0530, shishir gowda wrote:
> >> Hi Franco,
> >>
> >>
> >> If a file is under migration, and a rebalance stop is encountered,
> >> then rebalance process exits only after the completion of the
> >> migration.
> >>
> >> That might be one of the reasons why you saw rebalance in progress
> >> message while trying to add the brick
> >
> > The status said it was stopped. I didn't do a top on the machine but
are
> > you saying that it was still rebalancing despite saying it had
stopped?
> >
>
> The 'stopped' status is a little bit misleading. The rebalance process
> could have been migrating a large file when the stop command was
> issued, so the process would continue migrating that file and quit
> once it finished. In this time period, though the status says
> 'stopped' the rebalance process is actually running, which prevents
> other operations from happening. Ideally, we would have a 'stopping'
> status which would convey the correct meaning. But for now we can only
> verify that a rebalance process has actually stopped by monitoring the
> actual rebalance process. The rebalance process is a 'glusterfs'
> process with some arguments containing rebalance.
>
> >>
> >> Could you please share the average file size in your setup?
> >>
> >
> > Bit hard to say, I just copied some data from our main processing
> > system. The sizes range from very small to 10's of gigabytes.
> >
> >>
> >> You could always check the rebalance status command to ensure
> >> rebalance has indeed completed/stopped before proceeding with the
> >> add-brick. Using add-brick force while rebalance is on-going should
> >> not be used in normal scenarios. I do see that in your case, they
show
> >> stopped/completed. Glusterd logs would help in triaging the issue.
> >
> > See attached.
> >
> >>
> >>
> >> Rebalance re-writes layouts, and migrates data. While this is
> >> happening, if a add-brick is done, then the cluster might go into a
> >> imbalanced stated. Hence, the check if rebalance is in progress while
> >> doing add-brick
> >
> > I can see that but as far as I could tell, the rebalance had stopped
> > according to the status.
> >
> > Just to be clear, what command restarts the rebalancing?
> >
> >>
> >>
> >> With regards,
> >> Shishir
> >>
> >>
> >>
> >> On 10 December 2013 10:39, Franco Broi <franco.broi at iongeo.com>
wrote:
> >>
> >> Before attempting a rebalance on my existing distributed
> >> Gluster volume
> >> I thought I'd do some testing with my new storage. I created
a
> >> volume
> >> consisting of 4 bricks on the same server and wrote some data
> >> to it. I
> >> then added a new brick from a another server. I ran the
> >> fix-layout and
> >> wrote some new files and could see them on the new brick. All
> >> good so
> >> far, so I started the data rebalance. After it had been
> >> running for a
> >> while I wanted to add another brick, which I obviously
> >> couldn't do while
> >> it was running so I stopped it. Even with it stopped It
> >> wouldn't let me
> >> add a brick so I tried restarting it, but it wouldn't let me
> >> do that
> >> either. I presume you just reissue the start command as
> >> there's no
> >> restart?
> >>
> >> [root at nas3 ~]# gluster vol rebalance test-volume status
> >> Node Rebalanced-files
> >> size scanned failures skipped
> >> status run time in secs
> >> --------- ----------- ----------- -----------
> >> ----------- ----------- ------------ --------------
> >> localhost 7 611.7GB 1358
> >> 0 10 stopped 4929.00
> >> localhost 7 611.7GB 1358
> >> 0 10 stopped 4929.00
> >> nas4-10g 0 0Bytes 1506
> >> 0 0 completed 8.00
> >> volume rebalance: test-volume: success:
> >> [root at nas3 ~]# gluster vol add-brick test-volume
> >> nas4-10g:/data14/gvol
> >> volume add-brick: failed: Volume name test-volume rebalance
is
> >> in progress. Please retry after completion
> >> [root at nas3 ~]# gluster vol rebalance test-volume start
> >> volume rebalance: test-volume: failed: Rebalance on
> >> test-volume is already started
> >>
> >> In the end I used the force option to make it start but was
> >> that the
> >> right thing to do?
> >>
> >> glusterfs 3.4.1 built on Oct 28 2013 11:01:59
> >> Volume Name: test-volume
> >> Type: Distribute
> >> Volume ID: 56ee0173-aed1-4be6-a809-ee0544f9e066
> >> Status: Started
> >> Number of Bricks: 5
> >> Transport-type: tcp
> >> Bricks:
> >> Brick1: nas3-10g:/data9/gvol
> >> Brick2: nas3-10g:/data10/gvol
> >> Brick3: nas3-10g:/data11/gvol
> >> Brick4: nas3-10g:/data12/gvol
> >> Brick5: nas4-10g:/data13/gvol
> >>
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> >>
> >>
> >
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
------------------------------
Message: 33
Date: Tue, 10 Dec 2013 07:42:57 +0000
From: Bobby Jacob <bobby.jacob at alshaya.com>
To: Joe Julian <joe at julianfamily.org>
Cc: "gluster-users at gluster.org" <gluster-users at gluster.org>
Subject: Re: [Gluster-users] Self Heal Issue GlusterFS 3.3.1
Message-ID:
<AC3305F9C186F849B835A3E6D3C9BEFEB5A841 at KWTPRMBX001.mha.local>
Content-Type: text/plain; charset="utf-8"
Hi,
Thanks Joe, the split brain files have been removed as you recommended.
How can we deal with this situation as there is no document which solves
such issues. ?
[root at KWTOCUATGS001 83]# gluster volume heal glustervol info
Gathering Heal info on volume glustervol has been successful
Brick KWTOCUATGS001:/mnt/cloudbrick
Number of entries: 14
/Tommy Kolega
<gfid:10429dd5-180c-432e-aa4a-8b1624b86f4b>
<gfid:7883309e-8764-4cf6-82a6-d8d81cb60dd7>
<gfid:3e3d77d6-2818-4766-ae3b-4f582118321b>
<gfid:8bd03482-025c-4c09-8704-60be9ddfdfd8>
<gfid:2685e11a-4eb9-4a92-883e-faa50edfa172>
<gfid:24d83cbd-e621-4330-b0c1-ae1f0fd2580d>
<gfid:197e50fa-bfc0-4651-acaa-1f3d2d73936f>
<gfid:3e094ee9-c9cf-4010-82f4-6d18c1ab9ca0>
<gfid:77783245-4e03-4baf-8cb4-928a57b266cb>
<gfid:70340eaa-7967-41d0-855f-36add745f16f>
<gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f>
<gfid:b1651457-175a-43ec-b476-d91ae8b52b0b>
/Tommy Kolega/lucene_index
Brick KWTOCUATGS002:/mnt/cloudbrick
Number of entries: 15
<gfid:7883309e-8764-4cf6-82a6-d8d81cb60dd7>
<gfid:0454d0d2-d432-4ac8-8476-02a8522e4a6a>
<gfid:a7e88fd1-6e32-40ab-90f6-ea452242a7c6>
<gfid:00389876-700f-4351-b00e-1c57496eed89>
<gfid:0cd48d89-1dd2-47f6-9311-58224b19446e>
<gfid:081c6657-301a-42a4-9f95-6eeba6c67413>
<gfid:565f1358-449c-45e2-8535-93b5632c0d1e>
<gfid:9c7010ac-5c11-4561-8b86-5c4d6561f34e>
<gfid:25fd406f-63e0-4037-bb01-da282cbe4d76>
<gfid:a109c429-5885-499e-8711-09fdccd396f2>
<gfid:5a8fd3bf-9215-444c-b974-5c280f5699a6>
/Tommy Kolega
/Tommy Kolega/lucene_index
<gfid:c49e9d76-e5d4-47dc-9cf1-3f858f6d07ea>
<gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f>
Thanks & Regards,
Bobby Jacob
-----Original Message-----
From: Joe Julian [mailto:joe at julianfamily.org]
Sent: Tuesday, December 10, 2013 7:59 AM
To: Bobby Jacob
Cc: gluster-users at gluster.org
Subject: Re: [Gluster-users] Self Heal Issue GlusterFS 3.3.1
On Tue, 2013-12-03 at 05:47 +0000, Bobby Jacob wrote:
> Hi,
>
>
>
> I?m running glusterFS 3.3.1 on Centos 6.4.
>
> ? Gluster volume status
>
>
>
> Status of volume: glustervol
>
> Gluster process Port Online
> Pid
>
> ----------------------------------------------------------------------
> --------
>
> Brick KWTOCUATGS001:/mnt/cloudbrick 24009 Y
> 20031
>
> Brick KWTOCUATGS002:/mnt/cloudbrick 24009 Y
> 1260
>
> NFS Server on localhost
> 38467 Y 43320
>
> Self-heal Daemon on localhost N/A
> Y 43326
>
> NFS Server on KWTOCUATGS002 38467 Y
> 5842
>
> Self-heal Daemon on KWTOCUATGS002 N/A Y
> 5848
>
>
>
> The self heal stops working and application write only to 1 brick and
> it doesn?t replicate. When I check /var/log/glusterfs/glustershd.log I
> see the following.:
>
>
>
> [2013-12-03 05:42:32.033563] W [socket.c:410:__socket_keepalive]
> 0-socket: failed to set keep idle on socket 8
>
> [2013-12-03 05:42:32.033646] W
> [socket.c:1876:socket_server_event_handler] 0-socket.glusterfsd:
> Failed to set keep-alive: Operation not supported
>
> [2013-12-03 05:42:32.790473] I
> [client-handshake.c:1614:select_server_supported_programs]
> 0-glustervol-client-1: Using Program GlusterFS 3.3.2, Num (1298437),
> Version (330)
>
> [2013-12-03 05:42:32.790840] I
> [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-1:
> Connected to 172.16.95.153:24009, attached to remote volume
> '/mnt/cloudbrick'.
>
> [2013-12-03 05:42:32.790884] I
> [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-1:
> Server and Client lk-version numbers are not same, reopening the fds
>
> [2013-12-03 05:42:32.791003] I [afr-common.c:3685:afr_notify]
> 0-glustervol-replicate-0: Subvolume 'glustervol-client-1' came back
> up; going online.
>
> [2013-12-03 05:42:32.791161] I
> [client-handshake.c:453:client_set_lk_version_cbk]
> 0-glustervol-client-1: Server lk version = 1
>
> [2013-12-03 05:42:32.795103] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:a7e88fd1-6e32-40ab-90f6-ea452242a7c6> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
>
> [2013-12-03 05:42:32.798064] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:081c6657-301a-42a4-9f95-6eeba6c67413> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
>
> [2013-12-03 05:42:32.799278] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:565f1358-449c-45e2-8535-93b5632c0d1e> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
>
> [2013-12-03 05:42:32.800636] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:9c7010ac-5c11-4561-8b86-5c4d6561f34e> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
>
> [2013-12-03 05:42:32.802223] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:25fd406f-63e0-4037-bb01-da282cbe4d76> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
>
> [2013-12-03 05:42:32.803339] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:a109c429-5885-499e-8711-09fdccd396f2> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
>
> [2013-12-03 05:42:32.804308] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:5a8fd3bf-9215-444c-b974-5c280f5699a6> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
>
> [2013-12-03 05:42:32.804877] I
> [client-handshake.c:1614:select_server_supported_programs]
> 0-glustervol-client-0: Using Program GlusterFS 3.3.2, Num (1298437),
> Version (330)
>
> [2013-12-03 05:42:32.807517] I
> [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-0:
> Connected to 172.16.107.154:24009, attached to remote volume
> '/mnt/cloudbrick'.
>
> [2013-12-03 05:42:32.807562] I
> [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-0:
> Server and Client lk-version numbers are not same, reopening the fds
>
> [2013-12-03 05:42:32.810357] I
> [client-handshake.c:453:client_set_lk_version_cbk]
> 0-glustervol-client-0: Server lk version = 1
>
> [2013-12-03 05:42:32.827437] E
> [afr-self-heal-data.c:764:afr_sh_data_fxattrop_fstat_done]
> 0-glustervol-replicate-0: Unable to self-heal contents of
> '<gfid:1262d40d-46a3-4e57-b07b-0fcc972c8403>' (possible split-brain).
> Please delete the file from all but the preferred subvolume.
That file is at
$brick/.glusterfs/12/62/1262d40d-46a3-4e57-b07b-0fcc972c8403
Try picking one to remove like it says.
>
> [2013-12-03 05:42:39.205157] E
> [afr-self-heal-metadata.c:472:afr_sh_metadata_fix]
> 0-glustervol-replicate-0: Unable to self-heal permissions/ownership of
> '<gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f>' (possible split-brain).
> Please fix the file on all backend volumes
>
> [2013-12-03 05:42:39.215793] E
> [afr-self-heal-metadata.c:472:afr_sh_metadata_fix]
> 0-glustervol-replicate-0: Unable to self-heal permissions/ownership of
> '<gfid:c0660768-289f-48ac-b8e5-e5b5a3a4b965>' (possible split-brain).
> Please fix the file on all backend volumes
>
>
If that doesn't allow it to heal, you may need to find which filename
that's hardlinked to. ls -li the gfid file at the path I demonstrated
earlier. With that inode number in hand, find $brick -inum $inode_number
Once you know which filenames it's linked with, remove all linked copies
from all but one replica. Then the self-heal can continue successfully.
------------------------------
Message: 34
Date: Tue, 10 Dec 2013 09:30:22 +0100
From: Johan Huysmans <johan.huysmans at inuits.be>
To: "gluster-users at gluster.org" <gluster-users at gluster.org>
Subject: [Gluster-users] Structure needs cleaning on some files
Message-ID: <52A6D11E.4030406 at inuits.be>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi All,
When reading some files we get this error:
md5sum: /path/to/file.xml: Structure needs cleaning
in /var/log/glusterfs/mnt-sharedfs.log we see these errors:
[2013-12-10 08:07:32.256910] W
[client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0: remote
operation failed: No such file or directory
[2013-12-10 08:07:32.257436] W
[client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-1: remote
operation failed: No such file or directory
[2013-12-10 08:07:32.259356] W [fuse-bridge.c:705:fuse_attr_cbk]
0-glusterfs-fuse: 8230: STAT() /path/to/file.xml => -1 (Structure needs
cleaning)
We are using gluster 3.4.1-3 on CentOS6.
Our servers are 64-bit, our clients 32-bit (we are already using
--enable-ino32 on the mountpoint)
This is my gluster configuration:
Volume Name: testvolume
Type: Replicate
Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: SRV-1:/gluster/brick1
Brick2: SRV-2:/gluster/brick2
Options Reconfigured:
performance.force-readdirp: on
performance.stat-prefetch: off
network.ping-timeout: 5
And this is how the applications work:
We have 2 client nodes who both have a fuse.glusterfs mountpoint.
On 1 client node we have a application which writes files.
On the other client node we have a application which reads these files.
On the node where the files are written we don't see any problem, and
can read that file without problems.
On the other node we have problems (error messages above) reading that
file.
The problem occurs when we perform a md5sum on the exact file, when
perform a md5sum on all files in that directory there is no problem.
How can we solve this problem as this is annoying.
The problem occurs after some time (can be days), an umount and mount of
the mountpoint solves it for some days.
Once it occurs (and we don't remount) it occurs every time.
I hope someone can help me with this problems.
Thanks,
Johan Huysmans
------------------------------
Message: 35
Date: Tue, 10 Dec 2013 08:56:56 +0000
From: "Bernhard Glomm" <bernhard.glomm at ecologic.eu>
To: vbellur at redhat.com, mrcuongnv at gmail.com
Cc: gluster-users at gluster.org
Subject: Re: [Gluster-users] replace-brick failing -
transport.address-family not specified
Message-ID: <03a55549428f5909f0b3db1dee93d8c55e3ba3c3 at ecologic.eu>
Content-Type: text/plain; charset="utf-8"
Am 10.12.2013 06:39:47, schrieb Vijay Bellur:
> On 12/08/2013 07:06 PM, Nguyen Viet Cuong wrote:
> > Thanks for sharing.
> >
> > Btw, I do believe that GlusterFS 3.2.x is much more stable than 3.4.x
in
> > production.
> >
> This is quite contrary to what we have seen in the community. From a
> development perspective too, we feel much better about 3.4.1. Are there
> specific instances that worked well with 3.2.x which does not work fine
> for you in 3.4.x?
987555 -?is that fixed in 3.5?Or did it even make it into 3.4.2couldn't
find a note on that.Show stopper for moving from?3.2.x to anywhere for me!
cheersb?
>
> Cheers,
> Vijay
>
>
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
--
Bernhard Glomm
IT Administration
Phone:
+49 (30) 86880 134
Fax:
+49 (30) 86880 100
Skype:
bernhard.glomm.ecologic
Ecologic Institut gemeinn?tzige GmbH | Pfalzburger Str. 43/44 |
10717 Berlin | Germany
GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 |
USt/VAT-IdNr.: DE811963464
Ecologic? is a Trade Mark (TM) of Ecologic Institut
gemeinn?tzige GmbH
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/475454d4/attachment-0001.html
>
------------------------------
Message: 36
Date: Tue, 10 Dec 2013 10:02:14 +0100
From: Johan Huysmans <johan.huysmans at inuits.be>
To: "gluster-users at gluster.org" <gluster-users at gluster.org>
Subject: Re: [Gluster-users] Structure needs cleaning on some files
Message-ID: <52A6D896.1020404 at inuits.be>
Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
I could reproduce this problem with while my mount point is running in
debug mode.
logfile is attached.
gr.
Johan Huysmans
On 10-12-13 09:30, Johan Huysmans wrote:
> Hi All,
>
> When reading some files we get this error:
> md5sum: /path/to/file.xml: Structure needs cleaning
>
> in /var/log/glusterfs/mnt-sharedfs.log we see these errors:
> [2013-12-10 08:07:32.256910] W
> [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0:
> remote operation failed: No such file or directory
> [2013-12-10 08:07:32.257436] W
> [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-1:
> remote operation failed: No such file or directory
> [2013-12-10 08:07:32.259356] W [fuse-bridge.c:705:fuse_attr_cbk]
> 0-glusterfs-fuse: 8230: STAT() /path/to/file.xml => -1 (Structure
> needs cleaning)
>
> We are using gluster 3.4.1-3 on CentOS6.
> Our servers are 64-bit, our clients 32-bit (we are already using
> --enable-ino32 on the mountpoint)
>
> This is my gluster configuration:
> Volume Name: testvolume
> Type: Replicate
> Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: SRV-1:/gluster/brick1
> Brick2: SRV-2:/gluster/brick2
> Options Reconfigured:
> performance.force-readdirp: on
> performance.stat-prefetch: off
> network.ping-timeout: 5
>
> And this is how the applications work:
> We have 2 client nodes who both have a fuse.glusterfs mountpoint.
> On 1 client node we have a application which writes files.
> On the other client node we have a application which reads these files.
> On the node where the files are written we don't see any problem, and
> can read that file without problems.
> On the other node we have problems (error messages above) reading that
> file.
> The problem occurs when we perform a md5sum on the exact file, when
> perform a md5sum on all files in that directory there is no problem.
>
>
> How can we solve this problem as this is annoying.
> The problem occurs after some time (can be days), an umount and mount
> of the mountpoint solves it for some days.
> Once it occurs (and we don't remount) it occurs every time.
>
>
> I hope someone can help me with this problems.
>
> Thanks,
> Johan Huysmans
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gluster_debug.log
Type: text/x-log
Size: 16600 bytes
Desc: not available
URL: <
http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/bdf626dc/attachment-0001.bin
>
------------------------------
Message: 37
Date: Tue, 10 Dec 2013 10:08:43 +0100
From: Heiko Kr?mer <hkraemer at anynines.com>
To: gluster-users at gluster.org
Subject: Re: [Gluster-users] Gluster infrastructure question
Message-ID: <52A6DA1B.3030209 at anynines.com>
Content-Type: text/plain; charset="iso-8859-1"
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi guys,
thanks for all these reports. Well, I think I'll change my Raid level
to 6 and let the Raid controller build and rebuild all Raid members
and replicate again with glusterFS. I get more capacity but I need to
check if the write throughput acceptable.
I think, I can't take advantage of using glusterFS with a lot of
Bricks because I've found more cons as pros in my case.
@Ben thx for this very detailed document!
Cheers and Thanks
Heiko
On 10.12.2013 00:38, Dan Mons wrote:
> On 10 December 2013 08:09, Joe Julian <joe at julianfamily.org>
> wrote:
>> Replicas are defined in the order bricks are listed in the volume
>> create command. So gluster volume create myvol replica 2
>> server1:/data/brick1 server2:/data/brick1 server3:/data/brick1
>> server4:/data/brick1 will replicate between server1 and server2
>> and replicate between server3 and server4.
>>
>> Bricks added to a replica 2 volume after it's been created will
>> require pairs of bricks,
>>
>> The best way to "force" replication to happen on another server
>> is to just define it that way.
>
> Yup, that's understood. The problem is when (for argument's sake)
> :
>
> * We've defined 4 hosts with 10 disks each * Each individual disk
> is a brick * Replication is defined correctly when creating the
> volume initially * I'm on holidays, my employer buys a single node,
> configures it brick-per-disk, and the IT junior adds it to the
> cluster
>
> All good up until that final point, and then I've got that fifth
> node at the end replicating to itself. Node goes down some months
> later, chaos ensues.
>
> Not a GlusterFS/technology problem, but a problem with what
> frequently happens at a human level. As a sysadmin, these are also
> things I need to work around, even if it means deviating from best
> practices. :)
>
> -Dan _______________________________________________ Gluster-users
> mailing list Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
- --
Anynines.com
Avarteq GmbH
B.Sc. Informatik
Heiko Kr?mer
CIO
Twitter: @anynines
- ----
Gesch?ftsf?hrer: Alexander Fai?t, Dipl.-Inf.(FH) Julian Fischer
Handelsregister: AG Saarbr?cken HRB 17413, Ust-IdNr.: DE262633168
Sitz: Saarbr?cken
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQEcBAEBAgAGBQJSptoTAAoJELxFogM4ixOFJTsIAJBWed3AGiiI+PDC2ubfboKc
UPkMc+zuirRh2+QJBAoZ4CsAv9eIZ5NowclSSby9PTq2XRjjLvMdKuI+IbXCRT4j
AbMLYfP3g4Q+agXnY6N6WJ6ZIqXQ8pbCK3shYp9nBfVYkiDUT1bGk0WcgQmEWTCw
ta1h17LYkworIDRtqWQAl4jr4JR4P3x4cmwOZiHCVCtlyOP02x/fN4dji6nyOtuB
kQPBVsND5guQNU8Blg5cQoES5nthtuwJdkWXB+neaCZd/u3sexVSNe5m15iWbyYg
mAoVvlBJ473IKATlxM5nVqcUhmjFwNcc8MMwczXxTkwniYzth53BSoltPn7kIx4=
=epys
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hkraemer.vcf
Type: text/x-vcard
Size: 277 bytes
Desc: not available
URL: <
http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/f663943d/attachment-0001.vcf
>
------------------------------
Message: 38
Date: Tue, 10 Dec 2013 10:42:43 +0100
From: Johan Huysmans <johan.huysmans at inuits.be>
To: gluster-users at gluster.org, bill.mair at web.de
Subject: Re: [Gluster-users] Errors from PHP stat() on files and
directories in a glusterfs mount
Message-ID: <52A6E213.3000109 at inuits.be>
Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
Hi,
It seems I have a related problem (just posted this on the mailing list).
Do you already have a solution for this problem?
gr.
Johan Huysmans
On 05-12-13 20:05, Bill Mair wrote:
> Hi,
>
> I'm trying to use glusterfs to mirror the ownCloud "data" area between
> 2 servers.
>
> They are using debian jessie due to some dependancies that I have for
> other components.
>
> This is where my issue rears it's ugly head. This is failing because I
> can't stat the files and directories on my glusterfs mount.
>
> /var/www/owncloud/data is where I am mounting the volume and I can
> reproduce the error using a simple php test application, so I don't
> think that it is apache or owncloud related.
>
> I'd be grateful for any pointers on how to resolve this problem.
>
> Thanks,
>
> Bill
>
> Attached is "simple.php" test and the results of executing "strace
> php5 simple.php" twice, once with the glusterfs mounted
> (simple.php.strace-glusterfs) and once against the file system when
> unmounted (simple.php.strace-unmounted).
>
> ------------------------------------------------------------------------
>
> Here is what I get in the gluster log when I run the test (as root):
>
> /var/log/glusterfs/var-www-owncloud-data.log
>
> [2013-12-05 18:33:50.802250] D
> [client-handshake.c:185:client_start_ping] 0-gv-ocdata-client-0:
> returning as transport is already disconnected OR there are no frames
> (0 || 0)
> [2013-12-05 18:33:50.825132] D
> [afr-self-heal-common.c:138:afr_sh_print_pending_matrix]
> 0-gv-ocdata-replicate-0: pending_matrix: [ 0 0 ]
> [2013-12-05 18:33:50.825322] D
> [afr-self-heal-common.c:138:afr_sh_print_pending_matrix]
> 0-gv-ocdata-replicate-0: pending_matrix: [ 0 0 ]
> [2013-12-05 18:33:50.825393] D
> [afr-self-heal-common.c:887:afr_mark_sources] 0-gv-ocdata-replicate-0:
> Number of sources: 0
> [2013-12-05 18:33:50.825456] D
> [afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type]
> 0-gv-ocdata-replicate-0: returning read_child: 0
> [2013-12-05 18:33:50.825511] D
> [afr-common.c:1380:afr_lookup_select_read_child]
> 0-gv-ocdata-replicate-0: Source selected as 0 for /
> [2013-12-05 18:33:50.825579] D
> [afr-common.c:1117:afr_lookup_build_response_params]
> 0-gv-ocdata-replicate-0: Building lookup response from 0
> [2013-12-05 18:33:50.827069] D
> [afr-common.c:131:afr_lookup_xattr_req_prepare]
> 0-gv-ocdata-replicate-0: /check.txt: failed to get the gfid from dict
> [2013-12-05 18:33:50.829409] D
> [client-handshake.c:185:client_start_ping] 0-gv-ocdata-client-0:
> returning as transport is already disconnected OR there are no frames
> (0 || 0)
> [2013-12-05 18:33:50.836719] D
> [afr-self-heal-common.c:138:afr_sh_print_pending_matrix]
> 0-gv-ocdata-replicate-0: pending_matrix: [ 0 0 ]
> [2013-12-05 18:33:50.836870] D
> [afr-self-heal-common.c:138:afr_sh_print_pending_matrix]
> 0-gv-ocdata-replicate-0: pending_matrix: [ 0 0 ]
> [2013-12-05 18:33:50.836941] D
> [afr-self-heal-common.c:887:afr_mark_sources] 0-gv-ocdata-replicate-0:
> Number of sources: 0
> [2013-12-05 18:33:50.837002] D
> [afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type]
> 0-gv-ocdata-replicate-0: returning read_child: 0
> [2013-12-05 18:33:50.837058] D
> [afr-common.c:1380:afr_lookup_select_read_child]
> 0-gv-ocdata-replicate-0: Source selected as 0 for /check.txt
> [2013-12-05 18:33:50.837129] D
> [afr-common.c:1117:afr_lookup_build_response_params]
> 0-gv-ocdata-replicate-0: Building lookup response from 0
>
> Other bits of information
>
> root at bbb-1:/var/www/owncloud# uname -a
> Linux bbb-1 3.8.13-bone30 #1 SMP Thu Nov 14 02:59:07 UTC 2013 armv7l
> GNU/Linux
>
> root at bbb-1:/var/www/owncloud# dpkg -l glusterfs-*
> Desired=Unknown/Install/Remove/Purge/Hold
> |
>
Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
> |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
> ||/ Name Version Architecture Description
>
+++-============================================-===========================-===========================-==============================================================================================
> ii glusterfs-client 3.4.1-1 armhf clustered
> file-system (client package)
> ii glusterfs-common 3.4.1-1 armhf GlusterFS
> common libraries and translator modules
> ii glusterfs-server 3.4.1-1 armhf clustered
> file-system (server package)
>
> mount
>
> bbb-1:gv-ocdata on /var/www/owncloud/data type fuse.glusterfs
>
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
>
> /etc/fstab
>
> UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx /sdhc ext4 defaults 0 0
> bbb-1:gv-ocdata /var/www/owncloud/data glusterfs
> defaults,_netdev,log-level=DEBUG 0 0
>
> ls -al on the various paths
>
> root at bbb-1:/var/log/glusterfs# ll -d /sdhc/
> drwxrwxr-x 7 root root 4096 Nov 28 19:15 /sdhc/
>
> root at bbb-1:/var/log/glusterfs# ll -d /sdhc/gv-ocdata/
> drwxrwx--- 5 www-data www-data 4096 Dec 5 00:50 /sdhc/gv-ocdata/
>
> root at bbb-1:/var/log/glusterfs# ll -d /sdhc/gv-ocdata/check.txt
> -rw-r--r-- 2 root root 10 Dec 5 00:50 /sdhc/gv-ocdata/check.txt
>
> root at bbb-1:/var/www/owncloud# ll -d /var/www/owncloud/data/
> drwxrwx--- 5 www-data www-data 4096 Dec 5 00:50 /var/www/owncloud/data/
>
> root at bbb-1:/var/www/owncloud# ll -d /var/www/owncloud/data/check.txt
> -rw-r--r-- 1 root root 10 Dec 5 00:50 /var/www/owncloud/data/check.txt
>
> file & dir attr information:
>
> root at bbb-1:/var/www/owncloud# attr -l /var/www/owncloud/data
> Attribute "glusterfs.volume-id" has a 16 byte value for
> /var/www/owncloud/data
>
> root at bbb-1:/var/www/owncloud# attr -l /var/www/owncloud/data/check.txt
> root at bbb-1:/var/www/owncloud#
>
> root at bbb-1:/var/www/owncloud# attr -l /sdhc/gv-ocdata/
> Attribute "glusterfs.volume-id" has a 16 byte value for /sdhc/gv-ocdata/
> Attribute "gfid" has a 16 byte value for /sdhc/gv-ocdata/
> Attribute "glusterfs.dht" has a 16 byte value for /sdhc/gv-ocdata/
> Attribute "afr.gv-ocdata-client-0" has a 12 byte value for
> /sdhc/gv-ocdata/
> Attribute "afr.gv-ocdata-client-1" has a 12 byte value for
> /sdhc/gv-ocdata/
>
> root at bbb-1:/var/www/owncloud# attr -l /sdhc/gv-ocdata/check.txt
> Attribute "gfid" has a 16 byte value for /sdhc/gv-ocdata/check.txt
> Attribute "afr.gv-ocdata-client-0" has a 12 byte value for
> /sdhc/gv-ocdata/check.txt
> Attribute "afr.gv-ocdata-client-1" has a 12 byte value for
> /sdhc/gv-ocdata/check.txt
> root at bbb-1:/var/www/owncloud#
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/d77e25bb/attachment-0001.html
>
------------------------------
Message: 39
Date: Tue, 10 Dec 2013 21:03:36 +1100
From: Andrew Lau <andrew at andrewklau.com>
To: Ben Turner <bturner at redhat.com>
Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org>
Subject: Re: [Gluster-users] Gluster infrastructure question
Message-ID:
<CAD7dF9c3uexEG++1YEHwh3zw7a1Xy+=Co_xO+zrDrggDuV2DJQ at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi Ben,
For glusterfs would you recommend the enterprise-storage
or throughput-performance tuned profile?
Thanks,
Andrew
On Tue, Dec 10, 2013 at 6:31 AM, Ben Turner <bturner at redhat.com> wrote:
> ----- Original Message -----
> > From: "Ben Turner" <bturner at redhat.com>
> > To: "Heiko Kr?mer" <hkraemer at anynines.de>
> > Cc: "gluster-users at gluster.org List" <gluster-users at gluster.org>
> > Sent: Monday, December 9, 2013 2:26:45 PM
> > Subject: Re: [Gluster-users] Gluster infrastructure question
> >
> > ----- Original Message -----
> > > From: "Heiko Kr?mer" <hkraemer at anynines.de>
> > > To: "gluster-users at gluster.org List" <gluster-users at gluster.org>
> > > Sent: Monday, December 9, 2013 8:18:28 AM
> > > Subject: [Gluster-users] Gluster infrastructure question
> > >
> > > -----BEGIN PGP SIGNED MESSAGE-----
> > > Hash: SHA1
> > >
> > > Heyho guys,
> > >
> > > I'm running since years glusterfs in a small environment without big
> > > problems.
> > >
> > > Now I'm going to use glusterFS for a bigger cluster but I've some
> > > questions :)
> > >
> > > Environment:
> > > * 4 Servers
> > > * 20 x 2TB HDD, each
> > > * Raidcontroller
> > > * Raid 10
> > > * 4x bricks => Replicated, Distributed volume
> > > * Gluster 3.4
> > >
> > > 1)
> > > I'm asking me, if I can delete the raid10 on each server and create
> > > for each HDD a separate brick.
> > > In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is
there
> > > any experience about the write throughput in a production system
with
> > > many of bricks like in this case? In addition i'll get double of HDD
> > > capacity.
> >
> > Have a look at:
> >
> >
http://rhsummit.files.wordpress.com/2012/03/england-rhs-performance.pdf
>
> That one was from 2012, here is the latest:
>
>
>
http://rhsummit.files.wordpress.com/2013/07/england_th_0450_rhs_perf_practices-4_neependra.pdf
>
> -b
>
> > Specifically:
> >
> > ? RAID arrays
> > ? More RAID LUNs for better concurrency
> > ? For RAID6, 256-KB stripe size
> >
> > I use a single RAID 6 that is divided into several LUNs for my bricks.
> For
> > example, on my Dell servers(with PERC6 RAID controllers) each server
has
> 12
> > disks that I put into raid 6. Then I break the RAID 6 into 6 LUNs and
> > create a new PV/VG/LV for each brick. From there I follow the
> > recommendations listed in the presentation.
> >
> > HTH!
> >
> > -b
> >
> > > 2)
> > > I've heard a talk about glusterFS and out scaling. The main point
was
> > > if more bricks are in use, the scale out process will take a long
> > > time. The problem was/is the Hash-Algo. So I'm asking me how is it
if
> > > I've one very big brick (Raid10 20TB on each server) or I've much
more
> > > bricks, what's faster and is there any issues?
> > > Is there any experiences ?
> > >
> > > 3)
> > > Failover of a HDD is for a raid controller with HotSpare HDD not a
big
> > > deal. Glusterfs will rebuild automatically if a brick fails and
there
> > > are no data present, this action will perform a lot of network
traffic
> > > between the mirror bricks but it will handle it equal as the raid
> > > controller right ?
> > >
> > >
> > >
> > > Thanks and cheers
> > > Heiko
> > >
> > >
> > >
> > > - --
> > > Anynines.com
> > >
> > > Avarteq GmbH
> > > B.Sc. Informatik
> > > Heiko Kr?mer
> > > CIO
> > > Twitter: @anynines
> > >
> > > - ----
> > > Gesch?ftsf?hrer: Alexander Fai?t, Dipl.-Inf.(FH) Julian Fischer
> > > Handelsregister: AG Saarbr?cken HRB 17413, Ust-IdNr.: DE262633168
> > > Sitz: Saarbr?cken
> > > -----BEGIN PGP SIGNATURE-----
> > > Version: GnuPG v1.4.14 (GNU/Linux)
> > > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
> > >
> > > iQEcBAEBAgAGBQJSpcMfAAoJELxFogM4ixOF/ncH/3L9DvOWHrF0XBqCgeT6QQ6B
> > > lDwtXiD9xoznht0Zs2S9LA9Z7r2l5/fzMOUSOawEMv6M16Guwq3gQ1lClUi4Iwj0
> > > GKKtYQ6F4aG4KXHY4dlu1QKT5OaLk8ljCQ47Tc9aAiJMhfC1/IgQXOslFv26utdJ
> > > N9jxiCl2+r/tQvQRw6mA4KAuPYPwOV+hMtkwfrM4UsIYGGbkNPnz1oqmBsfGdSOs
> > > TJh6+lQRD9KYw72q3I9G6ZYlI7ylL9Q7vjTroVKH232pLo4G58NLxyvWvcOB9yK6
> > > Bpf/gRMxFNKA75eW5EJYeZ6EovwcyCAv7iAm+xNKhzsoZqbBbTOJxS5zKm4YWoY=
> > > =bDly
> > > -----END PGP SIGNATURE-----
> > >
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/b19779ff/attachment-0001.html
>
------------------------------
Message: 40
Date: Tue, 10 Dec 2013 15:34:56 +0530
From: Vijay Bellur <vbellur at redhat.com>
To: Bernhard Glomm <bernhard.glomm at ecologic.eu>, mrcuongnv at gmail.com
Cc: gluster-users at gluster.org
Subject: Re: [Gluster-users] replace-brick failing -
transport.address-family not specified
Message-ID: <52A6E748.5070300 at redhat.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
On 12/10/2013 02:26 PM, Bernhard Glomm wrote:
> Am 10.12.2013 06:39:47, schrieb Vijay Bellur:
>
> On 12/08/2013 07:06 PM, Nguyen Viet Cuong wrote:
>
> Thanks for sharing.
>
> Btw, I do believe that GlusterFS 3.2.x is much more stable than
> 3.4.x in
> production.
>
>
> This is quite contrary to what we have seen in the community. From a
> development perspective too, we feel much better about 3.4.1. Are
there
> specific instances that worked well with 3.2.x which does not work
fine
> for you in 3.4.x?
>
>
> 987555 - is that fixed in 3.5?
>
> Or did it even make it into 3.4.2
>
> couldn't find a note on that.
>
Yes, this will be part of 3.4.2. Note that the original problem was due
to libvirt being rigid about the ports that it needs to use for
migrations. AFAIK this has been addressed in upstream libvirt as well.
Through this bug fix, glusterfs provides a mechanism where it can use a
separate range of ports for bricks. This configuration can be enabled to
work with other applications that do not adhere with guidelines laid out
by IANA.
Cheers,
Vijay
------------------------------
Message: 41
Date: Tue, 10 Dec 2013 15:38:16 +0530
From: Vijay Bellur <vbellur at redhat.com>
To: Alexandru Coseru <alex.coseru at simplus.ro>,
gluster-users at gluster.org
Subject: Re: [Gluster-users] Gluster - replica - Unable to self-heal
contents of '/' (possible split-brain)
Message-ID: <52A6E810.9050900 at redhat.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
On 12/09/2013 07:21 PM, Alexandru Coseru wrote:
>
> [2013-12-09 13:20:52.066978] E
> [afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
> 0-stor1-replicate-0: Unable to self-heal contents of '/' (possible
> split-brain). Please delete the file from all but the preferred
> subvolume.- Pending matrix: [ [ 0 2 ] [ 2 0 ] ]
>
> [2013-12-09 13:20:52.067386] E
> [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk]
> 0-stor1-replicate-0: background meta-data self-heal failed on /
>
> [2013-12-09 13:20:52.067452] E [mount3.c:290:mnt3svc_lookup_mount_cbk]
> 0-nfs: error=Input/output error
>
> [2013-12-09 13:20:53.092039] E
> [afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
> 0-stor1-replicate-0: Unable to self-heal contents of '/' (possible
> split-brain). Please delete the file from all but the preferred
> subvolume.- Pending matrix: [ [ 0 2 ] [ 2 0 ] ]
>
> [2013-12-09 13:20:53.092497] E
> [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk]
> 0-stor1-replicate-0: background meta-data self-heal failed on /
>
> [2013-12-09 13:20:53.092559] E [mount3.c:290:mnt3svc_lookup_mount_cbk]
> 0-nfs: error=Input/output error
>
> What I?m doing wrong ?
Looks like there is a metadata split-brain on /.
The split-brain resolution document at [1] can possibly be of help here.
-Vijay
[1] https://github.com/gluster/glusterfs/blob/master/doc/split-brain.md
>
> PS: Volume stor_fast works like a charm.
>
Good to know, thanks!
------------------------------
Message: 42
Date: Tue, 10 Dec 2013 11:59:44 +0100
From: "Mariusz Sobisiak" <MSobisiak at ydp.pl>
To: <gluster-users at gluster.org>
Subject: [Gluster-users] Error after crash of Virtual Machine during
migration
Message-ID:
<507D8C234E515F4F969362F9666D7EBBE875D1 at nagato1.intranet.ydp>
Content-Type: text/plain; charset="us-ascii"
Greetings,
Legend:
storage-gfs-3-prd - the first gluster.
storage-1-saas - new gluster where "the first gluster" had to be
migrated.
storage-gfs-4-prd - the second gluster (which had to be migrated later).
I've started command replace-brick:
'gluster volume replace-brick sa_bookshelf storage-gfs-3-prd:/ydp/shared
storage-1-saas:/ydp/shared start'
During that Virtual Machine (Xen) has crashed. Now I can't abort
migration and continue it again.
When I try:
'# gluster volume replace-brick sa_bookshelf
storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared abort'
The command lasts about 5 minutes then finishes with no results. Apart
from that Gluster after that command starts behave very strange.
For example I can't do '# gluster volume heal sa_bookshelf info' because
it lasts about 5 minutes and returns black screen (the same like abort).
Then I restart Gluster server and Gluster returns to normal work except
the replace-brick commands. When I do:
'# gluster volume replace-brick sa_bookshelf
storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared status'
I get:
Number of files migrated = 0 Current file=
I can do 'volume heal info' commands etc. until I call the command:
'# gluster volume replace-brick sa_bookshelf
storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared abort'.
# gluster --version
glusterfs 3.3.1 built on Oct 22 2012 07:54:24 Repository revision:
git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS
comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU
General Public License.
Brick (/ydp/shared) logs (repeats the same constantly):
[2013-12-06 11:29:44.790299] W [dict.c:995:data_to_str]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab
) [0x7ff4a5d35fcb]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r
emote_sockaddr+0x15d) [0x7ff4a5d3d64d]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address
_family+0x2bb) [0x7ff4a5d3d4ab]))) 0-dict: data is NULL
[2013-12-06 11:29:44.790402] W [dict.c:995:data_to_str]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab
) [0x7ff4a5d35fcb]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r
emote_sockaddr+0x15d) [0x7ff4a5d3d64d]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address
_family+0x2c6) [0x7ff4a5d3d4b6]))) 0-dict: data is NULL
[2013-12-06 11:29:44.790465] E [name.c:141:client_fill_address_family]
0-sa_bookshelf-replace-brick: transport.address-family not specified.
Could not guess default value from (remote-host:(null) or
transport.unix.connect-path:(null)) options
[2013-12-06 11:29:47.791037] W [dict.c:995:data_to_str]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab
) [0x7ff4a5d35fcb]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r
emote_sockaddr+0x15d) [0x7ff4a5d3d64d]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address
_family+0x2bb) [0x7ff4a5d3d4ab]))) 0-dict: data is NULL
[2013-12-06 11:29:47.791141] W [dict.c:995:data_to_str]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab
) [0x7ff4a5d35fcb]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r
emote_sockaddr+0x15d) [0x7ff4a5d3d64d]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address
_family+0x2c6) [0x7ff4a5d3d4b6]))) 0-dict: data is NULL
[2013-12-06 11:29:47.791174] E [name.c:141:client_fill_address_family]
0-sa_bookshelf-replace-brick: transport.address-family not specified.
Could not guess default value from (remote-host:(null) or
transport.unix.connect-path:(null)) options
[2013-12-06 11:29:50.791775] W [dict.c:995:data_to_str]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab
) [0x7ff4a5d35fcb]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r
emote_sockaddr+0x15d) [0x7ff4a5d3d64d]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address
_family+0x2bb) [0x7ff4a5d3d4ab]))) 0-dict: data is NULL
[2013-12-06 11:29:50.791986] W [dict.c:995:data_to_str]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab
) [0x7ff4a5d35fcb]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r
emote_sockaddr+0x15d) [0x7ff4a5d3d64d]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address
_family+0x2c6) [0x7ff4a5d3d4b6]))) 0-dict: data is NULL
[2013-12-06 11:29:50.792046] E [name.c:141:client_fill_address_family]
0-sa_bookshelf-replace-brick: transport.address-family not specified.
Could not guess default value from (remote-host:(null) or
transport.unix.connect-path:(null)) options
# gluster volume info
Volume Name: sa_bookshelf
Type: Distributed-Replicate
Volume ID: 74512f52-72ec-4538-9a54-4e50c4691722
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: storage-gfs-3-prd:/ydp/shared
Brick2: storage-gfs-4-prd:/ydp/shared
Brick3: storage-gfs-3-prd:/ydp/shared2
Brick4: storage-gfs-4-prd:/ydp/shared2
# gluster volume status
Status of volume: sa_bookshelf
Gluster process Port Online
Pid
------------------------------------------------------------------------
------
Brick storage-gfs-3-prd:/ydp/shared 24009 Y
758
Brick storage-gfs-4-prd:/ydp/shared 24009 Y
730
Brick storage-gfs-3-prd:/ydp/shared2 24010 Y
764
Brick storage-gfs-4-prd:/ydp/shared2 24010 Y
4578
NFS Server on localhost 38467 Y
770
Self-heal Daemon on localhost N/A Y
776
NFS Server on storage-1-saas 38467 Y
840
Self-heal Daemon on storage-1-saas N/A Y
846
NFS Server on storage-gfs-4-prd 38467 Y
4584
Self-heal Daemon on storage-gfs-4-prd N/A Y
4590
storage-gfs-3-prd:~# gluster peer status Number of Peers: 2
Hostname: storage-1-saas
Uuid: 37b9d881-ce24-4550-b9de-6b304d7e9d07
State: Peer in Cluster (Connected)
Hostname: storage-gfs-4-prd
Uuid: 4c384f45-873b-4c12-9683-903059132c56
State: Peer in Cluster (Connected)
(from storage-1-saas)# gluster peer status Number of Peers: 2
Hostname: 172.16.3.60
Uuid: 1441a7b0-09d2-4a40-a3ac-0d0e546f6884
State: Peer in Cluster (Connected)
Hostname: storage-gfs-4-prd
Uuid: 4c384f45-873b-4c12-9683-903059132c56
State: Peer in Cluster (Connected)
Clients work properly.
I googled for that but I found that was a bug but in 3.3.0 version. How
can I repair that and continue my migration? Thank You for any help.
BTW: I moved Gluster Server via Gluster 3.4: Brick Restoration - Replace
Crashed Server how to.
Regards,
Mariusz
------------------------------
Message: 43
Date: Tue, 10 Dec 2013 12:52:29 +0100
From: Johan Huysmans <johan.huysmans at inuits.be>
To: "gluster-users at gluster.org" <gluster-users at gluster.org>
Subject: Re: [Gluster-users] Structure needs cleaning on some files
Message-ID: <52A7007D.6020005 at inuits.be>
Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
Hi All,
It seems I can easily reproduce the problem.
* on node 1 create a file (touch , cat , ...).
* on node 2 take md5sum of direct file (md5sum /path/to/file)
* on node 1 move file to other name (mv file file1)
* on node 2 take md5sum of direct file (md5sum /path/to/file), this is
still working although the file is not really there
* on node 1 change file content
* on node 2 take md5sum of direct file (md5sum /path/to/file), this is
still working and has a changed md5sum
This is really strange behaviour.
Is this normal, can this be altered with a a setting?
Thanks for any info,
gr.
Johan
On 10-12-13 10:02, Johan Huysmans wrote:
> I could reproduce this problem with while my mount point is running in
> debug mode.
> logfile is attached.
>
> gr.
> Johan Huysmans
>
> On 10-12-13 09:30, Johan Huysmans wrote:
>> Hi All,
>>
>> When reading some files we get this error:
>> md5sum: /path/to/file.xml: Structure needs cleaning
>>
>> in /var/log/glusterfs/mnt-sharedfs.log we see these errors:
>> [2013-12-10 08:07:32.256910] W
>> [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0:
>> remote operation failed: No such file or directory
>> [2013-12-10 08:07:32.257436] W
>> [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-1:
>> remote operation failed: No such file or directory
>> [2013-12-10 08:07:32.259356] W [fuse-bridge.c:705:fuse_attr_cbk]
>> 0-glusterfs-fuse: 8230: STAT() /path/to/file.xml => -1 (Structure
>> needs cleaning)
>>
>> We are using gluster 3.4.1-3 on CentOS6.
>> Our servers are 64-bit, our clients 32-bit (we are already using
>> --enable-ino32 on the mountpoint)
>>
>> This is my gluster configuration:
>> Volume Name: testvolume
>> Type: Replicate
>> Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7
>> Status: Started
>> Number of Bricks: 1 x 2 = 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: SRV-1:/gluster/brick1
>> Brick2: SRV-2:/gluster/brick2
>> Options Reconfigured:
>> performance.force-readdirp: on
>> performance.stat-prefetch: off
>> network.ping-timeout: 5
>>
>> And this is how the applications work:
>> We have 2 client nodes who both have a fuse.glusterfs mountpoint.
>> On 1 client node we have a application which writes files.
>> On the other client node we have a application which reads these files.
>> On the node where the files are written we don't see any problem, and
>> can read that file without problems.
>> On the other node we have problems (error messages above) reading
>> that file.
>> The problem occurs when we perform a md5sum on the exact file, when
>> perform a md5sum on all files in that directory there is no problem.
>>
>>
>> How can we solve this problem as this is annoying.
>> The problem occurs after some time (can be days), an umount and mount
>> of the mountpoint solves it for some days.
>> Once it occurs (and we don't remount) it occurs every time.
>>
>>
>> I hope someone can help me with this problems.
>>
>> Thanks,
>> Johan Huysmans
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/32f9069c/attachment-0001.html
>
------------------------------
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users
End of Gluster-users Digest, Vol 68, Issue 11
*********************************************
**
This email and any attachments may contain information that is confidential and/or privileged for the sole use of the intended recipient. Any use, review, disclosure, copying, distribution or reliance by others, and any forwarding of this email or its contents, without the express permission of the sender is strictly prohibited by law. If you are not the intended recipient, please contact the sender immediately, delete the e-mail and destroy all copies.
**
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/d921f3e9/attachment.html>
More information about the Gluster-users
mailing list