[Gluster-users] libgfapi failover problem on replica bricks

Joe Julian joe at julianfamily.org
Wed Apr 9 14:24:53 UTC 2014


I've asked before and not had any luck getting someone to try this:

On the server you're about to reboot, "killall glusterfsd" and let me know if you still see the same problem at the client. 

On April 9, 2014 1:19:43 AM PDT, Fabio Rosati <fabio.rosati at geminformatica.it> wrote:
>Hi Paul,
>
>you're not alone. I get the same issue after rebooting a brick
>belonging to a 2 x 2 volume and the same is true for João P. and Nick
>M. (added in cc).
>
>[root at networker ~]# gluster volume info gv_pri
> 
>Volume Name: gv_pri
>Type: Distributed-Replicate
>Volume ID: 3d91b91e-4d72-484f-8655-e5ed8d38bb28
>Status: Started
>Number of Bricks: 2 x 2 = 4
>Transport-type: tcp
>Bricks:
>Brick1: nw1glus.gem.local:/glustexp/pri1/brick
>Brick2: nw2glus.gem.local:/glustexp/pri1/brick
>Brick3: nw3glus.gem.local:/glustexp/pri2/brick
>Brick4: nw4glus.gem.local:/glustexp/pri2/brick
>Options Reconfigured:
>storage.owner-gid: 107
>storage.owner-uid: 107
>server.allow-insecure: on
>network.remote-dio: on
>performance.write-behind-window-size: 16MB
>performance.cache-size: 128MB
>
>
>I hope someone will address this problem in the near future since not
>being able to shutdown a server hosting a brick is a big limitation.
>It seems someone solved the problem using cgroups:
>http://www.gluster.org/author/andrew-lau/
>Anyway, I think it's not easy to implement because cgroups is already
>configured and in use for libvirt, if I had a test environment and some
>spare time I would have tried.
>
>
>Regards,
>Fabio Rosati 
>
>
>----- Messaggio originale -----
>Da: "Paul Penev" <ppquant at gmail.com>
>A: Gluster-users at gluster.org
>Inviato: Domenica, 6 aprile 2014 17:52:53
>Oggetto: [Gluster-users] libgfapi failover problem on replica bricks
>
>Hello,
>
>I'm having an issue with rebooting bricks holding images for live KVM
>machines (using libgfapi).
>
>I have a replicated+distributed setup of 4 bricks (2x2). The cluster
>contains images for a couple of kvm virtual machines.
>
>My problem is that when I reboot a brick containing a an image of a
>VM, the VM will start throwing disk errors and eventually die.
>
>The gluster volume is made like this:
>
># gluster vol info pool
>
>Volume Name: pool
>Type: Distributed-Replicate
>Volume ID: xxxxxxxxxxxxxxxxxxxx
>Status: Started
>Number of Bricks: 2 x 2 = 4
>Transport-type: tcp
>Bricks:
>Brick1: srv10g:/data/gluster/brick
>Brick2: srv11g:/data/gluster/brick
>Brick3: srv12g:/data/gluster/brick
>Brick4: srv13g:/data/gluster/brick
>Options Reconfigured:
>network.ping-timeout: 10
>cluster.server-quorum-type: server
>diagnostics.client-log-level: WARNING
>auth.allow: 192.168.0.*,127.*
>nfs.disable: on
>
>The KVM instances run on the same gluster bricks, with disks mounted
>as :
>file=gluster://localhost/pool/images/vm-xxx-disk-1.raw,.......,cache=writethrough,aio=native
>
>My self-heal backlog is not always 0. It looks like some writes are
>not going to all bricks at the same time (?).
>
>gluster vol heal pool info
>
>sometime shows the images needing sync on one brick, the other or both.
>
>There are no network problems or errors on the wire.
>
>Any ideas what could be causing this ?
>
>Thanks.
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://supercolony.gluster.org/mailman/listinfo/gluster-users
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://supercolony.gluster.org/mailman/listinfo/gluster-users

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140409/af3889d0/attachment.html>


More information about the Gluster-users mailing list