[Gluster-users] Disbalanced load

Fri Sep 5 12:15:07 UTC 2014

Libvirt log does not contain anything related. The error messages go 
from dmesg of virtual machine.

What I can see in gluster logs is that connection between two peers was 
lost. The physical connection, however, worked all the time.

Dne 5.9.2014 0:47, Joe Julian napsal(a):
> That is about as far removed from anything useful for troubleshooting 
> as possible. You're reporting a symptom from within a virtualized 
> environment. It's the real systems that have the useful logs. Any 
> errors on the client or brick logs? Libvirt logs? dmesg on the server? 
> Is either cpu bound? In swap?
>
>
> On September 4, 2014 9:12:16 PM PDT, "Miloš Kozák" 
> <milos.kozak at lejmr.com> wrote:
>
>     Hi,
>
>     I ran few more tests. I moved a file which is an VM image onto GlusterFS
>     mount and along the load I got this on console of running VM:
>
>     lost page write due to I/O error on vda1
>     Buffer I/O error on device vda1, logical block 1049638
>     lost page write due to I/O error on vda1
>     Buffer I/O error on device vda1, logical block 1049646
>     lost page write due to I/O error on vda1
>     Buffer I/O error on device vda1, logical block 1049647
>     lost page write due to I/O error on vda1
>     Buffer I/O error on device vda1, logical block 1049649
>     lost page write due to I/O error on vda1
>     end_request: I/O error, dev vda, sector 8399688
>     end_request: I/O error, dev vda, sector 8399728
>     end_request: I/O error, dev vda, sector 8399736
>     end_request: I/O error, dev vda, sector 8399776
>     end_request: I/O error, dev vda, sector 8399792
>     __ratelimit: 5 callbacks suppressed
>     EXT4-fs error (device vda1):
>     ext4_find_entry: reading directory #398064
>     offset 0
>     EXT4-fs error (device vda1): ext4_find_entry: reading directory #398064
>     offset 0
>     EXT4-fs error (device vda1): ext4_find_entry: reading directory #132029
>     offset 0
>
>     Do you think it is related to options which are set to the volume?
>
>           storage.owner-gid: 498
>           storage.owner-uid: 498
>           network.ping-timeout: 2
>           performance.io  <http://performance.io>-thread-count: 3
>           cluster.server-quorum-type: server
>           network.remote-dio: enable
>           cluster.eager-lock: enable
>           performance.stat-prefetch: off
>           performance.io  <http://performance.io>-cache: off
>           performance.read-ahead: off
>           performance.quick-read: off
>
>     Thanks Milos
>
>
>     Dne 14-09-03 v 04:01 PM Milos Kozak napsal(a):
>
>         I have just tried to copy an VM image (raw) and causes the
>         same problem. I have GlusterFS 3.5.2 On 9/3/2014 9:14 AM,
>         Roman wrote:
>
>             Hi, I had some issues with files generated from /dev/zero
>             also. try real files or /dev/urandom :) I don't know, if
>             there is a real issue/bug with files generated from
>             /dev/zero ? Devs should check them out /me thinks.
>             2014-09-03 16:11 GMT+03:00 Milos Kozak
>             <milos.kozak at lejmr.com <mailto:milos.kozak at lejmr.com>>:
>             Hi, I am facing a quite strange problem when I do have two
>             servers with the same configuration and the same hardware.
>             Servers are connected by bonded 1GE. I have one volume:
>             [root at nodef02i 103]# gluster volume info Volume Name:
>             ph-fs-0 Type: Replicate Volume ID:
>             f8f569ea-e30c-43d0-bb94-__b2f1164a7c9a Status: Started
>             Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks:
>             Brick1: 10.11.100.1
>             <http://10.11.100.1>:/gfs/s3-sata-10k/__fs Brick2:
>             10.11.100.2 <http://10.11.100.2>:/gfs/s3-sata-10k/__fs
>             Options Reconfigured: storage.owner-gid: 498
>             storage.owner-uid: 498 network.ping-timeout: 2
>             performance.io <http://performance.io>-thread-count: 3
>             cluster.server-quorum-type: server network.remote-dio:
>             enable cluster.eager-lock: enable
>             performance.stat-prefetch: off performance.io
>             <http://performance.io>-cache: off performance.read-ahead:
>             off performance.quick-read: off Intended to host virtual
>             servers (KVM), the configuration is according to the
>             gluster blog. Currently I have got only one virtual server
>             deployed on top of this volume in order to see effects of
>             my stress tests. During the tests I write to the volume
>             mounted through FUSE by dd (currently on one writing at a
>             moment): dd if=/dev/zero of=test2.img bs=1M count=20000
>             conv=fdatasync Test 1) I run dd on nodef02i. Load on
>             nodef02i is max 1erl but on the nodef01i around 14erl (I
>             do have 12threads CPU). After the write is done the load
>             on nodef02i goes down, but the load goes up to 28erl on
>             nodef01i. 20minutes it stays the same. In the mean time I
>             can see: [root at nodef01i 103]# gluster volume heal ph-fs-0
>             info Volume ph-fs-0 is not started (Or) All the bricks are
>             not running. Volume heal failed [root at nodef02i 103]#
>             gluster volume heal ph-fs-0 info Brick
>             nodef01i.czprg:/gfs/s3-sata-__10k/fs/
>             /__3706a2cb0bb27ba5787b3c12388f4e__bb - Possibly
>             undergoing heal /test.img - Possibly undergoing heal
>             Number of entries: 2 Brick
>             nodef02i.czprg:/gfs/s3-sata-__10k/fs/
>             /__3706a2cb0bb27ba5787b3c12388f4e__bb - Possibly
>             undergoing heal /test.img - Possibly undergoing heal
>             Number of entries: 2 [root at nodef01i 103]# gluster volume
>             status Status of volume: ph-fs-0 Gluster process Port
>             Online Pid
>             ------------------------------------------------------------------------
>             Brick 10.11.100.1
>             <http://10.11.100.1>:/gfs/s3-sata-10k/__fs 49152 Y 56631
>             Brick 10.11.100.2
>             <http://10.11.100.2>:/gfs/s3-sata-10k/__fs 49152 Y 3372
>             NFS Server on localhost 2049 Y 56645 Self-heal Daemon on
>             localhost N/A Y 56649 NFS Server on 10.11.100.2
>             <http://10.11.100.2> 2049 Y 3386 Self-heal Daemon on
>             10.11.100.2 <http://10.11.100.2> N/A Y 3387 Task Status of
>             Volume ph-fs-0
>             ------------------------------------------------------------------------
>             There are no active volume tasks This very high load takes
>             another 20-30minutes. During the first test I restarted
>             glusterd service after 10minutes because everything seemed
>             to me that the service does not work, but I could see very
>             high load on the nodef01i. Consequently, the virtual
>             server yields errors about problems with EXT4 filesystem -
>             MySQL stops. When the load culminated I tried to run the
>             same test but from opposite direction. I wrote (dd) from
>             nodef01i - test2. Happened more or less the same. I gained
>             extremely high load on nodef01i and minimal load on
>             nodef02i. Outputs from heal were more or less the same.. I
>             would like to tweak this but I don´t know what I should
>             focus on. Thank you for help. Milos
>             ------------------------------------------------------------------------
>             Gluster-users mailing list Gluster-users at gluster.org
>             <mailto:Gluster-users at gluster.org>
>             http://supercolony.gluster.org/mailman/listinfo/gluster-users
>             -- Best regards, Roman. 
>
>         ------------------------------------------------------------------------
>         Gluster-users mailing list Gluster-users at gluster.org
>         http://supercolony.gluster.org/mailman/listinfo/gluster-users 
>
>
>     ------------------------------------------------------------------------
>
>     Gluster-users mailing list
>     Gluster-users at gluster.org
>     http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
> -- 
> Sent from my Android device with K-9 Mail. Please excuse my brevity. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140905/96d373f1/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test3-nodef02i.tar.bz2
Type: application/octet-stream
Size: 16962 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140905/96d373f1/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test3-nodef01i.tar.bz2
Type: application/octet-stream
Size: 23738 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140905/96d373f1/attachment-0001.obj>