[Gluster-users] Disbalanced load
Miloš Kozák
milos.kozak at lejmr.com
Fri Sep 5 12:15:07 UTC 2014
Libvirt log does not contain anything related. The error messages go
from dmesg of virtual machine.
What I can see in gluster logs is that connection between two peers was
lost. The physical connection, however, worked all the time.
Dne 5.9.2014 0:47, Joe Julian napsal(a):
> That is about as far removed from anything useful for troubleshooting
> as possible. You're reporting a symptom from within a virtualized
> environment. It's the real systems that have the useful logs. Any
> errors on the client or brick logs? Libvirt logs? dmesg on the server?
> Is either cpu bound? In swap?
>
>
> On September 4, 2014 9:12:16 PM PDT, "Miloš Kozák"
> <milos.kozak at lejmr.com> wrote:
>
> Hi,
>
> I ran few more tests. I moved a file which is an VM image onto GlusterFS
> mount and along the load I got this on console of running VM:
>
> lost page write due to I/O error on vda1
> Buffer I/O error on device vda1, logical block 1049638
> lost page write due to I/O error on vda1
> Buffer I/O error on device vda1, logical block 1049646
> lost page write due to I/O error on vda1
> Buffer I/O error on device vda1, logical block 1049647
> lost page write due to I/O error on vda1
> Buffer I/O error on device vda1, logical block 1049649
> lost page write due to I/O error on vda1
> end_request: I/O error, dev vda, sector 8399688
> end_request: I/O error, dev vda, sector 8399728
> end_request: I/O error, dev vda, sector 8399736
> end_request: I/O error, dev vda, sector 8399776
> end_request: I/O error, dev vda, sector 8399792
> __ratelimit: 5 callbacks suppressed
> EXT4-fs error (device vda1):
> ext4_find_entry: reading directory #398064
> offset 0
> EXT4-fs error (device vda1): ext4_find_entry: reading directory #398064
> offset 0
> EXT4-fs error (device vda1): ext4_find_entry: reading directory #132029
> offset 0
>
> Do you think it is related to options which are set to the volume?
>
> storage.owner-gid: 498
> storage.owner-uid: 498
> network.ping-timeout: 2
> performance.io <http://performance.io>-thread-count: 3
> cluster.server-quorum-type: server
> network.remote-dio: enable
> cluster.eager-lock: enable
> performance.stat-prefetch: off
> performance.io <http://performance.io>-cache: off
> performance.read-ahead: off
> performance.quick-read: off
>
> Thanks Milos
>
>
> Dne 14-09-03 v 04:01 PM Milos Kozak napsal(a):
>
> I have just tried to copy an VM image (raw) and causes the
> same problem. I have GlusterFS 3.5.2 On 9/3/2014 9:14 AM,
> Roman wrote:
>
> Hi, I had some issues with files generated from /dev/zero
> also. try real files or /dev/urandom :) I don't know, if
> there is a real issue/bug with files generated from
> /dev/zero ? Devs should check them out /me thinks.
> 2014-09-03 16:11 GMT+03:00 Milos Kozak
> <milos.kozak at lejmr.com <mailto:milos.kozak at lejmr.com>>:
> Hi, I am facing a quite strange problem when I do have two
> servers with the same configuration and the same hardware.
> Servers are connected by bonded 1GE. I have one volume:
> [root at nodef02i 103]# gluster volume info Volume Name:
> ph-fs-0 Type: Replicate Volume ID:
> f8f569ea-e30c-43d0-bb94-__b2f1164a7c9a Status: Started
> Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks:
> Brick1: 10.11.100.1
> <http://10.11.100.1>:/gfs/s3-sata-10k/__fs Brick2:
> 10.11.100.2 <http://10.11.100.2>:/gfs/s3-sata-10k/__fs
> Options Reconfigured: storage.owner-gid: 498
> storage.owner-uid: 498 network.ping-timeout: 2
> performance.io <http://performance.io>-thread-count: 3
> cluster.server-quorum-type: server network.remote-dio:
> enable cluster.eager-lock: enable
> performance.stat-prefetch: off performance.io
> <http://performance.io>-cache: off performance.read-ahead:
> off performance.quick-read: off Intended to host virtual
> servers (KVM), the configuration is according to the
> gluster blog. Currently I have got only one virtual server
> deployed on top of this volume in order to see effects of
> my stress tests. During the tests I write to the volume
> mounted through FUSE by dd (currently on one writing at a
> moment): dd if=/dev/zero of=test2.img bs=1M count=20000
> conv=fdatasync Test 1) I run dd on nodef02i. Load on
> nodef02i is max 1erl but on the nodef01i around 14erl (I
> do have 12threads CPU). After the write is done the load
> on nodef02i goes down, but the load goes up to 28erl on
> nodef01i. 20minutes it stays the same. In the mean time I
> can see: [root at nodef01i 103]# gluster volume heal ph-fs-0
> info Volume ph-fs-0 is not started (Or) All the bricks are
> not running. Volume heal failed [root at nodef02i 103]#
> gluster volume heal ph-fs-0 info Brick
> nodef01i.czprg:/gfs/s3-sata-__10k/fs/
> /__3706a2cb0bb27ba5787b3c12388f4e__bb - Possibly
> undergoing heal /test.img - Possibly undergoing heal
> Number of entries: 2 Brick
> nodef02i.czprg:/gfs/s3-sata-__10k/fs/
> /__3706a2cb0bb27ba5787b3c12388f4e__bb - Possibly
> undergoing heal /test.img - Possibly undergoing heal
> Number of entries: 2 [root at nodef01i 103]# gluster volume
> status Status of volume: ph-fs-0 Gluster process Port
> Online Pid
> ------------------------------------------------------------------------
> Brick 10.11.100.1
> <http://10.11.100.1>:/gfs/s3-sata-10k/__fs 49152 Y 56631
> Brick 10.11.100.2
> <http://10.11.100.2>:/gfs/s3-sata-10k/__fs 49152 Y 3372
> NFS Server on localhost 2049 Y 56645 Self-heal Daemon on
> localhost N/A Y 56649 NFS Server on 10.11.100.2
> <http://10.11.100.2> 2049 Y 3386 Self-heal Daemon on
> 10.11.100.2 <http://10.11.100.2> N/A Y 3387 Task Status of
> Volume ph-fs-0
> ------------------------------------------------------------------------
> There are no active volume tasks This very high load takes
> another 20-30minutes. During the first test I restarted
> glusterd service after 10minutes because everything seemed
> to me that the service does not work, but I could see very
> high load on the nodef01i. Consequently, the virtual
> server yields errors about problems with EXT4 filesystem -
> MySQL stops. When the load culminated I tried to run the
> same test but from opposite direction. I wrote (dd) from
> nodef01i - test2. Happened more or less the same. I gained
> extremely high load on nodef01i and minimal load on
> nodef02i. Outputs from heal were more or less the same.. I
> would like to tweak this but I don´t know what I should
> focus on. Thank you for help. Milos
> ------------------------------------------------------------------------
> Gluster-users mailing list Gluster-users at gluster.org
> <mailto:Gluster-users at gluster.org>
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> -- Best regards, Roman.
>
> ------------------------------------------------------------------------
> Gluster-users mailing list Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
> ------------------------------------------------------------------------
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140905/96d373f1/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test3-nodef02i.tar.bz2
Type: application/octet-stream
Size: 16962 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140905/96d373f1/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test3-nodef01i.tar.bz2
Type: application/octet-stream
Size: 23738 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140905/96d373f1/attachment-0001.obj>
More information about the Gluster-users
mailing list