Re: [Gluster-devel] successive bonnie++ tests taking longer and longer to run (system load steadily increasing)
Gordan Bobic
gordan at bobich.net
Tue Feb 23 10:05:00 UTC 2010
Sounds like you are seeing the same disconnection issues I reported a while back.
Gordan
"Daniel Maher" <dma+gluster at witbe.net> wrote:
>Raghavendra G wrote:
>> * What does the top output corresponding to glusterfs say? what is the
>> memory usage and cpu usage?
>> * Do you find anything interesting in glusterfs client log files? Can we
>> get the log files?
>
>Snapshot :
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 1185 root 20 0 2478m 909m 564 D 11.3 90.1 1064:50 glusterfs
>
>
>Over a monitored period of 30 minutes, the CPU usage hovered between 7
>and 12, and Mem usage hovered at 90 to 91.
>
>
>
>================================================================================
>Version : glusterfs 3.0.0 built on Feb 3 2010 14:39:23
>git: 2.0.1-886-g8379edd
>Starting Time: 2010-02-17 13:58:41
>Command line : /usr/sbin/glusterfs --log-level=NORMAL
>--volfile=/etc/glusterfs/replicated-tcp.vol /opt/gluster
>PID : 1185
>System name : Linux
>Nodename : 169.install.pxe
>Kernel Release : 2.6.26.8-57.fc8
>Hardware Identifier: i686
>
>Given volfile:
>+------------------------------------------------------------------------------+
> 1: ## file auto generated by /usr/bin/glusterfs-volgen (mount.vol)
> 2: # Cmd line:
> 3: # $ /usr/bin/glusterfs-volgen --name replicated --raid 1
>s01:/opt/gluster s02:/opt/gluster
> 4:
> 5: # RAID 1
> 6: # TRANSPORT-TYPE tcp
> 7: volume s01-1
> 8: type protocol/client
> 9: option transport-type tcp
> 10: option remote-host s01
> 11: option transport.socket.nodelay on
> 12: option transport.remote-port 6996
> 13: option remote-subvolume brick1
> 14: end-volume
> 15:
> 16: volume s02-1
> 17: type protocol/client
> 18: option transport-type tcp
> 19: option remote-host s02
> 20: option transport.socket.nodelay on
> 21: option transport.remote-port 6996
> 22: option remote-subvolume brick1
> 23: end-volume
> 24:
> 25: volume mirror-0
> 26: type cluster/replicate
> 27: subvolumes s01-1 s02-1
> 28: end-volume
> 29:
> 30: volume writebehind
> 31: type performance/write-behind
> 32: option cache-size 4MB
> 33: subvolumes mirror-0
> 34: end-volume
> 35:
> 36: volume readahead
> 37: type performance/read-ahead
> 38: option page-count 4
> 39: subvolumes writebehind
> 40: end-volume
> 41:
> 42: volume iocache
> 43: type performance/io-cache
> 44: option cache-size 1GB
> 45: option cache-timeout 1
> 46: subvolumes readahead
> 47: end-volume
> 48:
> 49: volume quickread
> 50: type performance/quick-read
> 51: option cache-timeout 1
> 52: option max-file-size 64kB
> 53: subvolumes iocache
> 54: end-volume
> 55:
> 56: volume statprefetch
> 57: type performance/stat-prefetch
> 58: subvolumes quickread
> 59: end-volume
> 60:
>
>+------------------------------------------------------------------------------+
>[2010-02-17 13:58:41] W [xlator.c:655:validate_xlator_volume_options]
>s02-1: option 'transport.remote-port' is deprecated, preferred is
>'remote-port', continuing with correction
>[2010-02-17 13:58:41] W [xlator.c:655:validate_xlator_volume_options]
>s01-1: option 'transport.remote-port' is deprecated, preferred is
>'remote-port', continuing with correction
>[2010-02-17 13:58:52] E [socket.c:760:socket_connect_finish] s01-1:
>connection to failed (Connection refused)
>[2010-02-17 13:58:52] E [socket.c:760:socket_connect_finish] s01-1:
>connection to failed (Connection refused)
>[2010-02-17 13:58:58] E [socket.c:760:socket_connect_finish] s02-1:
>connection to failed (No route to host)
>[2010-02-17 13:58:58] E [socket.c:760:socket_connect_finish] s02-1:
>connection to failed (No route to host)
>[2010-02-17 14:06:23] N [client-protocol.c:6224:client_setvolume_cbk]
>s01-1: Connected to 10.0.0.38:6996, attached to remote volume 'brick1'.
>[2010-02-17 14:06:23] N [afr.c:2625:notify] mirror-0: Subvolume 's01-1'
>came back up; going online.
>[2010-02-17 14:06:23] N [client-protocol.c:6224:client_setvolume_cbk]
>s01-1: Connected to 10.0.0.38:6996, attached to remote volume 'brick1'.
>[2010-02-17 14:06:23] N [afr.c:2625:notify] mirror-0: Subvolume 's01-1'
>came back up; going online.
>[2010-02-17 14:06:33] N [client-protocol.c:6224:client_setvolume_cbk]
>s02-1: Connected to 10.0.0.39:6996, attached to remote volume 'brick1'.
>[2010-02-17 14:06:33] N [client-protocol.c:6224:client_setvolume_cbk]
>s02-1: Connected to 10.0.0.39:6996, attached to remote volume 'brick1'.
>[2010-02-20 20:02:43] E
>[client-protocol.c:415:client_ping_timer_expired] s02-1: Server
>10.0.0.39:6996 has not responded in the last 42 seconds, disconnecting.
>[2010-02-20 20:02:47] E [saved-frames.c:165:saved_frames_unwind] s02-1:
>forced unwinding frame type(2) op(PING)
>[2010-02-20 20:02:48] N [client-protocol.c:6972:notify] s02-1: disconnected
>[2010-02-20 20:04:01] N [client-protocol.c:6224:client_setvolume_cbk]
>s02-1: Connected to 10.0.0.39:6996, attached to remote volume 'brick1'.
>[2010-02-20 20:04:01] N [client-protocol.c:6224:client_setvolume_cbk]
>s02-1: Connected to 10.0.0.39:6996, attached to remote volume 'brick1'.
>[2010-02-22 02:37:28] E
>[client-protocol.c:415:client_ping_timer_expired] s02-1: Server
>10.0.0.39:6996 has not responded in the last 42 seconds, disconnecting.
>[2010-02-22 02:37:56] E [saved-frames.c:165:saved_frames_unwind] s02-1:
>forced unwinding frame type(1) op(READ)
>[2010-02-22 02:37:56] E [saved-frames.c:165:saved_frames_unwind] s02-1:
>forced unwinding frame type(2) op(PING)
>[2010-02-22 02:37:56] N [client-protocol.c:6972:notify] s02-1: disconnected
>[2010-02-22 02:38:02] N [client-protocol.c:6224:client_setvolume_cbk]
>s02-1: Connected to 10.0.0.39:6996, attached to remote volume 'brick1'.
>[2010-02-22 02:38:02] N [client-protocol.c:6224:client_setvolume_cbk]
>s02-1: Connected to 10.0.0.39:6996, attached to remote volume 'brick1'.
>[2010-02-22 04:19:02] E
>[client-protocol.c:415:client_ping_timer_expired] s02-1: Server
>10.0.0.39:6996 has not responded in the last 42 seconds, disconnecting.
>[2010-02-22 04:19:50] E [saved-frames.c:165:saved_frames_unwind] s02-1:
>forced unwinding frame type(1) op(READ)
>[2010-02-22 04:19:50] E [saved-frames.c:165:saved_frames_unwind] s02-1:
>forced unwinding frame type(2) op(PING)
>[2010-02-22 04:19:51] N [client-protocol.c:6972:notify] s02-1: disconnected
>[2010-02-22 04:20:33] E
>[client-protocol.c:415:client_ping_timer_expired] s01-1: Server
>10.0.0.38:6996 has not responded in the last 42 seconds, disconnecting.
>[2010-02-22 04:20:34] E [saved-frames.c:165:saved_frames_unwind] s01-1:
>forced unwinding frame type(1) op(READ)
>[2010-02-22 04:20:41] E [saved-frames.c:165:saved_frames_unwind] s01-1:
>forced unwinding frame type(2) op(PING)
>[2010-02-22 04:20:41] N [client-protocol.c:6224:client_setvolume_cbk]
>s02-1: Connected to 10.0.0.39:6996, attached to remote volume 'brick1'.
>[2010-02-22 04:20:41] N [client-protocol.c:6972:notify] s01-1: disconnected
>[2010-02-22 04:20:41] N [client-protocol.c:6224:client_setvolume_cbk]
>s02-1: Connected to 10.0.0.39:6996, attached to remote volume 'brick1'.
>[2010-02-22 04:20:41] N [afr.c:2625:notify] mirror-0: Subvolume 's02-1'
>came back up; going online.
>[2010-02-22 04:20:41] N [client-protocol.c:6224:client_setvolume_cbk]
>s01-1: Connected to 10.0.0.38:6996, attached to remote volume 'brick1'.
>[2010-02-22 04:20:41] N [client-protocol.c:6224:client_setvolume_cbk]
>s01-1: Connected to 10.0.0.38:6996, attached to remote volume 'brick1'.
>
>
>That is the entire client log file. If it would be useful, i can stop
>the tests, put the Gluster logging into the debug level, and start
>again. Just let me know.
>
>
>--
>Daniel Maher <dma+gluster AT witbe DOT net>
>
>
>_______________________________________________
>Gluster-devel mailing list
>Gluster-devel at nongnu.org
>http://lists.nongnu.org/mailman/listinfo/gluster-devel
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
More information about the Gluster-devel
mailing list