[Gluster-devel] successive bonnie++ tests taking longer and longer to run (system load steadily increasing)

Raghavendra G raghavendra at gluster.com
Tue Feb 23 11:22:54 UTC 2010


Hi Daniel,

Can you decrease the cache size in io-cache (say to 128MB)? If the files
being served are bigger than this size, you can as well remove io-cache from
configuration. Do let us know if this solves your issue.

regards,
On Tue, Feb 23, 2010 at 1:25 PM, Daniel Maher
<dma+gluster at witbe.net<dma%2Bgluster at witbe.net>
> wrote:

> Raghavendra G wrote:
>
>> * What does the top output corresponding to glusterfs say? what is the
>> memory usage and cpu usage?
>> * Do you find anything interesting in glusterfs client log files? Can we
>> get the log files?
>>
>
> Snapshot :
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  1185 root      20   0 2478m 909m  564 D 11.3 90.1   1064:50 glusterfs
>
>
> Over a monitored period of 30 minutes, the CPU usage hovered between 7 and
> 12, and Mem usage hovered at 90 to 91.
>
>
>
>
> ================================================================================
> Version      : glusterfs 3.0.0 built on Feb  3 2010 14:39:23
> git: 2.0.1-886-g8379edd
> Starting Time: 2010-02-17 13:58:41
> Command line : /usr/sbin/glusterfs --log-level=NORMAL
> --volfile=/etc/glusterfs/replicated-tcp.vol /opt/gluster
> PID          : 1185
> System name  : Linux
> Nodename     : 169.install.pxe
> Kernel Release : 2.6.26.8-57.fc8
> Hardware Identifier: i686
>
> Given volfile:
>
> +------------------------------------------------------------------------------+
>  1: ## file auto generated by /usr/bin/glusterfs-volgen (mount.vol)
>  2: # Cmd line:
>  3: # $ /usr/bin/glusterfs-volgen --name replicated --raid 1
> s01:/opt/gluster s02:/opt/gluster
>  4:
>  5: # RAID 1
>  6: # TRANSPORT-TYPE tcp
>  7: volume s01-1
>  8:     type protocol/client
>  9:     option transport-type tcp
>  10:     option remote-host s01
>  11:     option transport.socket.nodelay on
>  12:     option transport.remote-port 6996
>  13:     option remote-subvolume brick1
>  14: end-volume
>  15:
>  16: volume s02-1
>  17:     type protocol/client
>  18:     option transport-type tcp
>  19:     option remote-host s02
>  20:     option transport.socket.nodelay on
>  21:     option transport.remote-port 6996
>  22:     option remote-subvolume brick1
>  23: end-volume
>  24:
>  25: volume mirror-0
>  26:     type cluster/replicate
>  27:     subvolumes s01-1 s02-1
>  28: end-volume
>  29:
>  30: volume writebehind
>  31:     type performance/write-behind
>  32:     option cache-size 4MB
>  33:     subvolumes mirror-0
>  34: end-volume
>  35:
>  36: volume readahead
>  37:     type performance/read-ahead
>  38:     option page-count 4
>  39:     subvolumes writebehind
>  40: end-volume
>  41:
>  42: volume iocache
>  43:     type performance/io-cache
>  44:     option cache-size 1GB
>  45:     option cache-timeout 1
>  46:     subvolumes readahead
>  47: end-volume
>  48:
>  49: volume quickread
>  50:     type performance/quick-read
>  51:     option cache-timeout 1
>  52:     option max-file-size 64kB
>  53:     subvolumes iocache
>  54: end-volume
>  55:
>  56: volume statprefetch
>  57:     type performance/stat-prefetch
>  58:     subvolumes quickread
>  59: end-volume
>  60:
>
>
> +------------------------------------------------------------------------------+
> [2010-02-17 13:58:41] W [xlator.c:655:validate_xlator_volume_options]
> s02-1: option 'transport.remote-port' is deprecated, preferred is
> 'remote-port', continuing with correction
> [2010-02-17 13:58:41] W [xlator.c:655:validate_xlator_volume_options]
> s01-1: option 'transport.remote-port' is deprecated, preferred is
> 'remote-port', continuing with correction
> [2010-02-17 13:58:52] E [socket.c:760:socket_connect_finish] s01-1:
> connection to  failed (Connection refused)
> [2010-02-17 13:58:52] E [socket.c:760:socket_connect_finish] s01-1:
> connection to  failed (Connection refused)
> [2010-02-17 13:58:58] E [socket.c:760:socket_connect_finish] s02-1:
> connection to  failed (No route to host)
> [2010-02-17 13:58:58] E [socket.c:760:socket_connect_finish] s02-1:
> connection to  failed (No route to host)
> [2010-02-17 14:06:23] N [client-protocol.c:6224:client_setvolume_cbk]
> s01-1: Connected to 10.0.0.38:6996, attached to remote volume 'brick1'.
> [2010-02-17 14:06:23] N [afr.c:2625:notify] mirror-0: Subvolume 's01-1'
> came back up; going online.
> [2010-02-17 14:06:23] N [client-protocol.c:6224:client_setvolume_cbk]
> s01-1: Connected to 10.0.0.38:6996, attached to remote volume 'brick1'.
> [2010-02-17 14:06:23] N [afr.c:2625:notify] mirror-0: Subvolume 's01-1'
> came back up; going online.
> [2010-02-17 14:06:33] N [client-protocol.c:6224:client_setvolume_cbk]
> s02-1: Connected to 10.0.0.39:6996, attached to remote volume 'brick1'.
> [2010-02-17 14:06:33] N [client-protocol.c:6224:client_setvolume_cbk]
> s02-1: Connected to 10.0.0.39:6996, attached to remote volume 'brick1'.
> [2010-02-20 20:02:43] E [client-protocol.c:415:client_ping_timer_expired]
> s02-1: Server 10.0.0.39:6996 has not responded in the last 42 seconds,
> disconnecting.
> [2010-02-20 20:02:47] E [saved-frames.c:165:saved_frames_unwind] s02-1:
> forced unwinding frame type(2) op(PING)
> [2010-02-20 20:02:48] N [client-protocol.c:6972:notify] s02-1: disconnected
> [2010-02-20 20:04:01] N [client-protocol.c:6224:client_setvolume_cbk]
> s02-1: Connected to 10.0.0.39:6996, attached to remote volume 'brick1'.
> [2010-02-20 20:04:01] N [client-protocol.c:6224:client_setvolume_cbk]
> s02-1: Connected to 10.0.0.39:6996, attached to remote volume 'brick1'.
> [2010-02-22 02:37:28] E [client-protocol.c:415:client_ping_timer_expired]
> s02-1: Server 10.0.0.39:6996 has not responded in the last 42 seconds,
> disconnecting.
> [2010-02-22 02:37:56] E [saved-frames.c:165:saved_frames_unwind] s02-1:
> forced unwinding frame type(1) op(READ)
> [2010-02-22 02:37:56] E [saved-frames.c:165:saved_frames_unwind] s02-1:
> forced unwinding frame type(2) op(PING)
> [2010-02-22 02:37:56] N [client-protocol.c:6972:notify] s02-1: disconnected
> [2010-02-22 02:38:02] N [client-protocol.c:6224:client_setvolume_cbk]
> s02-1: Connected to 10.0.0.39:6996, attached to remote volume 'brick1'.
> [2010-02-22 02:38:02] N [client-protocol.c:6224:client_setvolume_cbk]
> s02-1: Connected to 10.0.0.39:6996, attached to remote volume 'brick1'.
> [2010-02-22 04:19:02] E [client-protocol.c:415:client_ping_timer_expired]
> s02-1: Server 10.0.0.39:6996 has not responded in the last 42 seconds,
> disconnecting.
> [2010-02-22 04:19:50] E [saved-frames.c:165:saved_frames_unwind] s02-1:
> forced unwinding frame type(1) op(READ)
> [2010-02-22 04:19:50] E [saved-frames.c:165:saved_frames_unwind] s02-1:
> forced unwinding frame type(2) op(PING)
> [2010-02-22 04:19:51] N [client-protocol.c:6972:notify] s02-1: disconnected
> [2010-02-22 04:20:33] E [client-protocol.c:415:client_ping_timer_expired]
> s01-1: Server 10.0.0.38:6996 has not responded in the last 42 seconds,
> disconnecting.
> [2010-02-22 04:20:34] E [saved-frames.c:165:saved_frames_unwind] s01-1:
> forced unwinding frame type(1) op(READ)
> [2010-02-22 04:20:41] E [saved-frames.c:165:saved_frames_unwind] s01-1:
> forced unwinding frame type(2) op(PING)
> [2010-02-22 04:20:41] N [client-protocol.c:6224:client_setvolume_cbk]
> s02-1: Connected to 10.0.0.39:6996, attached to remote volume 'brick1'.
> [2010-02-22 04:20:41] N [client-protocol.c:6972:notify] s01-1: disconnected
> [2010-02-22 04:20:41] N [client-protocol.c:6224:client_setvolume_cbk]
> s02-1: Connected to 10.0.0.39:6996, attached to remote volume 'brick1'.
> [2010-02-22 04:20:41] N [afr.c:2625:notify] mirror-0: Subvolume 's02-1'
> came back up; going online.
> [2010-02-22 04:20:41] N [client-protocol.c:6224:client_setvolume_cbk]
> s01-1: Connected to 10.0.0.38:6996, attached to remote volume 'brick1'.
> [2010-02-22 04:20:41] N [client-protocol.c:6224:client_setvolume_cbk]
> s01-1: Connected to 10.0.0.38:6996, attached to remote volume 'brick1'.
>
>
> That is the entire client log file.  If it would be useful, i can stop the
> tests, put the Gluster logging into the debug level, and start again.  Just
> let me know.
>
>
>
> --
> Daniel Maher <dma+gluster AT witbe DOT net>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>



-- 
Raghavendra G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20100223/ad6627b9/attachment-0003.html>


More information about the Gluster-devel mailing list