[Gluster-users] 3.6.6 issues

Tue Oct 20 15:16:13 UTC 2015

Ben,

It is not still in this state.  It seems to have settled out over the 
weekend and is behaving normally.  I am guessing here, but I think the 
issue might have been a problem with 3.6.3 clients connecting to the 
3.6.6 server.  We have upgraded all of our clients to 3.6.6 and are no 
longer having the problem.

Is there a way to check the client version of all clients connected to a 
gluster server?

David

------ Original Message ------
From: "Ben Turner" <bturner at redhat.com>
To: "David Robinson" <drobinson at corvidtec.com>
Cc: gluster-users at gluster.org; "Gluster Devel" 
<gluster-devel at gluster.org>
Sent: 10/19/2015 8:20:10 PM
Subject: Re: [Gluster-users] 3.6.6 issues

>Hi David.  Is the cluster still in this state?  If so can you grab a 
>couple stack traces from the offending brick (gfs01a) process with 
>gstack?  Make sure that its the brick process spinning your CPUs with 
>top or something, we want to be sure the stack traces are from the 
>offending process.  That will give us an idea of what it is chewing on. 
>  Other than that maybe you could take a couple sosreports on the 
>servers and open a BZ.  It may be a good idea to roll back versions 
>until we can get this sorted, I don't know how long you can have the 
>cluster in this state.  Once you get a bugzilla open I'll try to repro 
>what you are seeing to see if this is reproducible.
>
>-b
>
>----- Original Message -----
>>  From: "David Robinson" <david.robinson at corvidtec.com>
>>  To: gluster-users at gluster.org, "Gluster Devel" 
>><gluster-devel at gluster.org>
>>  Sent: Saturday, October 17, 2015 12:19:36 PM
>>  Subject: [Gluster-users] 3.6.6 issues
>>
>>  I upgraded my storage server from 3.6.3 to 3.6.6 and am now having 
>>issues. My
>>  setup (4x2) is shown below. One of the bricks (gfs01a) has a very 
>>high
>>  cpu-load even though the load on the other 3-bricks (gfs01b, gfs02a, 
>>gfs02b)
>>  is almost zero. The FUSE mounted partition is extremely slow and 
>>basically
>>  unuseable since the upgrade. I am getting a lot of the messages shown 
>>below
>>  in the logs on gfs01a and gfs01b. Nothing out of the ordinary is 
>>showing up
>>  on the gfs02a/gfs02b bricks.
>>  Can someone help?
>>  [root at gfs01b glusterfs]# gluster volume info homegfs
>>
>>  Volume Name: homegfs
>>  Type: Distributed-Replicate
>>  Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
>>  Status: Started
>>  Number of Bricks: 4 x 2 = 8
>>  Transport-type: tcp
>>  Bricks:
>>  Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
>>  Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
>>  Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
>>  Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
>>  Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs
>>  Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs
>>  Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs
>>  Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs
>>  Options Reconfigured:
>>  changelog.rollover-time: 15
>>  changelog.fsync-interval: 3
>>  changelog.changelog: on
>>  geo-replication.ignore-pid-check: on
>>  geo-replication.indexing: off
>>  storage.owner-gid: 100
>>  network.ping-timeout: 10
>>  server.allow-insecure: on
>>  performance.write-behind-window-size: 128MB
>>  performance.cache-size: 128MB
>>  performance.io-thread-count: 32
>>  server.manage-gids: on
>>  [root@ gfs01a glusterfs]# tail -f cli.log
>>  [2015-10-17 16:05:44.299933] I [socket.c:2353:socket_event_handler]
>>  0-transport: disconnecting now
>>  [2015-10-17 16:05:44.331233] I [input.c:36:cli_batch] 0-: Exiting 
>>with: 0
>>  [2015-10-17 16:06:33.397631] I [socket.c:2353:socket_event_handler]
>>  0-transport: disconnecting now
>>  [2015-10-17 16:06:33.432970] I [input.c:36:cli_batch] 0-: Exiting 
>>with: 0
>>  [2015-10-17 16:11:22.441290] I [socket.c:2353:socket_event_handler]
>>  0-transport: disconnecting now
>>  [2015-10-17 16:11:22.472227] I [input.c:36:cli_batch] 0-: Exiting 
>>with: 0
>>  [2015-10-17 16:15:44.176391] I [socket.c:2353:socket_event_handler]
>>  0-transport: disconnecting now
>>  [2015-10-17 16:15:44.205064] I [input.c:36:cli_batch] 0-: Exiting 
>>with: 0
>>  [2015-10-17 16:16:33.366424] I [socket.c:2353:socket_event_handler]
>>  0-transport: disconnecting now
>>  [2015-10-17 16:16:33.377160] I [input.c:36:cli_batch] 0-: Exiting 
>>with: 0
>>  [root@ gfs01a glusterfs]# tail etc-glusterfs-glusterd.vol.log
>>  [2015-10-17 15:56:33.177207] I
>>  [glusterd-handler.c:3836:__glusterd_handle_status_volume] 
>>0-management:
>>  Received status volume req for volume Source
>>  [2015-10-17 16:01:22.303635] I
>>  [glusterd-handler.c:3836:__glusterd_handle_status_volume] 
>>0-management:
>>  Received status volume req for volume Software
>>  [2015-10-17 16:05:44.320555] I
>>  [glusterd-handler.c:3836:__glusterd_handle_status_volume] 
>>0-management:
>>  Received status volume req for volume homegfs
>>  [2015-10-17 16:06:17.204783] W [rpcsvc.c:254:rpcsvc_program_actor]
>>  0-rpc-service: RPC program not available (req 1298437 330)
>>  [2015-10-17 16:06:17.204811] E 
>>[rpcsvc.c:544:rpcsvc_check_and_reply_error]
>>  0-rpcsvc: rpc actor failed to complete successfully
>>  [2015-10-17 16:06:33.408695] I
>>  [glusterd-handler.c:3836:__glusterd_handle_status_volume] 
>>0-management:
>>  Received status volume req for volume Source
>>  [2015-10-17 16:11:22.462374] I
>>  [glusterd-handler.c:3836:__glusterd_handle_status_volume] 
>>0-management:
>>  Received status volume req for volume Software
>>  [2015-10-17 16:12:30.608092] E 
>>[glusterd-op-sm.c:207:glusterd_get_txn_opinfo]
>>  0-: Unable to get transaction opinfo for transaction ID :
>>  d143b66b-2ac9-4fd9-8635-fe1eed41d56b
>>  [2015-10-17 16:15:44.198292] I
>>  [glusterd-handler.c:3836:__glusterd_handle_status_volume] 
>>0-management:
>>  Received status volume req for volume homegfs
>>  [2015-10-17 16:16:33.368170] I
>>  [glusterd-handler.c:3836:__glusterd_handle_status_volume] 
>>0-management:
>>  Received status volume req for volume Source
>>  [root@ gfs01b glusterfs]# tail -f glustershd.log
>>  [2015-10-17 16:11:45.996447] I
>>  [afr-self-heal-metadata.c:54:__afr_selfheal_metadata_do]
>>  0-homegfs-replicate-1: performing metadata selfheal on
>>  0a65d73a-a416-418e-92f0-5cec7d240433
>>  [2015-10-17 16:11:46.030947] I 
>>[afr-self-heal-common.c:476:afr_log_selfheal]
>>  0-homegfs-replicate-1: Completed metadata selfheal on
>>  0a65d73a-a416-418e-92f0-5cec7d240433. source=1 sinks=0
>>  [2015-10-17 16:11:46.031241] W 
>>[client-rpc-fops.c:2772:client3_3_lookup_cbk]
>>  0-homegfs-client-3: remote operation failed: No such file or 
>>directory.
>>  Path: <gfid:d2714957-0c83-4ab2-8cfc-1931c8e9d0bf>
>>  (d2714957-0c83-4ab2-8cfc-1931c8e9d0bf)
>>  [2015-10-17 16:11:46.031633] W 
>>[client-rpc-fops.c:2772:client3_3_lookup_cbk]
>>  0-homegfs-client-3: remote operation failed: No such file or 
>>directory.
>>  Path: <gfid:87c5f875-c3e7-4b14-807a-4e6d940750fc>
>>  (87c5f875-c3e7-4b14-807a-4e6d940750fc)
>>  [2015-10-17 16:11:47.043367] W 
>>[client-rpc-fops.c:2772:client3_3_lookup_cbk]
>>  0-homegfs-client-3: remote operation failed: No such file or 
>>directory.
>>  Path: <gfid:d2714957-0c83-4ab2-8cfc-1931c8e9d0bf>
>>  (d2714957-0c83-4ab2-8cfc-1931c8e9d0bf)
>>  [2015-10-17 16:11:47.054199] W 
>>[client-rpc-fops.c:2772:client3_3_lookup_cbk]
>>  0-homegfs-client-3: remote operation failed: No such file or 
>>directory.
>>  Path: <gfid:87c5f875-c3e7-4b14-807a-4e6d940750fc>
>>  (87c5f875-c3e7-4b14-807a-4e6d940750fc)
>>  [2015-10-17 16:12:48.001869] W 
>>[client-rpc-fops.c:2772:client3_3_lookup_cbk]
>>  0-homegfs-client-3: remote operation failed: No such file or 
>>directory.
>>  Path: <gfid:d2714957-0c83-4ab2-8cfc-1931c8e9d0bf>
>>  (d2714957-0c83-4ab2-8cfc-1931c8e9d0bf)
>>  [2015-10-17 16:12:48.012671] W 
>>[client-rpc-fops.c:2772:client3_3_lookup_cbk]
>>  0-homegfs-client-3: remote operation failed: No such file or 
>>directory.
>>  Path: <gfid:87c5f875-c3e7-4b14-807a-4e6d940750fc>
>>  (87c5f875-c3e7-4b14-807a-4e6d940750fc)
>>  [2015-10-17 16:13:49.011591] W 
>>[client-rpc-fops.c:2772:client3_3_lookup_cbk]
>>  0-homegfs-client-3: remote operation failed: No such file or 
>>directory.
>>  Path: <gfid:d2714957-0c83-4ab2-8cfc-1931c8e9d0bf>
>>  (d2714957-0c83-4ab2-8cfc-1931c8e9d0bf)
>>  [2015-10-17 16:13:49.018600] W 
>>[client-rpc-fops.c:2772:client3_3_lookup_cbk]
>>  0-homegfs-client-3: remote operation failed: No such file or 
>>directory.
>>  Path: <gfid:87c5f875-c3e7-4b14-807a-4e6d940750fc>
>>  (87c5f875-c3e7-4b14-807a-4e6d940750fc)
>>  [root@ gfs01b glusterfs]# tail cli.log
>>  [2015-10-16 10:52:16.002922] I [input.c:36:cli_batch] 0-: Exiting 
>>with: 0
>>  [2015-10-16 10:52:16.167432] I [socket.c:2353:socket_event_handler]
>>  0-transport: disconnecting now
>>  [2015-10-16 10:52:18.248024] I [input.c:36:cli_batch] 0-: Exiting 
>>with: 0
>>  [2015-10-17 16:12:30.607603] I [socket.c:2353:socket_event_handler]
>>  0-transport: disconnecting now
>>  [2015-10-17 16:12:30.628810] I [input.c:36:cli_batch] 0-: Exiting 
>>with: 0
>>  [2015-10-17 16:12:33.992818] I [socket.c:2353:socket_event_handler]
>>  0-transport: disconnecting now
>>  [2015-10-17 16:12:33.998944] I [input.c:36:cli_batch] 0-: Exiting 
>>with: 0
>>  [2015-10-17 16:12:38.604461] I [socket.c:2353:socket_event_handler]
>>  0-transport: disconnecting now
>>  [2015-10-17 16:12:38.605532] I 
>>[cli-rpc-ops.c:588:gf_cli_get_volume_cbk]
>>  0-cli: Received resp to get vol: 0
>>  [2015-10-17 16:12:38.605659] I [input.c:36:cli_batch] 0-: Exiting 
>>with: 0
>>  [root@ gfs01b glusterfs]# tail etc-glusterfs-glusterd.vol.log
>>  [2015-10-16 14:29:56.495120] E [rpcsvc.c:617:rpcsvc_handle_rpc_call]
>>  0-rpc-service: Request received from non-privileged port. Failing 
>>request
>>  [2015-10-16 14:29:59.369109] E [rpcsvc.c:617:rpcsvc_handle_rpc_call]
>>  0-rpc-service: Request received from non-privileged port. Failing 
>>request
>>  [2015-10-16 14:29:59.512093] E [rpcsvc.c:617:rpcsvc_handle_rpc_call]
>>  0-rpc-service: Request received from non-privileged port. Failing 
>>request
>>  [2015-10-16 14:30:02.383574] E [rpcsvc.c:617:rpcsvc_handle_rpc_call]
>>  0-rpc-service: Request received from non-privileged port. Failing 
>>request
>>  [2015-10-16 14:30:02.529206] E [rpcsvc.c:617:rpcsvc_handle_rpc_call]
>>  0-rpc-service: Request received from non-privileged port. Failing 
>>request
>>  [2015-10-16 16:01:20.389100] E [rpcsvc.c:617:rpcsvc_handle_rpc_call]
>>  0-rpc-service: Request received from non-privileged port. Failing 
>>request
>>  [2015-10-17 16:12:30.611161] W
>>  [glusterd-op-sm.c:4066:glusterd_op_modify_op_ctx] 0-management: 
>>op_ctx
>>  modification failed
>>  [2015-10-17 16:12:30.612433] I
>>  [glusterd-handler.c:3836:__glusterd_handle_status_volume] 
>>0-management:
>>  Received status volume req for volume Software
>>  [2015-10-17 16:12:30.618444] I
>>  [glusterd-handler.c:3836:__glusterd_handle_status_volume] 
>>0-management:
>>  Received status volume req for volume Source
>>  [2015-10-17 16:12:30.624005] I
>>  [glusterd-handler.c:3836:__glusterd_handle_status_volume] 
>>0-management:
>>  Received status volume req for volume homegfs
>>  [2015-10-17 16:12:33.993869] I
>>  [glusterd-handler.c:3836:__glusterd_handle_status_volume] 
>>0-management:
>>  Received status volume req for volume homegfs
>>  [2015-10-17 16:12:38.605389] I
>>  [glusterd-handler.c:1296:__glusterd_handle_cli_get_volume] 
>>0-glusterd:
>>  Received get vol req
>>  [root at gfs01b glusterfs]# gluster volume status homegfs
>>  Status of volume: homegfs
>>  Gluster process Port Online Pid
>>  
>>------------------------------------------------------------------------------
>>  Brick gfsib01a.corvidtec.com:/data/brick01a/homegfs 49152 Y 3820
>>  Brick gfsib01b.corvidtec.com:/data/brick01b/homegfs 49152 Y 3808
>>  Brick gfsib01a.corvidtec.com:/data/brick02a/homegfs 49153 Y 3825
>>  Brick gfsib01b.corvidtec.com:/data/brick02b/homegfs 49153 Y 3813
>>  Brick gfsib02a.corvidtec.com:/data/brick01a/homegfs 49152 Y 3967
>>  Brick gfsib02b.corvidtec.com:/data/brick01b/homegfs 49152 Y 3952
>>  Brick gfsib02a.corvidtec.com:/data/brick02a/homegfs 49153 Y 3972
>>  Brick gfsib02b.corvidtec.com:/data/brick02b/homegfs 49153 Y 3957
>>  NFS Server on localhost 2049 Y 3822
>>  Self-heal Daemon on localhost N/A Y 3827
>>  NFS Server on 10.200.70.1 2049 Y 3834
>>  Self-heal Daemon on 10.200.70.1 N/A Y 3839
>>  NFS Server on gfsib02a.corvidtec.com 2049 Y 3981
>>  Self-heal Daemon on gfsib02a.corvidtec.com N/A Y 3986
>>  NFS Server on gfsib02b.corvidtec.com 2049 Y 3966
>>  Self-heal Daemon on gfsib02b.corvidtec.com N/A Y 3971
>>
>>  Task Status of Volume homegfs
>>  
>>------------------------------------------------------------------------------
>>  Task : Rebalance
>>  ID : 58b6cc76-c29c-4695-93fe-c42b1112e171
>>  Status : completed
>>
>>
>>
>>  ========================
>>
>>
>>
>>  David F. Robinson, Ph.D.
>>
>>  President - Corvid Technologies
>>
>>  145 Overhill Drive
>>
>>  Mooresville, NC 28117
>>
>>  704.799.6944 x101 [Office]
>>
>>  704.252.1310 [Cell]
>>
>>  704.799.7974 [Fax]
>>
>>  d avid.robinson at corvidtec.com
>>
>>  http://www.corvidtec.com
>>
>>  _______________________________________________
>>  Gluster-users mailing list
>>  Gluster-users at gluster.org
>>  http://www.gluster.org/mailman/listinfo/gluster-users