[Gluster-users] one brick one volume process dies?

Thu Sep 28 16:05:18 UTC 2017

On 13/09/17 20:47, Ben Werthmann wrote:
> These symptoms appear to be the same as I've recorded in 
> this post:
>
> http://lists.gluster.org/pipermail/gluster-users/2017-September/032435.html
>
> On Wed, Sep 13, 2017 at 7:01 AM, Atin Mukherjee 
> <atin.mukherjee83 at gmail.com 
> <mailto:atin.mukherjee83 at gmail.com>> wrote:
>
>     Additionally the brick log file of the same brick
>     would be required. Please look for if brick process
>     went down or crashed. Doing a volume start force
>     should resolve the issue.
>

When I do: vol start force I see this between the lines:

[2017-09-28 16:00:55.120726] I [MSGID: 106568] 
[glusterd-proc-mgmt.c:87:glusterd_proc_stop] 0-management: 
Stopping glustershd daemon running in pid: 308300
[2017-09-28 16:00:55.128867] W [socket.c:593:__socket_rwv] 
0-glustershd: readv on 
/var/run/gluster/0853a4555820d3442b1c3909f1cb8466.socket 
failed (No data available)
[2017-09-28 16:00:56.122687] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: 
glustershd service is stopped

funnily(or not) I now see, a week after:

gluster vol status CYTO-DATA
Status of volume: CYTO-DATA
Gluster process                             TCP Port  RDMA 
Port Online  Pid
------------------------------------------------------------------------------
Brick 10.5.6.49:/__.aLocalStorages/0/0-GLUS
TERs/0GLUSTER-CYTO-DATA                     49161     0 
Y       1743719
Brick 10.5.6.100:/__.aLocalStorages/0/0-GLU
STERs/0GLUSTER-CYTO-DATA                    49152     0 
Y       20438
Brick 10.5.6.32:/__.aLocalStorages/0/0-GLUS
TERs/0GLUSTER-CYTO-DATA                     49152     0 
Y       5607
Self-heal Daemon on localhost               N/A       N/A 
Y       41106
Quota Daemon on localhost                   N/A       N/A 
Y       41117
Self-heal Daemon on 10.5.6.17               N/A       N/A 
Y       19088
Quota Daemon on 10.5.6.17                   N/A       N/A 
Y       19097
Self-heal Daemon on 10.5.6.32               N/A       N/A 
Y       1832978
Quota Daemon on 10.5.6.32                   N/A       N/A 
Y       1832987
Self-heal Daemon on 10.5.6.49               N/A       N/A 
Y       320291
Quota Daemon on 10.5.6.49                   N/A       N/A 
Y       320303

Task Status of Volume CYTO-DATA
------------------------------------------------------------------------------
There are no active volume tasks

$ gluster vol heal CYTO-DATA info
Brick 
10.5.6.49:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA
Status: Transport endpoint is not connected
Number of entries: -

Brick 
10.5.6.100:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA
....
....

>     On Wed, 13 Sep 2017 at 16:28, Gaurav Yadav
>     <gyadav at redhat.com <mailto:gyadav at redhat.com>> wrote:
>
>         Please send me the logs as well i.e glusterd.logs
>         and cmd_history.log.
>
>
>         On Wed, Sep 13, 2017 at 1:45 PM, lejeczek
>         <peljasz at yahoo.co.uk <mailto:peljasz at yahoo.co.uk>>
>         wrote:
>
>
>
>             On 13/09/17 06:21, Gaurav Yadav wrote:
>
>                 Please provide the output of gluster
>                 volume info, gluster volume status and
>                 gluster peer status.
>
>                 Apart  from above info, please provide
>                 glusterd logs, cmd_history.log.
>
>                 Thanks
>                 Gaurav
>
>                 On Tue, Sep 12, 2017 at 2:22 PM, lejeczek
>                 <peljasz at yahoo.co.uk
>                 <mailto:peljasz at yahoo.co.uk>
>                 <mailto:peljasz at yahoo.co.uk
>                 <mailto:peljasz at yahoo.co.uk>>> wrote:
>
>                     hi everyone
>
>                     I have 3-peer cluster with all vols in
>                 replica mode, 9
>                     vols.
>                     What I see, unfortunately, is one
>                 brick fails in one
>                     vol, when it happens it's always the
>                 same vol on the
>                     same brick.
>                     Command: gluster vol status $vol -
>                 would show brick
>                     not online.
>                     Restarting glusterd with systemclt
>                 does not help, only
>                     system reboot seem to help, until it
>                 happens, next time.
>
>                     How to troubleshoot this weird
>                 misbehaviour?
>                     many thanks, L.
>
>                     .
>                    
>                 _______________________________________________
>                     Gluster-users mailing list
>                 Gluster-users at gluster.org
>                 <mailto:Gluster-users at gluster.org>
>
>                     <mailto:Gluster-users at gluster.org
>                 <mailto:Gluster-users at gluster.org>>
>                 http://lists.gluster.org/mailman/listinfo/gluster-users
>                 <http://lists.gluster.org/mailman/listinfo/gluster-users>
>                    
>                 <http://lists.gluster.org/mailman/listinfo/gluster-users
>                 <http://lists.gluster.org/mailman/listinfo/gluster-users>>
>
>
>
>             hi, here:
>
>             $ gluster vol info C-DATA
>
>             Volume Name: C-DATA
>             Type: Replicate
>             Volume ID: 18ffba73-532e-4a4d-84da-fceea52f8c2e
>             Status: Started
>             Snapshot Count: 0
>             Number of Bricks: 1 x 3 = 3
>             Transport-type: tcp
>             Bricks:
>             Brick1:
>             10.5.6.49:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-C-DATA
>             Brick2:
>             10.5.6.100:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-C-DATA
>             Brick3:
>             10.5.6.32:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-C-DATA
>             Options Reconfigured:
>             performance.md-cache-timeout: 600
>             performance.cache-invalidation: on
>             performance.stat-prefetch: on
>             features.cache-invalidation-timeout: 600
>             features.cache-invalidation: on
>             performance.io-thread-count: 64
>             performance.cache-size: 128MB
>             cluster.self-heal-daemon: enable
>             features.quota-deem-statfs: on
>             changelog.changelog: on
>             geo-replication.ignore-pid-check: on
>             geo-replication.indexing: on
>             features.inode-quota: on
>             features.quota: on
>             performance.readdir-ahead: on
>             nfs.disable: on
>             transport.address-family: inet
>             performance.cache-samba-metadata: on
>
>
>             $ gluster vol status C-DATA
>             Status of volume: C-DATA
>             Gluster process TCP Port  RDMA Port Online  Pid
>             ------------------------------------------------------------------------------
>             Brick 10.5.6.49:/__.aLocalStorages/0/0-GLUS
>             TERs/0GLUSTER-C-DATA N/A       N/A N       N/A
>             Brick 10.5.6.100:/__.aLocalStorages/0/0-GLU
>             STERs/0GLUSTER-C-DATA 49152     0 Y       9376
>             Brick 10.5.6.32:/__.aLocalStorages/0/0-GLUS
>             TERs/0GLUSTER-C-DATA 49152     0 Y       8638
>             Self-heal Daemon on localhost N/A       N/A
>             Y       387879
>             Quota Daemon on localhost N/A       N/A
>             Y       387891
>             Self-heal Daemon on rider.private.ccnr.ceb.
>             private.cam.ac.uk <http://private.cam.ac.uk>
>             N/A       N/A Y       16439
>             Quota Daemon on rider.private.ccnr.ceb.priv
>             ate.cam.ac.uk <http://ate.cam.ac.uk> N/A      
>             N/A Y       16451
>             Self-heal Daemon on 10.5.6.32 N/A       N/A
>             Y       7708
>             Quota Daemon on 10.5.6.32 N/A       N/A
>             Y       8623
>             Self-heal Daemon on 10.5.6.17 N/A       N/A
>             Y       20549
>             Quota Daemon on 10.5.6.17 N/A       N/A
>             Y       9337
>
>             Task Status of Volume C-DATA
>             ------------------------------------------------------------------------------
>             There are no active volume tasks
>
>
>
>
>
>             .
>             _______________________________________________
>             Gluster-users mailing list
>             Gluster-users at gluster.org
>             <mailto:Gluster-users at gluster.org>
>             http://lists.gluster.org/mailman/listinfo/gluster-users
>             <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
>         _______________________________________________
>         Gluster-users mailing list
>         Gluster-users at gluster.org
>         <mailto:Gluster-users at gluster.org>
>         http://lists.gluster.org/mailman/listinfo/gluster-users
>         <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
>     -- 
>     --Atin
>
>     _______________________________________________
>     Gluster-users mailing list
>     Gluster-users at gluster.org
>     <mailto:Gluster-users at gluster.org>
>     http://lists.gluster.org/mailman/listinfo/gluster-users
>     <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
>

.