[Gluster-users] one brick one volume process dies?

Thu Sep 28 16:14:26 UTC 2017

On 28/09/17 17:05, lejeczek wrote:
>
>
> On 13/09/17 20:47, Ben Werthmann wrote:
>> These symptoms appear to be the same as I've recorded in 
>> this post:
>>
>> http://lists.gluster.org/pipermail/gluster-users/2017-September/032435.html 
>>
>>
>> On Wed, Sep 13, 2017 at 7:01 AM, Atin Mukherjee 
>> <atin.mukherjee83 at gmail.com 
>> <mailto:atin.mukherjee83 at gmail.com>> wrote:
>>
>>     Additionally the brick log file of the same brick
>>     would be required. Please look for if brick process
>>     went down or crashed. Doing a volume start force
>>     should resolve the issue.
>>
>
> When I do: vol start force I see this between the lines:
>
> [2017-09-28 16:00:55.120726] I [MSGID: 106568] 
> [glusterd-proc-mgmt.c:87:glusterd_proc_stop] 0-management: 
> Stopping glustershd daemon running in pid: 308300
> [2017-09-28 16:00:55.128867] W [socket.c:593:__socket_rwv] 
> 0-glustershd: readv on 
> /var/run/gluster/0853a4555820d3442b1c3909f1cb8466.socket 
> failed (No data available)
> [2017-09-28 16:00:56.122687] I [MSGID: 106568] 
> [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: 
> glustershd service is stopped
>
> funnily(or not) I now see, a week after:
>
> gluster vol status CYTO-DATA
> Status of volume: CYTO-DATA
> Gluster process                             TCP Port  RDMA 
> Port Online  Pid
> ------------------------------------------------------------------------------ 
>
> Brick 10.5.6.49:/__.aLocalStorages/0/0-GLUS
> TERs/0GLUSTER-CYTO-DATA                     49161     0 Y 
> 1743719
> Brick 10.5.6.100:/__.aLocalStorages/0/0-GLU
> STERs/0GLUSTER-CYTO-DATA                    49152     0 Y 
> 20438
> Brick 10.5.6.32:/__.aLocalStorages/0/0-GLUS
> TERs/0GLUSTER-CYTO-DATA                     49152     0 Y 
> 5607
> Self-heal Daemon on localhost               N/A       N/A 
> Y 41106
> Quota Daemon on localhost                   N/A       N/A 
> Y 41117
> Self-heal Daemon on 10.5.6.17               N/A       N/A 
> Y 19088
> Quota Daemon on 10.5.6.17                   N/A       N/A 
> Y 19097
> Self-heal Daemon on 10.5.6.32               N/A       N/A 
> Y 1832978
> Quota Daemon on 10.5.6.32                   N/A       N/A 
> Y 1832987
> Self-heal Daemon on 10.5.6.49               N/A       N/A 
> Y 320291
> Quota Daemon on 10.5.6.49                   N/A       N/A 
> Y 320303
>
> Task Status of Volume CYTO-DATA
> ------------------------------------------------------------------------------ 
>
> There are no active volume tasks
>
>
> $ gluster vol heal CYTO-DATA info
> Brick 
> 10.5.6.49:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA
> Status: Transport endpoint is not connected
> Number of entries: -
>
> Brick 
> 10.5.6.100:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA
> ...
> ...
>

And if I trace pid 1743719, yes, it's up & running but that 
port - 49161 - is not open.
I do not see any segfaults nor obvious crashes.

>
>>     On Wed, 13 Sep 2017 at 16:28, Gaurav Yadav
>>     <gyadav at redhat.com <mailto:gyadav at redhat.com>> wrote:
>>
>>         Please send me the logs as well i.e glusterd.logs
>>         and cmd_history.log.
>>
>>
>>         On Wed, Sep 13, 2017 at 1:45 PM, lejeczek
>>         <peljasz at yahoo.co.uk <mailto:peljasz at yahoo.co.uk>>
>>         wrote:
>>
>>
>>
>>             On 13/09/17 06:21, Gaurav Yadav wrote:
>>
>>                 Please provide the output of gluster
>>                 volume info, gluster volume status and
>>                 gluster peer status.
>>
>>                 Apart  from above info, please provide
>>                 glusterd logs, cmd_history.log.
>>
>>                 Thanks
>>                 Gaurav
>>
>>                 On Tue, Sep 12, 2017 at 2:22 PM, lejeczek
>>                 <peljasz at yahoo.co.uk
>>                 <mailto:peljasz at yahoo.co.uk>
>>                 <mailto:peljasz at yahoo.co.uk
>>                 <mailto:peljasz at yahoo.co.uk>>> wrote:
>>
>>                     hi everyone
>>
>>                     I have 3-peer cluster with all vols in
>>                 replica mode, 9
>>                     vols.
>>                     What I see, unfortunately, is one
>>                 brick fails in one
>>                     vol, when it happens it's always the
>>                 same vol on the
>>                     same brick.
>>                     Command: gluster vol status $vol -
>>                 would show brick
>>                     not online.
>>                     Restarting glusterd with systemclt
>>                 does not help, only
>>                     system reboot seem to help, until it
>>                 happens, next time.
>>
>>                     How to troubleshoot this weird
>>                 misbehaviour?
>>                     many thanks, L.
>>
>>                     .
>>
>>                 
>> _______________________________________________
>>                     Gluster-users mailing list
>>                 Gluster-users at gluster.org
>>                 <mailto:Gluster-users at gluster.org>
>>
>>                     <mailto:Gluster-users at gluster.org
>>                 <mailto:Gluster-users at gluster.org>>
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>
>> <http://lists.gluster.org/mailman/listinfo/gluster-users
>> <http://lists.gluster.org/mailman/listinfo/gluster-users>>
>>
>>
>>
>>             hi, here:
>>
>>             $ gluster vol info C-DATA
>>
>>             Volume Name: C-DATA
>>             Type: Replicate
>>             Volume ID: 18ffba73-532e-4a4d-84da-fceea52f8c2e
>>             Status: Started
>>             Snapshot Count: 0
>>             Number of Bricks: 1 x 3 = 3
>>             Transport-type: tcp
>>             Bricks:
>>             Brick1:
>> 10.5.6.49:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-C-DATA
>>             Brick2:
>> 10.5.6.100:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-C-DATA
>>             Brick3:
>> 10.5.6.32:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-C-DATA
>>             Options Reconfigured:
>>             performance.md-cache-timeout: 600
>>             performance.cache-invalidation: on
>>             performance.stat-prefetch: on
>>             features.cache-invalidation-timeout: 600
>>             features.cache-invalidation: on
>>             performance.io-thread-count: 64
>>             performance.cache-size: 128MB
>>             cluster.self-heal-daemon: enable
>>             features.quota-deem-statfs: on
>>             changelog.changelog: on
>>             geo-replication.ignore-pid-check: on
>>             geo-replication.indexing: on
>>             features.inode-quota: on
>>             features.quota: on
>>             performance.readdir-ahead: on
>>             nfs.disable: on
>>             transport.address-family: inet
>>             performance.cache-samba-metadata: on
>>
>>
>>             $ gluster vol status C-DATA
>>             Status of volume: C-DATA
>>             Gluster process TCP Port  RDMA Port Online  Pid
>> ------------------------------------------------------------------------------
>>             Brick 10.5.6.49:/__.aLocalStorages/0/0-GLUS
>>             TERs/0GLUSTER-C-DATA N/A       N/A N       N/A
>>             Brick 10.5.6.100:/__.aLocalStorages/0/0-GLU
>>             STERs/0GLUSTER-C-DATA 49152     0 Y       9376
>>             Brick 10.5.6.32:/__.aLocalStorages/0/0-GLUS
>>             TERs/0GLUSTER-C-DATA 49152     0 Y       8638
>>             Self-heal Daemon on localhost N/A       N/A
>>             Y       387879
>>             Quota Daemon on localhost N/A       N/A
>>             Y       387891
>>             Self-heal Daemon on rider.private.ccnr.ceb.
>>             private.cam.ac.uk <http://private.cam.ac.uk>
>>             N/A       N/A Y       16439
>>             Quota Daemon on rider.private.ccnr.ceb.priv
>>             ate.cam.ac.uk <http://ate.cam.ac.uk> N/A
>>             N/A Y       16451
>>             Self-heal Daemon on 10.5.6.32 N/A       N/A
>>             Y       7708
>>             Quota Daemon on 10.5.6.32 N/A       N/A
>>             Y       8623
>>             Self-heal Daemon on 10.5.6.17 N/A       N/A
>>             Y       20549
>>             Quota Daemon on 10.5.6.17 N/A       N/A
>>             Y       9337
>>
>>             Task Status of Volume C-DATA
>> ------------------------------------------------------------------------------
>>             There are no active volume tasks
>>
>>
>>
>>
>>
>>             .
>>             _______________________________________________
>>             Gluster-users mailing list
>>             Gluster-users at gluster.org
>>             <mailto:Gluster-users at gluster.org>
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>
>>         _______________________________________________
>>         Gluster-users mailing list
>>         Gluster-users at gluster.org
>>         <mailto:Gluster-users at gluster.org>
>>         
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>
>>     --     --Atin
>>
>>     _______________________________________________
>>     Gluster-users mailing list
>>     Gluster-users at gluster.org
>>     <mailto:Gluster-users at gluster.org>
>>     http://lists.gluster.org/mailman/listinfo/gluster-users
>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>
>>
>

.