[Gluster-users] one brick one volume process dies?
lejeczek
peljasz at yahoo.co.uk
Thu Sep 28 16:14:26 UTC 2017
On 28/09/17 17:05, lejeczek wrote:
>
>
> On 13/09/17 20:47, Ben Werthmann wrote:
>> These symptoms appear to be the same as I've recorded in
>> this post:
>>
>> http://lists.gluster.org/pipermail/gluster-users/2017-September/032435.html
>>
>>
>> On Wed, Sep 13, 2017 at 7:01 AM, Atin Mukherjee
>> <atin.mukherjee83 at gmail.com
>> <mailto:atin.mukherjee83 at gmail.com>> wrote:
>>
>> Additionally the brick log file of the same brick
>> would be required. Please look for if brick process
>> went down or crashed. Doing a volume start force
>> should resolve the issue.
>>
>
> When I do: vol start force I see this between the lines:
>
> [2017-09-28 16:00:55.120726] I [MSGID: 106568]
> [glusterd-proc-mgmt.c:87:glusterd_proc_stop] 0-management:
> Stopping glustershd daemon running in pid: 308300
> [2017-09-28 16:00:55.128867] W [socket.c:593:__socket_rwv]
> 0-glustershd: readv on
> /var/run/gluster/0853a4555820d3442b1c3909f1cb8466.socket
> failed (No data available)
> [2017-09-28 16:00:56.122687] I [MSGID: 106568]
> [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management:
> glustershd service is stopped
>
> funnily(or not) I now see, a week after:
>
> gluster vol status CYTO-DATA
> Status of volume: CYTO-DATA
> Gluster process TCP Port RDMA
> Port Online Pid
> ------------------------------------------------------------------------------
>
> Brick 10.5.6.49:/__.aLocalStorages/0/0-GLUS
> TERs/0GLUSTER-CYTO-DATA 49161 0 Y
> 1743719
> Brick 10.5.6.100:/__.aLocalStorages/0/0-GLU
> STERs/0GLUSTER-CYTO-DATA 49152 0 Y
> 20438
> Brick 10.5.6.32:/__.aLocalStorages/0/0-GLUS
> TERs/0GLUSTER-CYTO-DATA 49152 0 Y
> 5607
> Self-heal Daemon on localhost N/A N/A
> Y 41106
> Quota Daemon on localhost N/A N/A
> Y 41117
> Self-heal Daemon on 10.5.6.17 N/A N/A
> Y 19088
> Quota Daemon on 10.5.6.17 N/A N/A
> Y 19097
> Self-heal Daemon on 10.5.6.32 N/A N/A
> Y 1832978
> Quota Daemon on 10.5.6.32 N/A N/A
> Y 1832987
> Self-heal Daemon on 10.5.6.49 N/A N/A
> Y 320291
> Quota Daemon on 10.5.6.49 N/A N/A
> Y 320303
>
> Task Status of Volume CYTO-DATA
> ------------------------------------------------------------------------------
>
> There are no active volume tasks
>
>
> $ gluster vol heal CYTO-DATA info
> Brick
> 10.5.6.49:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA
> Status: Transport endpoint is not connected
> Number of entries: -
>
> Brick
> 10.5.6.100:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-CYTO-DATA
> ...
> ...
>
And if I trace pid 1743719, yes, it's up & running but that
port - 49161 - is not open.
I do not see any segfaults nor obvious crashes.
>
>> On Wed, 13 Sep 2017 at 16:28, Gaurav Yadav
>> <gyadav at redhat.com <mailto:gyadav at redhat.com>> wrote:
>>
>> Please send me the logs as well i.e glusterd.logs
>> and cmd_history.log.
>>
>>
>> On Wed, Sep 13, 2017 at 1:45 PM, lejeczek
>> <peljasz at yahoo.co.uk <mailto:peljasz at yahoo.co.uk>>
>> wrote:
>>
>>
>>
>> On 13/09/17 06:21, Gaurav Yadav wrote:
>>
>> Please provide the output of gluster
>> volume info, gluster volume status and
>> gluster peer status.
>>
>> Apart from above info, please provide
>> glusterd logs, cmd_history.log.
>>
>> Thanks
>> Gaurav
>>
>> On Tue, Sep 12, 2017 at 2:22 PM, lejeczek
>> <peljasz at yahoo.co.uk
>> <mailto:peljasz at yahoo.co.uk>
>> <mailto:peljasz at yahoo.co.uk
>> <mailto:peljasz at yahoo.co.uk>>> wrote:
>>
>> hi everyone
>>
>> I have 3-peer cluster with all vols in
>> replica mode, 9
>> vols.
>> What I see, unfortunately, is one
>> brick fails in one
>> vol, when it happens it's always the
>> same vol on the
>> same brick.
>> Command: gluster vol status $vol -
>> would show brick
>> not online.
>> Restarting glusterd with systemclt
>> does not help, only
>> system reboot seem to help, until it
>> happens, next time.
>>
>> How to troubleshoot this weird
>> misbehaviour?
>> many thanks, L.
>>
>> .
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> <mailto:Gluster-users at gluster.org>
>>
>> <mailto:Gluster-users at gluster.org
>> <mailto:Gluster-users at gluster.org>>
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>
>> <http://lists.gluster.org/mailman/listinfo/gluster-users
>> <http://lists.gluster.org/mailman/listinfo/gluster-users>>
>>
>>
>>
>> hi, here:
>>
>> $ gluster vol info C-DATA
>>
>> Volume Name: C-DATA
>> Type: Replicate
>> Volume ID: 18ffba73-532e-4a4d-84da-fceea52f8c2e
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1:
>> 10.5.6.49:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-C-DATA
>> Brick2:
>> 10.5.6.100:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-C-DATA
>> Brick3:
>> 10.5.6.32:/__.aLocalStorages/0/0-GLUSTERs/0GLUSTER-C-DATA
>> Options Reconfigured:
>> performance.md-cache-timeout: 600
>> performance.cache-invalidation: on
>> performance.stat-prefetch: on
>> features.cache-invalidation-timeout: 600
>> features.cache-invalidation: on
>> performance.io-thread-count: 64
>> performance.cache-size: 128MB
>> cluster.self-heal-daemon: enable
>> features.quota-deem-statfs: on
>> changelog.changelog: on
>> geo-replication.ignore-pid-check: on
>> geo-replication.indexing: on
>> features.inode-quota: on
>> features.quota: on
>> performance.readdir-ahead: on
>> nfs.disable: on
>> transport.address-family: inet
>> performance.cache-samba-metadata: on
>>
>>
>> $ gluster vol status C-DATA
>> Status of volume: C-DATA
>> Gluster process TCP Port RDMA Port Online Pid
>> ------------------------------------------------------------------------------
>> Brick 10.5.6.49:/__.aLocalStorages/0/0-GLUS
>> TERs/0GLUSTER-C-DATA N/A N/A N N/A
>> Brick 10.5.6.100:/__.aLocalStorages/0/0-GLU
>> STERs/0GLUSTER-C-DATA 49152 0 Y 9376
>> Brick 10.5.6.32:/__.aLocalStorages/0/0-GLUS
>> TERs/0GLUSTER-C-DATA 49152 0 Y 8638
>> Self-heal Daemon on localhost N/A N/A
>> Y 387879
>> Quota Daemon on localhost N/A N/A
>> Y 387891
>> Self-heal Daemon on rider.private.ccnr.ceb.
>> private.cam.ac.uk <http://private.cam.ac.uk>
>> N/A N/A Y 16439
>> Quota Daemon on rider.private.ccnr.ceb.priv
>> ate.cam.ac.uk <http://ate.cam.ac.uk> N/A
>> N/A Y 16451
>> Self-heal Daemon on 10.5.6.32 N/A N/A
>> Y 7708
>> Quota Daemon on 10.5.6.32 N/A N/A
>> Y 8623
>> Self-heal Daemon on 10.5.6.17 N/A N/A
>> Y 20549
>> Quota Daemon on 10.5.6.17 N/A N/A
>> Y 9337
>>
>> Task Status of Volume C-DATA
>> ------------------------------------------------------------------------------
>> There are no active volume tasks
>>
>>
>>
>>
>>
>> .
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> <mailto:Gluster-users at gluster.org>
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> <mailto:Gluster-users at gluster.org>
>>
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>
>> -- --Atin
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> <mailto:Gluster-users at gluster.org>
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>> <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>
>>
>
.
More information about the Gluster-users
mailing list