[Gluster-users] Failed Volume

Jarsulic, Michael [CRI] mjarsulic at bsd.uchicago.edu
Fri May 26 14:59:03 UTC 2017


Here is some further information on this issue:

The version of gluster we are using is 3.7.6.

Also, the error I found in the cmd history is:
[2017-05-26 04:28:28.332700]  : volume remove-brick hpcscratch cri16fs001-ib:/data/brick1/scratch commit : FAILED : Commit failed on cri16fs003-ib. Please check log file for details.

I did not notice this at the time and made an attempt to remove the next brick to migrate the data off of the system. This left the servers in the following state.

fs001 - /var/lib/glusterd/vols/hpcscratch/info

type=0
count=3
status=1
sub_count=0
stripe_count=1
replica_count=1
disperse_count=0
redundancy_count=0
version=42
transport-type=0
volume-id=80b8eeed-1e72-45b9-8402-e01ae0130105
…
op-version=30700
client-op-version=3
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
server.event-threads=8
performance.client-io-threads=on
client.event-threads=8
performance.cache-size=32MB
performance.readdir-ahead=on
brick-0=cri16fs001-ib:-data-brick2-scratch
brick-1=cri16fs003-ib:-data-brick5-scratch
brick-2=cri16fs003-ib:-data-brick6-scratch


fs003 - cat /var/lib/glusterd/vols/hpcscratch/info

type=0
count=4
status=1
sub_count=0
stripe_count=1
replica_count=1
disperse_count=0
redundancy_count=0
version=35
transport-type=0
volume-id=80b8eeed-1e72-45b9-8402-e01ae0130105
…
op-version=30700
client-op-version=3
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
performance.readdir-ahead=on
performance.cache-size=32MB
client.event-threads=8
performance.client-io-threads=on
server.event-threads=8
brick-0=cri16fs001-ib:-data-brick1-scratch
brick-1=cri16fs001-ib:-data-brick2-scratch
brick-2=cri16fs003-ib:-data-brick5-scratch
brick-3=cri16fs003-ib:-data-brick6-scratch


fs001 - /var/lib/glusterd/vols/hpcscratch/node_state.info

rebalance_status=5
status=4
rebalance_op=0
rebalance-id=00000000-0000-0000-0000-000000000000
brick1=cri16fs001-ib:/data/brick2/scratch
count=1


fs003 - /var/lib/glusterd/vols/hpcscratch/node_state.info

rebalance_status=1
status=0
rebalance_op=9
rebalance-id=0184577f-eb64-4af9-924d-91ead0605a1e
brick1=cri16fs001-ib:/data/brick1/scratch
count=1


-- 
Mike Jarsulic


On 5/26/17, 8:22 AM, "gluster-users-bounces at gluster.org on behalf of Jarsulic, Michael [CRI]" <gluster-users-bounces at gluster.org on behalf of mjarsulic at bsd.uchicago.edu> wrote:

    Recently, I had some problems with the OS hard drives in my glusterd servers and took one of my systems down for maintenance. The first step was to remove one of the bricks (brick1) hosted on the server (fs001). The data migrated successfully and completed last night. After that, I went to commit the changes and the commit failed. Afterwards, glusterd will not start on one of my servers (fs003). When I check the glusterd logs on fs003 I get the following errors whenever glusterd starts:
    
    [2017-05-26 04:37:21.358932] I [MSGID: 100030] [glusterfsd.c:2318:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.7.6 (args: /usr/sbin/glusterd --pid-file=/var/run/glusterd.pid)
    [2017-05-26 04:37:21.382630] I [MSGID: 106478] [glusterd.c:1350:init] 0-management: Maximum allowed open file descriptors set to 65536
    [2017-05-26 04:37:21.382712] I [MSGID: 106479] [glusterd.c:1399:init] 0-management: Using /var/lib/glusterd as working directory
    [2017-05-26 04:37:21.422858] I [MSGID: 106228] [glusterd.c:433:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system [No such file or directory]
    [2017-05-26 04:37:21.450123] I [MSGID: 106513] [glusterd-store.c:2047:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 30706
    [2017-05-26 04:37:21.463812] E [MSGID: 101032] [store.c:434:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/vols/hpcscratch/bricks/cri16fs001-ib:-data-brick1-scratch. [No such file or directory]
    [2017-05-26 04:37:21.463866] E [MSGID: 106201] [glusterd-store.c:3042:glusterd_store_retrieve_volumes] 0-management: Unable to restore volume: hpcscratch
    [2017-05-26 04:37:21.463919] E [MSGID: 101019] [xlator.c:428:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
    [2017-05-26 04:37:21.463943] E [graph.c:322:glusterfs_graph_init] 0-management: initializing translator failed
    [2017-05-26 04:37:21.463970] E [graph.c:661:glusterfs_graph_activate] 0-graph: init failed
    [2017-05-26 04:37:21.466703] W [glusterfsd.c:1236:cleanup_and_exit] (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xda) [0x405cba] -->/usr/sbin/glusterd(glusterfs_process_volfp+0x116) [0x405b96] -->/usr/sbin/glusterd(cleanup_and_exit+0x65) [0x4059d5] ) 0-: received signum (0), shutting down
    
    The volume is distribution only. The problem to me looks like it is still expecting brick1 on fs001 to be available in the volume. Is there any way to recover from this? Is there any more information that I can provide?
    
    
    --
    Mike Jarsulic
    
    _______________________________________________
    Gluster-users mailing list
    Gluster-users at gluster.org
    https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.gluster.org_mailman_listinfo_gluster-2Dusers&d=CwICAg&c=Nd1gv_ZWYNIRyZYZmXb18oVfc3lTqv2smA_esABG70U&r=Ak787_FO1coN0_NpWYelxgxjFkaWMHYbXVCdYf-STow&m=zlkeQUf69-VWf8o96ZWr-vxNatuWZvCgYuHnUVj3u70&s=8YOysLTMfJHXS6dSVgP7X0o0LovgLcIuPjfoSY2Kt2Q&e= 
    



More information about the Gluster-users mailing list