[Gluster-users] Failed Volume

Fri May 26 15:39:13 UTC 2017

You'd basically have to copy the content of /var/lib/glusterd from fs001 to
fs003 with out overwritting fs003's onode specific details. Please ensure
you don't touch glusterd.info file and content of /var/lib/glusterd/peers
in fs003 and rest can be copied. Post that I expect glusterd will come up.

On Fri, 26 May 2017 at 20:30, Jarsulic, Michael [CRI] <
mjarsulic at bsd.uchicago.edu> wrote:

> Here is some further information on this issue:
>
> The version of gluster we are using is 3.7.6.
>
> Also, the error I found in the cmd history is:
> [2017-05-26 04:28:28.332700]  : volume remove-brick hpcscratch
> cri16fs001-ib:/data/brick1/scratch commit : FAILED : Commit failed on
> cri16fs003-ib. Please check log file for details.
>
> I did not notice this at the time and made an attempt to remove the next
> brick to migrate the data off of the system. This left the servers in the
> following state.
>
> fs001 - /var/lib/glusterd/vols/hpcscratch/info
>
> type=0
> count=3
> status=1
> sub_count=0
> stripe_count=1
> replica_count=1
> disperse_count=0
> redundancy_count=0
> version=42
> transport-type=0
> volume-id=80b8eeed-1e72-45b9-8402-e01ae0130105
> …
> op-version=30700
> client-op-version=3
> quota-version=0
> parent_volname=N/A
> restored_from_snap=00000000-0000-0000-0000-000000000000
> snap-max-hard-limit=256
> server.event-threads=8
> performance.client-io-threads=on
> client.event-threads=8
> performance.cache-size=32MB
> performance.readdir-ahead=on
> brick-0=cri16fs001-ib:-data-brick2-scratch
> brick-1=cri16fs003-ib:-data-brick5-scratch
> brick-2=cri16fs003-ib:-data-brick6-scratch
>
>
> fs003 - cat /var/lib/glusterd/vols/hpcscratch/info
>
> type=0
> count=4
> status=1
> sub_count=0
> stripe_count=1
> replica_count=1
> disperse_count=0
> redundancy_count=0
> version=35
> transport-type=0
> volume-id=80b8eeed-1e72-45b9-8402-e01ae0130105
> …
> op-version=30700
> client-op-version=3
> quota-version=0
> parent_volname=N/A
> restored_from_snap=00000000-0000-0000-0000-000000000000
> snap-max-hard-limit=256
> performance.readdir-ahead=on
> performance.cache-size=32MB
> client.event-threads=8
> performance.client-io-threads=on
> server.event-threads=8
> brick-0=cri16fs001-ib:-data-brick1-scratch
> brick-1=cri16fs001-ib:-data-brick2-scratch
> brick-2=cri16fs003-ib:-data-brick5-scratch
> brick-3=cri16fs003-ib:-data-brick6-scratch
>
>
> fs001 - /var/lib/glusterd/vols/hpcscratch/node_state.info
>
> rebalance_status=5
> status=4
> rebalance_op=0
> rebalance-id=00000000-0000-0000-0000-000000000000
> brick1=cri16fs001-ib:/data/brick2/scratch
> count=1
>
>
> fs003 - /var/lib/glusterd/vols/hpcscratch/node_state.info
>
> rebalance_status=1
> status=0
> rebalance_op=9
> rebalance-id=0184577f-eb64-4af9-924d-91ead0605a1e
> brick1=cri16fs001-ib:/data/brick1/scratch
> count=1
>
>
> --
> Mike Jarsulic
>
>
> On 5/26/17, 8:22 AM, "gluster-users-bounces at gluster.org on behalf of
> Jarsulic, Michael [CRI]" <gluster-users-bounces at gluster.org on behalf of
> mjarsulic at bsd.uchicago.edu> wrote:
>
>     Recently, I had some problems with the OS hard drives in my glusterd
> servers and took one of my systems down for maintenance. The first step was
> to remove one of the bricks (brick1) hosted on the server (fs001). The data
> migrated successfully and completed last night. After that, I went to
> commit the changes and the commit failed. Afterwards, glusterd will not
> start on one of my servers (fs003). When I check the glusterd logs on fs003
> I get the following errors whenever glusterd starts:
>
>     [2017-05-26 04:37:21.358932] I [MSGID: 100030]
> [glusterfsd.c:2318:main] 0-/usr/sbin/glusterd: Started running
> /usr/sbin/glusterd version 3.7.6 (args: /usr/sbin/glusterd
> --pid-file=/var/run/glusterd.pid)
>     [2017-05-26 04:37:21.382630] I [MSGID: 106478] [glusterd.c:1350:init]
> 0-management: Maximum allowed open file descriptors set to 65536
>     [2017-05-26 04:37:21.382712] I [MSGID: 106479] [glusterd.c:1399:init]
> 0-management: Using /var/lib/glusterd as working directory
>     [2017-05-26 04:37:21.422858] I [MSGID: 106228]
> [glusterd.c:433:glusterd_check_gsync_present] 0-glusterd: geo-replication
> module not installed in the system [No such file or directory]
>     [2017-05-26 04:37:21.450123] I [MSGID: 106513]
> [glusterd-store.c:2047:glusterd_restore_op_version] 0-glusterd: retrieved
> op-version: 30706
>     [2017-05-26 04:37:21.463812] E [MSGID: 101032]
> [store.c:434:gf_store_handle_retrieve] 0-: Path corresponding to
> /var/lib/glusterd/vols/hpcscratch/bricks/cri16fs001-ib:-data-brick1-scratch.
> [No such file or directory]
>     [2017-05-26 04:37:21.463866] E [MSGID: 106201]
> [glusterd-store.c:3042:glusterd_store_retrieve_volumes] 0-management:
> Unable to restore volume: hpcscratch
>     [2017-05-26 04:37:21.463919] E [MSGID: 101019]
> [xlator.c:428:xlator_init] 0-management: Initialization of volume
> 'management' failed, review your volfile again
>     [2017-05-26 04:37:21.463943] E [graph.c:322:glusterfs_graph_init]
> 0-management: initializing translator failed
>     [2017-05-26 04:37:21.463970] E [graph.c:661:glusterfs_graph_activate]
> 0-graph: init failed
>     [2017-05-26 04:37:21.466703] W [glusterfsd.c:1236:cleanup_and_exit]
> (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xda) [0x405cba]
> -->/usr/sbin/glusterd(glusterfs_process_volfp+0x116) [0x405b96]
> -->/usr/sbin/glusterd(cleanup_and_exit+0x65) [0x4059d5] ) 0-: received
> signum (0), shutting down
>
>     The volume is distribution only. The problem to me looks like it is
> still expecting brick1 on fs001 to be available in the volume. Is there any
> way to recover from this? Is there any more information that I can provide?
>
>
>     --
>     Mike Jarsulic
>
>     _______________________________________________
>     Gluster-users mailing list
>     Gluster-users at gluster.org
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.gluster.org_mailman_listinfo_gluster-2Dusers&d=CwICAg&c=Nd1gv_ZWYNIRyZYZmXb18oVfc3lTqv2smA_esABG70U&r=Ak787_FO1coN0_NpWYelxgxjFkaWMHYbXVCdYf-STow&m=zlkeQUf69-VWf8o96ZWr-vxNatuWZvCgYuHnUVj3u70&s=8YOysLTMfJHXS6dSVgP7X0o0LovgLcIuPjfoSY2Kt2Q&e=
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

-- 
- Atin (atinm)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170526/20d83c1b/attachment.html>