<div>You&#39;d basically have to copy the content of /var/lib/glusterd from fs001 to fs003 with out overwritting fs003&#39;s onode specific details. Please ensure you don&#39;t touch <a href="http://glusterd.info">glusterd.info</a> file and content of /var/lib/glusterd/peers in fs003 and rest can be copied. Post that I expect glusterd will come up.</div><div><br><div class="gmail_quote"><div>On Fri, 26 May 2017 at 20:30, Jarsulic, Michael [CRI] &lt;<a href="mailto:mjarsulic@bsd.uchicago.edu">mjarsulic@bsd.uchicago.edu</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Here is some further information on this issue:<br>
<br>
The version of gluster we are using is 3.7.6.<br>
<br>
Also, the error I found in the cmd history is:<br>
[2017-05-26 04:28:28.332700]  : volume remove-brick hpcscratch cri16fs001-ib:/data/brick1/scratch commit : FAILED : Commit failed on cri16fs003-ib. Please check log file for details.<br>
<br>
I did not notice this at the time and made an attempt to remove the next brick to migrate the data off of the system. This left the servers in the following state.<br>
<br>
fs001 - /var/lib/glusterd/vols/hpcscratch/info<br>
<br>
type=0<br>
count=3<br>
status=1<br>
sub_count=0<br>
stripe_count=1<br>
replica_count=1<br>
disperse_count=0<br>
redundancy_count=0<br>
version=42<br>
transport-type=0<br>
volume-id=80b8eeed-1e72-45b9-8402-e01ae0130105<br>
…<br>
op-version=30700<br>
client-op-version=3<br>
quota-version=0<br>
parent_volname=N/A<br>
restored_from_snap=00000000-0000-0000-0000-000000000000<br>
snap-max-hard-limit=256<br>
server.event-threads=8<br>
performance.client-io-threads=on<br>
client.event-threads=8<br>
performance.cache-size=32MB<br>
performance.readdir-ahead=on<br>
brick-0=cri16fs001-ib:-data-brick2-scratch<br>
brick-1=cri16fs003-ib:-data-brick5-scratch<br>
brick-2=cri16fs003-ib:-data-brick6-scratch<br>
<br>
<br>
fs003 - cat /var/lib/glusterd/vols/hpcscratch/info<br>
<br>
type=0<br>
count=4<br>
status=1<br>
sub_count=0<br>
stripe_count=1<br>
replica_count=1<br>
disperse_count=0<br>
redundancy_count=0<br>
version=35<br>
transport-type=0<br>
volume-id=80b8eeed-1e72-45b9-8402-e01ae0130105<br>
…<br>
op-version=30700<br>
client-op-version=3<br>
quota-version=0<br>
parent_volname=N/A<br>
restored_from_snap=00000000-0000-0000-0000-000000000000<br>
snap-max-hard-limit=256<br>
performance.readdir-ahead=on<br>
performance.cache-size=32MB<br>
client.event-threads=8<br>
performance.client-io-threads=on<br>
server.event-threads=8<br>
brick-0=cri16fs001-ib:-data-brick1-scratch<br>
brick-1=cri16fs001-ib:-data-brick2-scratch<br>
brick-2=cri16fs003-ib:-data-brick5-scratch<br>
brick-3=cri16fs003-ib:-data-brick6-scratch<br>
<br>
<br>
fs001 - /var/lib/glusterd/vols/hpcscratch/<a href="http://node_state.info" rel="noreferrer" target="_blank">node_state.info</a><br>
<br>
rebalance_status=5<br>
status=4<br>
rebalance_op=0<br>
rebalance-id=00000000-0000-0000-0000-000000000000<br>
brick1=cri16fs001-ib:/data/brick2/scratch<br>
count=1<br>
<br>
<br>
fs003 - /var/lib/glusterd/vols/hpcscratch/<a href="http://node_state.info" rel="noreferrer" target="_blank">node_state.info</a><br>
<br>
rebalance_status=1<br>
status=0<br>
rebalance_op=9<br>
rebalance-id=0184577f-eb64-4af9-924d-91ead0605a1e<br>
brick1=cri16fs001-ib:/data/brick1/scratch<br>
count=1<br>
<br>
<br>
--<br>
Mike Jarsulic<br>
<br>
<br>
On 5/26/17, 8:22 AM, &quot;<a href="mailto:gluster-users-bounces@gluster.org" target="_blank">gluster-users-bounces@gluster.org</a> on behalf of Jarsulic, Michael [CRI]&quot; &lt;<a href="mailto:gluster-users-bounces@gluster.org" target="_blank">gluster-users-bounces@gluster.org</a> on behalf of <a href="mailto:mjarsulic@bsd.uchicago.edu" target="_blank">mjarsulic@bsd.uchicago.edu</a>&gt; wrote:<br>
<br>
    Recently, I had some problems with the OS hard drives in my glusterd servers and took one of my systems down for maintenance. The first step was to remove one of the bricks (brick1) hosted on the server (fs001). The data migrated successfully and completed last night. After that, I went to commit the changes and the commit failed. Afterwards, glusterd will not start on one of my servers (fs003). When I check the glusterd logs on fs003 I get the following errors whenever glusterd starts:<br>
<br>
    [2017-05-26 04:37:21.358932] I [MSGID: 100030] [glusterfsd.c:2318:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.7.6 (args: /usr/sbin/glusterd --pid-file=/var/run/glusterd.pid)<br>
    [2017-05-26 04:37:21.382630] I [MSGID: 106478] [glusterd.c:1350:init] 0-management: Maximum allowed open file descriptors set to 65536<br>
    [2017-05-26 04:37:21.382712] I [MSGID: 106479] [glusterd.c:1399:init] 0-management: Using /var/lib/glusterd as working directory<br>
    [2017-05-26 04:37:21.422858] I [MSGID: 106228] [glusterd.c:433:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system [No such file or directory]<br>
    [2017-05-26 04:37:21.450123] I [MSGID: 106513] [glusterd-store.c:2047:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 30706<br>
    [2017-05-26 04:37:21.463812] E [MSGID: 101032] [store.c:434:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/vols/hpcscratch/bricks/cri16fs001-ib:-data-brick1-scratch. [No such file or directory]<br>
    [2017-05-26 04:37:21.463866] E [MSGID: 106201] [glusterd-store.c:3042:glusterd_store_retrieve_volumes] 0-management: Unable to restore volume: hpcscratch<br>
    [2017-05-26 04:37:21.463919] E [MSGID: 101019] [xlator.c:428:xlator_init] 0-management: Initialization of volume &#39;management&#39; failed, review your volfile again<br>
    [2017-05-26 04:37:21.463943] E [graph.c:322:glusterfs_graph_init] 0-management: initializing translator failed<br>
    [2017-05-26 04:37:21.463970] E [graph.c:661:glusterfs_graph_activate] 0-graph: init failed<br>
    [2017-05-26 04:37:21.466703] W [glusterfsd.c:1236:cleanup_and_exit] (--&gt;/usr/sbin/glusterd(glusterfs_volumes_init+0xda) [0x405cba] --&gt;/usr/sbin/glusterd(glusterfs_process_volfp+0x116) [0x405b96] --&gt;/usr/sbin/glusterd(cleanup_and_exit+0x65) [0x4059d5] ) 0-: received signum (0), shutting down<br>
<br>
    The volume is distribution only. The problem to me looks like it is still expecting brick1 on fs001 to be available in the volume. Is there any way to recover from this? Is there any more information that I can provide?<br>
<br>
<br>
    --<br>
    Mike Jarsulic<br>
<br>
    _______________________________________________<br>
    Gluster-users mailing list<br>
    <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
    <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.gluster.org_mailman_listinfo_gluster-2Dusers&amp;d=CwICAg&amp;c=Nd1gv_ZWYNIRyZYZmXb18oVfc3lTqv2smA_esABG70U&amp;r=Ak787_FO1coN0_NpWYelxgxjFkaWMHYbXVCdYf-STow&amp;m=zlkeQUf69-VWf8o96ZWr-vxNatuWZvCgYuHnUVj3u70&amp;s=8YOysLTMfJHXS6dSVgP7X0o0LovgLcIuPjfoSY2Kt2Q&amp;e=" rel="noreferrer" target="_blank">https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.gluster.org_mailman_listinfo_gluster-2Dusers&amp;d=CwICAg&amp;c=Nd1gv_ZWYNIRyZYZmXb18oVfc3lTqv2smA_esABG70U&amp;r=Ak787_FO1coN0_NpWYelxgxjFkaWMHYbXVCdYf-STow&amp;m=zlkeQUf69-VWf8o96ZWr-vxNatuWZvCgYuHnUVj3u70&amp;s=8YOysLTMfJHXS6dSVgP7X0o0LovgLcIuPjfoSY2Kt2Q&amp;e=</a><br>
<br>
<br>
_______________________________________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/mailman/listinfo/gluster-users</a></blockquote></div></div><div dir="ltr">-- <br></div><div data-smartmail="gmail_signature">- Atin (atinm)</div>