[Gluster-users] Error after crash of Virtual Machine during migration
Joe Julian
joe at julianfamily.org
Tue Jan 21 15:42:57 UTC 2014
On 12/10/2013 02:59 AM, Mariusz Sobisiak wrote:
> Greetings,
>
> Legend:
> storage-gfs-3-prd - the first gluster.
What's a "gluster"?
> storage-1-saas - new gluster where "the first gluster" had to be
> migrated.
> storage-gfs-4-prd - the second gluster (which had to be migrated later).
What do you mean "migrated"?
> I've started command replace-brick:
> 'gluster volume replace-brick sa_bookshelf storage-gfs-3-prd:/ydp/shared
> storage-1-saas:/ydp/shared start'
>
> During that Virtual Machine (Xen) has crashed. Now I can't abort
> migration and continue it again.
I don't know what state that leaves your files in. I think the original
brick, "storage-gfs-3-prd:/ydp/shared", should still have all the data.
The rest of the problem has to do with settings in
/var/lib/glusterd/sa_bookshelf/info. Make a backup of that file and edit
it, removing anything to do with replace-brick, or rebalance. Feel free
to put the info file on fpaste.org and ping me on IRC if you need help
with that. Stop the volume and glusterd. Copy that same edited info file
to the same path on both servers. Start glusterd again. That should
clear the replace-brick status so you can try again with 3.4.2.
> When I try:
> '# gluster volume replace-brick sa_bookshelf
> storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared abort'
> The command lasts about 5 minutes then finishes with no results. Apart
> from that Gluster after that command starts behave very strange.
> For example I can't do '# gluster volume heal sa_bookshelf info' because
> it lasts about 5 minutes and returns black screen (the same like abort).
>
> Then I restart Gluster server and Gluster returns to normal work except
> the replace-brick commands. When I do:
> '# gluster volume replace-brick sa_bookshelf
> storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared status'
> I get:
> Number of files migrated = 0 Current file=
> I can do 'volume heal info' commands etc. until I call the command:
> '# gluster volume replace-brick sa_bookshelf
> storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared abort'.
>
>
>
> # gluster --version
> glusterfs 3.3.1 built on Oct 22 2012 07:54:24 Repository revision:
> git://git.gluster.com/glusterfs.git
> Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS
> comes with ABSOLUTELY NO WARRANTY.
> You may redistribute copies of GlusterFS under the terms of the GNU
> General Public License.
>
> Brick (/ydp/shared) logs (repeats the same constantly):
> [2013-12-06 11:29:44.790299] W [dict.c:995:data_to_str]
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab
> ) [0x7ff4a5d35fcb]
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r
> emote_sockaddr+0x15d) [0x7ff4a5d3d64d]
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address
> _family+0x2bb) [0x7ff4a5d3d4ab]))) 0-dict: data is NULL
> [2013-12-06 11:29:44.790402] W [dict.c:995:data_to_str]
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab
> ) [0x7ff4a5d35fcb]
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r
> emote_sockaddr+0x15d) [0x7ff4a5d3d64d]
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address
> _family+0x2c6) [0x7ff4a5d3d4b6]))) 0-dict: data is NULL
> [2013-12-06 11:29:44.790465] E [name.c:141:client_fill_address_family]
> 0-sa_bookshelf-replace-brick: transport.address-family not specified.
> Could not guess default value from (remote-host:(null) or
> transport.unix.connect-path:(null)) options
> [2013-12-06 11:29:47.791037] W [dict.c:995:data_to_str]
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab
> ) [0x7ff4a5d35fcb]
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r
> emote_sockaddr+0x15d) [0x7ff4a5d3d64d]
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address
> _family+0x2bb) [0x7ff4a5d3d4ab]))) 0-dict: data is NULL
> [2013-12-06 11:29:47.791141] W [dict.c:995:data_to_str]
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab
> ) [0x7ff4a5d35fcb]
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r
> emote_sockaddr+0x15d) [0x7ff4a5d3d64d]
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address
> _family+0x2c6) [0x7ff4a5d3d4b6]))) 0-dict: data is NULL
> [2013-12-06 11:29:47.791174] E [name.c:141:client_fill_address_family]
> 0-sa_bookshelf-replace-brick: transport.address-family not specified.
> Could not guess default value from (remote-host:(null) or
> transport.unix.connect-path:(null)) options
> [2013-12-06 11:29:50.791775] W [dict.c:995:data_to_str]
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab
> ) [0x7ff4a5d35fcb]
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r
> emote_sockaddr+0x15d) [0x7ff4a5d3d64d]
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address
> _family+0x2bb) [0x7ff4a5d3d4ab]))) 0-dict: data is NULL
> [2013-12-06 11:29:50.791986] W [dict.c:995:data_to_str]
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab
> ) [0x7ff4a5d35fcb]
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r
> emote_sockaddr+0x15d) [0x7ff4a5d3d64d]
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address
> _family+0x2c6) [0x7ff4a5d3d4b6]))) 0-dict: data is NULL
> [2013-12-06 11:29:50.792046] E [name.c:141:client_fill_address_family]
> 0-sa_bookshelf-replace-brick: transport.address-family not specified.
> Could not guess default value from (remote-host:(null) or
> transport.unix.connect-path:(null)) options
>
>
> # gluster volume info
>
> Volume Name: sa_bookshelf
> Type: Distributed-Replicate
> Volume ID: 74512f52-72ec-4538-9a54-4e50c4691722
> Status: Started
> Number of Bricks: 2 x 2 = 4
> Transport-type: tcp
> Bricks:
> Brick1: storage-gfs-3-prd:/ydp/shared
> Brick2: storage-gfs-4-prd:/ydp/shared
> Brick3: storage-gfs-3-prd:/ydp/shared2
> Brick4: storage-gfs-4-prd:/ydp/shared2
>
>
> # gluster volume status
> Status of volume: sa_bookshelf
> Gluster process Port Online
> Pid
> ------------------------------------------------------------------------
> ------
> Brick storage-gfs-3-prd:/ydp/shared 24009 Y
> 758
> Brick storage-gfs-4-prd:/ydp/shared 24009 Y
> 730
> Brick storage-gfs-3-prd:/ydp/shared2 24010 Y
> 764
> Brick storage-gfs-4-prd:/ydp/shared2 24010 Y
> 4578
> NFS Server on localhost 38467 Y
> 770
> Self-heal Daemon on localhost N/A Y
> 776
> NFS Server on storage-1-saas 38467 Y
> 840
> Self-heal Daemon on storage-1-saas N/A Y
> 846
> NFS Server on storage-gfs-4-prd 38467 Y
> 4584
> Self-heal Daemon on storage-gfs-4-prd N/A Y
> 4590
>
> storage-gfs-3-prd:~# gluster peer status Number of Peers: 2
>
> Hostname: storage-1-saas
> Uuid: 37b9d881-ce24-4550-b9de-6b304d7e9d07
> State: Peer in Cluster (Connected)
>
> Hostname: storage-gfs-4-prd
> Uuid: 4c384f45-873b-4c12-9683-903059132c56
> State: Peer in Cluster (Connected)
>
>
> (from storage-1-saas)# gluster peer status Number of Peers: 2
>
> Hostname: 172.16.3.60
> Uuid: 1441a7b0-09d2-4a40-a3ac-0d0e546f6884
> State: Peer in Cluster (Connected)
>
> Hostname: storage-gfs-4-prd
> Uuid: 4c384f45-873b-4c12-9683-903059132c56
> State: Peer in Cluster (Connected)
>
>
>
> Clients work properly.
> I googled for that but I found that was a bug but in 3.3.0 version. How
> can I repair that and continue my migration? Thank You for any help.
>
> BTW: I moved Gluster Server via Gluster 3.4: Brick Restoration - Replace
> Crashed Server how to.
>
> Regards,
> Mariusz
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
More information about the Gluster-users
mailing list