[Gluster-users] Error after crash of Virtual Machine during migration

Wed Jan 22 12:18:50 UTC 2014

>> Legend:
>> storage-gfs-3-prd - the first gluster.
> What's a "gluster"?

Thank you for answer. It's GlusterFS Server - where the GlusterFS is
installed.

>> storage-1-saas - new gluster where "the first gluster" had to be 
>> migrated.
>> storage-gfs-4-prd - the second gluster (which had to be migrated
later).
> What do you mean "migrated"?

Migrated - I mean move all data to another GlusterFS server with
replace-brick command

>> I've started command replace-brick:
>> 'gluster volume replace-brick sa_bookshelf 
>> storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared start'
>>
>> During that Virtual Machine (Xen) has crashed. Now I can't abort 
>> migration and continue it again.
> I don't know what state that leaves your files in. I think the
original brick, "storage-gfs-3-prd:/ydp/shared", should still have all
the data.

Yes, /ydp/shared still have all the data.

> The rest of the problem has to do with settings in
/var/lib/glusterd/sa_bookshelf/info. Make a backup of that file and edit
it, removing anything to do with replace-brick, or rebalance. Feel free
to put the info file on fpaste.org and ping me on IRC if you need help
with that. Stop the volume and glusterd. Copy that same edited info file
to the same path on both servers. Start glusterd again. That should
clear the replace-brick status so you can try again with 3.4.2.

In that info file I haven't got any lines with "replace-brick" or
"rebalance". I edited earlier file (removed lines concerned
replace-brick):
/var/lib/glusterd/vols/sa_bookshelf/sa_bookshelf.storage-gfs-3-prd.ydp-s
hared.vol on both nodes to cause glusterFS began to work (because of
behavior I described below)

> When I try:
> '# gluster volume replace-brick sa_bookshelf 
> storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared abort'
> The command lasts about 5 minutes then finishes with no results. Apart

> from that Gluster after that command starts behave very strange.
> For example I can't do '# gluster volume heal sa_bookshelf info' 
> because it lasts about 5 minutes and returns black screen (the same
like abort).
>
> Then I restart Gluster server and Gluster returns to normal work 
> except the replace-brick commands. When I do:
> '# gluster volume replace-brick sa_bookshelf 
> storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared status'
> I get:
> Number of files migrated = 0       Current file=
> I can do 'volume heal info' commands etc. until I call the command:
> '# gluster volume replace-brick sa_bookshelf 
> storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared abort'.
>
>
>
> # gluster --version
> glusterfs 3.3.1 built on Oct 22 2012 07:54:24 Repository revision:
> git://git.gluster.com/glusterfs.git
> Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> 
> GlusterFS comes with ABSOLUTELY NO WARRANTY.
> You may redistribute copies of GlusterFS under the terms of the GNU 
> General Public License.
>
> Brick (/ydp/shared) logs (repeats the same constantly):
> [2013-12-06 11:29:44.790299] W [dict.c:995:data_to_str] 
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0x
> ab
> ) [0x7ff4a5d35fcb]
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get
> _r
> emote_sockaddr+0x15d) [0x7ff4a5d3d64d] 
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_addre
> ss
> _family+0x2bb) [0x7ff4a5d3d4ab]))) 0-dict: data is NULL
> [2013-12-06 11:29:44.790402] W [dict.c:995:data_to_str] 
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0x
> ab
> ) [0x7ff4a5d35fcb]
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get
> _r
> emote_sockaddr+0x15d) [0x7ff4a5d3d64d] 
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_addre
> ss
> _family+0x2c6) [0x7ff4a5d3d4b6]))) 0-dict: data is NULL
> [2013-12-06 11:29:44.790465] E [name.c:141:client_fill_address_family]
> 0-sa_bookshelf-replace-brick: transport.address-family not specified.
> Could not guess default value from (remote-host:(null) or
> transport.unix.connect-path:(null)) options
> [2013-12-06 11:29:47.791037] W [dict.c:995:data_to_str] 
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0x
> ab
> ) [0x7ff4a5d35fcb]
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get
> _r
> emote_sockaddr+0x15d) [0x7ff4a5d3d64d] 
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_addre
> ss
> _family+0x2bb) [0x7ff4a5d3d4ab]))) 0-dict: data is NULL
> [2013-12-06 11:29:47.791141] W [dict.c:995:data_to_str] 
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0x
> ab
> ) [0x7ff4a5d35fcb]
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get
> _r
> emote_sockaddr+0x15d) [0x7ff4a5d3d64d] 
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_addre
> ss
> _family+0x2c6) [0x7ff4a5d3d4b6]))) 0-dict: data is NULL
> [2013-12-06 11:29:47.791174] E [name.c:141:client_fill_address_family]
> 0-sa_bookshelf-replace-brick: transport.address-family not specified.
> Could not guess default value from (remote-host:(null) or
> transport.unix.connect-path:(null)) options
> [2013-12-06 11:29:50.791775] W [dict.c:995:data_to_str] 
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0x
> ab
> ) [0x7ff4a5d35fcb]
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get
> _r
> emote_sockaddr+0x15d) [0x7ff4a5d3d64d] 
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_addre
> ss
> _family+0x2bb) [0x7ff4a5d3d4ab]))) 0-dict: data is NULL
> [2013-12-06 11:29:50.791986] W [dict.c:995:data_to_str] 
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0x
> ab
> ) [0x7ff4a5d35fcb]
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get
> _r
> emote_sockaddr+0x15d) [0x7ff4a5d3d64d] 
> (-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_addre
> ss
> _family+0x2c6) [0x7ff4a5d3d4b6]))) 0-dict: data is NULL
> [2013-12-06 11:29:50.792046] E [name.c:141:client_fill_address_family]
> 0-sa_bookshelf-replace-brick: transport.address-family not specified.
> Could not guess default value from (remote-host:(null) or
> transport.unix.connect-path:(null)) options
>
>
> # gluster volume info
>
> Volume Name: sa_bookshelf
> Type: Distributed-Replicate
> Volume ID: 74512f52-72ec-4538-9a54-4e50c4691722
> Status: Started
> Number of Bricks: 2 x 2 = 4
> Transport-type: tcp
> Bricks:
> Brick1: storage-gfs-3-prd:/ydp/shared
> Brick2: storage-gfs-4-prd:/ydp/shared
> Brick3: storage-gfs-3-prd:/ydp/shared2
> Brick4: storage-gfs-4-prd:/ydp/shared2
>
>
> # gluster volume status
> Status of volume: sa_bookshelf
> Gluster process                                         Port    Online
> Pid
> ----------------------------------------------------------------------
> --
> ------
> Brick storage-gfs-3-prd:/ydp/shared                     24009   Y
> 758
> Brick storage-gfs-4-prd:/ydp/shared                     24009   Y
> 730
> Brick storage-gfs-3-prd:/ydp/shared2                    24010   Y
> 764
> Brick storage-gfs-4-prd:/ydp/shared2                    24010   Y
> 4578
> NFS Server on localhost                                 38467   Y
> 770
> Self-heal Daemon on localhost                           N/A     Y
> 776
> NFS Server on storage-1-saas                            38467   Y
> 840
> Self-heal Daemon on storage-1-saas                      N/A     Y
> 846
> NFS Server on storage-gfs-4-prd                         38467   Y
> 4584
> Self-heal Daemon on storage-gfs-4-prd                   N/A     Y
> 4590
>
> storage-gfs-3-prd:~# gluster peer status Number of Peers: 2
>
> Hostname: storage-1-saas
> Uuid: 37b9d881-ce24-4550-b9de-6b304d7e9d07
> State: Peer in Cluster (Connected)
>
> Hostname: storage-gfs-4-prd
> Uuid: 4c384f45-873b-4c12-9683-903059132c56
> State: Peer in Cluster (Connected)
>
>
> (from storage-1-saas)# gluster peer status Number of Peers: 2
>
> Hostname: 172.16.3.60
> Uuid: 1441a7b0-09d2-4a40-a3ac-0d0e546f6884
> State: Peer in Cluster (Connected)
>
> Hostname: storage-gfs-4-prd
> Uuid: 4c384f45-873b-4c12-9683-903059132c56
> State: Peer in Cluster (Connected)
>
>
>
> Clients work properly.
> I googled for that but I found that was a bug but in 3.3.0 version. 
> How can I repair that and continue my migration? Thank You for any
help.
>
> BTW: I moved Gluster Server via Gluster 3.4: Brick Restoration - 
> Replace Crashed Server how to.
>
> Regards,
> Mariusz
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users