[Gluster-users] libgfapi failover problem on replica bricks
Pranith Kumar Karampuri
pkarampu at redhat.com
Thu Aug 7 01:07:07 UTC 2014
hi Roman,
Does the md5 sum match when the VMs are paused?
Pranith
On 08/07/2014 03:11 AM, Roman wrote:
> I don't know, if it makes any sense, but I'll add this kind of
> information:
> after I stop the VM (in situation, when one glusterfs server was down
> for a while and then back up) and start it again, glusterfs treats
> those VM disk files, like they are the same now. Meanwhile they are
> not. The sizes are different. I think there is some kind of problem
> with striped files checks in glusterfs.
>
>
> root at stor1:~# getfattr -d -m. -e hex
> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
> getfattr: Removing leading '/' from absolute path names
> # file: exports/pve1/1T/images/125/vm-125-disk-1.qcow2
> trusted.afr.HA-MED-PVE1-1T-client-0=0x000000000000000000000000
> trusted.afr.HA-MED-PVE1-1T-client-1=0x000000000000000000000000
> trusted.gfid=0x207984df4e6e4ef983f285ed0c4ce8fa
>
>
> root at stor1:~# du -sh /exports/pve1/1T/images/125/
> 1.6G /exports/pve1/1T/images/125/
>
>
> getfattr: Removing leading '/' from absolute path names
> # file: exports/pve1/1T/images/125/vm-125-disk-1.qcow2
> trusted.afr.HA-MED-PVE1-1T-client-0=0x000000000000000000000000
> trusted.afr.HA-MED-PVE1-1T-client-1=0x000000000000000000000000
> trusted.gfid=0x207984df4e6e4ef983f285ed0c4ce8fa
>
> root at stor2:~# du -sh /exports/pve1/1T/images/125/
> 2.6G /exports/pve1/1T/images/125/
>
>
>
> 2014-08-06 12:49 GMT+03:00 Humble Chirammal <hchiramm at redhat.com
> <mailto:hchiramm at redhat.com>>:
>
>
>
>
> ----- Original Message -----
> | From: "Pranith Kumar Karampuri" <pkarampu at redhat.com
> <mailto:pkarampu at redhat.com>>
> | To: "Roman" <romeo.r at gmail.com <mailto:romeo.r at gmail.com>>
> | Cc: gluster-users at gluster.org
> <mailto:gluster-users at gluster.org>, "Niels de Vos"
> <ndevos at redhat.com <mailto:ndevos at redhat.com>>, "Humble Chirammal"
> <hchiramm at redhat.com <mailto:hchiramm at redhat.com>>
> | Sent: Wednesday, August 6, 2014 12:09:57 PM
> | Subject: Re: [Gluster-users] libgfapi failover problem on
> replica bricks
> |
> | Roman,
> | The file went into split-brain. I think we should do these
> tests
> | with 3.5.2. Where monitoring the heals is easier. Let me also
> come up
> | with a document about how to do this testing you are trying to do.
> |
> | Humble/Niels,
> | Do we have debs available for 3.5.2? In 3.5.1 there was
> packaging
> | issue where /usr/bin/glfsheal is not packaged along with the deb. I
> | think that should be fixed now as well?
> |
> Pranith,
>
> The 3.5.2 packages for debian is not available yet. We are
> co-ordinating internally to get it processed.
> I will update the list once its available.
>
> --Humble
> |
> | On 08/06/2014 11:52 AM, Roman wrote:
> | > good morning,
> | >
> | > root at stor1:~# getfattr -d -m. -e hex
> | > /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
> | > getfattr: Removing leading '/' from absolute path names
> | > # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
> | > trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
> | > trusted.afr.HA-fast-150G-PVE1-client-1=0x000001320000000000000000
> | > trusted.gfid=0x23c79523075a4158bea38078da570449
> | >
> | > getfattr: Removing leading '/' from absolute path names
> | > # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
> | > trusted.afr.HA-fast-150G-PVE1-client-0=0x000000040000000000000000
> | > trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
> | > trusted.gfid=0x23c79523075a4158bea38078da570449
> | >
> | >
> | >
> | > 2014-08-06 9:20 GMT+03:00 Pranith Kumar Karampuri
> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>
> | > <mailto:pkarampu at redhat.com <mailto:pkarampu at redhat.com>>>:
> | >
> | >
> | > On 08/06/2014 11:30 AM, Roman wrote:
> | >> Also, this time files are not the same!
> | >>
> | >> root at stor1:~# md5sum
> | >> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
> | >> 32411360c53116b96a059f17306caeda
> | >> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
> | >>
> | >> root at stor2:~# md5sum
> | >> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
> | >> 65b8a6031bcb6f5fb3a11cb1e8b1c9c9
> | >> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
> | > What is the getfattr output?
> | >
> | > Pranith
> | >
> | >>
> | >>
> | >> 2014-08-05 16:33 GMT+03:00 Roman <romeo.r at gmail.com
> <mailto:romeo.r at gmail.com>
> | >> <mailto:romeo.r at gmail.com <mailto:romeo.r at gmail.com>>>:
> | >>
> | >> Nope, it is not working. But this time it went a bit
> other way
> | >>
> | >> root at gluster-client:~# dmesg
> | >> Segmentation fault
> | >>
> | >>
> | >> I was not able even to start the VM after I done the
> tests
> | >>
> | >> Could not read qcow2 header: Operation not permitted
> | >>
> | >> And it seems, it never starts to sync files after first
> | >> disconnect. VM survives first disconnect, but not
> second (I
> | >> waited around 30 minutes). Also, I've
> | >> got network.ping-timeout: 2 in volume settings, but logs
> | >> react on first disconnect around 30 seconds. Second was
> | >> faster, 2 seconds.
> | >>
> | >> Reaction was different also:
> | >>
> | >> slower one:
> | >> [2014-08-05 13:26:19.558435] W
> [socket.c:514:__socket_rwv]
> | >> 0-glusterfs: readv failed (Connection timed out)
> | >> [2014-08-05 13:26:19.558485] W
> | >> [socket.c:1962:__socket_proto_state_machine] 0-glusterfs:
> | >> reading from socket failed. Error (Connection timed out),
> | >> peer (10.250.0.1:24007 <http://10.250.0.1:24007>
> <http://10.250.0.1:24007>)
> | >> [2014-08-05 13:26:21.281426] W
> [socket.c:514:__socket_rwv]
> | >> 0-HA-fast-150G-PVE1-client-0: readv failed
> (Connection timed out)
> | >> [2014-08-05 13:26:21.281474] W
> | >> [socket.c:1962:__socket_proto_state_machine]
> | >> 0-HA-fast-150G-PVE1-client-0: reading from socket failed.
> | >> Error (Connection timed out), peer (10.250.0.1:49153
> <http://10.250.0.1:49153>
> | >> <http://10.250.0.1:49153>)
> | >> [2014-08-05 13:26:21.281507] I
> | >> [client.c:2098:client_rpc_notify]
> | >> 0-HA-fast-150G-PVE1-client-0: disconnected
> | >>
> | >> the fast one:
> | >> 2014-08-05 12:52:44.607389] C
> | >> [client-handshake.c:127:rpc_client_ping_timer_expired]
> | >> 0-HA-fast-150G-PVE1-client-1: server 10.250.0.2:49153
> <http://10.250.0.2:49153>
> | >> <http://10.250.0.2:49153> has not responded in the last 2
> | >> seconds, disconnecting.
> | >> [2014-08-05 12:52:44.607491] W
> [socket.c:514:__socket_rwv]
> | >> 0-HA-fast-150G-PVE1-client-1: readv failed (No data
> available)
> | >> [2014-08-05 12:52:44.607585] E
> | >> [rpc-clnt.c:368:saved_frames_unwind]
> | >> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
> | >> [0x7fcb1b4b0558]
> | >>
> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
> | >> [0x7fcb1b4aea63]
> | >>
> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
> | >> [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced
> | >> unwinding frame type(GlusterFS 3.3) op(LOOKUP(27))
> called at
> | >> 2014-08-05 12:52:42.463881 (xid=0x381883x)
> | >> [2014-08-05 12:52:44.607604] W
> | >> [client-rpc-fops.c:2624:client3_3_lookup_cbk]
> | >> 0-HA-fast-150G-PVE1-client-1: remote operation failed:
> | >> Transport endpoint is not connected. Path: /
> | >> (00000000-0000-0000-0000-000000000001)
> | >> [2014-08-05 12:52:44.607736] E
> | >> [rpc-clnt.c:368:saved_frames_unwind]
> | >> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
> | >> [0x7fcb1b4b0558]
> | >>
> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
> | >> [0x7fcb1b4aea63]
> | >>
> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
> | >> [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced
> | >> unwinding frame type(GlusterFS Handshake) op(PING(3))
> called
> | >> at 2014-08-05 12:52:42.463891 (xid=0x381884x)
> | >> [2014-08-05 12:52:44.607753] W
> | >> [client-handshake.c:276:client_ping_cbk]
> | >> 0-HA-fast-150G-PVE1-client-1: timer must have expired
> | >> [2014-08-05 12:52:44.607776] I
> | >> [client.c:2098:client_rpc_notify]
> | >> 0-HA-fast-150G-PVE1-client-1: disconnected
> | >>
> | >>
> | >>
> | >> I've got SSD disks (just for an info).
> | >> Should I go and give a try for 3.5.2?
> | >>
> | >>
> | >>
> | >> 2014-08-05 13:06 GMT+03:00 Pranith Kumar Karampuri
> | >> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>
> <mailto:pkarampu at redhat.com <mailto:pkarampu at redhat.com>>>:
> | >>
> | >> reply along with gluster-users please :-). May be
> you are
> | >> hitting 'reply' instead of 'reply all'?
> | >>
> | >> Pranith
> | >>
> | >> On 08/05/2014 03:35 PM, Roman wrote:
> | >>> To make sure and clean, I've created another VM
> with raw
> | >>> format and goint to repeat those steps. So now
> I've got
> | >>> two VM-s one with qcow2 format and other with raw
> | >>> format. I will send another e-mail shortly.
> | >>>
> | >>>
> | >>> 2014-08-05 13:01 GMT+03:00 Pranith Kumar Karampuri
> | >>> <pkarampu at redhat.com
> <mailto:pkarampu at redhat.com> <mailto:pkarampu at redhat.com
> <mailto:pkarampu at redhat.com>>>:
> | >>>
> | >>>
> | >>> On 08/05/2014 03:07 PM, Roman wrote:
> | >>>> really, seems like the same file
> | >>>>
> | >>>> stor1:
> | >>>> a951641c5230472929836f9fcede6b04
> | >>>> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
> | >>>>
> | >>>> stor2:
> | >>>> a951641c5230472929836f9fcede6b04
> | >>>> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
> | >>>>
> | >>>>
> | >>>> one thing I've seen from logs, that somehow
> proxmox
> | >>>> VE is connecting with wrong version to servers?
> | >>>> [2014-08-05 09:23:45.218550] I
> | >>>> [client-handshake.c:1659:select_server_supported_programs]
> | >>>> 0-HA-fast-150G-PVE1-client-0: Using Program
> | >>>> GlusterFS 3.3, Num (1298437), Version (330)
> | >>> It is the rpc (over the network data structures)
> | >>> version, which is not changed at all from 3.3 so
> | >>> thats not a problem. So what is the
> conclusion? Is
> | >>> your test case working now or not?
> | >>>
> | >>> Pranith
> | >>>
> | >>>> but if I issue:
> | >>>> root at pve1:~# glusterfs -V
> | >>>> glusterfs 3.4.4 built on Jun 28 2014 03:44:57
> | >>>> seems ok.
> | >>>>
> | >>>> server use 3.4.4 meanwhile
> | >>>> [2014-08-05 09:23:45.117875] I
> | >>>> [server-handshake.c:567:server_setvolume]
> | >>>> 0-HA-fast-150G-PVE1-server: accepted client from
> | >>>>
> stor1-9004-2014/08/05-09:23:45:93538-HA-fast-150G-PVE1-client-1-0
> | >>>> (version: 3.4.4)
> | >>>> [2014-08-05 09:23:49.103035] I
> | >>>> [server-handshake.c:567:server_setvolume]
> | >>>> 0-HA-fast-150G-PVE1-server: accepted client from
> | >>>>
> stor1-8998-2014/08/05-09:23:45:89883-HA-fast-150G-PVE1-client-0-0
> | >>>> (version: 3.4.4)
> | >>>>
> | >>>> if this could be the reason, of course.
> | >>>> I did restart the Proxmox VE yesterday
> (just for an
> | >>>> information)
> | >>>>
> | >>>>
> | >>>>
> | >>>>
> | >>>>
> | >>>> 2014-08-05 12:30 GMT+03:00 Pranith Kumar
> Karampuri
> | >>>> <pkarampu at redhat.com
> <mailto:pkarampu at redhat.com> <mailto:pkarampu at redhat.com
> <mailto:pkarampu at redhat.com>>>:
> | >>>>
> | >>>>
> | >>>> On 08/05/2014 02:33 PM, Roman wrote:
> | >>>>> Waited long enough for now, still
> different
> | >>>>> sizes and no logs about healing :(
> | >>>>>
> | >>>>> stor1
> | >>>>> # file:
> | >>>>> exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
> | >>>>>
> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
> | >>>>>
> trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
> | >>>>> trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
> | >>>>>
> | >>>>> root at stor1:~# du -sh
> | >>>>> /exports/fast-test/150G/images/127/
> | >>>>> 1.2G /exports/fast-test/150G/images/127/
> | >>>>>
> | >>>>>
> | >>>>> stor2
> | >>>>> # file:
> | >>>>> exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
> | >>>>>
> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
> | >>>>>
> trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
> | >>>>> trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
> | >>>>>
> | >>>>>
> | >>>>> root at stor2:~# du -sh
> | >>>>> /exports/fast-test/150G/images/127/
> | >>>>> 1.4G /exports/fast-test/150G/images/127/
> | >>>> According to the changelogs, the file
> doesn't
> | >>>> need any healing. Could you stop the
> operations
> | >>>> on the VMs and take md5sum on both
> these machines?
> | >>>>
> | >>>> Pranith
> | >>>>
> | >>>>>
> | >>>>>
> | >>>>>
> | >>>>>
> | >>>>> 2014-08-05 11:49 GMT+03:00 Pranith Kumar
> | >>>>> Karampuri <pkarampu at redhat.com
> <mailto:pkarampu at redhat.com>
> | >>>>> <mailto:pkarampu at redhat.com
> <mailto:pkarampu at redhat.com>>>:
> | >>>>>
> | >>>>>
> | >>>>> On 08/05/2014 02:06 PM, Roman wrote:
> | >>>>>> Well, it seems like it doesn't
> see the
> | >>>>>> changes were made to the volume ? I
> | >>>>>> created two files 200 and 100 MB (from
> | >>>>>> /dev/zero) after I disconnected the first
> | >>>>>> brick. Then connected it back and got
> | >>>>>> these logs:
> | >>>>>>
> | >>>>>> [2014-08-05 08:30:37.830150] I
> | >>>>>> [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
> | >>>>>> 0-glusterfs: No change in volfile, continuing
> | >>>>>> [2014-08-05 08:30:37.830207] I
> | >>>>>> [rpc-clnt.c:1676:rpc_clnt_reconfig]
> | >>>>>> 0-HA-fast-150G-PVE1-client-0: changing
> | >>>>>> port to 49153 (from 0)
> | >>>>>> [2014-08-05 08:30:37.830239] W
> | >>>>>> [socket.c:514:__socket_rwv]
> | >>>>>> 0-HA-fast-150G-PVE1-client-0: readv
> | >>>>>> failed (No data available)
> | >>>>>> [2014-08-05 08:30:37.831024] I
> | >>>>>> [client-handshake.c:1659:select_server_supported_programs]
> | >>>>>> 0-HA-fast-150G-PVE1-client-0: Using
> | >>>>>> Program GlusterFS 3.3, Num (1298437),
> | >>>>>> Version (330)
> | >>>>>> [2014-08-05 08:30:37.831375] I
> | >>>>>> [client-handshake.c:1456:client_setvolume_cbk]
> | >>>>>> 0-HA-fast-150G-PVE1-client-0: Connected
> | >>>>>> to 10.250.0.1:49153
> <http://10.250.0.1:49153>
> | >>>>>> <http://10.250.0.1:49153>,
> attached to
> | >>>>>> remote volume '/exports/fast-test/150G'.
> | >>>>>> [2014-08-05 08:30:37.831394] I
> | >>>>>> [client-handshake.c:1468:client_setvolume_cbk]
> | >>>>>> 0-HA-fast-150G-PVE1-client-0: Server and
> | >>>>>> Client lk-version numbers are not same,
> | >>>>>> reopening the fds
> | >>>>>> [2014-08-05 08:30:37.831566] I
> | >>>>>> [client-handshake.c:450:client_set_lk_version_cbk]
> | >>>>>> 0-HA-fast-150G-PVE1-client-0: Server lk
> | >>>>>> version = 1
> | >>>>>>
> | >>>>>>
> | >>>>>> [2014-08-05 08:30:37.830150] I
> | >>>>>> [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
> | >>>>>> 0-glusterfs: No change in volfile, continuing
> | >>>>>> this line seems weird to me tbh.
> | >>>>>> I do not see any traffic on switch
> | >>>>>> interfaces between gluster servers, which
> | >>>>>> means, there is no syncing between them.
> | >>>>>> I tried to ls -l the files on the
> client
> | >>>>>> and servers to trigger the
> healing, but
> | >>>>>> seems like no success. Should I
> wait more?
> | >>>>> Yes, it should take around 10-15
> minutes.
> | >>>>> Could you provide 'getfattr -d -m.
> -e hex
> | >>>>> <file-on-brick>' on both the bricks.
> | >>>>>
> | >>>>> Pranith
> | >>>>>
> | >>>>>>
> | >>>>>>
> | >>>>>> 2014-08-05 11:25 GMT+03:00 Pranith Kumar
> | >>>>>> Karampuri <pkarampu at redhat.com <mailto:pkarampu at redhat.com>
> | >>>>>> <mailto:pkarampu at redhat.com <mailto:pkarampu at redhat.com>>>:
> | >>>>>>
> | >>>>>>
> | >>>>>> On 08/05/2014 01:10 PM, Roman wrote:
> | >>>>>>> Ahha! For some reason I was not able
> | >>>>>>> to start the VM anymore, Proxmox VE
> | >>>>>>> told me, that it is not able to read
> | >>>>>>> the qcow2 header due to permission
> | >>>>>>> is denied for some reason. So I just
> | >>>>>>> deleted that file and created a new
> | >>>>>>> VM. And the nex message I've got was
> | >>>>>>> this:
> | >>>>>> Seems like these are the messages
> | >>>>>> where you took down the bricks before
> | >>>>>> self-heal. Could you restart the run
> | >>>>>> waiting for self-heals to complete
> | >>>>>> before taking down the next brick?
> | >>>>>>
> | >>>>>> Pranith
> | >>>>>>
> | >>>>>>>
> | >>>>>>>
> | >>>>>>> [2014-08-05 07:31:25.663412] E
> | >>>>>>> [afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
> | >>>>>>> 0-HA-fast-150G-PVE1-replicate-0:
> | >>>>>>> Unable to self-heal contents of
> | >>>>>>> '/images/124/vm-124-disk-1.qcow2'
> | >>>>>>> (possible split-brain). Please
> | >>>>>>> delete the file from all but the
> | >>>>>>> preferred subvolume.- Pending
> | >>>>>>> matrix: [ [ 0 60 ] [ 11 0 ] ]
> | >>>>>>> [2014-08-05 07:31:25.663955] E
> | >>>>>>> [afr-self-heal-common.c:2262:afr_self_heal_completion_cbk]
> | >>>>>>> 0-HA-fast-150G-PVE1-replicate-0:
> | >>>>>>> background data self-heal failed on
> | >>>>>>> /images/124/vm-124-disk-1.qcow2
> | >>>>>>>
> | >>>>>>>
> | >>>>>>>
> | >>>>>>> 2014-08-05 10:13 GMT+03:00 Pranith
> | >>>>>>> Kumar Karampuri <pkarampu at redhat.com
> <mailto:pkarampu at redhat.com>
> | >>>>>>> <mailto:pkarampu at redhat.com <mailto:pkarampu at redhat.com>>>:
> | >>>>>>>
> | >>>>>>> I just responded to your earlier
> | >>>>>>> mail about how the log looks.
> | >>>>>>> The log comes on the mount's logfile
> | >>>>>>>
> | >>>>>>> Pranith
> | >>>>>>>
> | >>>>>>> On 08/05/2014 12:41 PM, Roman wrote:
> | >>>>>>>> Ok, so I've waited enough, I
> | >>>>>>>> think. Had no any traffic on
> | >>>>>>>> switch ports between servers.
> | >>>>>>>> Could not find any suitable log
> | >>>>>>>> message about completed
> | >>>>>>>> self-heal (waited about 30
> | >>>>>>>> minutes). Plugged out the other
> | >>>>>>>> server's UTP cable this time
> | >>>>>>>> and got in the same situation:
> | >>>>>>>> root at gluster-test1:~# cat
> | >>>>>>>> /var/log/dmesg
> | >>>>>>>> -bash: /bin/cat: Input/output error
> | >>>>>>>>
> | >>>>>>>> brick logs:
> | >>>>>>>> [2014-08-05 07:09:03.005474] I
> | >>>>>>>> [server.c:762:server_rpc_notify]
> | >>>>>>>> 0-HA-fast-150G-PVE1-server:
> | >>>>>>>> disconnecting connectionfrom
> | >>>>>>>>
> pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
> | >>>>>>>> [2014-08-05 07:09:03.005530] I
> | >>>>>>>> [server-helpers.c:729:server_connection_put]
> | >>>>>>>> 0-HA-fast-150G-PVE1-server:
> | >>>>>>>> Shutting down connection
> | >>>>>>>>
> pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
> | >>>>>>>> [2014-08-05 07:09:03.005560] I
> | >>>>>>>> [server-helpers.c:463:do_fd_cleanup]
> | >>>>>>>> 0-HA-fast-150G-PVE1-server: fd
> | >>>>>>>> cleanup on
> | >>>>>>>> /images/124/vm-124-disk-1.qcow2
> | >>>>>>>> [2014-08-05 07:09:03.005797] I
> | >>>>>>>> [server-helpers.c:617:server_connection_destroy]
> | >>>>>>>> 0-HA-fast-150G-PVE1-server:
> | >>>>>>>> destroyed connection of
> | >>>>>>>>
> pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
> | >>>>>>>>
> | >>>>>>>>
> | >>>>>>>>
> | >>>>>>>>
> | >>>>>>>>
> | >>>>>>>> 2014-08-05 9:53 GMT+03:00
> | >>>>>>>> Pranith Kumar Karampuri
> | >>>>>>>> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>
> | >>>>>>>> <mailto:pkarampu at redhat.com
> <mailto:pkarampu at redhat.com>>>:
> | >>>>>>>>
> | >>>>>>>> Do you think it is possible
> | >>>>>>>> for you to do these tests
> | >>>>>>>> on the latest version
> | >>>>>>>> 3.5.2? 'gluster volume heal
> | >>>>>>>> <volname> info' would give
> | >>>>>>>> you that information in
> | >>>>>>>> versions > 3.5.1.
> | >>>>>>>> Otherwise you will have to
> | >>>>>>>> check it from either the
> | >>>>>>>> logs, there will be
> | >>>>>>>> self-heal completed message
> | >>>>>>>> on the mount logs (or) by
> | >>>>>>>> observing 'getfattr -d -m.
> | >>>>>>>> -e hex <image-file-on-bricks>'
> | >>>>>>>>
> | >>>>>>>> Pranith
> | >>>>>>>>
> | >>>>>>>>
> | >>>>>>>> On 08/05/2014 12:09 PM,
> | >>>>>>>> Roman wrote:
> | >>>>>>>>> Ok, I understand. I will
> | >>>>>>>>> try this shortly.
> | >>>>>>>>> How can I be sure, that
> | >>>>>>>>> healing process is done,
> | >>>>>>>>> if I am not able to see
> | >>>>>>>>> its status?
> | >>>>>>>>>
> | >>>>>>>>>
> | >>>>>>>>> 2014-08-05 9:30 GMT+03:00
> | >>>>>>>>> Pranith Kumar Karampuri
> | >>>>>>>>> <pkarampu at redhat.com
> <mailto:pkarampu at redhat.com>
> | >>>>>>>>> <mailto:pkarampu at redhat.com
> <mailto:pkarampu at redhat.com>>>:
> | >>>>>>>>>
> | >>>>>>>>> Mounts will do the
> | >>>>>>>>> healing, not the
> | >>>>>>>>> self-heal-daemon. The
> | >>>>>>>>> problem I feel is that
> | >>>>>>>>> whichever process does
> | >>>>>>>>> the healing has the
> | >>>>>>>>> latest information
> | >>>>>>>>> about the good bricks
> | >>>>>>>>> in this usecase. Since
> | >>>>>>>>> for VM usecase, mounts
> | >>>>>>>>> should have the latest
> | >>>>>>>>> information, we should
> | >>>>>>>>> let the mounts do the
> | >>>>>>>>> healing. If the mount
> | >>>>>>>>> accesses the VM image
> | >>>>>>>>> either by someone
> | >>>>>>>>> doing operations
> | >>>>>>>>> inside the VM or
> | >>>>>>>>> explicit stat on the
> | >>>>>>>>> file it should do the
> | >>>>>>>>> healing.
> | >>>>>>>>>
> | >>>>>>>>> Pranith.
> | >>>>>>>>>
> | >>>>>>>>>
> | >>>>>>>>> On 08/05/2014 10:39
> | >>>>>>>>> AM, Roman wrote:
> | >>>>>>>>>> Hmmm, you told me to
> | >>>>>>>>>> turn it off. Did I
> | >>>>>>>>>> understood something
> | >>>>>>>>>> wrong? After I issued
> | >>>>>>>>>> the command you've
> | >>>>>>>>>> sent me, I was not
> | >>>>>>>>>> able to watch the
> | >>>>>>>>>> healing process, it
> | >>>>>>>>>> said, it won't be
> | >>>>>>>>>> healed, becouse its
> | >>>>>>>>>> turned off.
> | >>>>>>>>>>
> | >>>>>>>>>>
> | >>>>>>>>>> 2014-08-05 5:39
> | >>>>>>>>>> GMT+03:00 Pranith
> | >>>>>>>>>> Kumar Karampuri
> | >>>>>>>>>> <pkarampu at redhat.com
> <mailto:pkarampu at redhat.com>
> | >>>>>>>>>> <mailto:pkarampu at redhat.com
> <mailto:pkarampu at redhat.com>>>:
> | >>>>>>>>>>
> | >>>>>>>>>> You didn't
> | >>>>>>>>>> mention anything
> | >>>>>>>>>> about
> | >>>>>>>>>> self-healing. Did
> | >>>>>>>>>> you wait until
> | >>>>>>>>>> the self-heal is
> | >>>>>>>>>> complete?
> | >>>>>>>>>>
> | >>>>>>>>>> Pranith
> | >>>>>>>>>>
> | >>>>>>>>>> On 08/04/2014
> | >>>>>>>>>> 05:49 PM, Roman
> | >>>>>>>>>> wrote:
> | >>>>>>>>>>> Hi!
> | >>>>>>>>>>> Result is pretty
> | >>>>>>>>>>> same. I set the
> | >>>>>>>>>>> switch port down
> | >>>>>>>>>>> for 1st server,
> | >>>>>>>>>>> it was ok. Then
> | >>>>>>>>>>> set it up back
> | >>>>>>>>>>> and set other
> | >>>>>>>>>>> server's port
> | >>>>>>>>>>> off. and it
> | >>>>>>>>>>> triggered IO
> | >>>>>>>>>>> error on two
> | >>>>>>>>>>> virtual
> | >>>>>>>>>>> machines: one
> | >>>>>>>>>>> with local root
> | >>>>>>>>>>> FS but network
> | >>>>>>>>>>> mounted storage.
> | >>>>>>>>>>> and other with
> | >>>>>>>>>>> network root FS.
> | >>>>>>>>>>> 1st gave an
> | >>>>>>>>>>> error on copying
> | >>>>>>>>>>> to or from the
> | >>>>>>>>>>> mounted network
> | >>>>>>>>>>> disk, other just
> | >>>>>>>>>>> gave me an error
> | >>>>>>>>>>> for even reading
> | >>>>>>>>>>> log.files.
> | >>>>>>>>>>>
> | >>>>>>>>>>> cat:
> | >>>>>>>>>>> /var/log/alternatives.log:
> | >>>>>>>>>>> Input/output error
> | >>>>>>>>>>> then I reset the
> | >>>>>>>>>>> kvm VM and it
> | >>>>>>>>>>> said me, there
> | >>>>>>>>>>> is no boot
> | >>>>>>>>>>> device. Next I
> | >>>>>>>>>>> virtually
> | >>>>>>>>>>> powered it off
> | >>>>>>>>>>> and then back on
> | >>>>>>>>>>> and it has booted.
> | >>>>>>>>>>>
> | >>>>>>>>>>> By the way, did
> | >>>>>>>>>>> I have to
> | >>>>>>>>>>> start/stop volume?
> | >>>>>>>>>>>
> | >>>>>>>>>>> >> Could you do
> | >>>>>>>>>>> the following
> | >>>>>>>>>>> and test it again?
> | >>>>>>>>>>> >> gluster volume
> | >>>>>>>>>>> set <volname>
> | >>>>>>>>>>> cluster.self-heal-daemon
> | >>>>>>>>>>> off
> | >>>>>>>>>>>
> | >>>>>>>>>>> >>Pranith
> | >>>>>>>>>>>
> | >>>>>>>>>>>
> | >>>>>>>>>>>
> | >>>>>>>>>>>
> | >>>>>>>>>>> 2014-08-04 14:10
> | >>>>>>>>>>> GMT+03:00
> | >>>>>>>>>>> Pranith Kumar
> | >>>>>>>>>>> Karampuri
> | >>>>>>>>>>>
> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>
> | >>>>>>>>>>>
> <mailto:pkarampu at redhat.com <mailto:pkarampu at redhat.com>>>:
> | >>>>>>>>>>>
> | >>>>>>>>>>>
> | >>>>>>>>>>> On
> | >>>>>>>>>>> 08/04/2014
> | >>>>>>>>>>> 03:33 PM,
> | >>>>>>>>>>> Roman wrote:
> | >>>>>>>>>>>> Hello!
> | >>>>>>>>>>>>
> | >>>>>>>>>>>> Facing the
> | >>>>>>>>>>>> same
> | >>>>>>>>>>>> problem as
> | >>>>>>>>>>>> mentioned
> | >>>>>>>>>>>> here:
> | >>>>>>>>>>>>
> | >>>>>>>>>>>>
> http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html
> | >>>>>>>>>>>>
> | >>>>>>>>>>>> my set up
> | >>>>>>>>>>>> is up and
> | >>>>>>>>>>>> running, so
> | >>>>>>>>>>>> i'm ready
> | >>>>>>>>>>>> to help you
> | >>>>>>>>>>>> back with
> | >>>>>>>>>>>> feedback.
> | >>>>>>>>>>>>
> | >>>>>>>>>>>> setup:
> | >>>>>>>>>>>> proxmox
> | >>>>>>>>>>>> server as
> | >>>>>>>>>>>> client
> | >>>>>>>>>>>> 2 gluster
> | >>>>>>>>>>>> physical
> | >>>>>>>>>>>> servers
> | >>>>>>>>>>>>
> | >>>>>>>>>>>> server side
> | >>>>>>>>>>>> and client
> | >>>>>>>>>>>> side both
> | >>>>>>>>>>>> running atm
> | >>>>>>>>>>>> 3.4.4
> | >>>>>>>>>>>> glusterfs
> | >>>>>>>>>>>> from
> | >>>>>>>>>>>> gluster repo.
> | >>>>>>>>>>>>
> | >>>>>>>>>>>> the
> problem is:
> | >>>>>>>>>>>>
> | >>>>>>>>>>>> 1. craeted
> | >>>>>>>>>>>> replica
> bricks.
> | >>>>>>>>>>>> 2. mounted
> | >>>>>>>>>>>> in proxmox
> | >>>>>>>>>>>> (tried both
> | >>>>>>>>>>>> promox
> | >>>>>>>>>>>> ways: via
> | >>>>>>>>>>>> GUI and
> | >>>>>>>>>>>> fstab (with
> | >>>>>>>>>>>> backup
> | >>>>>>>>>>>> volume
> | >>>>>>>>>>>> line), btw
> | >>>>>>>>>>>> while
> | >>>>>>>>>>>> mounting
> | >>>>>>>>>>>> via fstab
> | >>>>>>>>>>>> I'm unable
> | >>>>>>>>>>>> to launch a
> | >>>>>>>>>>>> VM without
> | >>>>>>>>>>>> cache,
> | >>>>>>>>>>>> meanwhile
> | >>>>>>>>>>>>
> direct-io-mode
> | >>>>>>>>>>>> is enabled
> | >>>>>>>>>>>> in fstab
> line)
> | >>>>>>>>>>>> 3.
> installed VM
> | >>>>>>>>>>>> 4. bring
> | >>>>>>>>>>>> one volume
> | >>>>>>>>>>>> down - ok
> | >>>>>>>>>>>> 5. bringing
> | >>>>>>>>>>>> up, waiting
> | >>>>>>>>>>>> for sync is
> | >>>>>>>>>>>> done.
> | >>>>>>>>>>>> 6. bring
> | >>>>>>>>>>>> other
> | >>>>>>>>>>>> volume down
> | >>>>>>>>>>>> - getting
> | >>>>>>>>>>>> IO errors
> | >>>>>>>>>>>> on VM guest
> | >>>>>>>>>>>> and not
> | >>>>>>>>>>>> able to
> | >>>>>>>>>>>> restore the
> | >>>>>>>>>>>> VM after I
> | >>>>>>>>>>>> reset the
> | >>>>>>>>>>>> VM via
> | >>>>>>>>>>>> host. It
> | >>>>>>>>>>>> says (no
> | >>>>>>>>>>>> bootable
> | >>>>>>>>>>>> media).
> | >>>>>>>>>>>> After I
> | >>>>>>>>>>>> shut it
> | >>>>>>>>>>>> down
> | >>>>>>>>>>>> (forced)
> | >>>>>>>>>>>> and bring
> | >>>>>>>>>>>> back up, it
> | >>>>>>>>>>>> boots.
> | >>>>>>>>>>> Could you do
> | >>>>>>>>>>> the
> | >>>>>>>>>>> following
> | >>>>>>>>>>> and test it
> | >>>>>>>>>>> again?
> | >>>>>>>>>>> gluster
> | >>>>>>>>>>> volume set
> | >>>>>>>>>>> <volname>
> | >>>>>>>>>>> cluster.self-heal-daemon
> | >>>>>>>>>>> off
> | >>>>>>>>>>>
> | >>>>>>>>>>> Pranith
> | >>>>>>>>>>>>
> | >>>>>>>>>>>> Need help.
> | >>>>>>>>>>>> Tried
> | >>>>>>>>>>>> 3.4.3, 3.4.4.
> | >>>>>>>>>>>> Still
> | >>>>>>>>>>>> missing
> | >>>>>>>>>>>> pkg-s for
> | >>>>>>>>>>>> 3.4.5 for
> | >>>>>>>>>>>> debian and
> | >>>>>>>>>>>> 3.5.2
> | >>>>>>>>>>>> (3.5.1
> | >>>>>>>>>>>> always
> | >>>>>>>>>>>> gives a
> | >>>>>>>>>>>> healing
> | >>>>>>>>>>>> error for
> | >>>>>>>>>>>> some reason)
> | >>>>>>>>>>>>
> | >>>>>>>>>>>> --
> | >>>>>>>>>>>> Best regards,
> | >>>>>>>>>>>> Roman.
> | >>>>>>>>>>>>
> | >>>>>>>>>>>>
> | >>>>>>>>>>>> _______________________________________________
> | >>>>>>>>>>>> Gluster-users
> | >>>>>>>>>>>> mailing list
> | >>>>>>>>>>>> Gluster-users at gluster.org
> <mailto:Gluster-users at gluster.org>
> | >>>>>>>>>>>>
> <mailto:Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>>
> | >>>>>>>>>>>>
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> | >>>>>>>>>>>
> | >>>>>>>>>>>
> | >>>>>>>>>>>
> | >>>>>>>>>>>
> | >>>>>>>>>>> --
> | >>>>>>>>>>> Best regards,
> | >>>>>>>>>>> Roman.
> | >>>>>>>>>>
> | >>>>>>>>>>
> | >>>>>>>>>>
> | >>>>>>>>>>
> | >>>>>>>>>> --
> | >>>>>>>>>> Best regards,
> | >>>>>>>>>> Roman.
> | >>>>>>>>>
> | >>>>>>>>>
> | >>>>>>>>>
> | >>>>>>>>>
> | >>>>>>>>> --
> | >>>>>>>>> Best regards,
> | >>>>>>>>> Roman.
> | >>>>>>>>
> | >>>>>>>>
> | >>>>>>>>
> | >>>>>>>>
> | >>>>>>>> --
> | >>>>>>>> Best regards,
> | >>>>>>>> Roman.
> | >>>>>>>
> | >>>>>>>
> | >>>>>>>
> | >>>>>>>
> | >>>>>>> --
> | >>>>>>> Best regards,
> | >>>>>>> Roman.
> | >>>>>>
> | >>>>>>
> | >>>>>>
> | >>>>>>
> | >>>>>> --
> | >>>>>> Best regards,
> | >>>>>> Roman.
> | >>>>>
> | >>>>>
> | >>>>>
> | >>>>>
> | >>>>> --
> | >>>>> Best regards,
> | >>>>> Roman.
> | >>>>
> | >>>>
> | >>>>
> | >>>>
> | >>>> --
> | >>>> Best regards,
> | >>>> Roman.
> | >>>
> | >>>
> | >>>
> | >>>
> | >>> --
> | >>> Best regards,
> | >>> Roman.
> | >>
> | >>
> | >>
> | >>
> | >> --
> | >> Best regards,
> | >> Roman.
> | >>
> | >>
> | >>
> | >>
> | >> --
> | >> Best regards,
> | >> Roman.
> | >
> | >
> | >
> | >
> | > --
> | > Best regards,
> | > Roman.
> |
> |
>
>
>
>
> --
> Best regards,
> Roman.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140807/ca26adc5/attachment-0001.html>
More information about the Gluster-users
mailing list