[Gluster-users] Migrating a VM makes its gluster storage inaccessible
Paul Boven
boven at jive.nl
Wed Jan 22 14:38:58 UTC 2014
Hi Josh, everyone,
I've just tried the server.allow-insecure option, and it makes no
difference.
You can find a summary and the logfiles at this URL:
http://epboven.home.xs4all.nl/gluster-migrate.html
The migration itself happens at 14:00:00, with the first write access
attempt by the migrated guest at 14:00:25 which results in the
'permission denied' errors in the gluster.log. Some highlights from
gluster.log:
[2014-01-22 14:00:00.779741] D
[afr-common.c:131:afr_lookup_xattr_req_prepare] 0-gv0-replicate-0:
/kvmtest.raw: failed to get the gfid from dict
[2014-01-22 14:00:00.780458] D
[afr-common.c:1380:afr_lookup_select_read_child] 0-gv0-replicate-0:
Source selected as 1 for /kvmtest.raw
[2014-01-22 14:00:25.176181] W
[client-rpc-fops.c:471:client3_3_open_cbk] 0-gv0-client-1: remote
operation failed: Permission denied. Path: /kvmtest.raw
(f7ed9edd-c6bd-4e86-b448-1d98bb38314b)
[2014-01-22 14:00:25.176322] W [fuse-bridge.c:2167:fuse_writev_cbk]
0-glusterfs-fuse: 2494829: WRITE => -1 (Permission denied)
Regards, Paul Boven.
On 01/21/2014 05:35 PM, Josh Boon wrote:
> Hey Paul,
>
>
> Have you tried server.allow-insecure: on as a volume option? If that doesn't work we'll need the logs for both bricks.
>
> Best,
> Josh
>
> ----- Original Message -----
> From: "Paul Boven" <boven at jive.nl>
> To: gluster-users at gluster.org
> Sent: Tuesday, January 21, 2014 11:12:03 AM
> Subject: Re: [Gluster-users] Migrating a VM makes its gluster storage inaccessible
>
> Hi Josh, everyone,
>
> Glad you're trying to help, so no need to apologize at all.
>
> mount output:
> /dev/sdb1 on /export/brick0 type xfs (rw)
>
> localhost:/gv0 on /gluster type fuse.glusterfs
> (rw,default_permissions,allow_other,max_read=131072)
>
> gluster volume info all:
> Volume Name: gv0
> Type: Replicate
> Volume ID: ee77a036-50c7-4a41-b10d-cc0703769df9
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: 10.88.4.0:/export/brick0/sdb1
> Brick2: 10.88.4.1:/export/brick0/sdb1
> Options Reconfigured:
> diagnostics.client-log-level: INFO
> diagnostics.brick-log-level: INFO
>
> Regards, Paul Boven.
>
>
>
>
> On 01/21/2014 05:02 PM, Josh Boon wrote:
>> Hey Paul,
>>
>> Definitely looks to be gluster. Sorry about the wrong guess on UID/GID. What's the output of "mount" and "gluster volume info all"?
>>
>> Best,
>> Josh
>>
>>
>> ----- Original Message -----
>> From: "Paul Boven" <boven at jive.nl>
>> To: gluster-users at gluster.org
>> Sent: Tuesday, January 21, 2014 10:56:34 AM
>> Subject: Re: [Gluster-users] Migrating a VM makes its gluster storage inaccessible
>>
>> Hi Josh,
>>
>> I've taken great care that /etc/passwd and /etc/group are the same on
>> both machines. When the problem occurs, even root gets 'permission
>> denied' when trying to read /gluster/guest.raw. So my first reaction was
>> that it cannot be a uid problem.
>>
>> In the normal situation, the storage for a running guest is owned by
>> libvirt-qemu:kvm. When I shut a guest down (virsh destroy), the
>> ownership changes to root:root on both cluster servers.
>>
>> During a migration (that fails), the ownership also ends up as root:root
>> on both, which I hadn't noticed before. Filemode is 0644.
>>
>> On the originating server, root can still read /gluster/guest.raw,
>> whereas on the destination, this gives me 'permission denied'.
>>
>> The qemu logfile for the guest doesn't show much interesting
>> information, merely 'shutting down' on the originating server, and the
>> startup on de destination server. Libvirt/qemu does not seem to be aware
>> of the situation that the guest ends up in. I'll post the gluster logs
>> somewhere, too.
>>
>> From the destination server:
>>
>> LC_ALL=C
>> PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin
>> /usr/bin/kvm -name kvmtest -S -M pc-i440fx-1.4 -m 1024 -smp
>> 1,sockets=1,cores=1,threads=1 -uuid 97db2d3f-c8e4-31de-9f89-848356b20da5
>> -nographic -no-user-config -nodefaults -chardev
>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/kvmtest.monitor,server,nowait
>> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
>> -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
>> file=/gluster/kvmtest.raw,if=none,id=drive-virtio-disk0,format=raw,cache=none
>> -device
>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
>> -netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=29 -device
>> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:01:01:11,bus=pci.0,addr=0x3
>> -chardev pty,id=charserial0 -device
>> isa-serial,chardev=charserial0,id=serial0 -incoming tcp:0.0.0.0:49166
>> -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
>> W: kvm binary is deprecated, please use qemu-system-x86_64 instead
>> char device redirected to /dev/pts/4 (label charserial0)
>>
>> Regards, Paul Boven.
>>
>>
>>
>>
>>
>>
>> On 01/21/2014 04:22 PM, Josh Boon wrote:
>>>
>>> Paul,
>>>
>>> Sounds like a potential uid/gid problem. Would you be able to update with the logs from cd /var/log/libvirt/qemu/ for the guest from both source and destination? Also the gluster logs for the volume would be awesome.
>>>
>>>
>>> Best,
>>> Josh
>>>
>>> ----- Original Message -----
>>> From: "Paul Boven" <boven at jive.nl>
>>> To: gluster-users at gluster.org
>>> Sent: Tuesday, January 21, 2014 9:36:06 AM
>>> Subject: Re: [Gluster-users] Migrating a VM makes its gluster storage inaccessible
>>>
>>> Hi James,
>>>
>>> Thanks for the quick reply.
>>>
>>> We are only using the fuse mounted paths at the moment. So libvirt/qemu
>>> simply know of these files as /gluster/guest.raw, and the guests are not
>>> aware of libgluster.
>>>
>>> Some version numbers:
>>>
>>> Kernel: Ubuntu 3.8.0-35-generic (13.10, Raring)
>>> Glusterfs: 3.4.1-ubuntu1~raring1
>>> qemu: 1.4.0+dfsg-1expubuntu4
>>> libvirt0: 1.0.2-0ubuntu11.13.04.4
>>> The gluster bricks are on xfs.
>>>
>>> Regards, Paul Boven.
>>>
>>>
>>> On 01/21/2014 03:25 PM, James wrote:
>>>> Are you using the qemu gluster:// storage or are you using a fuse
>>>> mounted file path?
>>>>
>>>> I would actually expect it to work with either, however I haven't had
>>>> a chance to test this yet.
>>>>
>>>> It's probably also useful if you post your qemu versions...
>>>>
>>>> James
>>>>
>>>> On Tue, Jan 21, 2014 at 9:15 AM, Paul Boven <boven at jive.nl> wrote:
>>>>> Hi everyone
>>>>>
>>>>> We've been running glusterfs-3.4.0 on Ubuntu 13.04, using semiosis'
>>>>> packages. We're using kvm (libvrt) to host guest installs, and thanks to
>>>>> gluster and libvirt, we can live-migrate guests between the two hosts.
>>>>>
>>>>> Recently I ran an apt-get update/upgrade to stay up-to-date with security
>>>>> patches, and this also upgraded our glusterfs to the 3.4.1 version of the
>>>>> packages.
>>>>>
>>>>> Since this upgrade (which updated the gluster packages, but also the Ubuntu
>>>>> kernel package), kvm live migration fails in a most unusual manner. The live
>>>>> migration itself succeeds, but on the receiving machine, the vm-storage for
>>>>> that machine becomes inaccessible. Which in turn causes the guest OS to no
>>>>> longer be able to read or write its filesystem, with of course fairly
>>>>> disastrous consequences for such a guest.
>>>>>
>>>>> So before a migration, everything is running smoothly. The two cluster nodes
>>>>> are 'cl0' and 'cl1', and we do the migration like this:
>>>>>
>>>>> virsh migrate --live --persistent --undefinesource <guest>
>>>>> qemu+tls://cl1/system
>>>>>
>>>>> The migration itself works, but soon as you do the migration, the
>>>>> /gluster/guest.raw file (which holds the filesystem for the guest) becomes
>>>>> completely inaccessible: trying to read it (e.g. with dd or md5sum) results
>>>>> in a 'permission denied' on the destination cluster node, whereas the file
>>>>> is still perfectly fine on the machine that the migration originated from.
>>>>>
>>>>> As soon as I stop the guest (virsh destroy), the /gluster/guest.raw file
>>>>> becomes readable again and I can start up the guest on either server without
>>>>> further issues. It does not affect any of the other files in /gluster/.
>>>>>
>>>>> The problem seems to be in the gluster or fuse part, because once this error
>>>>> condition is triggered, the /gluster/guest.raw cannot be read by any
>>>>> application on the destination server. This situation is 100% reproducible,
>>>>> every attempted live migration fails in this way.
>>>>>
>>>>> Has anyone else experienced this? Is this a known or new bug?
>>>>>
>>>>> We've done some troubleshooting already in the irc channel (thanks to
>>>>> everyone for their help) but haven't found the smoking gun yet. I would
>>>>> appreciate any help in debugging and resolving this.
>>>>>
>>>>> Regards, Paul Boven.
>>>>> --
>>>>> Paul Boven <boven at jive.nl> +31 (0)521-596547
>>>>> Unix/Linux/Networking specialist
>>>>> Joint Institute for VLBI in Europe - www.jive.nl
>>>>> VLBI - It's a fringe science
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>
>>
>
>
--
Paul Boven <boven at jive.nl> +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
More information about the Gluster-users
mailing list