[Gluster-users] Fwd: vm paused unknown storage error one node out of 3 only

Krutika Dhananjay kdhananj at redhat.com
Sat Aug 13 05:26:20 UTC 2016


1. Could you share the output of `gluster volume heal <VOL> info`?
2. `gluster volume info`
3. fuse mount logs of the affected volume(s)?
4. glustershd logs
5. Brick logs

-Krutika


On Sat, Aug 13, 2016 at 3:10 AM, David Gossage <dgossage at carouselchecks.com>
wrote:

> On Fri, Aug 12, 2016 at 4:25 PM, Dan Lavu <dan at redhat.com> wrote:
>
>> David,
>>
>> I'm seeing similar behavior in my lab, but it has been caused by healing
>> files in the gluster cluster, though I attribute my problems to problems
>> with the storage fabric. See if 'gluster volume heal $VOL info' indicates
>> files that are being healed, and if those reduce in number, can the VM
>> start?
>>
>>
> I haven't had any files in a state of being healed according to either of
> the 3 storage nodes.
>
> I shut down one VM that has been around awhile a moment ago then told it
> to start on the one ovirt server that complained previously.  It ran fine,
> and I was able to migrate it off and on the host no issues.
>
> I told one of the new VM's to migrate to the one node and within seconds
> it paused from unknown storage errors no shards showing heals nothing with
> an error on storage node.  Same stale file handle issues.
>
> I'll probably put this node in maintenance later and reboot it.  Other
> than that I may re-clone those 2 reccent VM's.  maybe images just got
> corrupted though why it would only fail on one node of 3 if image was bad
> not sure.
>
>
> Dan
>>
>> On Thu, Aug 11, 2016 at 7:52 AM, David Gossage <
>> dgossage at carouselchecks.com> wrote:
>>
>>> Figure I would repost here as well.  one client out of 3 complaining of
>>> stale file handles on a few new VM's I migrated over. No errors on storage
>>> nodes just client.  Maybe just put that one in maintenance and restart
>>> gluster mount?
>>>
>>> *David Gossage*
>>> *Carousel Checks Inc. | System Administrator*
>>> *Office* 708.613.2284
>>>
>>> ---------- Forwarded message ----------
>>> From: David Gossage <dgossage at carouselchecks.com>
>>> Date: Thu, Aug 11, 2016 at 12:17 AM
>>> Subject: vm paused unknown storage error one node out of 3 only
>>> To: users <users at ovirt.org>
>>>
>>>
>>> Out of a 3 node cluster running oVirt 3.6.6.2-1.el7.centos with a 3
>>> replicate gluster 3.7.14 starting a VM i just copied in on one node of the
>>> 3 gets the following errors.  The other 2 the vm starts fine.  All ovirt
>>> and gluster are centos 7 based. VM on start of the one node it tries to
>>> default to on its own accord immediately puts into paused for unknown
>>> reason.  Telling it to start on different node starts ok.  node with issue
>>> already has 5 VMs running fine on it same gluster storage plus the hosted
>>> engine on different volume.
>>>
>>> gluster nodes logs did not have any errors for volume
>>> nodes own gluster logs had this in log
>>>
>>> dfb8777a-7e8c-40ff-8faa-252beabba5f8 couldnt find in .glusterfs .shard
>>> or images/
>>>
>>> 7919f4a0-125c-4b11-b5c9-fb50cc195c43 is the gfid of the bootable drive
>>> of the vm
>>>
>>> [2016-08-11 04:31:39.982952] W [MSGID: 114031]
>>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-2:
>>> remote operation failed [No such file or directory]
>>> [2016-08-11 04:31:39.983683] W [MSGID: 114031]
>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2:
>>> remote operation failed [No such file or directory]
>>> [2016-08-11 04:31:39.984182] W [MSGID: 114031]
>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0:
>>> remote operation failed [No such file or directory]
>>> [2016-08-11 04:31:39.984221] W [MSGID: 114031]
>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1:
>>> remote operation failed [No such file or directory]
>>> [2016-08-11 04:31:39.985941] W [MSGID: 108008]
>>> [afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
>>> subvolume -1 found with event generation 3 for gfid
>>> dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
>>> [2016-08-11 04:31:39.986633] W [MSGID: 114031]
>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2:
>>> remote operation failed [No such file or directory]
>>> [2016-08-11 04:31:39.987644] E [MSGID: 109040]
>>> [dht-helper.c:1190:dht_migration_complete_check_task] 0-GLUSTER1-dht:
>>> (null): failed to lookup the file on GLUSTER1-dht [Stale file handle]
>>> [2016-08-11 04:31:39.987751] W [fuse-bridge.c:2227:fuse_readv_cbk]
>>> 0-glusterfs-fuse: 15152930: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
>>> fd=0x7f00a80bdb64 (Stale file handle)
>>> [2016-08-11 04:31:39.986567] W [MSGID: 114031]
>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0:
>>> remote operation failed [No such file or directory]
>>> [2016-08-11 04:31:39.986567] W [MSGID: 114031]
>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1:
>>> remote operation failed [No such file or directory]
>>> [2016-08-11 04:35:21.210145] W [MSGID: 108008]
>>> [afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
>>> subvolume -1 found with event generation 3 for gfid
>>> dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
>>> [2016-08-11 04:35:21.210873] W [MSGID: 114031]
>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1:
>>> remote operation failed [No such file or directory]
>>> [2016-08-11 04:35:21.210888] W [MSGID: 114031]
>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2:
>>> remote operation failed [No such file or directory]
>>> [2016-08-11 04:35:21.210947] W [MSGID: 114031]
>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0:
>>> remote operation failed [No such file or directory]
>>> [2016-08-11 04:35:21.213270] E [MSGID: 109040]
>>> [dht-helper.c:1190:dht_migration_complete_check_task] 0-GLUSTER1-dht:
>>> (null): failed to lookup the file on GLUSTER1-dht [Stale file handle]
>>> [2016-08-11 04:35:21.213345] W [fuse-bridge.c:2227:fuse_readv_cbk]
>>> 0-glusterfs-fuse: 15156910: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43
>>> fd=0x7f00a80bf6d0 (Stale file handle)
>>> [2016-08-11 04:35:21.211516] W [MSGID: 108008]
>>> [afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable
>>> subvolume -1 found with event generation 3 for gfid
>>> dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
>>> [2016-08-11 04:35:21.212013] W [MSGID: 114031]
>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0:
>>> remote operation failed [No such file or directory]
>>> [2016-08-11 04:35:21.212081] W [MSGID: 114031]
>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1:
>>> remote operation failed [No such file or directory]
>>> [2016-08-11 04:35:21.212121] W [MSGID: 114031]
>>> [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2:
>>> remote operation failed [No such file or directory]
>>>
>>> I attached vdsm.log starting from when I spun up vm on offending node
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160813/d7e0d642/attachment.html>


More information about the Gluster-users mailing list