[Gluster-users] gluster warning remote operation failed during recovery from backups

Steve Dainard sdainard at spd1.com
Fri Jun 17 22:16:00 UTC 2016


Yes, same message on gluster03's brick log:
[2016-06-16 10:07:55.619621] E [MSGID: 115059]
[server-rpc-fops.c:811:server_getxattr_cbk] 0-storage-server: 23783173:
GETXATTR
/data/climate/ANUSPLIN/ANUSPLIN300/monthly/pcp_grids/1918/pcp300_08.asc
(0e98a94b-7b86-4a72-88a9-a99a787e059d) ((null)) ==> (Numerical result out
of range) [Numerical result out of range]

nothing indicated in etc-glusterfs-glusterd.vol.log

Also it seems hard to believe TSM would be sending back a larger value than
was sent to it from the initial backup done on gluster storage. ie: File on
gluster fuse mount -> TSM backup ... TSM restore -> File restore to gluster
fuse mount.

I can't actually get the xattrs from the file, because the file doesn't
exist after TSM errors out. My guess is that TSM restores the file, then
tries to verify the xattrs and on failure removes the file. BUT I suppose
if there was some corruption on TSM side, it might be trying to send
garbage too large to store in the xattr (if I'm understanding the issue).

If I restore the single file above after the failure I don't get any
errors, which is why I started to suspect gluster as the culprit.

I'm capturing the strace output now, hopefully something useful is shown.

Thanks

On Thu, Jun 16, 2016 at 7:31 PM, Vijay Bellur <vbellur at redhat.com> wrote:

> On Thu, Jun 16, 2016 at 3:05 PM, Steve Dainard <sdainard at spd1.com> wrote:
> > I'm restoring some data to gluster from TSM backups and the client errors
> > out trying to retrieve xattrs at some point during the restore, killing
> > progress:
> > ...
> > Restoring       8,118,878
> >
> /storage/data/climate/ANUSPLIN/ANUSPLIN300/monthly/pcp_grids/1918/pcp300_04.asc
> > [Done]
> > ANS1587W Unable to read extended attributes for object
> >
> /storage/data/climate/ANUSPLIN/ANUSPLIN300/monthly/pcp_grids/1918/pcp300_08.asc
> > due to errno: 34, reason: Numerical result out of range
> >  ** Unsuccessful **
> > ...
> >
> > In the gluster fuse logs for the volume I see this:
> > [2016-06-16 10:07:55.622020] W [MSGID: 114031]
> > [client-rpc-fops.c:1161:client3_3_getxattr_cbk] 0-storage-client-2:
> remote
> > operation failed. Path:
> > /data/climate/ANUSPLIN/ANUSPLIN300/monthly/pcp_grids/1918/pcp300_08.asc
> > (0e98a94b-7b86-4a72-88a9-a99a787e059d). Key: (null) [Numerical result
> out of
> > range]
> > [2016-06-16 10:07:55.622110] W [fuse-bridge.c:3353:fuse_xattr_cbk]
> > 0-glusterfs-fuse: 76197165: GETXATTR((null))
> > /data/climate/ANUSPLIN/ANUSPLIN300/monthly/pcp_grids/1918/pcp300_08.asc
> =>
> > -1 (Numerical result out of range)
> >
> > I'm trying to understand if gluster is bubbling up errors to the TSM
> client
> > (gluster fault), or reporting errors the TSM client is generating (TSM
> > fault).
> >
>
> Do you happen to see the same error reported by posix translator(s) in
> any of the brick(s)? Doing that might help in figuring out where the
> problem could be stemming from.
>
> As per man (2) getxattr, ERANGE is seen when the size of the value
> buffer is too small to hold the result. Would it be possible to strace
> the TSM client and see the size of the value buffer being passed?
> Also, doing an extended attribute dump of the file on the brick
> directory (either through attr or getfattr) can help in determining
> the size necessary to hold all attributes.
>
> HTH,
> Vijay
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160617/8bfed366/attachment.html>


More information about the Gluster-users mailing list