[Gluster-users] Stale file handle

Sat Mar 14 03:28:43 UTC 2020

Thanks for the update!

On Fri, 13 Mar, 2020, 9:40 PM Pat Haley, <phaley at mit.edu> wrote:

>
> Hi All,
>
> After performing Strahil's checks and poking around some more, we found
> that the problem was with the underlying filesystem thinking it was full
> when it wasn't.  Following the information in the links below, we found
> that mounting with 64bit inodes fixed this problem.
>
>
> https://serverfault.com/questions/357367/xfs-no-space-left-on-device-but-i-have-850gb-available
>
> https://support.microfocus.com/kb/doc.php?id=7014318
>
> Thanks
>
> Pat
>
>
> On 3/12/20 4:24 PM, Strahil Nikolov wrote:
> > On March 12, 2020 8:06:14 PM GMT+02:00, Pat Haley <phaley at mit.edu>
> wrote:
> >> Hi
> >>
> >> Yesterday we seemed to clear an issue with erroneous "No space left on
> >> device" messages
> >> (
> https://lists.gluster.org/pipermail/gluster-users/2020-March/037848.html)
> >>
> >> I am now seeing "Stale file handle" messages coming from directories
> >> I've just created.
> >>
> >> We are running gluster 3.7.11 in a distributed volume across 2 servers
> >> (2 bricks each). For the "Stale file handle" for a newly created
> >> directory, I've noticed that the directory does not appear in brick1
> >> (it
> >> is in the other 3 bricks).
> >>
> >> In the cli.log on the server with brick1 I'm seeing messages like
> >>
> >> --------------------------------------------------------
> >> [2020-03-12 17:21:36.596908] I [cli.c:721:main] 0-cli: Started running
> >> gluster with version 3.7.11
> >> [2020-03-12 17:21:36.604587] I
> >> [cli-cmd-volume.c:1795:cli_check_gsync_present] 0-: geo-replication not
> >>
> >> installed
> >> [2020-03-12 17:21:36.605100] I [MSGID: 101190]
> >> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
> >>
> >> with index 1
> >> [2020-03-12 17:21:36.605155] I [socket.c:2356:socket_event_handler]
> >> 0-transport: disconnecting now
> >> [2020-03-12 17:21:36.617433] I [input.c:36:cli_batch] 0-: Exiting with:
> >> 0
> >> --------------------------------------------------------
> >>
> >> I'm not sure why I would be getting any geo-replication messages, we
> >> aren't using replication. The cli.log on the other server is showing
> >>
> >> --------------------------------------------------------
> >> [2020-03-12 17:27:08.172573] I [cli.c:721:main] 0-cli: Started running
> >> gluster with version 3.7.11
> >> [2020-03-12 17:27:08.302564] I [MSGID: 101190]
> >> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
> >>
> >> with index 1
> >> [2020-03-12 17:27:08.302716] I [socket.c:2356:socket_event_handler]
> >> 0-transport: disconnecting now
> >> [2020-03-12 17:27:08.304557] I [input.c:36:cli_batch] 0-: Exiting with:
> >> 0
> >> --------------------------------------------------------
> >>
> >>
> >> On the server with brick1, the etc-glusterfs-glusterd.vol.log is
> >> showing
> >>
> >> --------------------------------------------------------
> >> [2020-03-12 17:21:25.925394] I [MSGID: 106499]
> >> [glusterd-handler.c:4331:__glusterd_handle_status_volume] 0-management:
> >>
> >> Received status volume req for volume data-volume
> >> [2020-03-12 17:21:25.946240] W [MSGID: 106217]
> >> [glusterd-op-sm.c:4630:glusterd_op_modify_op_ctx] 0-management: Failed
> >> uuid to hostname conversion
> >> [2020-03-12 17:21:25.946282] W [MSGID: 106387]
> >> [glusterd-op-sm.c:4734:glusterd_op_modify_op_ctx] 0-management: op_ctx
> >> modification failed
> >> [2020-03-12 17:21:36.617090] I [MSGID: 106487]
> >> [glusterd-handler.c:1472:__glusterd_handle_cli_list_friends]
> >> 0-glusterd:
> >> Received cli list req
> >> [2020-03-12 17:21:15.577829] I [MSGID: 106488]
> >> [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd:
> >> Received get vol req
> >> --------------------------------------------------------
> >>
> >> On the other server I'm seeing similar messages
> >>
> >> --------------------------------------------------------
> >> [2020-03-12 17:26:57.024168] I [MSGID: 106499]
> >> [glusterd-handler.c:4331:__glusterd_handle_status_volume] 0-management:
> >>
> >> Received status volume req for volume data-volume
> >> [2020-03-12 17:26:57.037269] W [MSGID: 106217]
> >> [glusterd-op-sm.c:4630:glusterd_op_modify_op_ctx] 0-management: Failed
> >> uuid to hostname conversion
> >> [2020-03-12 17:26:57.037299] W [MSGID: 106387]
> >> [glusterd-op-sm.c:4734:glusterd_op_modify_op_ctx] 0-management: op_ctx
> >> modification failed
> >> [2020-03-12 17:26:42.025200] I [MSGID: 106488]
> >> [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd:
> >> Received get vol req
> >> [2020-03-12 17:27:08.304267] I [MSGID: 106487]
> >> [glusterd-handler.c:1472:__glusterd_handle_cli_list_friends]
> >> 0-glusterd:
> >> Received cli list req
> >> --------------------------------------------------------
> >>
> >> And I've just noticed that I'm again seeing "No space left on device"
> >> in
> >> the logs of brick1 (although there is 3.5 TB free)
> >>
> >> --------------------------------------------------------
> >> [2020-03-12 17:19:54.576597] E [MSGID: 113027]
> >> [posix.c:1427:posix_mkdir] 0-data-volume-posix: mkdir of
> >> /mnt/brick1/projects/deep_sea_mining/Tide/2020/Mar06/ccfzR75deg_001
> >> failed [No space left on device]
> >> [2020-03-12 17:19:54.576681] E [MSGID: 115056]
> >> [server-rpc-fops.c:512:server_mkdir_cbk] 0-data-volume-server: 5001698:
> >>
> >> MKDIR /projects/deep_sea_mining/Tide/2020/Mar06/ccfzR75deg_001
> >> (96e0b7e4-6b43-42ef-9896-86097b4208fe/ccfzR75deg_001) ==> (No space
> >> left
> >> on device) [No space left on device]
> >> --------------------------------------------------------
> >>
> >> Any thoughts would be greatly appreciated.  (Some additional
> >> information
> >> below)
> >>
> >> Thanks
> >>
> >> Pat
> >>
> >> --------------------------------------------------------
> >> server 1:
> >> [root at mseas-data2 ~]# df -h
> >> Filesystem      Size  Used Avail Use% Mounted on
> >> /dev/sdb        164T  161T  3.5T  98% /mnt/brick2
> >> /dev/sda        164T  159T  5.4T  97% /mnt/brick1
> >>
> >> [root at mseas-data2 ~]# df -i
> >> Filesystem         Inodes    IUsed      IFree IUse% Mounted on
> >> /dev/sdb       7031960320 31213790 7000746530    1% /mnt/brick2
> >> /dev/sda       7031960320 28707456 7003252864    1% /mnt/brick1
> >> --------------------------------------------------------
> >>
> >> --------------------------------------------------------
> >> server 2:
> >> [root at mseas-data3 ~]# df -h
> >> Filesystem            Size  Used Avail Use% Mounted on
> >> /dev/sda               91T   88T  3.9T  96% /export/sda/brick3
> >> /dev/mapper/vg_Data4-lv_Data4
> >>                         91T   89T  2.6T  98% /export/sdc/brick4
> >>
> >> [root at mseas-data3 glusterfs]# df -i
> >> Filesystem               Inodes    IUsed      IFree IUse% Mounted on
> >> /dev/sda             1953182464 10039172 1943143292    1%
> >> /export/sda/brick3
> >> /dev/mapper/vg_Data4-lv_Data4
> >>                       3906272768 11917222 3894355546    1%
> >> /export/sdc/brick4
> >> --------------------------------------------------------
> >>
> >> --------------------------------------------------------
> >> [root at mseas-data2 ~]# gluster volume info
> >> --------------------------------------------------------
> >> Volume Name: data-volume
> >> Type: Distribute
> >> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
> >> Status: Started
> >> Number of Bricks: 4
> >> Transport-type: tcp
> >> Bricks:
> >> Brick1: mseas-data2:/mnt/brick1
> >> Brick2: mseas-data2:/mnt/brick2
> >> Brick3: mseas-data3:/export/sda/brick3
> >> Brick4: mseas-data3:/export/sdc/brick4
> >> Options Reconfigured:
> >> cluster.min-free-disk: 1%
> >> nfs.export-volumes: off
> >> nfs.disable: on
> >> performance.readdir-ahead: on
> >> diagnostics.brick-sys-log-level: WARNING
> >> nfs.exports-auth-enable: on
> >> server.allow-insecure: on
> >> auth.allow: *
> >> disperse.eager-lock: off
> >> performance.open-behind: off
> >> performance.md-cache-timeout: 60
> >> network.inode-lru-limit: 50000
> >> diagnostics.client-log-level: ERROR
> >>
> >> --------------------------------------------------------
> >> [root at mseas-data2 ~]# gluster volume status data-volume detail
> >> --------------------------------------------------------
> >> Status of volume: data-volume
> >>
> ------------------------------------------------------------------------------
> >> Brick                : Brick mseas-data2:/mnt/brick1
> >> TCP Port             : 49154
> >> RDMA Port            : 0
> >> Online               : Y
> >> Pid                  : 4601
> >> File System          : xfs
> >> Device               : /dev/sda
> >> Mount Options        : rw
> >> Inode Size           : 256
> >> Disk Space Free      : 5.4TB
> >> Total Disk Space     : 163.7TB
> >> Inode Count          : 7031960320
> >> Free Inodes          : 7003252864
> >>
> ------------------------------------------------------------------------------
> >> Brick                : Brick mseas-data2:/mnt/brick2
> >> TCP Port             : 49155
> >> RDMA Port            : 0
> >> Online               : Y
> >> Pid                  : 7949
> >> File System          : xfs
> >> Device               : /dev/sdb
> >> Mount Options        : rw
> >> Inode Size           : 256
> >> Disk Space Free      : 3.4TB
> >> Total Disk Space     : 163.7TB
> >> Inode Count          : 7031960320
> >> Free Inodes          : 7000746530
> >>
> ------------------------------------------------------------------------------
> >> Brick                : Brick mseas-data3:/export/sda/brick3
> >> TCP Port             : 49153
> >> RDMA Port            : 0
> >> Online               : Y
> >> Pid                  : 4650
> >> File System          : xfs
> >> Device               : /dev/sda
> >> Mount Options        : rw
> >> Inode Size           : 512
> >> Disk Space Free      : 3.9TB
> >> Total Disk Space     : 91.0TB
> >> Inode Count          : 1953182464
> >> Free Inodes          : 1943143292
> >>
> ------------------------------------------------------------------------------
> >> Brick                : Brick mseas-data3:/export/sdc/brick4
> >> TCP Port             : 49154
> >> RDMA Port            : 0
> >> Online               : Y
> >> Pid                  : 23772
> >> File System          : xfs
> >> Device               : /dev/mapper/vg_Data4-lv_Data4
> >> Mount Options        : rw
> >> Inode Size           : 256
> >> Disk Space Free      : 2.6TB
> >> Total Disk Space     : 90.9TB
> >> Inode Count          : 3906272768
> >> Free Inodes          : 3894355546
> >>
> >> --
> >>
> >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >> Pat Haley                          Email:  phaley at mit.edu
> >> Center for Ocean Engineering       Phone:  (617) 253-6824
> >> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
> >> MIT, Room 5-213                    http://web.mit.edu/phaley/www/
> >> 77 Massachusetts Avenue
> >> Cambridge, MA  02139-4301
> >>
> >> ________
> >>
> >>
> >>
> >> Community Meeting Calendar:
> >>
> >> Schedule -
> >> Every Tuesday at 14:30 IST / 09:00 UTC
> >> Bridge: https://bluejeans.com/441850968
> >>
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> https://lists.gluster.org/mailman/listinfo/gluster-users
> > Hey Pat,
> >
> > The logs are not  providing  much information  ,  but the following
> seems strange:
> > 'Failed uuid to hostname conversion'
> >
> > Have you checked  dns resolution (both short name and fqdn)?
> > Also,  check the systems' ntp/chrony is in sync  and the  'gluster peer
> status'  on all nodes.
> >
> > Is it possible that the  client  is not reaching all  bricks  ?
> >
> >
> > P.S.:  Consider  increasing the log level,  as  current level is not
> sufficient.
> >
> > Best Regards,
> > Strahil Nikolov
>
> --
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Pat Haley                          Email:  phaley at mit.edu
> Center for Ocean Engineering       Phone:  (617) 253-6824
> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
> MIT, Room 5-213                    http://web.mit.edu/phaley/www/
> 77 Massachusetts Avenue
> Cambridge, MA  02139-4301
>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200314/2a70277d/attachment.html>