[Gluster-users] Stale file handle

Fri Mar 13 16:10:15 UTC 2020

Hi All,

After performing Strahil's checks and poking around some more, we found 
that the problem was with the underlying filesystem thinking it was full 
when it wasn't.  Following the information in the links below, we found 
that mounting with 64bit inodes fixed this problem.

https://serverfault.com/questions/357367/xfs-no-space-left-on-device-but-i-have-850gb-available

https://support.microfocus.com/kb/doc.php?id=7014318

Thanks

Pat

On 3/12/20 4:24 PM, Strahil Nikolov wrote:
> On March 12, 2020 8:06:14 PM GMT+02:00, Pat Haley <phaley at mit.edu> wrote:
>> Hi
>>
>> Yesterday we seemed to clear an issue with erroneous "No space left on
>> device" messages
>> (https://lists.gluster.org/pipermail/gluster-users/2020-March/037848.html)
>>
>> I am now seeing "Stale file handle" messages coming from directories
>> I've just created.
>>
>> We are running gluster 3.7.11 in a distributed volume across 2 servers
>> (2 bricks each). For the "Stale file handle" for a newly created
>> directory, I've noticed that the directory does not appear in brick1
>> (it
>> is in the other 3 bricks).
>>
>> In the cli.log on the server with brick1 I'm seeing messages like
>>
>> --------------------------------------------------------
>> [2020-03-12 17:21:36.596908] I [cli.c:721:main] 0-cli: Started running
>> gluster with version 3.7.11
>> [2020-03-12 17:21:36.604587] I
>> [cli-cmd-volume.c:1795:cli_check_gsync_present] 0-: geo-replication not
>>
>> installed
>> [2020-03-12 17:21:36.605100] I [MSGID: 101190]
>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
>>
>> with index 1
>> [2020-03-12 17:21:36.605155] I [socket.c:2356:socket_event_handler]
>> 0-transport: disconnecting now
>> [2020-03-12 17:21:36.617433] I [input.c:36:cli_batch] 0-: Exiting with:
>> 0
>> --------------------------------------------------------
>>
>> I'm not sure why I would be getting any geo-replication messages, we
>> aren't using replication. The cli.log on the other server is showing
>>
>> --------------------------------------------------------
>> [2020-03-12 17:27:08.172573] I [cli.c:721:main] 0-cli: Started running
>> gluster with version 3.7.11
>> [2020-03-12 17:27:08.302564] I [MSGID: 101190]
>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
>>
>> with index 1
>> [2020-03-12 17:27:08.302716] I [socket.c:2356:socket_event_handler]
>> 0-transport: disconnecting now
>> [2020-03-12 17:27:08.304557] I [input.c:36:cli_batch] 0-: Exiting with:
>> 0
>> --------------------------------------------------------
>>
>>
>> On the server with brick1, the etc-glusterfs-glusterd.vol.log is
>> showing
>>
>> --------------------------------------------------------
>> [2020-03-12 17:21:25.925394] I [MSGID: 106499]
>> [glusterd-handler.c:4331:__glusterd_handle_status_volume] 0-management:
>>
>> Received status volume req for volume data-volume
>> [2020-03-12 17:21:25.946240] W [MSGID: 106217]
>> [glusterd-op-sm.c:4630:glusterd_op_modify_op_ctx] 0-management: Failed
>> uuid to hostname conversion
>> [2020-03-12 17:21:25.946282] W [MSGID: 106387]
>> [glusterd-op-sm.c:4734:glusterd_op_modify_op_ctx] 0-management: op_ctx
>> modification failed
>> [2020-03-12 17:21:36.617090] I [MSGID: 106487]
>> [glusterd-handler.c:1472:__glusterd_handle_cli_list_friends]
>> 0-glusterd:
>> Received cli list req
>> [2020-03-12 17:21:15.577829] I [MSGID: 106488]
>> [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd:
>> Received get vol req
>> --------------------------------------------------------
>>
>> On the other server I'm seeing similar messages
>>
>> --------------------------------------------------------
>> [2020-03-12 17:26:57.024168] I [MSGID: 106499]
>> [glusterd-handler.c:4331:__glusterd_handle_status_volume] 0-management:
>>
>> Received status volume req for volume data-volume
>> [2020-03-12 17:26:57.037269] W [MSGID: 106217]
>> [glusterd-op-sm.c:4630:glusterd_op_modify_op_ctx] 0-management: Failed
>> uuid to hostname conversion
>> [2020-03-12 17:26:57.037299] W [MSGID: 106387]
>> [glusterd-op-sm.c:4734:glusterd_op_modify_op_ctx] 0-management: op_ctx
>> modification failed
>> [2020-03-12 17:26:42.025200] I [MSGID: 106488]
>> [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd:
>> Received get vol req
>> [2020-03-12 17:27:08.304267] I [MSGID: 106487]
>> [glusterd-handler.c:1472:__glusterd_handle_cli_list_friends]
>> 0-glusterd:
>> Received cli list req
>> --------------------------------------------------------
>>
>> And I've just noticed that I'm again seeing "No space left on device"
>> in
>> the logs of brick1 (although there is 3.5 TB free)
>>
>> --------------------------------------------------------
>> [2020-03-12 17:19:54.576597] E [MSGID: 113027]
>> [posix.c:1427:posix_mkdir] 0-data-volume-posix: mkdir of
>> /mnt/brick1/projects/deep_sea_mining/Tide/2020/Mar06/ccfzR75deg_001
>> failed [No space left on device]
>> [2020-03-12 17:19:54.576681] E [MSGID: 115056]
>> [server-rpc-fops.c:512:server_mkdir_cbk] 0-data-volume-server: 5001698:
>>
>> MKDIR /projects/deep_sea_mining/Tide/2020/Mar06/ccfzR75deg_001
>> (96e0b7e4-6b43-42ef-9896-86097b4208fe/ccfzR75deg_001) ==> (No space
>> left
>> on device) [No space left on device]
>> --------------------------------------------------------
>>
>> Any thoughts would be greatly appreciated.  (Some additional
>> information
>> below)
>>
>> Thanks
>>
>> Pat
>>
>> --------------------------------------------------------
>> server 1:
>> [root at mseas-data2 ~]# df -h
>> Filesystem      Size  Used Avail Use% Mounted on
>> /dev/sdb        164T  161T  3.5T  98% /mnt/brick2
>> /dev/sda        164T  159T  5.4T  97% /mnt/brick1
>>
>> [root at mseas-data2 ~]# df -i
>> Filesystem         Inodes    IUsed      IFree IUse% Mounted on
>> /dev/sdb       7031960320 31213790 7000746530    1% /mnt/brick2
>> /dev/sda       7031960320 28707456 7003252864    1% /mnt/brick1
>> --------------------------------------------------------
>>
>> --------------------------------------------------------
>> server 2:
>> [root at mseas-data3 ~]# df -h
>> Filesystem            Size  Used Avail Use% Mounted on
>> /dev/sda               91T   88T  3.9T  96% /export/sda/brick3
>> /dev/mapper/vg_Data4-lv_Data4
>>                         91T   89T  2.6T  98% /export/sdc/brick4
>>
>> [root at mseas-data3 glusterfs]# df -i
>> Filesystem               Inodes    IUsed      IFree IUse% Mounted on
>> /dev/sda             1953182464 10039172 1943143292    1%
>> /export/sda/brick3
>> /dev/mapper/vg_Data4-lv_Data4
>>                       3906272768 11917222 3894355546    1%
>> /export/sdc/brick4
>> --------------------------------------------------------
>>
>> --------------------------------------------------------
>> [root at mseas-data2 ~]# gluster volume info
>> --------------------------------------------------------
>> Volume Name: data-volume
>> Type: Distribute
>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>> Status: Started
>> Number of Bricks: 4
>> Transport-type: tcp
>> Bricks:
>> Brick1: mseas-data2:/mnt/brick1
>> Brick2: mseas-data2:/mnt/brick2
>> Brick3: mseas-data3:/export/sda/brick3
>> Brick4: mseas-data3:/export/sdc/brick4
>> Options Reconfigured:
>> cluster.min-free-disk: 1%
>> nfs.export-volumes: off
>> nfs.disable: on
>> performance.readdir-ahead: on
>> diagnostics.brick-sys-log-level: WARNING
>> nfs.exports-auth-enable: on
>> server.allow-insecure: on
>> auth.allow: *
>> disperse.eager-lock: off
>> performance.open-behind: off
>> performance.md-cache-timeout: 60
>> network.inode-lru-limit: 50000
>> diagnostics.client-log-level: ERROR
>>
>> --------------------------------------------------------
>> [root at mseas-data2 ~]# gluster volume status data-volume detail
>> --------------------------------------------------------
>> Status of volume: data-volume
>> ------------------------------------------------------------------------------
>> Brick                : Brick mseas-data2:/mnt/brick1
>> TCP Port             : 49154
>> RDMA Port            : 0
>> Online               : Y
>> Pid                  : 4601
>> File System          : xfs
>> Device               : /dev/sda
>> Mount Options        : rw
>> Inode Size           : 256
>> Disk Space Free      : 5.4TB
>> Total Disk Space     : 163.7TB
>> Inode Count          : 7031960320
>> Free Inodes          : 7003252864
>> ------------------------------------------------------------------------------
>> Brick                : Brick mseas-data2:/mnt/brick2
>> TCP Port             : 49155
>> RDMA Port            : 0
>> Online               : Y
>> Pid                  : 7949
>> File System          : xfs
>> Device               : /dev/sdb
>> Mount Options        : rw
>> Inode Size           : 256
>> Disk Space Free      : 3.4TB
>> Total Disk Space     : 163.7TB
>> Inode Count          : 7031960320
>> Free Inodes          : 7000746530
>> ------------------------------------------------------------------------------
>> Brick                : Brick mseas-data3:/export/sda/brick3
>> TCP Port             : 49153
>> RDMA Port            : 0
>> Online               : Y
>> Pid                  : 4650
>> File System          : xfs
>> Device               : /dev/sda
>> Mount Options        : rw
>> Inode Size           : 512
>> Disk Space Free      : 3.9TB
>> Total Disk Space     : 91.0TB
>> Inode Count          : 1953182464
>> Free Inodes          : 1943143292
>> ------------------------------------------------------------------------------
>> Brick                : Brick mseas-data3:/export/sdc/brick4
>> TCP Port             : 49154
>> RDMA Port            : 0
>> Online               : Y
>> Pid                  : 23772
>> File System          : xfs
>> Device               : /dev/mapper/vg_Data4-lv_Data4
>> Mount Options        : rw
>> Inode Size           : 256
>> Disk Space Free      : 2.6TB
>> Total Disk Space     : 90.9TB
>> Inode Count          : 3906272768
>> Free Inodes          : 3894355546
>>
>> -- 
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley                          Email:  phaley at mit.edu
>> Center for Ocean Engineering       Phone:  (617) 253-6824
>> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>> MIT, Room 5-213                    http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA  02139-4301
>>
>> ________
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://bluejeans.com/441850968
>>
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
> Hey Pat,
>
> The logs are not  providing  much information  ,  but the following seems strange:
> 'Failed uuid to hostname conversion'
>
> Have you checked  dns resolution (both short name and fqdn)?
> Also,  check the systems' ntp/chrony is in sync  and the  'gluster peer  status'  on all nodes.
>
> Is it possible that the  client  is not reaching all  bricks  ?
>
>
> P.S.:  Consider  increasing the log level,  as  current level is not sufficient.
>
> Best Regards,
> Strahil Nikolov

-- 

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley                          Email:  phaley at mit.edu
Center for Ocean Engineering       Phone:  (617) 253-6824
Dept. of Mechanical Engineering    Fax:    (617) 253-8125
MIT, Room 5-213                    http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301