[Gluster-users] Stale file handle
Pat Haley
phaley at mit.edu
Fri Mar 13 16:10:15 UTC 2020
Hi All,
After performing Strahil's checks and poking around some more, we found
that the problem was with the underlying filesystem thinking it was full
when it wasn't. Following the information in the links below, we found
that mounting with 64bit inodes fixed this problem.
https://serverfault.com/questions/357367/xfs-no-space-left-on-device-but-i-have-850gb-available
https://support.microfocus.com/kb/doc.php?id=7014318
Thanks
Pat
On 3/12/20 4:24 PM, Strahil Nikolov wrote:
> On March 12, 2020 8:06:14 PM GMT+02:00, Pat Haley <phaley at mit.edu> wrote:
>> Hi
>>
>> Yesterday we seemed to clear an issue with erroneous "No space left on
>> device" messages
>> (https://lists.gluster.org/pipermail/gluster-users/2020-March/037848.html)
>>
>> I am now seeing "Stale file handle" messages coming from directories
>> I've just created.
>>
>> We are running gluster 3.7.11 in a distributed volume across 2 servers
>> (2 bricks each). For the "Stale file handle" for a newly created
>> directory, I've noticed that the directory does not appear in brick1
>> (it
>> is in the other 3 bricks).
>>
>> In the cli.log on the server with brick1 I'm seeing messages like
>>
>> --------------------------------------------------------
>> [2020-03-12 17:21:36.596908] I [cli.c:721:main] 0-cli: Started running
>> gluster with version 3.7.11
>> [2020-03-12 17:21:36.604587] I
>> [cli-cmd-volume.c:1795:cli_check_gsync_present] 0-: geo-replication not
>>
>> installed
>> [2020-03-12 17:21:36.605100] I [MSGID: 101190]
>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
>>
>> with index 1
>> [2020-03-12 17:21:36.605155] I [socket.c:2356:socket_event_handler]
>> 0-transport: disconnecting now
>> [2020-03-12 17:21:36.617433] I [input.c:36:cli_batch] 0-: Exiting with:
>> 0
>> --------------------------------------------------------
>>
>> I'm not sure why I would be getting any geo-replication messages, we
>> aren't using replication. The cli.log on the other server is showing
>>
>> --------------------------------------------------------
>> [2020-03-12 17:27:08.172573] I [cli.c:721:main] 0-cli: Started running
>> gluster with version 3.7.11
>> [2020-03-12 17:27:08.302564] I [MSGID: 101190]
>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
>>
>> with index 1
>> [2020-03-12 17:27:08.302716] I [socket.c:2356:socket_event_handler]
>> 0-transport: disconnecting now
>> [2020-03-12 17:27:08.304557] I [input.c:36:cli_batch] 0-: Exiting with:
>> 0
>> --------------------------------------------------------
>>
>>
>> On the server with brick1, the etc-glusterfs-glusterd.vol.log is
>> showing
>>
>> --------------------------------------------------------
>> [2020-03-12 17:21:25.925394] I [MSGID: 106499]
>> [glusterd-handler.c:4331:__glusterd_handle_status_volume] 0-management:
>>
>> Received status volume req for volume data-volume
>> [2020-03-12 17:21:25.946240] W [MSGID: 106217]
>> [glusterd-op-sm.c:4630:glusterd_op_modify_op_ctx] 0-management: Failed
>> uuid to hostname conversion
>> [2020-03-12 17:21:25.946282] W [MSGID: 106387]
>> [glusterd-op-sm.c:4734:glusterd_op_modify_op_ctx] 0-management: op_ctx
>> modification failed
>> [2020-03-12 17:21:36.617090] I [MSGID: 106487]
>> [glusterd-handler.c:1472:__glusterd_handle_cli_list_friends]
>> 0-glusterd:
>> Received cli list req
>> [2020-03-12 17:21:15.577829] I [MSGID: 106488]
>> [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd:
>> Received get vol req
>> --------------------------------------------------------
>>
>> On the other server I'm seeing similar messages
>>
>> --------------------------------------------------------
>> [2020-03-12 17:26:57.024168] I [MSGID: 106499]
>> [glusterd-handler.c:4331:__glusterd_handle_status_volume] 0-management:
>>
>> Received status volume req for volume data-volume
>> [2020-03-12 17:26:57.037269] W [MSGID: 106217]
>> [glusterd-op-sm.c:4630:glusterd_op_modify_op_ctx] 0-management: Failed
>> uuid to hostname conversion
>> [2020-03-12 17:26:57.037299] W [MSGID: 106387]
>> [glusterd-op-sm.c:4734:glusterd_op_modify_op_ctx] 0-management: op_ctx
>> modification failed
>> [2020-03-12 17:26:42.025200] I [MSGID: 106488]
>> [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd:
>> Received get vol req
>> [2020-03-12 17:27:08.304267] I [MSGID: 106487]
>> [glusterd-handler.c:1472:__glusterd_handle_cli_list_friends]
>> 0-glusterd:
>> Received cli list req
>> --------------------------------------------------------
>>
>> And I've just noticed that I'm again seeing "No space left on device"
>> in
>> the logs of brick1 (although there is 3.5 TB free)
>>
>> --------------------------------------------------------
>> [2020-03-12 17:19:54.576597] E [MSGID: 113027]
>> [posix.c:1427:posix_mkdir] 0-data-volume-posix: mkdir of
>> /mnt/brick1/projects/deep_sea_mining/Tide/2020/Mar06/ccfzR75deg_001
>> failed [No space left on device]
>> [2020-03-12 17:19:54.576681] E [MSGID: 115056]
>> [server-rpc-fops.c:512:server_mkdir_cbk] 0-data-volume-server: 5001698:
>>
>> MKDIR /projects/deep_sea_mining/Tide/2020/Mar06/ccfzR75deg_001
>> (96e0b7e4-6b43-42ef-9896-86097b4208fe/ccfzR75deg_001) ==> (No space
>> left
>> on device) [No space left on device]
>> --------------------------------------------------------
>>
>> Any thoughts would be greatly appreciated. (Some additional
>> information
>> below)
>>
>> Thanks
>>
>> Pat
>>
>> --------------------------------------------------------
>> server 1:
>> [root at mseas-data2 ~]# df -h
>> Filesystem Size Used Avail Use% Mounted on
>> /dev/sdb 164T 161T 3.5T 98% /mnt/brick2
>> /dev/sda 164T 159T 5.4T 97% /mnt/brick1
>>
>> [root at mseas-data2 ~]# df -i
>> Filesystem Inodes IUsed IFree IUse% Mounted on
>> /dev/sdb 7031960320 31213790 7000746530 1% /mnt/brick2
>> /dev/sda 7031960320 28707456 7003252864 1% /mnt/brick1
>> --------------------------------------------------------
>>
>> --------------------------------------------------------
>> server 2:
>> [root at mseas-data3 ~]# df -h
>> Filesystem Size Used Avail Use% Mounted on
>> /dev/sda 91T 88T 3.9T 96% /export/sda/brick3
>> /dev/mapper/vg_Data4-lv_Data4
>> 91T 89T 2.6T 98% /export/sdc/brick4
>>
>> [root at mseas-data3 glusterfs]# df -i
>> Filesystem Inodes IUsed IFree IUse% Mounted on
>> /dev/sda 1953182464 10039172 1943143292 1%
>> /export/sda/brick3
>> /dev/mapper/vg_Data4-lv_Data4
>> 3906272768 11917222 3894355546 1%
>> /export/sdc/brick4
>> --------------------------------------------------------
>>
>> --------------------------------------------------------
>> [root at mseas-data2 ~]# gluster volume info
>> --------------------------------------------------------
>> Volume Name: data-volume
>> Type: Distribute
>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>> Status: Started
>> Number of Bricks: 4
>> Transport-type: tcp
>> Bricks:
>> Brick1: mseas-data2:/mnt/brick1
>> Brick2: mseas-data2:/mnt/brick2
>> Brick3: mseas-data3:/export/sda/brick3
>> Brick4: mseas-data3:/export/sdc/brick4
>> Options Reconfigured:
>> cluster.min-free-disk: 1%
>> nfs.export-volumes: off
>> nfs.disable: on
>> performance.readdir-ahead: on
>> diagnostics.brick-sys-log-level: WARNING
>> nfs.exports-auth-enable: on
>> server.allow-insecure: on
>> auth.allow: *
>> disperse.eager-lock: off
>> performance.open-behind: off
>> performance.md-cache-timeout: 60
>> network.inode-lru-limit: 50000
>> diagnostics.client-log-level: ERROR
>>
>> --------------------------------------------------------
>> [root at mseas-data2 ~]# gluster volume status data-volume detail
>> --------------------------------------------------------
>> Status of volume: data-volume
>> ------------------------------------------------------------------------------
>> Brick : Brick mseas-data2:/mnt/brick1
>> TCP Port : 49154
>> RDMA Port : 0
>> Online : Y
>> Pid : 4601
>> File System : xfs
>> Device : /dev/sda
>> Mount Options : rw
>> Inode Size : 256
>> Disk Space Free : 5.4TB
>> Total Disk Space : 163.7TB
>> Inode Count : 7031960320
>> Free Inodes : 7003252864
>> ------------------------------------------------------------------------------
>> Brick : Brick mseas-data2:/mnt/brick2
>> TCP Port : 49155
>> RDMA Port : 0
>> Online : Y
>> Pid : 7949
>> File System : xfs
>> Device : /dev/sdb
>> Mount Options : rw
>> Inode Size : 256
>> Disk Space Free : 3.4TB
>> Total Disk Space : 163.7TB
>> Inode Count : 7031960320
>> Free Inodes : 7000746530
>> ------------------------------------------------------------------------------
>> Brick : Brick mseas-data3:/export/sda/brick3
>> TCP Port : 49153
>> RDMA Port : 0
>> Online : Y
>> Pid : 4650
>> File System : xfs
>> Device : /dev/sda
>> Mount Options : rw
>> Inode Size : 512
>> Disk Space Free : 3.9TB
>> Total Disk Space : 91.0TB
>> Inode Count : 1953182464
>> Free Inodes : 1943143292
>> ------------------------------------------------------------------------------
>> Brick : Brick mseas-data3:/export/sdc/brick4
>> TCP Port : 49154
>> RDMA Port : 0
>> Online : Y
>> Pid : 23772
>> File System : xfs
>> Device : /dev/mapper/vg_Data4-lv_Data4
>> Mount Options : rw
>> Inode Size : 256
>> Disk Space Free : 2.6TB
>> Total Disk Space : 90.9TB
>> Inode Count : 3906272768
>> Free Inodes : 3894355546
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email: phaley at mit.edu
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>> ________
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://bluejeans.com/441850968
>>
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
> Hey Pat,
>
> The logs are not providing much information , but the following seems strange:
> 'Failed uuid to hostname conversion'
>
> Have you checked dns resolution (both short name and fqdn)?
> Also, check the systems' ntp/chrony is in sync and the 'gluster peer status' on all nodes.
>
> Is it possible that the client is not reaching all bricks ?
>
>
> P.S.: Consider increasing the log level, as current level is not sufficient.
>
> Best Regards,
> Strahil Nikolov
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: phaley at mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
More information about the Gluster-users
mailing list