<div dir="auto">Thanks for the update! </div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 13 Mar, 2020, 9:40 PM Pat Haley, <<a href="mailto:phaley@mit.edu">phaley@mit.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
Hi All,<br>
<br>
After performing Strahil's checks and poking around some more, we found <br>
that the problem was with the underlying filesystem thinking it was full <br>
when it wasn't. Following the information in the links below, we found <br>
that mounting with 64bit inodes fixed this problem.<br>
<br>
<a href="https://serverfault.com/questions/357367/xfs-no-space-left-on-device-but-i-have-850gb-available" rel="noreferrer noreferrer" target="_blank">https://serverfault.com/questions/357367/xfs-no-space-left-on-device-but-i-have-850gb-available</a><br>
<br>
<a href="https://support.microfocus.com/kb/doc.php?id=7014318" rel="noreferrer noreferrer" target="_blank">https://support.microfocus.com/kb/doc.php?id=7014318</a><br>
<br>
Thanks<br>
<br>
Pat<br>
<br>
<br>
On 3/12/20 4:24 PM, Strahil Nikolov wrote:<br>
> On March 12, 2020 8:06:14 PM GMT+02:00, Pat Haley <<a href="mailto:phaley@mit.edu" target="_blank" rel="noreferrer">phaley@mit.edu</a>> wrote:<br>
>> Hi<br>
>><br>
>> Yesterday we seemed to clear an issue with erroneous "No space left on<br>
>> device" messages<br>
>> (<a href="https://lists.gluster.org/pipermail/gluster-users/2020-March/037848.html" rel="noreferrer noreferrer" target="_blank">https://lists.gluster.org/pipermail/gluster-users/2020-March/037848.html</a>)<br>
>><br>
>> I am now seeing "Stale file handle" messages coming from directories<br>
>> I've just created.<br>
>><br>
>> We are running gluster 3.7.11 in a distributed volume across 2 servers<br>
>> (2 bricks each). For the "Stale file handle" for a newly created<br>
>> directory, I've noticed that the directory does not appear in brick1<br>
>> (it<br>
>> is in the other 3 bricks).<br>
>><br>
>> In the cli.log on the server with brick1 I'm seeing messages like<br>
>><br>
>> --------------------------------------------------------<br>
>> [2020-03-12 17:21:36.596908] I [cli.c:721:main] 0-cli: Started running<br>
>> gluster with version 3.7.11<br>
>> [2020-03-12 17:21:36.604587] I<br>
>> [cli-cmd-volume.c:1795:cli_check_gsync_present] 0-: geo-replication not<br>
>><br>
>> installed<br>
>> [2020-03-12 17:21:36.605100] I [MSGID: 101190]<br>
>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread<br>
>><br>
>> with index 1<br>
>> [2020-03-12 17:21:36.605155] I [socket.c:2356:socket_event_handler]<br>
>> 0-transport: disconnecting now<br>
>> [2020-03-12 17:21:36.617433] I [input.c:36:cli_batch] 0-: Exiting with:<br>
>> 0<br>
>> --------------------------------------------------------<br>
>><br>
>> I'm not sure why I would be getting any geo-replication messages, we<br>
>> aren't using replication. The cli.log on the other server is showing<br>
>><br>
>> --------------------------------------------------------<br>
>> [2020-03-12 17:27:08.172573] I [cli.c:721:main] 0-cli: Started running<br>
>> gluster with version 3.7.11<br>
>> [2020-03-12 17:27:08.302564] I [MSGID: 101190]<br>
>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread<br>
>><br>
>> with index 1<br>
>> [2020-03-12 17:27:08.302716] I [socket.c:2356:socket_event_handler]<br>
>> 0-transport: disconnecting now<br>
>> [2020-03-12 17:27:08.304557] I [input.c:36:cli_batch] 0-: Exiting with:<br>
>> 0<br>
>> --------------------------------------------------------<br>
>><br>
>><br>
>> On the server with brick1, the etc-glusterfs-glusterd.vol.log is<br>
>> showing<br>
>><br>
>> --------------------------------------------------------<br>
>> [2020-03-12 17:21:25.925394] I [MSGID: 106499]<br>
>> [glusterd-handler.c:4331:__glusterd_handle_status_volume] 0-management:<br>
>><br>
>> Received status volume req for volume data-volume<br>
>> [2020-03-12 17:21:25.946240] W [MSGID: 106217]<br>
>> [glusterd-op-sm.c:4630:glusterd_op_modify_op_ctx] 0-management: Failed<br>
>> uuid to hostname conversion<br>
>> [2020-03-12 17:21:25.946282] W [MSGID: 106387]<br>
>> [glusterd-op-sm.c:4734:glusterd_op_modify_op_ctx] 0-management: op_ctx<br>
>> modification failed<br>
>> [2020-03-12 17:21:36.617090] I [MSGID: 106487]<br>
>> [glusterd-handler.c:1472:__glusterd_handle_cli_list_friends]<br>
>> 0-glusterd:<br>
>> Received cli list req<br>
>> [2020-03-12 17:21:15.577829] I [MSGID: 106488]<br>
>> [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd:<br>
>> Received get vol req<br>
>> --------------------------------------------------------<br>
>><br>
>> On the other server I'm seeing similar messages<br>
>><br>
>> --------------------------------------------------------<br>
>> [2020-03-12 17:26:57.024168] I [MSGID: 106499]<br>
>> [glusterd-handler.c:4331:__glusterd_handle_status_volume] 0-management:<br>
>><br>
>> Received status volume req for volume data-volume<br>
>> [2020-03-12 17:26:57.037269] W [MSGID: 106217]<br>
>> [glusterd-op-sm.c:4630:glusterd_op_modify_op_ctx] 0-management: Failed<br>
>> uuid to hostname conversion<br>
>> [2020-03-12 17:26:57.037299] W [MSGID: 106387]<br>
>> [glusterd-op-sm.c:4734:glusterd_op_modify_op_ctx] 0-management: op_ctx<br>
>> modification failed<br>
>> [2020-03-12 17:26:42.025200] I [MSGID: 106488]<br>
>> [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd:<br>
>> Received get vol req<br>
>> [2020-03-12 17:27:08.304267] I [MSGID: 106487]<br>
>> [glusterd-handler.c:1472:__glusterd_handle_cli_list_friends]<br>
>> 0-glusterd:<br>
>> Received cli list req<br>
>> --------------------------------------------------------<br>
>><br>
>> And I've just noticed that I'm again seeing "No space left on device"<br>
>> in<br>
>> the logs of brick1 (although there is 3.5 TB free)<br>
>><br>
>> --------------------------------------------------------<br>
>> [2020-03-12 17:19:54.576597] E [MSGID: 113027]<br>
>> [posix.c:1427:posix_mkdir] 0-data-volume-posix: mkdir of<br>
>> /mnt/brick1/projects/deep_sea_mining/Tide/2020/Mar06/ccfzR75deg_001<br>
>> failed [No space left on device]<br>
>> [2020-03-12 17:19:54.576681] E [MSGID: 115056]<br>
>> [server-rpc-fops.c:512:server_mkdir_cbk] 0-data-volume-server: 5001698:<br>
>><br>
>> MKDIR /projects/deep_sea_mining/Tide/2020/Mar06/ccfzR75deg_001<br>
>> (96e0b7e4-6b43-42ef-9896-86097b4208fe/ccfzR75deg_001) ==> (No space<br>
>> left<br>
>> on device) [No space left on device]<br>
>> --------------------------------------------------------<br>
>><br>
>> Any thoughts would be greatly appreciated. (Some additional<br>
>> information<br>
>> below)<br>
>><br>
>> Thanks<br>
>><br>
>> Pat<br>
>><br>
>> --------------------------------------------------------<br>
>> server 1:<br>
>> [root@mseas-data2 ~]# df -h<br>
>> Filesystem Size Used Avail Use% Mounted on<br>
>> /dev/sdb 164T 161T 3.5T 98% /mnt/brick2<br>
>> /dev/sda 164T 159T 5.4T 97% /mnt/brick1<br>
>><br>
>> [root@mseas-data2 ~]# df -i<br>
>> Filesystem Inodes IUsed IFree IUse% Mounted on<br>
>> /dev/sdb 7031960320 31213790 7000746530 1% /mnt/brick2<br>
>> /dev/sda 7031960320 28707456 7003252864 1% /mnt/brick1<br>
>> --------------------------------------------------------<br>
>><br>
>> --------------------------------------------------------<br>
>> server 2:<br>
>> [root@mseas-data3 ~]# df -h<br>
>> Filesystem Size Used Avail Use% Mounted on<br>
>> /dev/sda 91T 88T 3.9T 96% /export/sda/brick3<br>
>> /dev/mapper/vg_Data4-lv_Data4<br>
>> 91T 89T 2.6T 98% /export/sdc/brick4<br>
>><br>
>> [root@mseas-data3 glusterfs]# df -i<br>
>> Filesystem Inodes IUsed IFree IUse% Mounted on<br>
>> /dev/sda 1953182464 10039172 1943143292 1%<br>
>> /export/sda/brick3<br>
>> /dev/mapper/vg_Data4-lv_Data4<br>
>> 3906272768 11917222 3894355546 1%<br>
>> /export/sdc/brick4<br>
>> --------------------------------------------------------<br>
>><br>
>> --------------------------------------------------------<br>
>> [root@mseas-data2 ~]# gluster volume info<br>
>> --------------------------------------------------------<br>
>> Volume Name: data-volume<br>
>> Type: Distribute<br>
>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18<br>
>> Status: Started<br>
>> Number of Bricks: 4<br>
>> Transport-type: tcp<br>
>> Bricks:<br>
>> Brick1: mseas-data2:/mnt/brick1<br>
>> Brick2: mseas-data2:/mnt/brick2<br>
>> Brick3: mseas-data3:/export/sda/brick3<br>
>> Brick4: mseas-data3:/export/sdc/brick4<br>
>> Options Reconfigured:<br>
>> cluster.min-free-disk: 1%<br>
>> nfs.export-volumes: off<br>
>> nfs.disable: on<br>
>> performance.readdir-ahead: on<br>
>> diagnostics.brick-sys-log-level: WARNING<br>
>> nfs.exports-auth-enable: on<br>
>> server.allow-insecure: on<br>
>> auth.allow: *<br>
>> disperse.eager-lock: off<br>
>> performance.open-behind: off<br>
>> performance.md-cache-timeout: 60<br>
>> network.inode-lru-limit: 50000<br>
>> diagnostics.client-log-level: ERROR<br>
>><br>
>> --------------------------------------------------------<br>
>> [root@mseas-data2 ~]# gluster volume status data-volume detail<br>
>> --------------------------------------------------------<br>
>> Status of volume: data-volume<br>
>> ------------------------------------------------------------------------------<br>
>> Brick : Brick mseas-data2:/mnt/brick1<br>
>> TCP Port : 49154<br>
>> RDMA Port : 0<br>
>> Online : Y<br>
>> Pid : 4601<br>
>> File System : xfs<br>
>> Device : /dev/sda<br>
>> Mount Options : rw<br>
>> Inode Size : 256<br>
>> Disk Space Free : 5.4TB<br>
>> Total Disk Space : 163.7TB<br>
>> Inode Count : 7031960320<br>
>> Free Inodes : 7003252864<br>
>> ------------------------------------------------------------------------------<br>
>> Brick : Brick mseas-data2:/mnt/brick2<br>
>> TCP Port : 49155<br>
>> RDMA Port : 0<br>
>> Online : Y<br>
>> Pid : 7949<br>
>> File System : xfs<br>
>> Device : /dev/sdb<br>
>> Mount Options : rw<br>
>> Inode Size : 256<br>
>> Disk Space Free : 3.4TB<br>
>> Total Disk Space : 163.7TB<br>
>> Inode Count : 7031960320<br>
>> Free Inodes : 7000746530<br>
>> ------------------------------------------------------------------------------<br>
>> Brick : Brick mseas-data3:/export/sda/brick3<br>
>> TCP Port : 49153<br>
>> RDMA Port : 0<br>
>> Online : Y<br>
>> Pid : 4650<br>
>> File System : xfs<br>
>> Device : /dev/sda<br>
>> Mount Options : rw<br>
>> Inode Size : 512<br>
>> Disk Space Free : 3.9TB<br>
>> Total Disk Space : 91.0TB<br>
>> Inode Count : 1953182464<br>
>> Free Inodes : 1943143292<br>
>> ------------------------------------------------------------------------------<br>
>> Brick : Brick mseas-data3:/export/sdc/brick4<br>
>> TCP Port : 49154<br>
>> RDMA Port : 0<br>
>> Online : Y<br>
>> Pid : 23772<br>
>> File System : xfs<br>
>> Device : /dev/mapper/vg_Data4-lv_Data4<br>
>> Mount Options : rw<br>
>> Inode Size : 256<br>
>> Disk Space Free : 2.6TB<br>
>> Total Disk Space : 90.9TB<br>
>> Inode Count : 3906272768<br>
>> Free Inodes : 3894355546<br>
>><br>
>> -- <br>
>><br>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-<br>
>> Pat Haley Email: <a href="mailto:phaley@mit.edu" target="_blank" rel="noreferrer">phaley@mit.edu</a><br>
>> Center for Ocean Engineering Phone: (617) 253-6824<br>
>> Dept. of Mechanical Engineering Fax: (617) 253-8125<br>
>> MIT, Room 5-213 <a href="http://web.mit.edu/phaley/www/" rel="noreferrer noreferrer" target="_blank">http://web.mit.edu/phaley/www/</a><br>
>> 77 Massachusetts Avenue<br>
>> Cambridge, MA 02139-4301<br>
>><br>
>> ________<br>
>><br>
>><br>
>><br>
>> Community Meeting Calendar:<br>
>><br>
>> Schedule -<br>
>> Every Tuesday at 14:30 IST / 09:00 UTC<br>
>> Bridge: <a href="https://bluejeans.com/441850968" rel="noreferrer noreferrer" target="_blank">https://bluejeans.com/441850968</a><br>
>><br>
>> Gluster-users mailing list<br>
>> <a href="mailto:Gluster-users@gluster.org" target="_blank" rel="noreferrer">Gluster-users@gluster.org</a><br>
>> <a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
> Hey Pat,<br>
><br>
> The logs are not providing much information , but the following seems strange:<br>
> 'Failed uuid to hostname conversion'<br>
><br>
> Have you checked dns resolution (both short name and fqdn)?<br>
> Also, check the systems' ntp/chrony is in sync and the 'gluster peer status' on all nodes.<br>
><br>
> Is it possible that the client is not reaching all bricks ?<br>
><br>
><br>
> P.S.: Consider increasing the log level, as current level is not sufficient.<br>
><br>
> Best Regards,<br>
> Strahil Nikolov<br>
<br>
-- <br>
<br>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-<br>
Pat Haley Email: <a href="mailto:phaley@mit.edu" target="_blank" rel="noreferrer">phaley@mit.edu</a><br>
Center for Ocean Engineering Phone: (617) 253-6824<br>
Dept. of Mechanical Engineering Fax: (617) 253-8125<br>
MIT, Room 5-213 <a href="http://web.mit.edu/phaley/www/" rel="noreferrer noreferrer" target="_blank">http://web.mit.edu/phaley/www/</a><br>
77 Massachusetts Avenue<br>
Cambridge, MA 02139-4301<br>
<br>
________<br>
<br>
<br>
<br>
Community Meeting Calendar:<br>
<br>
Schedule -<br>
Every Tuesday at 14:30 IST / 09:00 UTC<br>
Bridge: <a href="https://bluejeans.com/441850968" rel="noreferrer noreferrer" target="_blank">https://bluejeans.com/441850968</a><br>
<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank" rel="noreferrer">Gluster-users@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
</blockquote></div>