[Gluster-users] Erroneous "No space left on device." messages
Strahil Nikolov
hunter86_bg at yahoo.com
Wed Mar 11 15:19:24 UTC 2020
On March 11, 2020 4:27:58 PM GMT+02:00, Pat Haley <phaley at mit.edu> wrote:
>
>Hi,
>
>I was able to successfully reset cluster.min-free-disk. That only made
>
>the "No space left on device" problem intermittent instead of
>constant.
>I then look at the brick log files again and noticed "No space ..."
>error recorded for files that I knew nobody was accessing. gluster
>volume status was also reporting a rebalance on-going (but not the same
>
>ID as that one I started on Monday). I stopped the rebalance and I do
>not seem to be getting the "No space left on device" messages.
>
>However I now have new curious issue. I have at least one file that I
>created after resetting cluster.min-free-disk but before shutting down
>the rebalance that does not show up on a simple "ls" command but does
>show up if I explicitly try to ls that file (example below, the file in
>
>question is PeManJob). This semi-missing file is located on brick1
>(one
>of the 2 that were giving the "No space left on device" messages). How
>
>do I fix this new issue?
>
>Thanks
>
>Pat
>
>mseas(DSMccfzR75deg_001b)% ls
>at_pe_job pe_nrg.nc
>check_times_job pe_out.nc
>HoldJob pe_PBI.in
>oi_3hr.dat PePbiJob
>PE_Data_Comparison_glider_all_smalldom.m pe_PB.in
>PE_Data_Comparison_glider_sp011_smalldom.m pe_PB.log
>PE_Data_Comparison_glider_sp064_smalldom.m pe_PB_short.in
>PeManJob.log PlotJob
>
>mseas(DSMccfzR75deg_001b)% ls PeManJob
>PeManJob
>
>mseas(DSMccfzR75deg_001b)% ls PeManJob*
>PeManJob.log
>
>On 3/10/20 8:18 PM, Strahil Nikolov wrote:
>> On March 10, 2020 9:47:49 PM GMT+02:00, Pat Haley <phaley at mit.edu>
>wrote:
>>> Hi,
>>>
>>> If I understand this, to remove the "No space left on device" error
>I
>>> either have to clear up 10% space on each brick, or clean-up a
>lesser
>>> amount and reset cluster.min-free. Is this correct?
>>>
>>> I have found the following command for resetting the
>cluster.min-free
>>>
>>> *
>>>
>>> gluster volume set <volume> cluster.min-free-disk <value>
>>>
>>> Can this be done while the volume is live? Does the <value> need to
>be
>>>
>>> an integer?
>>>
>>> Thanks
>>>
>>> Pat
>>>
>>>
>>> On 3/10/20 2:45 PM, Pat Haley wrote:
>>>> Hi,
>>>>
>>>> I get the following
>>>>
>>>> [root at mseas-data2 bricks]# gluster volume get data-volume all |
>grep
>>>> cluster.min-free
>>>> cluster.min-free-disk 10%
>>>> cluster.min-free-inodes 5%
>>>>
>>>>
>>>> On 3/10/20 2:34 PM, Strahil Nikolov wrote:
>>>>> On March 10, 2020 8:14:41 PM GMT+02:00, Pat Haley <phaley at mit.edu>
>>>>> wrote:
>>>>>> HI,
>>>>>>
>>>>>> After some more poking around in the logs (specifically the brick
>>> logs)
>>>>>> * brick1 & brick2 have both been recording "No space left on
>>> device"
>>>>>> messages today (as recently at 15 minutes ago)
>>>>>> * brick3 last recorded a "No space left on device" message
>last
>>> night
>>>>>> around 10:30pm
>>>>>> * brick4 has no such messages in its log file
>>>>>>
>>>>>> Note brick1 & brick2 are on one server, brick3 and brick4 are on
>>> the
>>>>>> second server.
>>>>>>
>>>>>> Pat
>>>>>>
>>>>>>
>>>>>> On 3/10/20 11:51 AM, Pat Haley wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> We have developed a problem with Gluster reporting "No space
>left
>>> on
>>>>>>> device." even though "df" of both the gluster filesystem and the
>>>>>>> underlying bricks show space available (details below). Our
>inode
>>>>>>> usage is between 1-3%. We are running gluster 3.7.11 in a
>>>>>> distributed
>>>>>>> volume across 2 servers (2 bricks each). We have followed the
>>> thread
>>>
>https://lists.gluster.org/pipermail/gluster-users/2020-March/037821.html
>>>
>>>>>>
>>>>>>> but haven't found a solution yet.
>>>>>>>
>>>>>>> Last night we ran a rebalance which appeared successful (and
>have
>>>>>>> since cleared up some more space which seems to have mainly been
>>> on
>>>>>>> one brick). There were intermittent erroneous "No space..."
>>> messages
>>>>>>> last night, but they have become much more frequent today.
>>>>>>>
>>>>>>> Any help would be greatly appreciated.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> ---------------------------
>>>>>>> [root at mseas-data2 ~]# df -h
>>>>>>> ---------------------------
>>>>>>> Filesystem Size Used Avail Use% Mounted on
>>>>>>> /dev/sdb 164T 164T 324G 100% /mnt/brick2
>>>>>>> /dev/sda 164T 164T 323G 100% /mnt/brick1
>>>>>>> ---------------------------
>>>>>>> [root at mseas-data2 ~]# df -i
>>>>>>> ---------------------------
>>>>>>> Filesystem Inodes IUsed IFree IUse% Mounted on
>>>>>>> /dev/sdb 1375470800 31207165 1344263635 3% /mnt/brick2
>>>>>>> /dev/sda 1384781520 28706614 1356074906 3% /mnt/brick1
>>>>>>>
>>>>>>> ---------------------------
>>>>>>> [root at mseas-data3 ~]# df -h
>>>>>>> ---------------------------
>>>>>>> /dev/sda 91T 91T 323G 100% /export/sda/brick3
>>>>>>> /dev/mapper/vg_Data4-lv_Data4
>>>>>>> 91T 88T 3.4T 97% /export/sdc/brick4
>>>>>>> ---------------------------
>>>>>>> [root at mseas-data3 ~]# df -i
>>>>>>> ---------------------------
>>>>>>> /dev/sda 679323496 9822199 669501297 2%
>>>>>>> /export/sda/brick3
>>>>>>> /dev/mapper/vg_Data4-lv_Data4
>>>>>>> 3906272768 11467484 3894805284 1%
>>>>>>> /export/sdc/brick4
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------
>>>>>>> [root at mseas-data2 ~]# gluster --version
>>>>>>> ---------------------------------------
>>>>>>> glusterfs 3.7.11 built on Apr 27 2016 14:09:22
>>>>>>> Repository revision: git://git.gluster.com/glusterfs.git
>>>>>>> Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
>>>>>>> GlusterFS comes with ABSOLUTELY NO WARRANTY.
>>>>>>> You may redistribute copies of GlusterFS under the terms of the
>>> GNU
>>>>>>> General Public License.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----------------------------------------
>>>>>>> [root at mseas-data2 ~]# gluster volume info
>>>>>>> -----------------------------------------
>>>>>>> Volume Name: data-volume
>>>>>>> Type: Distribute
>>>>>>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>> Status: Started
>>>>>>> Number of Bricks: 4
>>>>>>> Transport-type: tcp
>>>>>>> Bricks:
>>>>>>> Brick1: mseas-data2:/mnt/brick1
>>>>>>> Brick2: mseas-data2:/mnt/brick2
>>>>>>> Brick3: mseas-data3:/export/sda/brick3
>>>>>>> Brick4: mseas-data3:/export/sdc/brick4
>>>>>>> Options Reconfigured:
>>>>>>> nfs.export-volumes: off
>>>>>>> nfs.disable: on
>>>>>>> performance.readdir-ahead: on
>>>>>>> diagnostics.brick-sys-log-level: WARNING
>>>>>>> nfs.exports-auth-enable: on
>>>>>>> server.allow-insecure: on
>>>>>>> auth.allow: *
>>>>>>> disperse.eager-lock: off
>>>>>>> performance.open-behind: off
>>>>>>> performance.md-cache-timeout: 60
>>>>>>> network.inode-lru-limit: 50000
>>>>>>> diagnostics.client-log-level: ERROR
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------
>>>>>>> [root at mseas-data2 ~]# gluster volume status data-volume detail
>>>>>>> --------------------------------------------------------------
>>>>>>> Status of volume: data-volume
>>>>>>>
>>>
>------------------------------------------------------------------------------
>>>
>>>>>>
>>>>>>> Brick : Brick mseas-data2:/mnt/brick1
>>>>>>> TCP Port : 49154
>>>>>>> RDMA Port : 0
>>>>>>> Online : Y
>>>>>>> Pid : 4601
>>>>>>> File System : xfs
>>>>>>> Device : /dev/sda
>>>>>>> Mount Options : rw
>>>>>>> Inode Size : 256
>>>>>>> Disk Space Free : 318.8GB
>>>>>>> Total Disk Space : 163.7TB
>>>>>>> Inode Count : 1365878288
>>>>>>> Free Inodes : 1337173596
>>>>>>>
>>>
>------------------------------------------------------------------------------
>>>
>>>>>>
>>>>>>> Brick : Brick mseas-data2:/mnt/brick2
>>>>>>> TCP Port : 49155
>>>>>>> RDMA Port : 0
>>>>>>> Online : Y
>>>>>>> Pid : 7949
>>>>>>> File System : xfs
>>>>>>> Device : /dev/sdb
>>>>>>> Mount Options : rw
>>>>>>> Inode Size : 256
>>>>>>> Disk Space Free : 319.8GB
>>>>>>> Total Disk Space : 163.7TB
>>>>>>> Inode Count : 1372421408
>>>>>>> Free Inodes : 1341219039
>>>>>>>
>>>
>------------------------------------------------------------------------------
>>>
>>>>>>
>>>>>>> Brick : Brick mseas-data3:/export/sda/brick3
>>>>>>> TCP Port : 49153
>>>>>>> RDMA Port : 0
>>>>>>> Online : Y
>>>>>>> Pid : 4650
>>>>>>> File System : xfs
>>>>>>> Device : /dev/sda
>>>>>>> Mount Options : rw
>>>>>>> Inode Size : 512
>>>>>>> Disk Space Free : 325.3GB
>>>>>>> Total Disk Space : 91.0TB
>>>>>>> Inode Count : 692001992
>>>>>>> Free Inodes : 682188893
>>>>>>>
>>>
>------------------------------------------------------------------------------
>>>
>>>>>>
>>>>>>> Brick : Brick mseas-data3:/export/sdc/brick4
>>>>>>> TCP Port : 49154
>>>>>>> RDMA Port : 0
>>>>>>> Online : Y
>>>>>>> Pid : 23772
>>>>>>> File System : xfs
>>>>>>> Device : /dev/mapper/vg_Data4-lv_Data4
>>>>>>> Mount Options : rw
>>>>>>> Inode Size : 256
>>>>>>> Disk Space Free : 3.4TB
>>>>>>> Total Disk Space : 90.9TB
>>>>>>> Inode Count : 3906272768
>>>>>>> Free Inodes : 3894809903
>>>>>>>
>>>>> Hi Pat,
>>>>>
>>>>> What is the output of:
>>>>> gluster volume get data-volume all | grep cluster.min-free
>>>>>
>>>>> 1% of 164 T is 1640G , but in your case you have only 324G which
>is
>>>>> way lower.
>>>>>
>>>>> Best Regards,
>>>>> Strahil Nikolov
>> Hey Pat,
>>
>> Some users have reported they are using a value of 1% and it seems
>to be working.
>>
>> Most probably you will be able to do it live, but I have never had to
>change that. You can give a try on a test cluster.
>>
>> Best Regards,
>> Strahil Nikolov
Hey Pat,
I'm glad you can write files now.
I guess it's related to the rebalance story.
Actually I never hit that, but I guess you can try to move it to another folder and back to the original location. This is pure speculation !
mv /mnt/somedir/somedir2/file /mnt/somedir
mv /mnt/somedir/file /mnt/somedir/somedir2
Yet, you need to run a rebalance if your files are not placed evenly between the brick.
Best Regards,
Strahil Nikolov
More information about the Gluster-users
mailing list