[Gluster-users] Erroneous "No space left on device." messages

Wed Mar 11 15:19:24 UTC 2020

On March 11, 2020 4:27:58 PM GMT+02:00, Pat Haley <phaley at mit.edu> wrote:
>
>Hi,
>
>I was able to successfully reset cluster.min-free-disk.  That only made
>
>the "No space left on device" problem intermittent instead of
>constant.  
>I then look at the brick log files again and noticed "No space ..." 
>error recorded for files that I knew nobody was accessing.  gluster 
>volume status was also reporting a rebalance on-going (but not the same
>
>ID as that one I started on Monday).  I stopped the rebalance and I do 
>not seem to be getting the "No space left on device" messages.
>
>However I now have new curious issue.  I have at least one file that I 
>created after resetting cluster.min-free-disk but before shutting down 
>the rebalance that does not show up on a simple "ls" command but does 
>show up if I explicitly try to ls that file (example below, the file in
>
>question is PeManJob).  This semi-missing file is located on brick1
>(one 
>of the 2 that were giving the "No space left on device" messages).  How
>
>do I fix this new issue?
>
>Thanks
>
>Pat
>
>mseas(DSMccfzR75deg_001b)% ls
>at_pe_job                                   pe_nrg.nc
>check_times_job                             pe_out.nc
>HoldJob                                     pe_PBI.in
>oi_3hr.dat                                  PePbiJob
>PE_Data_Comparison_glider_all_smalldom.m    pe_PB.in
>PE_Data_Comparison_glider_sp011_smalldom.m  pe_PB.log
>PE_Data_Comparison_glider_sp064_smalldom.m pe_PB_short.in
>PeManJob.log                                PlotJob
>
>mseas(DSMccfzR75deg_001b)% ls PeManJob
>PeManJob
>
>mseas(DSMccfzR75deg_001b)% ls PeManJob*
>PeManJob.log
>
>On 3/10/20 8:18 PM, Strahil Nikolov wrote:
>> On March 10, 2020 9:47:49 PM GMT+02:00, Pat Haley <phaley at mit.edu>
>wrote:
>>> Hi,
>>>
>>> If I understand this, to remove the "No space left on device" error
>I
>>> either have to clear up 10% space on each brick, or clean-up a
>lesser
>>> amount and reset cluster.min-free.  Is this correct?
>>>
>>> I have found the following command for resetting the
>cluster.min-free
>>>
>>>   *
>>>
>>>     gluster volume set <volume> cluster.min-free-disk <value>
>>>
>>> Can this be done while the volume is live?  Does the <value> need to
>be
>>>
>>> an integer?
>>>
>>> Thanks
>>>
>>> Pat
>>>
>>>
>>> On 3/10/20 2:45 PM, Pat Haley wrote:
>>>> Hi,
>>>>
>>>> I get the following
>>>>
>>>> [root at mseas-data2 bricks]# gluster  volume get data-volume all |
>grep
>>>> cluster.min-free
>>>> cluster.min-free-disk 10%
>>>> cluster.min-free-inodes 5%
>>>>
>>>>
>>>> On 3/10/20 2:34 PM, Strahil Nikolov wrote:
>>>>> On March 10, 2020 8:14:41 PM GMT+02:00, Pat Haley <phaley at mit.edu>
>>>>> wrote:
>>>>>> HI,
>>>>>>
>>>>>> After some more poking around in the logs (specifically the brick
>>> logs)
>>>>>>    * brick1 & brick2 have both been recording "No space left on
>>> device"
>>>>>>      messages today (as recently at 15 minutes ago)
>>>>>>    * brick3 last recorded a "No space left on device" message
>last
>>> night
>>>>>>      around 10:30pm
>>>>>>    * brick4 has no such messages in its log file
>>>>>>
>>>>>> Note brick1 & brick2 are on one server, brick3 and brick4 are on
>>> the
>>>>>> second server.
>>>>>>
>>>>>> Pat
>>>>>>
>>>>>>
>>>>>> On 3/10/20 11:51 AM, Pat Haley wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> We have developed a problem with Gluster reporting "No space
>left
>>> on
>>>>>>> device." even though "df" of both the gluster filesystem and the
>>>>>>> underlying bricks show space available (details below).  Our
>inode
>>>>>>> usage is between 1-3%.  We are running gluster 3.7.11 in a
>>>>>> distributed
>>>>>>> volume across 2 servers (2 bricks each). We have followed the
>>> thread
>>>
>https://lists.gluster.org/pipermail/gluster-users/2020-March/037821.html
>>>
>>>>>>
>>>>>>> but haven't found a solution yet.
>>>>>>>
>>>>>>> Last night we ran a rebalance which appeared successful (and
>have
>>>>>>> since cleared up some more space which seems to have mainly been
>>> on
>>>>>>> one brick).  There were intermittent erroneous "No space..."
>>> messages
>>>>>>> last night, but they have become much more frequent today.
>>>>>>>
>>>>>>> Any help would be greatly appreciated.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> ---------------------------
>>>>>>> [root at mseas-data2 ~]# df -h
>>>>>>> ---------------------------
>>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>>> /dev/sdb        164T  164T  324G 100% /mnt/brick2
>>>>>>> /dev/sda        164T  164T  323G 100% /mnt/brick1
>>>>>>> ---------------------------
>>>>>>> [root at mseas-data2 ~]# df -i
>>>>>>> ---------------------------
>>>>>>> Filesystem         Inodes    IUsed      IFree IUse% Mounted on
>>>>>>> /dev/sdb       1375470800 31207165 1344263635    3% /mnt/brick2
>>>>>>> /dev/sda       1384781520 28706614 1356074906    3% /mnt/brick1
>>>>>>>
>>>>>>> ---------------------------
>>>>>>> [root at mseas-data3 ~]# df -h
>>>>>>> ---------------------------
>>>>>>> /dev/sda               91T   91T  323G 100% /export/sda/brick3
>>>>>>> /dev/mapper/vg_Data4-lv_Data4
>>>>>>>                          91T   88T  3.4T  97% /export/sdc/brick4
>>>>>>> ---------------------------
>>>>>>> [root at mseas-data3 ~]# df -i
>>>>>>> ---------------------------
>>>>>>> /dev/sda              679323496  9822199  669501297    2%
>>>>>>> /export/sda/brick3
>>>>>>> /dev/mapper/vg_Data4-lv_Data4
>>>>>>>                        3906272768 11467484 3894805284    1%
>>>>>>> /export/sdc/brick4
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------
>>>>>>> [root at mseas-data2 ~]# gluster --version
>>>>>>> ---------------------------------------
>>>>>>> glusterfs 3.7.11 built on Apr 27 2016 14:09:22
>>>>>>> Repository revision: git://git.gluster.com/glusterfs.git
>>>>>>> Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
>>>>>>> GlusterFS comes with ABSOLUTELY NO WARRANTY.
>>>>>>> You may redistribute copies of GlusterFS under the terms of the
>>> GNU
>>>>>>> General Public License.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----------------------------------------
>>>>>>> [root at mseas-data2 ~]# gluster volume info
>>>>>>> -----------------------------------------
>>>>>>> Volume Name: data-volume
>>>>>>> Type: Distribute
>>>>>>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>>>>>>> Status: Started
>>>>>>> Number of Bricks: 4
>>>>>>> Transport-type: tcp
>>>>>>> Bricks:
>>>>>>> Brick1: mseas-data2:/mnt/brick1
>>>>>>> Brick2: mseas-data2:/mnt/brick2
>>>>>>> Brick3: mseas-data3:/export/sda/brick3
>>>>>>> Brick4: mseas-data3:/export/sdc/brick4
>>>>>>> Options Reconfigured:
>>>>>>> nfs.export-volumes: off
>>>>>>> nfs.disable: on
>>>>>>> performance.readdir-ahead: on
>>>>>>> diagnostics.brick-sys-log-level: WARNING
>>>>>>> nfs.exports-auth-enable: on
>>>>>>> server.allow-insecure: on
>>>>>>> auth.allow: *
>>>>>>> disperse.eager-lock: off
>>>>>>> performance.open-behind: off
>>>>>>> performance.md-cache-timeout: 60
>>>>>>> network.inode-lru-limit: 50000
>>>>>>> diagnostics.client-log-level: ERROR
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------
>>>>>>> [root at mseas-data2 ~]# gluster volume status data-volume detail
>>>>>>> --------------------------------------------------------------
>>>>>>> Status of volume: data-volume
>>>>>>>
>>>
>------------------------------------------------------------------------------
>>>
>>>>>>
>>>>>>> Brick                : Brick mseas-data2:/mnt/brick1
>>>>>>> TCP Port             : 49154
>>>>>>> RDMA Port            : 0
>>>>>>> Online               : Y
>>>>>>> Pid                  : 4601
>>>>>>> File System          : xfs
>>>>>>> Device               : /dev/sda
>>>>>>> Mount Options        : rw
>>>>>>> Inode Size           : 256
>>>>>>> Disk Space Free      : 318.8GB
>>>>>>> Total Disk Space     : 163.7TB
>>>>>>> Inode Count          : 1365878288
>>>>>>> Free Inodes          : 1337173596
>>>>>>>
>>>
>------------------------------------------------------------------------------
>>>
>>>>>>
>>>>>>> Brick                : Brick mseas-data2:/mnt/brick2
>>>>>>> TCP Port             : 49155
>>>>>>> RDMA Port            : 0
>>>>>>> Online               : Y
>>>>>>> Pid                  : 7949
>>>>>>> File System          : xfs
>>>>>>> Device               : /dev/sdb
>>>>>>> Mount Options        : rw
>>>>>>> Inode Size           : 256
>>>>>>> Disk Space Free      : 319.8GB
>>>>>>> Total Disk Space     : 163.7TB
>>>>>>> Inode Count          : 1372421408
>>>>>>> Free Inodes          : 1341219039
>>>>>>>
>>>
>------------------------------------------------------------------------------
>>>
>>>>>>
>>>>>>> Brick                : Brick mseas-data3:/export/sda/brick3
>>>>>>> TCP Port             : 49153
>>>>>>> RDMA Port            : 0
>>>>>>> Online               : Y
>>>>>>> Pid                  : 4650
>>>>>>> File System          : xfs
>>>>>>> Device               : /dev/sda
>>>>>>> Mount Options        : rw
>>>>>>> Inode Size           : 512
>>>>>>> Disk Space Free      : 325.3GB
>>>>>>> Total Disk Space     : 91.0TB
>>>>>>> Inode Count          : 692001992
>>>>>>> Free Inodes          : 682188893
>>>>>>>
>>>
>------------------------------------------------------------------------------
>>>
>>>>>>
>>>>>>> Brick                : Brick mseas-data3:/export/sdc/brick4
>>>>>>> TCP Port             : 49154
>>>>>>> RDMA Port            : 0
>>>>>>> Online               : Y
>>>>>>> Pid                  : 23772
>>>>>>> File System          : xfs
>>>>>>> Device               : /dev/mapper/vg_Data4-lv_Data4
>>>>>>> Mount Options        : rw
>>>>>>> Inode Size           : 256
>>>>>>> Disk Space Free      : 3.4TB
>>>>>>> Total Disk Space     : 90.9TB
>>>>>>> Inode Count          : 3906272768
>>>>>>> Free Inodes          : 3894809903
>>>>>>>
>>>>> Hi Pat,
>>>>>
>>>>> What is the output of:
>>>>> gluster  volume get data-volume all | grep cluster.min-free
>>>>>
>>>>> 1% of 164 T is  1640G , but in your case you have only 324G which
>is
>>>>> way lower.
>>>>>
>>>>> Best Regards,
>>>>> Strahil Nikolov
>> Hey Pat,
>>
>> Some users  have reported  they are  using a value of 1% and it seems
>to be working.
>>
>> Most probably you will be able to do it live, but I have never had to
>change that. You can give a try on a test cluster.
>>
>> Best Regards,
>> Strahil Nikolov

Hey Pat,

I'm glad  you can write files now.

I guess it's related to the rebalance story.
Actually I never hit that, but I guess you can try to move it to another folder and back to the original location. This is pure speculation !

mv /mnt/somedir/somedir2/file /mnt/somedir
mv  /mnt/somedir/file /mnt/somedir/somedir2

Yet, you need to run a rebalance if your files are not placed evenly between the brick.

Best Regards,
Strahil Nikolov