[Bugs] [Bug 1277112] New: Data Tiering:File create and new writes to existing file fails when the hot tier is full instead of redirecting/flushing the data to cold tier

bugzilla at redhat.com bugzilla at redhat.com
Mon Nov 2 12:07:42 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1277112

            Bug ID: 1277112
           Summary: Data Tiering:File create and new writes to existing
                    file fails when the hot tier is full instead of
                    redirecting/flushing the data to cold tier
           Product: Red Hat Gluster Storage
         Component: glusterfs
     Sub Component: tiering
          Keywords: Reopened
          Severity: urgent
          Priority: medium
          Assignee: rhs-bugs at redhat.com
          Reporter: nchilaka at redhat.com
        QA Contact: nchilaka at redhat.com
                CC: bugs at gluster.org, dlambrig at redhat.com,
                    nbalacha at redhat.com, vagarwal at redhat.com
        Depends On: 1259312
            Blocks: 1260923



+++ This bug was initially created as a clone of Bug #1259312 +++

Description of problem:
======================
given that the hot tier is mostly a costly storage space, it is highly possible
for the hot tier to have very less disk size compared to the cold tier.
So, in this problem, while I a, doing writes to a file and if the hot tier is
full, the new writes fail. Also, Iam able to create any more new files even
though the cold tier is largely free



Version-Release number of selected component (if applicable):
=========================================================
[root at nag-manual-node1 brick999]# gluster --version
glusterfs 3.7.3 built on Aug 27 2015 01:23:05
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General
Public License.
[root at nag-manual-node1 brick999]# rpm -qa|grep gluster
glusterfs-libs-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-fuse-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-server-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-api-3.7.3-0.82.git6c4096f.el6.x86_64
glusterfs-cli-3.7.3-0.82.git6c4096f.el6.x86_64
python-gluster-3.7.3-0.82.git6c4096f.el6.noarch
glusterfs-client-xlators-3.7.3-0.82.git6c4096f.el6.x86_64
[root at nag-manual-node1 brick999]# 


How reproducible:
=====================
easily

Steps to Reproduce:
====================
1.have a cold tier with huge space and hot tier with say about only 1GB space
2.Now turn on ctr and set the demote freq to say a large value like 3600(1hr)
3.Now fill the volume with about 990MB space(which will go to hot tier)
4. Now create a file (may be using dd command) which is about 100Mb
5. Now while creating , the file fails to acccomodate any more writes after
10MB as hot tier is full
6. Also try to create new files(to see if new files go to cold tier, as hot si
filled)


Actual results:
===============
Exisin file writes and new file creates fail when hot tier is full, even though
cold  tier is largely free.

Expected results:
=================
Make new writes /file creates to go to cold tier when hot tier is full or flush
the relatively cold files in hot tier to accomodate new files, irrespective of
demote freq.

Additional info:


Work-Around:
==========
Wait for the next CTR promote/demote cycle to kick in




CLI Log:
==========
[root at nag-manual-nfsclient1 srt]# dd if=/dev/urandom of=junkrandom.120m bs=1024
count=120000
120000+0 records in
120000+0 records out
122880000 bytes (123 MB) copied, 30.1008 s, 4.1 MB/s
[root at nag-manual-nfsclient1 srt]# ll
total 1320000
-rw-r--r--. 1 root root 1228800000 Sep  2 22:10 junkrandom
-rw-r--r--. 1 root root  122880000 Sep  2 22:13 junkrandom.120m
[root at nag-manual-nfsclient1 srt]# du -sh *
1.2G    junkrandom
118M    junkrandom.120m
[root at nag-manual-nfsclient1 srt]# cp junkrandom.120m junkrandom.120m1
========creating a 300MB file when there is hardly 35MB free
[root at nag-manual-nfsclient1 srt]# dd if=/dev/urandom of=bricklimit bs=1024
count=3000000
dd: writing `bricklimit': No space left on device
160892+0 records in
160891+0 records out
164752384 bytes (165 MB) copied, 35.6738 s, 4.6 MB/s
[root at nag-manual-nfsclient1 srt]# dd if=/dev/urandom of=bricklimit.1 bs=1024
count=3000000
dd: opening `bricklimit.1': No space left on device
[root at nag-manual-nfsclient1 srt]# 
[root at nag-manual-nfsclient1 srt]# 
[root at nag-manual-nfsclient1 srt]# 
[root at nag-manual-nfsclient1 srt]# ls -l
total 1475652
-rw-r--r--. 1 root root   47190016 Sep  2 22:17 bricklimit
-rw-r--r--. 1 root root 1228800000 Sep  2 22:10 junkrandom
-rw-r--r--. 1 root root  122880000 Sep  2 22:13 junkrandom.120m
-rw-r--r--. 1 root root  122880000 Sep  2 22:14 junkrandom.120m1
[root at nag-manual-nfsclient1 srt]# touch newf1
touch: cannot touch `newf1': No space left on device
[root at nag-manual-nfsclient1 srt]# touch newf1
touch: cannot touch `newf1': No space left on device
[root at nag-manual-nfsclient1 srt]# touch newf1
touch: cannot touch `newf1': No space left on device
[root at nag-manual-nfsclient1 srt]# du -sh *
35M    bricklimit
1.2G    junkrandom
118M    junkrandom.120m
118M    junkrandom.120m1
[root at nag-manual-nfsclient1 srt]#

--- Additional comment from nchilaka on 2015-09-02 08:11:50 EDT ---

sosreports @ 
rhsqe-repo.lab.eng.blr.redhat.com:/home/repo/sosreports/bug.1259312/

--- Additional comment from nchilaka on 2015-09-02 08:14:12 EDT ---



--- Additional comment from nchilaka on 2015-10-13 05:40:15 EDT ---

This is failing on glusterfs-server-3.7.5-0.18

[root at rhel7-autofuseclient estonia]# for i in {1..100};do touch x.$i;done
touch: cannot touch ‘x.1’: No space left on device
touch: cannot touch ‘x.2’: No space left on device
touch: cannot touch ‘x.3’: No space left on device
touch: cannot touch ‘x.4’: No space left on device
touch: cannot touch ‘x.5’: No space left on device
touch: cannot touch ‘x.6’: No space left on device
touch: cannot touch ‘x.7’: No space left on device
touch: cannot touch ‘x.8’: No space left on device
touch: cannot touch ‘x.9’: No space left on device
touch: cannot touch ‘x.10’: No space left on device
touch: cannot touch ‘x.11’: No space left on device
touch: cannot touch ‘x.12’: No space left on device
touch: cannot touch ‘x.13’: No space left on device
touch: cannot touch ‘x.14’: No space left on device
touch: cannot touch ‘x.15’: No space left on device
touch: cannot touch ‘x.16’: No space left on device
touch: cannot touch ‘x.17’: No space left on device
touch: cannot touch ‘x.18’: No space left on device
touch: cannot touch ‘x.19’: No space left on device
touch: cannot touch ‘x.20’: No space left on device
touch: cannot touch ‘x.21’: No space left on device
touch: cannot touch ‘x.22’: No space left on device
touch: cannot touch ‘x.23’: No space left on device
touch: cannot touch ‘x.24’: No space left on device
touch: cannot touch ‘x.25’: No space left on device
touch: cannot touch ‘x.26’: No space left on device
touch: cannot touch ‘x.27’: No space left on device
touch: cannot touch ‘x.28’: No space left on device
touch: cannot touch ‘x.29’: No space left on device
touch: cannot touch ‘x.30’: No space left on device
touch: cannot touch ‘x.31’: No space left on device
touch: cannot touch ‘x.32’: No space left on device
touch: cannot touch ‘x.33’: No space left on device
touch: cannot touch ‘x.34’: No space left on device
touch: cannot touch ‘x.35’: No space left on device
touch: cannot touch ‘x.36’: No space left on device
touch: cannot touch ‘x.37’: No space left on device
touch: cannot touch ‘x.38’: No space left on device
touch: cannot touch ‘x.39’: No space left on device
touch: cannot touch ‘x.40’: No space left on device
touch: cannot touch ‘x.41’: No space left on device
touch: cannot touch ‘x.42’: No space left on device
touch: cannot touch ‘x.43’: No space left on device
touch: cannot touch ‘x.44’: No space left on device
touch: cannot touch ‘x.45’: No space left on device
touch: cannot touch ‘x.46’: No space left on device
touch: cannot touch ‘x.47’: No space left on device
touch: cannot touch ‘x.48’: No space left on device
touch: cannot touch ‘x.49’: No space left on device
touch: cannot touch ‘x.50’: No space left on device
touch: cannot touch ‘x.51’: No space left on device
touch: cannot touch ‘x.52’: No space left on device
touch: cannot touch ‘x.53’: No space left on device
touch: cannot touch ‘x.54’: No space left on device
touch: cannot touch ‘x.55’: No space left on device
touch: cannot touch ‘x.56’: No space left on device



refer bz#1271151

--- Additional comment from Dan Lambright on 2015-10-26 13:42:11 EDT ---

My cannot do anything with write updates, but my understanding is DHT has a
mechanism to redirect new file creates to a different brick when the hashed sub
volume is full.  Need to understand why this does not work with tier.

--- Additional comment from Nithya Balachandran on 2015-10-30 04:39:40 EDT ---

What was the name of the volume on which this failed?

--- Additional comment from Nithya Balachandran on 2015-10-30 05:15:33 EDT ---

I tried this on the latest master code and could not reproduce the issue where
new file creates were still going to the hashed subvol. 

We cannot do anything about the writes until the file is moved to the cold tier
- this is existing DHT behaviour.


I am moving this to WorksForMe. Please reopen if seen again.

--- Additional comment from Nithya Balachandran on 2015-10-30 05:15:33 EDT ---

I tried this on the latest master code and could not reproduce the issue where
new file creates were still going to the hashed subvol. 

We cannot do anything about the writes until the file is moved to the cold tier
- this is existing DHT behaviour.


I am moving this to WorksForMe. Please reopen if seen again.

--- Additional comment from nchilaka on 2015-11-02 07:05:02 EST ---

Hi,
I saw this not working on latest downstream.
I am cloning this to downstream for further tracking


===========================
[root at mia diskfull]# for i in {1..10};do dd if=/dev/urandom of=bigfi bs=1024
count=300000;done
dd: failed to open ‘bigfi’: No space left on device
dd: failed to open ‘bigfi’: No space left on device
dd: failed to open ‘bigfi’: No space left on device
dd: failed to open ‘bigfi’: No space left on device
dd: failed to open ‘bigfi’: No space left on device
dd: failed to open ‘bigfi’: No space left on device
dd: failed to open ‘bigfi’: No space left on device
dd: failed to open ‘bigfi’: No space left on device
dd: failed to open ‘bigfi’: No space left on device
dd: failed to open ‘bigfi’: No space left on device



fuse mount logs:
==================
d 9 times between [2015-11-02 04:40:08.609241] and [2015-11-02 04:40:08.723177]
The message "E [MSGID: 114031] [client-rpc-fops.c:251:client3_3_mknod_cbk]
0-diskfull-client-6: remote operation failed. Path: /bigfi [No space left on
device]" repeated 9 times between [2015-11-02 04:40:08.609351] and [2015-11-02
04:40:08.723197]
^C
[root at mia glusterfs]# tail -f mnt-diskfull.log 
[2015-11-02 04:40:08.635409] W [fuse-bridge.c:1978:fuse_create_cbk]
0-glusterfs-fuse: 30100: /bigfi => -1 (No space left on device)
[2015-11-02 04:40:08.647871] W [fuse-bridge.c:1978:fuse_create_cbk]
0-glusterfs-fuse: 30102: /bigfi => -1 (No space left on device)
[2015-11-02 04:40:08.660484] W [fuse-bridge.c:1978:fuse_create_cbk]
0-glusterfs-fuse: 30104: /bigfi => -1 (No space left on device)
[2015-11-02 04:40:08.673046] W [fuse-bridge.c:1978:fuse_create_cbk]
0-glusterfs-fuse: 30106: /bigfi => -1 (No space left on device)
[2015-11-02 04:40:08.685465] W [fuse-bridge.c:1978:fuse_create_cbk]
0-glusterfs-fuse: 30108: /bigfi => -1 (No space left on device)
[2015-11-02 04:40:08.698471] W [fuse-bridge.c:1978:fuse_create_cbk]
0-glusterfs-fuse: 30110: /bigfi => -1 (No space left on device)
[2015-11-02 04:40:08.710918] W [fuse-bridge.c:1978:fuse_create_cbk]
0-glusterfs-fuse: 30112: /bigfi => -1 (No space left on device)
[2015-11-02 04:40:08.723667] W [fuse-bridge.c:1978:fuse_create_cbk]
0-glusterfs-fuse: 30114: /bigfi => -1 (No space left on device)
The message "E [MSGID: 114031] [client-rpc-fops.c:251:client3_3_mknod_cbk]
0-diskfull-client-7: remote operation failed. Path: /bigfi [No space left on
device]" repeated 9 times between [2015-11-02 04:40:08.609241] and [2015-11-02
04:40:08.723177]
The message "E [MSGID: 114031] [client-rpc-fops.c:251:client3_3_mknod_cbk]
0-diskfull-client-6: remote operation failed. Path: /bigfi [No space left on
device]" repeated 9 times between [2015-11-02 04:40:08.609351] and [2015-11-02
04:40:08.723197]


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1259312
[Bug 1259312] Data Tiering:File create and new writes to existing file
fails when the hot tier is full instead of redirecting/flushing the data to
cold tier
https://bugzilla.redhat.com/show_bug.cgi?id=1260923
[Bug 1260923] Tracker for tiering in 3.1.2
-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=UHrt4JW97c&a=cc_unsubscribe


More information about the Bugs mailing list