[Gluster-devel] tiering: emergency demotions

Fri Aug 12 13:57:35 UTC 2016

On 08/10/2016 12:06 PM, Milind Changire wrote:
> Emergency demotions will be required whenever writes breach the
> hi-watermark. Emergency demotions are required to avoid ENOSPC in case
> of continuous writes that originate on the hot tier.
>
> There are two concerns in this area:
>
> 1. enforcing max-cycle-time during emergency demotions
>    max-cycle-time is the time the tiering daemon spends in promotions or
>    demotions
>    I tend to think that the tiering daemon skip this check for the
>    emergency situation and continue demotions until the watermark drops
>    below the hi-watermark

Update:
To keep matters simple and manageable, it has been decided to *enforce*
max-cycle-time to yield the worker threads to attend to impending tier
management tasks if the need arises.

>
> 2. file demotion policy
>    I tend to think that evicting the largest file with the most recent
>    *write* should be chosen for eviction when write-freq-threshold is
>    NON-ZERO.
>    Choosing a least written file is just going to delay file migration
>    of an active file which might consume hot tier disk space resulting
>    in a ENOSPC, in the worst case.
>    In cases where write-freq-threshold are ZERO, the most recently
>    *written* file can be chosen for eviction.
>    In the case of choosing the largest file within the
>    write-freq-threshold, a stat() on the files would be required to
>    calculate the number of files that need to be demoted to take the
>    watermark below the hi-watermark. Finding the number of most recently
>    written files to demote could also help make demotions in parallel
>    rather than in the sequential manner currently in place.

Update:
The idea of choosing the files wrt file size has been dropped.
Iteratively, the most recently written file will be chosen for eviction
from the hot tier in case of a hi-watermark breach and until the
watermark drops below hi-watermark.
The idea of parallelizing multiple promotions/demotions has been
deferred.

-----

Sustained writes creating larges files in the hot tier which
cumulatively breach the hi-watermark does NOT seem to be a good
workload for making use of tiering. The assumption is that, to make the 
most of of the hot tier, the hi-watermark would be closer to 100.
In this case a sustained large file copy might easily breach the
hi-watermark and may even consume the entire hot tier space, resulting
in a ENOSPC.

eg. an example of a sustained write

# cp file1 /mnt/glustervol/dir

Workloads that would seem to make the most of tiering are:
1. Many smaller files, which are created in small bursts of write
    activity and then closed
2. Few large files where updates are in-place and the file size
    does not grow beyond the hi-watermark eg. database, with frequent
    in-line compaction/de-fragmentation policy enabled
3. Frequent reads of few large files, mostly static in size, which
    cumulatively don't breach the hi-watermark. Frequently reading
    a large number of smaller, mostly static, files would be good
    tiering workload candidates as well.

>
> Comments are requested.
>