[Bugs] [Bug 1379769] New: GlusterFS fails to build on old Linux distros with linux/oom.h missing

Tue Sep 27 15:38:54 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1379769

            Bug ID: 1379769
           Summary: GlusterFS fails to build on old Linux distros with
                    linux/oom.h missing
           Product: GlusterFS
           Version: mainline
         Component: core
          Severity: low
          Assignee: bugs at gluster.org
          Reporter: oleksandr at natalenko.name
                CC: bugs at gluster.org

Created attachment 1205274
  --> https://bugzilla.redhat.com/attachment.cgi?id=1205274&action=edit
Initial patch

Milind Changire has reported that GlusterFS fails to build under RHEL5 because
it does not have linux/oom.h header.

This header is used purely to obtain OOM-related constants.

Also, this issue raises the question how OOM should be managed under old
kernels. From man 5 proc we see this:

===
       /proc/[pid]/oom_adj (since Linux 2.6.11)
              This file can be used to adjust the score used to select which
process should be killed in an out-of-memory (OOM) situation.  The kernel uses 
this  value
              for  a  bit-shift  operation  of  the process's oom_score value:
valid values are in the range -16 to +15, plus the special value -17, which
disables OOM-
              killing altogether for this process.  A positive score increases
the likelihood of this process being killed by the OOM-killer; a negative score
decreases
              the likelihood.

              The default value for this file is 0; a new process inherits its
parent's oom_adj setting.  A process must be privileged (CAP_SYS_RESOURCE) to
update this
              file.

              Since Linux 2.6.36, use of this file is deprecated in favor of
/proc/[pid]/oom_score_adj.

...

       /proc/[pid]/oom_score_adj (since Linux 2.6.36)
              This file can be used to adjust the badness heuristic used to
select which process gets killed in out-of-memory conditions.

              The badness heuristic assigns a value to each candidate task
ranging from 0 (never kill) to 1000 (always kill) to determine  which  process 
is  targeted.
              The units are roughly a proportion along that range of allowed
memory the process may allocate from, based on an estimation of its current
memory and swap
              use.  For example, if a task is using all allowed memory, its
badness score will be 1000.  If it is using half of its allowed memory, its 
score  will  be
              500.

              There is an additional factor included in the badness score: root
processes are given 3% extra memory over other tasks.

              The  amount  of  "allowed" memory depends on the context in which
the OOM-killer was called.  If it is due to the memory assigned to the
allocating task's
              cpuset being exhausted, the allowed memory represents the set of
mems assigned to that cpuset (see cpuset(7)).  If it is  due  to  a 
mempolicy's  node(s)
              being exhausted, the allowed memory represents the set of
mempolicy nodes.  If it is due to a memory limit (or swap limit) being reached,
the allowed mem‐
              ory is that configured limit.  Finally, if it is due to the
entire system being out of memory, the allowed memory represents all
allocatable resources.

              The value of oom_score_adj is added to the badness score before
it is used  to  determine  which  task  to  kill.   Acceptable  values  range 
from  -1000
              (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX).  This allows
user space to control the preference for OOM-killing, ranging from always
preferring a cer‐
              tain task or completely disabling it from OOM killing.  The
lowest possible value, -1000, is equivalent to disabling OOM-killing entirely
for  that  task,
              since it will always report a badness score of 0.

              Consequently, it is very simple for user space to define the
amount of memory to consider for each task.  Setting a oom_score_adj value of
+500, for exam‐
              ple, is roughly equivalent to allowing the remainder of tasks
sharing the same system, cpuset, mempolicy, or memory controller resources to
use  at  least
              50% more memory.  A value of -500, on the other hand, would be
roughly equivalent to discounting 50% of the task's allowed memory from being
considered as
              scoring against the task.

              For backward compatibility with previous kernels,
/proc/[pid]/oom_adj can still be used to tune the badness score.  Its  value 
is  scaled  linearly  with
              oom_score_adj.

              Writing to /proc/[pid]/oom_score_adj or /proc/[pid]/oom_adj will
change the other with its scaled value.
===

In summary, for kernels older that 2.6.11 we must disable OOM-related code
completely, for kernels from 2.6.11 to 2.6.35 incl we must use old interface
(/proc/[pid]/oom_adj), and starting from 2.6.36 we must use
/proc/[pid]/oom_score_adj.

It is not that simple obviously. For example, RHEL6 while having 2.6.32 kernel,
provides /proc/[pid]/oom_score_adj interface. So, I guess, we must to this:

1) if there is no /proc/self/oom_adj and no /proc/self/oom_score_adj, consider
this kernel to be too old and disable OOM-related code (i.e. not define
HAVE_LINUX_OOM_PROC);
2) if there is /proc/self/oom_adj, but no /proc/self/oom_score_adj, consider
this kernel to be old and use old OOM /proc interface (define
HAVE_LINUX_OOM_PROC and HAVE_LINUX_OOM_PROC_V1, for example);
3) if there is /proc/self/oom_score_adj, work as we do now (and define
HAVE_LINUX_OOM_PROC_V2 or so);
4) if there is linux/oom.h, use it for constants (define HAVE_LINUX_OOM_H),
otherwise define necessary constants manually.

Not defining HAVE_LINUX_OOM_PROC will throw away OOM-related code completely.
HAVE_LINUX_OOM_V1/HAVE_LINUX_OOM_V2 option will switch the code to write to
specific /proc file as well as constants to deal with. In case we have
HAVE_LINUX_OOM_PROC (V1 or V2), but do not have HAVE_LINUX_OOM_H, we might end
up doing this:

===
#define OOM_DISABLE -17
#define OOM_ADJUST_MIN -16
#define OOM_ADJUST_MAX 15
#define OOM_SCORE_ADJ_MIN       (-1000)
#define OOM_SCORE_ADJ_MAX       1000
===

With this changes we'll cover all the possibilities one may face while
compiling GlusterFS against relatively old kernel.

Attaching initial Milind's patch as a proof-of-concept, but will take care of
adopting everything written above if there are no objections.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.