[Bugs] [Bug 1411901] New: DHT doesn't evenly balance files on FreeBSD with ZFS

bugzilla at redhat.com bugzilla at redhat.com
Tue Jan 10 17:20:08 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1411901

            Bug ID: 1411901
           Summary: DHT doesn't evenly balance files on FreeBSD with ZFS
           Product: GlusterFS
           Version: 3.7.18
         Component: distribute
          Keywords: Triaged
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: xhernandez at datalab.es
                CC: bperkins at redhat.com, bugs at gluster.org,
                    jdarcy at redhat.com, kjohnson at gnulnx.net
        Depends On: 1356076
            Blocks: 1411898, 1411899



+++ This bug was initially created as a clone of Bug #1356076 +++

Description of problem:

On a pure distributed volume with one brick being a FreeBSD node with ZFS as
filesystem and the other a Linux, dht puts ten more times data on FreeBSD node
(3 TB vs 30 TB)

Version-Release number of selected component (if applicable): mainline


How reproducible:

Not sure

Steps to Reproduce:
1. Create a distributed volume with two bricks: one on a FreeBSD/ZFS and
another one on a CentOS
2. Start copying files
3.

Actual results:

almost all files are placed in the FreeBSD node.

Expected results:

nearly 50% of files should be placed in each node.

Additional info:

A "gluster volume status detail" command shows a space on FreeBSD filesystem
much bigger that it really is (~256 times bigger). It also doesn't detect the
filesystem and some other information:

    File System          : N/A
    Device               : N/A
    Mount Options        : N/A
    Inode Size           : N/A
    Disk Space Free      : 2.6PB
    Total Disk Space     : 12.6PB

Real brick space is 45TB

A statvfs() call on FreeBSD returns this:

    f_frsize: 512
    f_bsize:  131072

>From statvfs() man page on FreeBSD:

    "The statvfs() and fstatvfs() functions fill the structure pointed to by
     buf with garbage.  This garbage will occasionally bear resemblance to
     file system statistics, but portable applications must not depend on
     this.  Applications must pass a pathname or file descriptor which refers
     to a file on the file system in which they are interested."

    "f_frsize   The size in bytes of the minimum unit of allocation on
                this file system.  (This corresponds to the f_bsize mem-
                ber of struct statfs.)"

    "f_bsize    The preferred length of I/O requests for files on this
                file system.  (Corresponds to the f_iosize member of
                struct statfs.)"

Probably gluster uses f_bsize as the block size, but on FreeBSD it's the
optimal I/O size, not the block size.

As a workaround, disabling 'weighted-rebalance' distributes files evenly
between bricks.

--- Additional comment from Jeff Darcy on 2016-07-13 17:54:25 CEST ---

You're probably right, Xavier.  Unfortunately, Linux and FreeBSD seem to have
some fundamental disagreements about what these fields mean, so we'll probably
have to add some platform-conditional code in some of the several places that
use them.  I also doubt that this is the last problem we'll find in
OS-heterogeneous clusters.  :(

--- Additional comment from Xavier Hernandez on 2016-07-14 08:17:25 CEST ---

I agree.

We are using wrapped system calls in many places right now (syscall.h). Maybe
we should enforce the usage of these wrappers and place the specific OS code in
syscall.c.

For this particular case we could solve the problem simply by setting f_bsize =
f_frsize on FreeBSD.


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1356076
[Bug 1356076] DHT doesn't evenly balance files on FreeBSD with ZFS
https://bugzilla.redhat.com/show_bug.cgi?id=1411898
[Bug 1411898] DHT doesn't evenly balance files on FreeBSD with ZFS
https://bugzilla.redhat.com/show_bug.cgi?id=1411899
[Bug 1411899] DHT doesn't evenly balance files on FreeBSD with ZFS
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list