[Bugs] [Bug 1402215] New: [RFE] enable sharding and strict-o-direct with virt profile - /var/lib/glusterd /groups/virt

bugzilla at redhat.com bugzilla at redhat.com
Wed Dec 7 05:03:07 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1402215

            Bug ID: 1402215
           Summary: [RFE] enable sharding and strict-o-direct with virt
                    profile - /var/lib/glusterd/groups/virt
           Product: GlusterFS
           Version: 3.7.18
         Component: glusterd
          Keywords: Triaged
          Severity: medium
          Assignee: bugs at gluster.org
          Reporter: kdhananj at redhat.com
                CC: amukherj at redhat.com, bugs at gluster.org,
                    kdhananj at redhat.com, pkarampu at redhat.com,
                    sasundar at redhat.com, vbellur at redhat.com
        Depends On: 1375431
            Blocks: 1375849, 1376464



+++ This bug was initially created as a clone of Bug #1375431 +++

Description of problem:
-----------------------
Sharding seems to be most vital for the virt store usecase and this needs to be
turned when the volume is optimized for virt-store

The following options needs to be added to the virt profile -
/var/lib/glusterd/groups/virt

features.shard=on
cluster.data-self-heal-algorithm=full

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
mainline

How reproducible:
-----------------
Not applicable as this is a RFE

Steps to Reproduce:
-------------------
Not applicable as this is a RFE

Actual results:
---------------
sharding is not enabled by default by optimizing the volume for virt store

Expected results:
-----------------
Sharding should be enabled by default on optimizing the gluster volume for virt
store usecase

--- Additional comment from SATHEESARAN on 2016-09-14 09:08:42 EDT ---

strict-o-direct also needs to be turned on and remote-dio to be turned off

In total there are 4 options :

features.shard=on
cluster.data-self-heal-algorithm=full
performance.strict-o-direct=on
network.remote-dio=disable

--- Additional comment from Worker Ant on 2016-12-01 09:34:21 EST ---

REVIEW: http://review.gluster.org/15995 (extras: Include shard and
full-data-heal in virt group) posted (#1) for review on master by Krutika
Dhananjay (kdhananj at redhat.com)

--- Additional comment from Krutika Dhananjay on 2016-12-01 11:54:30 EST ---

(In reply to SATHEESARAN from comment #1)
> strict-o-direct also needs to be turned on and remote-dio to be turned off
> 
> In total there are 4 options :
> 
> features.shard=on
> cluster.data-self-heal-algorithm=full
> performance.strict-o-direct=on
> network.remote-dio=disable

Just for the record, the option features.shard=on and
cluster.data-self-heal-algorithm=full will be added to group virt. O-DIRECT
options will be skipped since not all users might want to use cache=none qemu
option, and so it is best to configure them separately.
-Krutika

--- Additional comment from Pranith Kumar K on 2016-12-02 00:02:03 EST ---

(In reply to Krutika Dhananjay from comment #3)
> (In reply to SATHEESARAN from comment #1)
> > strict-o-direct also needs to be turned on and remote-dio to be turned off
> > 
> > In total there are 4 options :
> > 
> > features.shard=on
> > cluster.data-self-heal-algorithm=full
> > performance.strict-o-direct=on
> > network.remote-dio=disable
> 
> Just for the record, the option features.shard=on and
> cluster.data-self-heal-algorithm=full will be added to group virt. O-DIRECT
> options will be skipped since not all users might want to use cache=none
> qemu option, and so it is best to configure them separately.
> -Krutika

odirect options honour the o-direct flag for open. Does qemu always open with
o-direct even when cache is not set as 'none'?

--- Additional comment from Krutika Dhananjay on 2016-12-02 00:13:00 EST ---

(In reply to Pranith Kumar K from comment #4)
> (In reply to Krutika Dhananjay from comment #3)
> > (In reply to SATHEESARAN from comment #1)
> > > strict-o-direct also needs to be turned on and remote-dio to be turned off
> > > 
> > > In total there are 4 options :
> > > 
> > > features.shard=on
> > > cluster.data-self-heal-algorithm=full
> > > performance.strict-o-direct=on
> > > network.remote-dio=disable
> > 
> > Just for the record, the option features.shard=on and
> > cluster.data-self-heal-algorithm=full will be added to group virt. O-DIRECT
> > options will be skipped since not all users might want to use cache=none
> > qemu option, and so it is best to configure them separately.
> > -Krutika
> 
> odirect options honour the o-direct flag for open. Does qemu always open
> with o-direct even when cache is not set as 'none'?

When cache=none is not used, I believe qemu won't be passing O_DIRECT flag.
Now that I remember, there was one more reason Vijay mentioned about a certain
ping-timeout issue, which if fixed, we won't need to rely on any of the odirect
option (even if cache=none is used by qemu).

-Krutika

--- Additional comment from Pranith Kumar K on 2016-12-02 00:23:30 EST ---

(In reply to Krutika Dhananjay from comment #5)
> (In reply to Pranith Kumar K from comment #4)
> > (In reply to Krutika Dhananjay from comment #3)
> > > (In reply to SATHEESARAN from comment #1)
> > > > strict-o-direct also needs to be turned on and remote-dio to be turned off
> > > > 
> > > > In total there are 4 options :
> > > > 
> > > > features.shard=on
> > > > cluster.data-self-heal-algorithm=full
> > > > performance.strict-o-direct=on
> > > > network.remote-dio=disable
> > > 
> > > Just for the record, the option features.shard=on and
> > > cluster.data-self-heal-algorithm=full will be added to group virt. O-DIRECT
> > > options will be skipped since not all users might want to use cache=none
> > > qemu option, and so it is best to configure them separately.
> > > -Krutika
> > 
> > odirect options honour the o-direct flag for open. Does qemu always open
> > with o-direct even when cache is not set as 'none'?
> 
> When cache=none is not used, I believe qemu won't be passing O_DIRECT flag.
> Now that I remember, there was one more reason Vijay mentioned about a
> certain ping-timeout issue, which if fixed, we won't need to rely on any of
> the odirect option (even if cache=none is used by qemu).
> 
> -Krutika

Okay, so what is the plan for the deployments? Are we going to suggest users to
apply virt-profile and explicitly set remote-dio to off in every deployment,
considering that cache=none is used by default?

--- Additional comment from Krutika Dhananjay on 2016-12-02 00:51:05 EST ---

(In reply to Pranith Kumar K from comment #6)
> (In reply to Krutika Dhananjay from comment #5)
> > (In reply to Pranith Kumar K from comment #4)
> > > (In reply to Krutika Dhananjay from comment #3)
> > > > (In reply to SATHEESARAN from comment #1)
> > > > > strict-o-direct also needs to be turned on and remote-dio to be turned off
> > > > > 
> > > > > In total there are 4 options :
> > > > > 
> > > > > features.shard=on
> > > > > cluster.data-self-heal-algorithm=full
> > > > > performance.strict-o-direct=on
> > > > > network.remote-dio=disable
> > > > 
> > > > Just for the record, the option features.shard=on and
> > > > cluster.data-self-heal-algorithm=full will be added to group virt. O-DIRECT
> > > > options will be skipped since not all users might want to use cache=none
> > > > qemu option, and so it is best to configure them separately.
> > > > -Krutika
> > > 
> > > odirect options honour the o-direct flag for open. Does qemu always open
> > > with o-direct even when cache is not set as 'none'?
> > 
> > When cache=none is not used, I believe qemu won't be passing O_DIRECT flag.
> > Now that I remember, there was one more reason Vijay mentioned about a
> > certain ping-timeout issue, which if fixed, we won't need to rely on any of
> > the odirect option (even if cache=none is used by qemu).
> > 
> > -Krutika
> 
> Okay, so what is the plan for the deployments? Are we going to suggest users
> to apply virt-profile and explicitly set remote-dio to off in every
> deployment, considering that cache=none is used by default?

cache=none is used by default? Isn't that a very specific case and confined to
ovirt users alone?

--- Additional comment from Pranith Kumar K on 2016-12-02 01:07:06 EST ---

(In reply to Krutika Dhananjay from comment #7)
> (In reply to Pranith Kumar K from comment #6)
> > (In reply to Krutika Dhananjay from comment #5)
> > > (In reply to Pranith Kumar K from comment #4)
> > > > (In reply to Krutika Dhananjay from comment #3)
> > > > > (In reply to SATHEESARAN from comment #1)
> > > > > > strict-o-direct also needs to be turned on and remote-dio to be turned off
> > > > > > 
> > > > > > In total there are 4 options :
> > > > > > 
> > > > > > features.shard=on
> > > > > > cluster.data-self-heal-algorithm=full
> > > > > > performance.strict-o-direct=on
> > > > > > network.remote-dio=disable
> > > > > 
> > > > > Just for the record, the option features.shard=on and
> > > > > cluster.data-self-heal-algorithm=full will be added to group virt. O-DIRECT
> > > > > options will be skipped since not all users might want to use cache=none
> > > > > qemu option, and so it is best to configure them separately.
> > > > > -Krutika
> > > > 
> > > > odirect options honour the o-direct flag for open. Does qemu always open
> > > > with o-direct even when cache is not set as 'none'?
> > > 
> > > When cache=none is not used, I believe qemu won't be passing O_DIRECT flag.
> > > Now that I remember, there was one more reason Vijay mentioned about a
> > > certain ping-timeout issue, which if fixed, we won't need to rely on any of
> > > the odirect option (even if cache=none is used by qemu).
> > > 
> > > -Krutika
> > 
> > Okay, so what is the plan for the deployments? Are we going to suggest users
> > to apply virt-profile and explicitly set remote-dio to off in every
> > deployment, considering that cache=none is used by default?
> 
> cache=none is used by default? Isn't that a very specific case and confined
> to ovirt users alone?

It seems like quite a few of them are recommending cache as none for different
reasons, including proxmox which is a bit popular in gluster-users:
https://pve.proxmox.com/wiki/Performance_Tweaks#Disk_Cache
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/Virtualization_Tuning_and_Optimization_Guide/index.html#sect-Virtualization_Tuning_Optimization_Guide-BlockIO-Caching

So I am thinking it is better to put it in the profile than to suggest users to
change this for all deployments. It seems to be safer option as well because it
eliminates human errors where users may forget to turn this option off which
may lead to VM pauses.

--- Additional comment from Krutika Dhananjay on 2016-12-02 01:44:43 EST ---

(In reply to Pranith Kumar K from comment #8)
> (In reply to Krutika Dhananjay from comment #7)
> > (In reply to Pranith Kumar K from comment #6)
> > > (In reply to Krutika Dhananjay from comment #5)
> > > > (In reply to Pranith Kumar K from comment #4)
> > > > > (In reply to Krutika Dhananjay from comment #3)
> > > > > > (In reply to SATHEESARAN from comment #1)
> > > > > > > strict-o-direct also needs to be turned on and remote-dio to be turned off
> > > > > > > 
> > > > > > > In total there are 4 options :
> > > > > > > 
> > > > > > > features.shard=on
> > > > > > > cluster.data-self-heal-algorithm=full
> > > > > > > performance.strict-o-direct=on
> > > > > > > network.remote-dio=disable
> > > > > > 
> > > > > > Just for the record, the option features.shard=on and
> > > > > > cluster.data-self-heal-algorithm=full will be added to group virt. O-DIRECT
> > > > > > options will be skipped since not all users might want to use cache=none
> > > > > > qemu option, and so it is best to configure them separately.
> > > > > > -Krutika
> > > > > 
> > > > > odirect options honour the o-direct flag for open. Does qemu always open
> > > > > with o-direct even when cache is not set as 'none'?
> > > > 
> > > > When cache=none is not used, I believe qemu won't be passing O_DIRECT flag.
> > > > Now that I remember, there was one more reason Vijay mentioned about a
> > > > certain ping-timeout issue, which if fixed, we won't need to rely on any of
> > > > the odirect option (even if cache=none is used by qemu).
> > > > 
> > > > -Krutika
> > > 
> > > Okay, so what is the plan for the deployments? Are we going to suggest users
> > > to apply virt-profile and explicitly set remote-dio to off in every
> > > deployment, considering that cache=none is used by default?
> > 
> > cache=none is used by default? Isn't that a very specific case and confined
> > to ovirt users alone?
> 
> It seems like quite a few of them are recommending cache as none for
> different reasons, including proxmox which is a bit popular in gluster-users:
> https://pve.proxmox.com/wiki/Performance_Tweaks#Disk_Cache
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/
> html-single/Virtualization_Tuning_and_Optimization_Guide/index.html#sect-
> Virtualization_Tuning_Optimization_Guide-BlockIO-Caching
> 
> So I am thinking it is better to put it in the profile than to suggest users
> to change this for all deployments. It seems to be safer option as well
> because it eliminates human errors where users may forget to turn this
> option off which may lead to VM pauses.

I'm not entirely convinced. What if not all users have the kind of heavy
workload that was used in testing which led to VM pause and required o-direct
options to be enabled? Why should all users suffer performance penalty
associated with having odirect options set?

--- Additional comment from Pranith Kumar K on 2016-12-02 01:50:39 EST ---

(In reply to Krutika Dhananjay from comment #9)
> (In reply to Pranith Kumar K from comment #8)
> > (In reply to Krutika Dhananjay from comment #7)
> > > (In reply to Pranith Kumar K from comment #6)
> > > > (In reply to Krutika Dhananjay from comment #5)
> > > > > (In reply to Pranith Kumar K from comment #4)
> > > > > > (In reply to Krutika Dhananjay from comment #3)
> > > > > > > (In reply to SATHEESARAN from comment #1)
> > > > > > > > strict-o-direct also needs to be turned on and remote-dio to be turned off
> > > > > > > > 
> > > > > > > > In total there are 4 options :
> > > > > > > > 
> > > > > > > > features.shard=on
> > > > > > > > cluster.data-self-heal-algorithm=full
> > > > > > > > performance.strict-o-direct=on
> > > > > > > > network.remote-dio=disable
> > > > > > > 
> > > > > > > Just for the record, the option features.shard=on and
> > > > > > > cluster.data-self-heal-algorithm=full will be added to group virt. O-DIRECT
> > > > > > > options will be skipped since not all users might want to use cache=none
> > > > > > > qemu option, and so it is best to configure them separately.
> > > > > > > -Krutika
> > > > > > 
> > > > > > odirect options honour the o-direct flag for open. Does qemu always open
> > > > > > with o-direct even when cache is not set as 'none'?
> > > > > 
> > > > > When cache=none is not used, I believe qemu won't be passing O_DIRECT flag.
> > > > > Now that I remember, there was one more reason Vijay mentioned about a
> > > > > certain ping-timeout issue, which if fixed, we won't need to rely on any of
> > > > > the odirect option (even if cache=none is used by qemu).
> > > > > 
> > > > > -Krutika
> > > > 
> > > > Okay, so what is the plan for the deployments? Are we going to suggest users
> > > > to apply virt-profile and explicitly set remote-dio to off in every
> > > > deployment, considering that cache=none is used by default?
> > > 
> > > cache=none is used by default? Isn't that a very specific case and confined
> > > to ovirt users alone?
> > 
> > It seems like quite a few of them are recommending cache as none for
> > different reasons, including proxmox which is a bit popular in gluster-users:
> > https://pve.proxmox.com/wiki/Performance_Tweaks#Disk_Cache
> > https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/
> > html-single/Virtualization_Tuning_and_Optimization_Guide/index.html#sect-
> > Virtualization_Tuning_Optimization_Guide-BlockIO-Caching
> > 
> > So I am thinking it is better to put it in the profile than to suggest users
> > to change this for all deployments. It seems to be safer option as well
> > because it eliminates human errors where users may forget to turn this
> > option off which may lead to VM pauses.
> 
> I'm not entirely convinced. What if not all users have the kind of heavy
> workload that was used in testing which led to VM pause and required
> o-direct options to be enabled? Why should all users suffer performance
> penalty associated with having odirect options set?

Based on the data we have so far disabling remote-dio and enabling
strict-o-direct is safer. Did I get that right? For people who want better
performance based on their workload, they can choose to enable o-direct, but
they do know at the time of enabling that this is the choice they made. But
default option should be the safest one.
If we choose remote-dio=enable as the default, people who are not informed
enough will learn about the problem only after the VM pauses, we do not want
that.

--- Additional comment from Worker Ant on 2016-12-02 02:44:46 EST ---

REVIEW: http://review.gluster.org/16005 (extras: Add odirect options, shard and
full data heal to group virt) posted (#1) for review on master by Krutika
Dhananjay (kdhananj at redhat.com)

--- Additional comment from Pranith Kumar K on 2016-12-02 02:53:48 EST ---

Thanks Krutika for submitting this version of the patch as well.

Vijay,
     Based on our discussion I am of the opinion that enabling odirect options
in the profile is better. Could you let us know if you see any issues with
this?

Pranith

--- Additional comment from Worker Ant on 2016-12-04 23:44:16 EST ---

COMMIT: http://review.gluster.org/15995 committed in master by Pranith Kumar
Karampuri (pkarampu at redhat.com) 
------
commit 45f914ec9c7b15ba8e962b8fae3593f06912c1f0
Author: Krutika Dhananjay <kdhananj at redhat.com>
Date:   Thu Dec 1 17:28:40 2016 +0530

    extras: Include shard and full-data-heal in virt group

    Change-Id: Iea66cb017bd1ab62da9cd65895fa65fc6896108b
    BUG: 1375431
    Signed-off-by: Krutika Dhananjay <kdhananj at redhat.com>
    Reviewed-on: http://review.gluster.org/15995
    Smoke: Gluster Build System <jenkins at build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins at build.gluster.org>
    Reviewed-by: Atin Mukherjee <amukherj at redhat.com>
    CentOS-regression: Gluster Build System <jenkins at build.gluster.org>
    Reviewed-by: Vijay Bellur <vbellur at redhat.com>


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1375431
[Bug 1375431] [RFE] enable sharding and strict-o-direct with virt profile -
/var/lib/glusterd/groups/virt
https://bugzilla.redhat.com/show_bug.cgi?id=1375849
[Bug 1375849] [RFE] enable sharding with virt profile -
/var/lib/glusterd/groups/virt
https://bugzilla.redhat.com/show_bug.cgi?id=1376464
[Bug 1376464] [RFE] enable sharding and strict-o-direct with virt profile -
/var/lib/glusterd/groups/virt
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list