[Gluster-users] Quota issue

Thu Jun 11 12:13:30 UTC 2015

Hi Vijay,

Could you take a time to take a look at this? I found only one thing about my issues in Red Hat bugzilla (https://bugzilla.redhat.com/show_bug.cgi?id=917901 <https://bugzilla.redhat.com/show_bug.cgi?id=917901>) But, my storage & computing clusters are still in production now and I wonder if I should warn my community about of a needed production break or can I  apply a fix during production? (i.e. without updating my GlusterFS version on my storage cluster).

Thanks in advance,
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr

> Le 10 juin 2015 à 06:12, Vijaikumar M <vmallika at redhat.com> a écrit :
> 
> Hi Geoffrey,
> 
> grep for 'ERROR' from the log file, and only these lines would be sufficient.
> 
> Thanks,
> Vijay
> 
> 
> On Wednesday 10 June 2015 04:38 AM, Geoffrey Letessier wrote:
>> Hello Vijay,
>> 
>> Quota-verify is still running since a couple of hours (more than 10) and each output file sizes (4 files because 4 bricks per replica) are very huge: around 800MB per file in the first server and 5GB per file in the second one. Do your still want these? How can I send it to you?
>> 
>> Nice night (in France)
>> Geoffrey
>> ------------------------------------------------------
>> Geoffrey Letessier
>> Responsable informatique & ingénieur système
>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>> Institut de Biologie Physico-Chimique
>> 13, rue Pierre et Marie Curie - 75005 Paris
>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr <mailto:geoffrey.letessier at ibpc.fr>
>> Le 9 juin 2015 à 12:46, Vijaikumar M <vmallika at redhat.com <mailto:vmallika at redhat.com>> a écrit :
>> 
>>> Hi Geoffrey,
>>> 
>>> The file content deletion is because of 'vi editor' behaviour of truncating the file when writing the updated content.
>>> 
>>> Regarding quota size/usage problem, can you please execute the script attached on each brick and provide us the output generated, this will help us analyse why quota list is showing wrong-size.
>>> The script basically crawls the directory given as argument.
>>> It collects quota "contri" and "size" extended attribute and also "block size" from stat call.
>>> 
>>> Usage:
>>> 
>>> ./quota-verify -b <brick_path> | tee brick_name.log
>>> 
>>> 
>>> Thanks,
>>> Vijay
>>> 
>>> 
>>> 
>>> On Tuesday 09 June 2015 03:45 PM, Vijaikumar M wrote:
>>>> 
>>>> 
>>>> On Tuesday 09 June 2015 03:40 PM, Geoffrey Letessier wrote:
>>>>> Hi Vijay,
>>>>> 
>>>>> Thanks for having replied.
>>>>> 
>>>>> Unfortunately, i check each bricks on my stockage pool and dont find any backup file.. damage!
>>>> 
>>>> Please check backup file on client machine where the file was edited and on the home dir of a user (this is the user login used to edit a file).
>>>> 
>>>> Thanks,
>>>> Vijay
>>>> 
>>>> 
>>>>> 
>>>>> Thank you again!
>>>>> Good luck and see you,
>>>>> Geoffrey
>>>>> ------------------------------------------------------
>>>>> Geoffrey Letessier
>>>>> Responsable informatique & ingénieur système
>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>>>> Institut de Biologie Physico-Chimique
>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr <mailto:geoffrey.letessier at ibpc.fr>
>>>>>> Le 9 juin 2015 à 10:05, Vijaikumar M <vmallika at redhat.com <mailto:vmallika at redhat.com>> a écrit :
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Tuesday 09 June 2015 01:08 PM, Geoffrey Letessier wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Yes of course:
>>>>>>> [root at lucifer ~]# pdsh -w cl-storage[1,3] du -s /export/brick_home/brick*/amyloid_team
>>>>>>> cl-storage1: 1608522280 /export/brick_home/brick1/amyloid_team
>>>>>>> cl-storage3: 1619630616 /export/brick_home/brick1/amyloid_team
>>>>>>> cl-storage1: 1614057836 /export/brick_home/brick2/amyloid_team
>>>>>>> cl-storage3: 1602653808 /export/brick_home/brick2/amyloid_team
>>>>>>> 
>>>>>>> The sum is: 6444864540 (around 6.4-6.5TB) while the quota list displays 7.7TB.
>>>>>>> So, the mistake is roughly 1.2-1.3TB, in other words around 16% -which is too huge, no?
>>>>>>> 
>>>>>>> In addition, since the quota is exceeded, i note a lot of files like following:
>>>>>>> [root at lucifer ~]# pdsh -w cl-storage[1,3] "cd /export/brick_home/brick2/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/; ls -ail remd_100.sh 2> /dev/null" 2>/dev/null
>>>>>>> cl-storage3: 133325688 ---------T 2 tarus amyloid_team 0 16 févr. 10:20 remd_100.sh
>>>>>>> note the ’T’ at the end of perms and the file size to 0B.
>>>>>>> 
>>>>>>> And, yesterday, some files were duplicated but not anymore...
>>>>>>> 
>>>>>>> The worst is, previously, all these files were OK. In other words, exceeding quota made file or content deletions or corruptions… What can I do to prevent to situation for the futur -because I guess i cannot do something to rollback this situation now, right?
>>>>>>> 
>>>>>> 
>>>>>> Hi Geoffrey,
>>>>>> 
>>>>>> I tried re-creating the problem.
>>>>>> 
>>>>>> Here is the behaviour of vi editor.
>>>>>> When a file is saved in vi editor, it creates a backup file under home dir and opens the original file with 'O_TRUNC' flag and hence file was truncated.
>>>>>> 
>>>>>> 
>>>>>> Here is the strace of vi editor when it gets 'EDQUOT' error:
>>>>>> 
>>>>>> open("hello", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 3
>>>>>> write(3, "line one\nline two\n", 18)    = 18
>>>>>> fsync(3)                                = 0
>>>>>> close(3)                                = -1 EDQUOT (Disk quota exceeded)
>>>>>> chmod("hello", 0100644)                 = 0
>>>>>> open("/root/hello~", O_RDONLY)          = 3
>>>>>> open("hello", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 7
>>>>>> read(3, "line one\n", 256)              = 9
>>>>>> write(7, "line one\n", 9)               = 9
>>>>>> read(3, "", 256)                        = 0
>>>>>> close(7)                                = -1 EDQUOT (Disk quota exceeded)
>>>>>> close(3)                                = 0
>>>>>> 
>>>>>> 
>>>>>> To re-cover the truncated file, please find if there are any backup file 'remd_115.sh~' under '~/' or on the same dir where this file exists. If exists you can copy this file.
>>>>>> 
>>>>>> Thanks,
>>>>>> Vijay
>>>>>> 
>>>>>> 
>>>>>>> Geoffrey
>>>>>>> ------------------------------------------------------
>>>>>>> Geoffrey Letessier
>>>>>>> Responsable informatique & ingénieur système
>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>>>>>> Institut de Biologie Physico-Chimique
>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr <mailto:geoffrey.letessier at ibpc.fr>
>>>>>>>> Le 9 juin 2015 à 09:01, Vijaikumar M <vmallika at redhat.com <mailto:vmallika at redhat.com>> a écrit :
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Monday 08 June 2015 07:11 PM, Geoffrey Letessier wrote:
>>>>>>>>> In addition, i notice a very big difference between the sum of DU on each brick and « quota list » display, as you can read below:
>>>>>>>>> [root at lucifer ~]# pdsh -w cl-storage[1,3] du -sh /export/brick_home/brick*/amyloid_team
>>>>>>>>> cl-storage1: 1,6T
>>>>>>>>>                                               /export/brick_home/brick1/amyloid_team
>>>>>>>>> cl-storage3: 1,6T
>>>>>>>>>                                               /export/brick_home/brick1/amyloid_team
>>>>>>>>> cl-storage1: 1,6T
>>>>>>>>>                                               /export/brick_home/brick2/amyloid_team
>>>>>>>>> cl-storage3: 1,6T
>>>>>>>>>                                               /export/brick_home/brick2/amyloid_team
>>>>>>>>> [root at lucifer ~]# gluster volume quota vol_home list /amyloid_team
>>>>>>>>>                   Path                   Hard-limit Soft-limit   Used  Available
>>>>>>>>> --------------------------------------------------------------------------------
>>>>>>>>> /amyloid_team                              9.0TB       90%       7.8TB   1.2TB
>>>>>>>>> 
>>>>>>>>> As you can notice, the sum of all bricks gives me roughly 6.4TB and « quota list » around 7.8TB; so there is a difference of 1.4TB i’m not able to explain… Do you have any idea?
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> There were few issues when quota accounting the size, we have fixed some of these issues in 3.7
>>>>>>>> 'df -h' will round off the values, can you please provide the output of 'df' without -h option?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Geoffrey
>>>>>>>>> ------------------------------------------------------
>>>>>>>>> Geoffrey Letessier
>>>>>>>>> Responsable informatique & ingénieur système
>>>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>>>>>>>> Institut de Biologie Physico-Chimique
>>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr <mailto:geoffrey.letessier at ibpc.fr>
>>>>>>>>>> Le 8 juin 2015 à 14:30, Geoffrey Letessier <geoffrey.letessier at cnrs.fr <mailto:geoffrey.letessier at cnrs.fr>> a écrit :
>>>>>>>>>> 
>>>>>>>>>> Hello,
>>>>>>>>>> 
>>>>>>>>>> Concerning the 3.5.3 version of GlusterFS, I met this morning a strange issue writing file when quota is exceeded. 
>>>>>>>>>> 
>>>>>>>>>> One person of my lab, whose her quota is exceeded (but she didn’t know about) try to modify a file but, because of exceeded quota, she was unable to and decided to exit VI. Now, her file is empty/blank as you can read below:
>>>>>>>> we suspect 'vi' might have created tmp file before writing to a file. We are working on re-creating this problem and will update you on the same.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>>> pdsh at lucifer: cl-storage3: ssh exited with exit code 2
>>>>>>>>>> cl-storage1: ---------T 2 tarus amyloid_team 0 19 févr. 12:34 /export/brick_home/brick1/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.sh
>>>>>>>>>> cl-storage1: -rwxrw-r-- 2 tarus amyloid_team 0  8 juin  12:38 /export/brick_home/brick2/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.sh
>>>>>>>>>> 
>>>>>>>>>> In addition, i dont understand why, my volume being a distributed volume inside replica (cl-storage[1,3] is replicated only on cl-storage[2,4]), i have 2 « same » files (complete path) in 2 different bricks (as you can read above).
>>>>>>>>>> 
>>>>>>>>>> Thanks by advance for your help and clarification.
>>>>>>>>>> Geoffrey
>>>>>>>>>> ------------------------------------------------------
>>>>>>>>>> Geoffrey Letessier
>>>>>>>>>> Responsable informatique & ingénieur système
>>>>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>>>>>>>>> Institut de Biologie Physico-Chimique
>>>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr <mailto:geoffrey.letessier at ibpc.fr>
>>>>>>>>>>> Le 2 juin 2015 à 23:45, Geoffrey Letessier <geoffrey.letessier at cnrs.fr <mailto:geoffrey.letessier at cnrs.fr>> a écrit :
>>>>>>>>>>> 
>>>>>>>>>>> Hi Ben,
>>>>>>>>>>> 
>>>>>>>>>>> I just check my messages log files, both on client and server, and I dont find any hung task you notice on yours.. 
>>>>>>>>>>> 
>>>>>>>>>>> As you can read below, i dont note the performance issue in a simple DD but I think my issue is concerning a set of small files (tens of thousands nay more)…
>>>>>>>>>>> 
>>>>>>>>>>> [root at nisus test]# ddt -t 10g /mnt/test/
>>>>>>>>>>> Writing to /mnt/test/ddt.8362 ... syncing ... done.
>>>>>>>>>>> sleeping 10 seconds ... done.
>>>>>>>>>>> Reading from /mnt/test/ddt.8362 ... done.
>>>>>>>>>>> 10240MiB    KiB/s  CPU%
>>>>>>>>>>> Write      114770     4
>>>>>>>>>>> Read        40675     4
>>>>>>>>>>> 
>>>>>>>>>>> for info: /mnt/test concerns the single v2 GlFS volume
>>>>>>>>>>> 
>>>>>>>>>>> [root at nisus test]# ddt -t 10g /mnt/fhgfs/
>>>>>>>>>>> Writing to /mnt/fhgfs/ddt.8380 ... syncing ... done.
>>>>>>>>>>> sleeping 10 seconds ... done.
>>>>>>>>>>> Reading from /mnt/fhgfs/ddt.8380 ... done.
>>>>>>>>>>> 10240MiB    KiB/s  CPU%
>>>>>>>>>>> Write      102591     1
>>>>>>>>>>> Read        98079     2
>>>>>>>>>>> 
>>>>>>>>>>> Do you have a idea how to tune/optimize performance settings? and/or TCP settings (MTU, etc.)?
>>>>>>>>>>> 
>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>> |             |  UNTAR  |   DU   |  FIND   |   TAR   |   RM   |
>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>> | single      |  ~3m45s |   ~43s |    ~47s |  ~3m10s | ~3m15s |
>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>> | replicated  |  ~5m10s |   ~59s |   ~1m6s |  ~1m19s | ~1m49s |
>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>> | distributed |  ~4m18s |   ~41s |    ~57s |  ~2m24s | ~1m38s |
>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>> | dist-repl   |  ~8m18s |  ~1m4s |  ~1m11s |  ~1m24s | ~2m40s |
>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>> | native FS   |    ~11s |    ~4s |     ~2s |    ~56s |   ~10s |
>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>> | BeeGFS      |  ~3m43s |   ~15s |     ~3s |  ~1m33s |   ~46s |
>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>> | single (v2) |   ~3m6s |   ~14s |    ~32s |   ~1m2s |   ~44s |
>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>> for info: 
>>>>>>>>>>>  -BeeGFS is a distributed FS (4 bricks, 2 bricks per server and 2 servers)
>>>>>>>>>>>  - single (v2): simple gluster volume with default settings
>>>>>>>>>>> 
>>>>>>>>>>> I also note I obtain the same tar/untar performance issue with FhGFS/BeeGFS but the rest (DU, FIND, RM) looks like to be OK.
>>>>>>>>>>> 
>>>>>>>>>>> Thank you very much for your reply and help.
>>>>>>>>>>> Geoffrey
>>>>>>>>>>> -----------------------------------------------
>>>>>>>>>>> Geoffrey Letessier
>>>>>>>>>>> 
>>>>>>>>>>> Responsable informatique & ingénieur système
>>>>>>>>>>> CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
>>>>>>>>>>> Institut de Biologie Physico-Chimique
>>>>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr <mailto:geoffrey.letessier at cnrs.fr>
>>>>>>>>>>> Le 2 juin 2015 à 21:53, Ben Turner <bturner at redhat.com <mailto:bturner at redhat.com>> a écrit :
>>>>>>>>>>> 
>>>>>>>>>>>> I am seeing problems on 3.7 as well.  Can you check /var/log/messages on both the clients and servers for hung tasks like:
>>>>>>>>>>>> 
>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: iozone        D 0000000000000001     0 21999      1 0x00000080
>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: ffff880611321cc8 0000000000000082 ffff880611321c18 ffffffffa027236e
>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: ffff880611321c48 ffffffffa0272c10 ffff88052bd1e040 ffff880611321c78
>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: ffff88052bd1e0f0 ffff88062080c7a0 ffff880625addaf8 ffff880611321fd8
>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: Call Trace:
>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffffa027236e>] ? rpc_make_runnable+0x7e/0x80 [sunrpc]
>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffffa0272c10>] ? rpc_execute+0x50/0xa0 [sunrpc]
>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff810aaa21>] ? ktime_get_ts+0xb1/0xf0
>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811242d0>] ? sync_page+0x0/0x50
>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8152a1b3>] io_schedule+0x73/0xc0
>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8112430d>] sync_page+0x3d/0x50
>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8152ac7f>] __wait_on_bit+0x5f/0x90
>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff81124543>] wait_on_page_bit+0x73/0x80
>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8109eb80>] ? wake_bit_function+0x0/0x50
>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8113a525>] ? pagevec_lookup_tag+0x25/0x40
>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8112496b>] wait_on_page_writeback_range+0xfb/0x190
>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff81124b38>] filemap_write_and_wait_range+0x78/0x90
>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811c07ce>] vfs_fsync_range+0x7e/0x100
>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811c08bd>] vfs_fsync+0x1d/0x20
>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811c08fe>] do_fsync+0x3e/0x60
>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811c0950>] sys_fsync+0x10/0x20
>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
>>>>>>>>>>>> 
>>>>>>>>>>>> Do you see a perf problem with just a simple DD or do you need a more complex workload to hit the issue?  I think I saw an issue with metadata performance that I am trying to run down, let me know if you can see the problem with simple DD reads / writes or if we need to do some sort of dir / metadata access as well.
>>>>>>>>>>>> 
>>>>>>>>>>>> -b
>>>>>>>>>>>> 
>>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>>> From: "Geoffrey Letessier" <geoffrey.letessier at cnrs.fr <mailto:geoffrey.letessier at cnrs.fr>>
>>>>>>>>>>>>> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>
>>>>>>>>>>>>> Cc: gluster-users at gluster.org <mailto:gluster-users at gluster.org>
>>>>>>>>>>>>> Sent: Tuesday, June 2, 2015 8:09:04 AM
>>>>>>>>>>>>> Subject: Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Pranith,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I’m sorry but I cannot bring you any comparison because comparison will be
>>>>>>>>>>>>> distorted by the fact in my HPC cluster in production the network technology
>>>>>>>>>>>>> is InfiniBand QDR and my volumes are quite different (brick in RAID6
>>>>>>>>>>>>> (12x2TB), 2 bricks per server and 4 servers into my pool)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Concerning your demand, in attachments you can find all expected results
>>>>>>>>>>>>> hoping it can help you to solve this serious performance issue (maybe I need
>>>>>>>>>>>>> play with glusterfs parameters?).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thank you very much by advance,
>>>>>>>>>>>>> Geoffrey
>>>>>>>>>>>>> ------------------------------------------------------
>>>>>>>>>>>>> Geoffrey Letessier
>>>>>>>>>>>>> Responsable informatique & ingénieur système
>>>>>>>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>>>>>>>>>>>> Institut de Biologie Physico-Chimique
>>>>>>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>>>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr <mailto:geoffrey.letessier at ibpc.fr>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Le 2 juin 2015 à 10:09, Pranith Kumar Karampuri < pkarampu at redhat.com <mailto:pkarampu at redhat.com> > a
>>>>>>>>>>>>> écrit :
>>>>>>>>>>>>> 
>>>>>>>>>>>>> hi Geoffrey,
>>>>>>>>>>>>> Since you are saying it happens on all types of volumes, lets do the
>>>>>>>>>>>>> following:
>>>>>>>>>>>>> 1) Create a dist-repl volume
>>>>>>>>>>>>> 2) Set the options etc you need.
>>>>>>>>>>>>> 3) enable gluster volume profile using "gluster volume profile <volname>
>>>>>>>>>>>>> start"
>>>>>>>>>>>>> 4) run the work load
>>>>>>>>>>>>> 5) give output of "gluster volume profile <volname> info"
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Repeat the steps above on new and old version you are comparing this with.
>>>>>>>>>>>>> That should give us insight into what could be causing the slowness.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Pranith
>>>>>>>>>>>>> On 06/02/2015 03:22 AM, Geoffrey Letessier wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Dear all,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I have a crash test cluster where i’ve tested the new version of GlusterFS
>>>>>>>>>>>>> (v3.7) before upgrading my HPC cluster in production.
>>>>>>>>>>>>> But… all my tests show me very very low performances.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> For my benches, as you can read below, I do some actions (untar, du, find,
>>>>>>>>>>>>> tar, rm) with linux kernel sources, dropping cache, each on distributed,
>>>>>>>>>>>>> replicated, distributed-replicated, single (single brick) volumes and the
>>>>>>>>>>>>> native FS of one brick.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; tar xJf ~/linux-4.1-rc5.tar.xz;
>>>>>>>>>>>>> sync; echo 3 > /proc/sys/vm/drop_caches)
>>>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; du -sh linux-4.1-rc5/; echo 3 >
>>>>>>>>>>>>> /proc/sys/vm/drop_caches)
>>>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; find linux-4.1-rc5/|wc -l; echo 3
>>>>>>>>>>>>>> /proc/sys/vm/drop_caches)
>>>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; tar czf linux-4.1-rc5.tgz
>>>>>>>>>>>>> linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches)
>>>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; rm -rf linux-4.1-rc5.tgz
>>>>>>>>>>>>> linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> And here are the process times:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>>>> | | UNTAR | DU | FIND | TAR | RM |
>>>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>>>> | single | ~3m45s | ~43s | ~47s | ~3m10s | ~3m15s |
>>>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>>>> | replicated | ~5m10s | ~59s | ~1m6s | ~1m19s | ~1m49s |
>>>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>>>> | distributed | ~4m18s | ~41s | ~57s | ~2m24s | ~1m38s |
>>>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>>>> | dist-repl | ~8m18s | ~1m4s | ~1m11s | ~1m24s | ~2m40s |
>>>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>>>> | native FS | ~11s | ~4s | ~2s | ~56s | ~10s |
>>>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I get the same results, whether with default configurations with custom
>>>>>>>>>>>>> configurations.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> if I look at the side of the ifstat command, I can note my IO write processes
>>>>>>>>>>>>> never exceed 3MBs...
>>>>>>>>>>>>> 
>>>>>>>>>>>>> EXT4 native FS seems to be faster (roughly 15-20% but no more) than XFS one
>>>>>>>>>>>>> 
>>>>>>>>>>>>> My [test] storage cluster config is composed by 2 identical servers (biCPU
>>>>>>>>>>>>> Intel Xeon X5355, 8GB of RAM, 2x2TB HDD (no-RAID) and Gb ethernet)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> My volume settings:
>>>>>>>>>>>>> single: 1server 1 brick
>>>>>>>>>>>>> replicated: 2 servers 1 brick each
>>>>>>>>>>>>> distributed: 2 servers 2 bricks each
>>>>>>>>>>>>> dist-repl: 2 bricks in the same server and replica 2
>>>>>>>>>>>>> 
>>>>>>>>>>>>> All seems to be OK in gluster status command line.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Do you have an idea why I obtain so bad results?
>>>>>>>>>>>>> Thanks in advance.
>>>>>>>>>>>>> Geoffrey
>>>>>>>>>>>>> -----------------------------------------------
>>>>>>>>>>>>> Geoffrey Letessier
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Responsable informatique & ingénieur système
>>>>>>>>>>>>> CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
>>>>>>>>>>>>> Institut de Biologie Physico-Chimique
>>>>>>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>>>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr <mailto:geoffrey.letessier at cnrs.fr>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Gluster-users mailing list Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> Gluster-users mailing list
>>>>>>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users <http://www.gluster.org/mailman/listinfo/gluster-users>
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> <quota-verify.gz>
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150611/26f852d8/attachment.html>