[Gluster-users] Quota issue

Vijaikumar M vmallika at redhat.com
Sat Jun 13 04:49:11 UTC 2015


Hi Geoffrey,

Sorry for the delayed response. We will look at the log files you 
provided in your previous email and update you the workaround as soon as 
possible.

Thanks,
Vijay


On Thursday 11 June 2015 05:43 PM, Geoffrey Letessier wrote:
> Hi Vijay,
>
> Could you take a time to take a look at this? I found only one thing 
> about my issues in Red Hat bugzilla 
> (https://bugzilla.redhat.com/show_bug.cgi?id=917901) But, my storage & 
> computing clusters are still in production now and I wonder if I 
> should warn my community about of a needed production break or can I 
>  apply a fix during production? (i.e. without updating my GlusterFS 
> version on my storage cluster).
>
> Thanks in advance,
> Geoffrey
> ------------------------------------------------------
> Geoffrey Letessier
> Responsable informatique & ingénieur système
> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr 
> <mailto:geoffrey.letessier at ibpc.fr>
>
>> Le 10 juin 2015 à 06:12, Vijaikumar M <vmallika at redhat.com 
>> <mailto:vmallika at redhat.com>> a écrit :
>>
>> Hi Geoffrey,
>>
>> grep for 'ERROR' from the log file, and only these lines would be 
>> sufficient.
>>
>> Thanks,
>> Vijay
>>
>>
>> On Wednesday 10 June 2015 04:38 AM, Geoffrey Letessier wrote:
>>> Hello Vijay,
>>>
>>> Quota-verify is still running since a couple of hours (more than 10) 
>>> and each output file sizes (4 files because 4 bricks per replica) 
>>> are very huge: around 800MB per file in the first server and 5GB per 
>>> file in the second one. Do your still want these? How can I send it 
>>> to you?
>>>
>>> Nice night (in France)
>>> Geoffrey
>>> ------------------------------------------------------
>>> Geoffrey Letessier
>>> Responsable informatique & ingénieur système
>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>> Institut de Biologie Physico-Chimique
>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr 
>>> <mailto:geoffrey.letessier at ibpc.fr>
>>>
>>> Le 9 juin 2015 à 12:46, Vijaikumar M <vmallika at redhat.com 
>>> <mailto:vmallika at redhat.com>> a écrit :
>>>
>>>> Hi Geoffrey,
>>>>
>>>> The file content deletion is because of 'vi editor' behaviour of 
>>>> truncating the file when writing the updated content.
>>>>
>>>> Regarding quota size/usage problem, can you please execute the 
>>>> script attached on each brick and provide us the output generated, 
>>>> this will help us analyse why quota list is showing wrong-size.
>>>> The script basically crawls the directory given as argument.
>>>> It collects quota "contri" and "size" extended attribute and also 
>>>> "block size" from stat call.
>>>>
>>>> Usage:
>>>>
>>>> ./quota-verify -b <brick_path> | tee brick_name.log
>>>>
>>>>
>>>> Thanks,
>>>> Vijay
>>>>
>>>>
>>>>
>>>> On Tuesday 09 June 2015 03:45 PM, Vijaikumar M wrote:
>>>>>
>>>>>
>>>>> On Tuesday 09 June 2015 03:40 PM, Geoffrey Letessier wrote:
>>>>>> Hi Vijay,
>>>>>>
>>>>>> Thanks for having replied.
>>>>>>
>>>>>> Unfortunately, i check each bricks on my stockage pool and dont 
>>>>>> find any backup file.. damage!
>>>>>
>>>>> Please check backup file on client machine where the file was 
>>>>> edited and on the home dir of a user (this is the user login used 
>>>>> to edit a file).
>>>>>
>>>>> Thanks,
>>>>> Vijay
>>>>>
>>>>>
>>>>>>
>>>>>> Thank you again!
>>>>>> Good luck and see you,
>>>>>> Geoffrey
>>>>>> ------------------------------------------------------
>>>>>> Geoffrey Letessier
>>>>>> Responsable informatique & ingénieur système
>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>>>>> Institut de Biologie Physico-Chimique
>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr 
>>>>>> <mailto:geoffrey.letessier at ibpc.fr>
>>>>>>
>>>>>>> Le 9 juin 2015 à 10:05, Vijaikumar M <vmallika at redhat.com 
>>>>>>> <mailto:vmallika at redhat.com>> a écrit :
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tuesday 09 June 2015 01:08 PM, Geoffrey Letessier wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Yes of course:
>>>>>>>> [root at lucifer ~]# pdsh -w cl-storage[1,3] du -s 
>>>>>>>> /export/brick_home/brick*/amyloid_team
>>>>>>>> cl-storage1: 1608522280/export/brick_home/brick1/amyloid_team
>>>>>>>> cl-storage3: 1619630616/export/brick_home/brick1/amyloid_team
>>>>>>>> cl-storage1: 1614057836/export/brick_home/brick2/amyloid_team
>>>>>>>> cl-storage3: 1602653808/export/brick_home/brick2/amyloid_team
>>>>>>>>
>>>>>>>> The sum is: 6444864540 (around 6.4-6.5TB) while the quota list 
>>>>>>>> displays 7.7TB.
>>>>>>>> So, the mistake is roughly 1.2-1.3TB, in other words around 16% 
>>>>>>>> -which is too huge, no?
>>>>>>>>
>>>>>>>> In addition, since the quota is exceeded, i note a lot of files 
>>>>>>>> like following:
>>>>>>>> [root at lucifer ~]# pdsh -w cl-storage[1,3] "cd 
>>>>>>>> /export/brick_home/brick2/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/; 
>>>>>>>> ls -ail remd_100.sh 2> /dev/null" 2>/dev/null
>>>>>>>> cl-storage3: 133325688 ---------T 2 tarus amyloid_team 0 16 
>>>>>>>> févr. 10:20 remd_100.sh
>>>>>>>> note the ’T’ at the end of perms and the file size to 0B.
>>>>>>>>
>>>>>>>> And, yesterday, some files were duplicated but not anymore...
>>>>>>>>
>>>>>>>> The worst is, previously, all these files were OK. In other 
>>>>>>>> words, exceeding quota made file or content deletions or 
>>>>>>>> corruptions… What can I do to prevent to situation for the 
>>>>>>>> futur -because I guess i cannot do something to rollback this 
>>>>>>>> situation now, right?
>>>>>>>>
>>>>>>>
>>>>>>> Hi Geoffrey,
>>>>>>>
>>>>>>> I tried re-creating the problem.
>>>>>>>
>>>>>>> Here is the behaviour of vi editor.
>>>>>>> When a file is saved in vi editor, it creates a backup file 
>>>>>>> under home dir and opens the original file with 'O_TRUNC' flag 
>>>>>>> and hence file was truncated.
>>>>>>>
>>>>>>>
>>>>>>> Here is the strace of vi editor when it gets 'EDQUOT' error:
>>>>>>>
>>>>>>> open("hello", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 3
>>>>>>> write(3, "line one\nline two\n", 18)    = 18
>>>>>>> fsync(3) = 0
>>>>>>> close(3) = -1 EDQUOT (Disk quota exceeded)
>>>>>>> chmod("hello", 0100644)                 = 0
>>>>>>> open("/root/hello~", O_RDONLY)          = 3
>>>>>>> *open("hello", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 7*
>>>>>>> read(3, "line one\n", 256)              = 9
>>>>>>> write(7, "line one\n", 9)               = 9
>>>>>>> read(3, "", 256)                        = 0
>>>>>>> close(7) = -1 EDQUOT (Disk quota exceeded)
>>>>>>> close(3) = 0
>>>>>>>
>>>>>>>
>>>>>>> To re-cover the truncated file, please find if there are any 
>>>>>>> backup file 'remd_115.sh~' under '~/' or on the same dir where 
>>>>>>> this file exists.If exists you can copy this file.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vijay
>>>>>>>
>>>>>>>
>>>>>>>> Geoffrey
>>>>>>>> ------------------------------------------------------
>>>>>>>> Geoffrey Letessier
>>>>>>>> Responsable informatique & ingénieur système
>>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>>>>>>> Institut de Biologie Physico-Chimique
>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr 
>>>>>>>> <mailto:geoffrey.letessier at ibpc.fr>
>>>>>>>>
>>>>>>>>> Le 9 juin 2015 à 09:01, Vijaikumar M <vmallika at redhat.com 
>>>>>>>>> <mailto:vmallika at redhat.com>> a écrit :
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Monday 08 June 2015 07:11 PM, Geoffrey Letessier wrote:
>>>>>>>>>> In addition, i notice a very big difference between the sum 
>>>>>>>>>> of DU on each brick and « quota list » display, as you can 
>>>>>>>>>> read below:
>>>>>>>>>> [root at lucifer ~]# pdsh -w cl-storage[1,3] du -sh 
>>>>>>>>>> /export/brick_home/brick*/amyloid_team
>>>>>>>>>> cl-storage1: 1,6T/export/brick_home/brick1/amyloid_team
>>>>>>>>>> cl-storage3: 1,6T/export/brick_home/brick1/amyloid_team
>>>>>>>>>> cl-storage1: 1,6T/export/brick_home/brick2/amyloid_team
>>>>>>>>>> cl-storage3: 1,6T/export/brick_home/brick2/amyloid_team
>>>>>>>>>> [root at lucifer ~]# gluster volume quota vol_home list 
>>>>>>>>>> /amyloid_team
>>>>>>>>>> Path Hard-limit Soft-limit Used Available
>>>>>>>>>> --------------------------------------------------------------------------------
>>>>>>>>>> /amyloid_team 9.0TB 90% 7.8TB   1.2TB
>>>>>>>>>>
>>>>>>>>>> As you can notice, the sum of all bricks gives me roughly 
>>>>>>>>>> 6.4TB and « quota list » around 7.8TB; so there is a 
>>>>>>>>>> difference of 1.4TB i’m not able to explain… Do you have any 
>>>>>>>>>> idea?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> There were few issues when quota accounting the size, we have 
>>>>>>>>> fixed some of these issues in 3.7
>>>>>>>>> 'df -h' will round off the values, can you please provide the 
>>>>>>>>> output of 'df' without -h option?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Geoffrey
>>>>>>>>>> ------------------------------------------------------
>>>>>>>>>> Geoffrey Letessier
>>>>>>>>>> Responsable informatique & ingénieur système
>>>>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>>>>>>>>> Institut de Biologie Physico-Chimique
>>>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr 
>>>>>>>>>> <mailto:geoffrey.letessier at ibpc.fr>
>>>>>>>>>>
>>>>>>>>>>> Le 8 juin 2015 à 14:30, Geoffrey Letessier 
>>>>>>>>>>> <geoffrey.letessier at cnrs.fr 
>>>>>>>>>>> <mailto:geoffrey.letessier at cnrs.fr>> a écrit :
>>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> Concerning the 3.5.3 version of GlusterFS, I met this 
>>>>>>>>>>> morning a strange issue writing file when quota is exceeded.
>>>>>>>>>>>
>>>>>>>>>>> One person of my lab, whose her quota is exceeded (but she 
>>>>>>>>>>> didn’t know about) try to modify a file but, because of 
>>>>>>>>>>> exceeded quota, she was unable to and decided to exit VI. 
>>>>>>>>>>> Now, her file is empty/blank as you can read below:
>>>>>>>>> we suspect 'vi' might have created tmp file before writing to 
>>>>>>>>> a file. We are working on re-creating this problem and will 
>>>>>>>>> update you on the same.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> pdsh at lucifer: cl-storage3: ssh exited with exit code 2
>>>>>>>>>>> cl-storage1: ---------T 2 tarus amyloid_team 0 19 févr. 
>>>>>>>>>>> 12:34 
>>>>>>>>>>> /export/brick_home/brick1/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.sh
>>>>>>>>>>> cl-storage1: -rwxrw-r-- 2 tarus amyloid_team 0  8 juin 12:38 
>>>>>>>>>>> /export/brick_home/brick2/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.sh
>>>>>>>>>>>
>>>>>>>>>>> In addition, i dont understand why, my volume being a 
>>>>>>>>>>> distributed volume inside replica (cl-storage[1,3] is 
>>>>>>>>>>> replicated only on cl-storage[2,4]), i have 2 « same » files 
>>>>>>>>>>> (complete path) in 2 different bricks (as you can read above).
>>>>>>>>>>>
>>>>>>>>>>> Thanks by advance for your help and clarification.
>>>>>>>>>>> Geoffrey
>>>>>>>>>>> ------------------------------------------------------
>>>>>>>>>>> Geoffrey Letessier
>>>>>>>>>>> Responsable informatique & ingénieur système
>>>>>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>>>>>>>>>> Institut de Biologie Physico-Chimique
>>>>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr 
>>>>>>>>>>> <mailto:geoffrey.letessier at ibpc.fr>
>>>>>>>>>>>
>>>>>>>>>>>> Le 2 juin 2015 à 23:45, Geoffrey Letessier 
>>>>>>>>>>>> <geoffrey.letessier at cnrs.fr 
>>>>>>>>>>>> <mailto:geoffrey.letessier at cnrs.fr>> a écrit :
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Ben,
>>>>>>>>>>>>
>>>>>>>>>>>> I just check my messages log files, both on client and 
>>>>>>>>>>>> server, and I dont find any hung task you notice on yours..
>>>>>>>>>>>>
>>>>>>>>>>>> As you can read below, i dont note the performance issue in 
>>>>>>>>>>>> a simple DD but I think my issue is concerning a set of 
>>>>>>>>>>>> small files (tens of thousands nay more)…
>>>>>>>>>>>>
>>>>>>>>>>>> [root at nisus test]# ddt -t 10g /mnt/test/
>>>>>>>>>>>> Writing to /mnt/test/ddt.8362 ... syncing ... done.
>>>>>>>>>>>> sleeping 10 seconds ... done.
>>>>>>>>>>>> Reading from /mnt/test/ddt.8362 ... done.
>>>>>>>>>>>> 10240MiB   KiB/s  CPU%
>>>>>>>>>>>> Write     114770   4
>>>>>>>>>>>> Read       40675   4
>>>>>>>>>>>>
>>>>>>>>>>>> for info: /mnt/test concerns the single v2 GlFS volume
>>>>>>>>>>>>
>>>>>>>>>>>> [root at nisus test]# ddt -t 10g /mnt/fhgfs/
>>>>>>>>>>>> Writing to /mnt/fhgfs/ddt.8380 ... syncing ... done.
>>>>>>>>>>>> sleeping 10 seconds ... done.
>>>>>>>>>>>> Reading from /mnt/fhgfs/ddt.8380 ... done.
>>>>>>>>>>>> 10240MiB   KiB/s  CPU%
>>>>>>>>>>>> Write     102591   1
>>>>>>>>>>>> Read       98079   2
>>>>>>>>>>>>
>>>>>>>>>>>> Do you have a idea how to tune/optimize performance 
>>>>>>>>>>>> settings? and/or TCP settings (MTU, etc.)?
>>>>>>>>>>>>
>>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>>> |         | UNTAR  |   DU   |  FIND   |   TAR   |   RM   |
>>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>>> | single      | ~3m45s | ~43s |   ~47s | ~3m10s | ~3m15s |
>>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>>> | replicated  | ~5m10s | ~59s |  ~1m6s | ~1m19s | ~1m49s |
>>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>>> | distributed | ~4m18s | ~41s |   ~57s | ~2m24s | ~1m38s |
>>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>>> | dist-repl   | ~8m18s | ~1m4s |  ~1m11s | ~1m24s | ~2m40s |
>>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>>> | native FS   |   ~11s | ~4s |   ~2s | ~56s |   ~10s |
>>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>>> | BeeGFS      |  ~3m43s | ~15s |   ~3s |  ~1m33s | ~46s |
>>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>>> | single (v2) |   ~3m6s | ~14s |  ~32s | ~1m2s |   ~44s |
>>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>>> for info:
>>>>>>>>>>>> -BeeGFS is a distributed FS (4 bricks, 2 bricks per server 
>>>>>>>>>>>> and 2 servers)
>>>>>>>>>>>> - single (v2): simple gluster volume with default settings
>>>>>>>>>>>>
>>>>>>>>>>>> I also note I obtain the same tar/untar performance issue 
>>>>>>>>>>>> with FhGFS/BeeGFS but the rest (DU, FIND, RM) looks like to 
>>>>>>>>>>>> be OK.
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you very much for your reply and help.
>>>>>>>>>>>> Geoffrey
>>>>>>>>>>>> -----------------------------------------------
>>>>>>>>>>>> Geoffrey Letessier
>>>>>>>>>>>>
>>>>>>>>>>>> Responsable informatique & ingénieur système
>>>>>>>>>>>> CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
>>>>>>>>>>>> Institut de Biologie Physico-Chimique
>>>>>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr 
>>>>>>>>>>>> <mailto:geoffrey.letessier at cnrs.fr>
>>>>>>>>>>>>
>>>>>>>>>>>> Le 2 juin 2015 à 21:53, Ben Turner <bturner at redhat.com 
>>>>>>>>>>>> <mailto:bturner at redhat.com>> a écrit :
>>>>>>>>>>>>
>>>>>>>>>>>>> I am seeing problems on 3.7 as well.  Can you check 
>>>>>>>>>>>>> /var/log/messages on both the clients and servers for hung 
>>>>>>>>>>>>> tasks like:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: "echo 0 > 
>>>>>>>>>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this 
>>>>>>>>>>>>> message.
>>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: iozone        D 
>>>>>>>>>>>>> 0000000000000001     0 21999      1 0x00000080
>>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: ffff880611321cc8 
>>>>>>>>>>>>> 0000000000000082 ffff880611321c18 ffffffffa027236e
>>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: ffff880611321c48 
>>>>>>>>>>>>> ffffffffa0272c10 ffff88052bd1e040 ffff880611321c78
>>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: ffff88052bd1e0f0 
>>>>>>>>>>>>> ffff88062080c7a0 ffff880625addaf8 ffff880611321fd8
>>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: Call Trace:
>>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffffa027236e>] ? 
>>>>>>>>>>>>> rpc_make_runnable+0x7e/0x80 [sunrpc]
>>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffffa0272c10>] ? 
>>>>>>>>>>>>> rpc_execute+0x50/0xa0 [sunrpc]
>>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff810aaa21>] ? 
>>>>>>>>>>>>> ktime_get_ts+0xb1/0xf0
>>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811242d0>] ? 
>>>>>>>>>>>>> sync_page+0x0/0x50
>>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8152a1b3>] 
>>>>>>>>>>>>> io_schedule+0x73/0xc0
>>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8112430d>] 
>>>>>>>>>>>>> sync_page+0x3d/0x50
>>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8152ac7f>] 
>>>>>>>>>>>>> __wait_on_bit+0x5f/0x90
>>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff81124543>] 
>>>>>>>>>>>>> wait_on_page_bit+0x73/0x80
>>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8109eb80>] ? 
>>>>>>>>>>>>> wake_bit_function+0x0/0x50
>>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8113a525>] ? 
>>>>>>>>>>>>> pagevec_lookup_tag+0x25/0x40
>>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8112496b>] 
>>>>>>>>>>>>> wait_on_page_writeback_range+0xfb/0x190
>>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff81124b38>] 
>>>>>>>>>>>>> filemap_write_and_wait_range+0x78/0x90
>>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811c07ce>] 
>>>>>>>>>>>>> vfs_fsync_range+0x7e/0x100
>>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811c08bd>] 
>>>>>>>>>>>>> vfs_fsync+0x1d/0x20
>>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811c08fe>] 
>>>>>>>>>>>>> do_fsync+0x3e/0x60
>>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811c0950>] 
>>>>>>>>>>>>> sys_fsync+0x10/0x20
>>>>>>>>>>>>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8100b072>] 
>>>>>>>>>>>>> system_call_fastpath+0x16/0x1b
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do you see a perf problem with just a simple DD or do you 
>>>>>>>>>>>>> need a more complex workload to hit the issue?  I think I 
>>>>>>>>>>>>> saw an issue with metadata performance that I am trying to 
>>>>>>>>>>>>> run down, let me know if you can see the problem with 
>>>>>>>>>>>>> simple DD reads / writes or if we need to do some sort of 
>>>>>>>>>>>>> dir / metadata access as well.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -b
>>>>>>>>>>>>>
>>>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>>>> From: "Geoffrey Letessier" <geoffrey.letessier at cnrs.fr 
>>>>>>>>>>>>>> <mailto:geoffrey.letessier at cnrs.fr>>
>>>>>>>>>>>>>> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com 
>>>>>>>>>>>>>> <mailto:pkarampu at redhat.com>>
>>>>>>>>>>>>>> Cc:gluster-users at gluster.org 
>>>>>>>>>>>>>> <mailto:gluster-users at gluster.org>
>>>>>>>>>>>>>> Sent: Tuesday, June 2, 2015 8:09:04 AM
>>>>>>>>>>>>>> Subject: Re: [Gluster-users] GlusterFS 3.7 - slow/poor 
>>>>>>>>>>>>>> performances
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Pranith,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I’m sorry but I cannot bring you any comparison because 
>>>>>>>>>>>>>> comparison will be
>>>>>>>>>>>>>> distorted by the fact in my HPC cluster in production the 
>>>>>>>>>>>>>> network technology
>>>>>>>>>>>>>> is InfiniBand QDR and my volumes are quite different 
>>>>>>>>>>>>>> (brick in RAID6
>>>>>>>>>>>>>> (12x2TB), 2 bricks per server and 4 servers into my pool)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Concerning your demand, in attachments you can find all 
>>>>>>>>>>>>>> expected results
>>>>>>>>>>>>>> hoping it can help you to solve this serious performance 
>>>>>>>>>>>>>> issue (maybe I need
>>>>>>>>>>>>>> play with glusterfs parameters?).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you very much by advance,
>>>>>>>>>>>>>> Geoffrey
>>>>>>>>>>>>>> ------------------------------------------------------
>>>>>>>>>>>>>> Geoffrey Letessier
>>>>>>>>>>>>>> Responsable informatique & ingénieur système
>>>>>>>>>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>>>>>>>>>>>>> Institut de Biologie Physico-Chimique
>>>>>>>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>>>>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr 
>>>>>>>>>>>>>> <mailto:geoffrey.letessier at ibpc.fr>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Le 2 juin 2015 à 10:09, Pranith Kumar Karampuri < 
>>>>>>>>>>>>>> pkarampu at redhat.com <mailto:pkarampu at redhat.com> > a
>>>>>>>>>>>>>> écrit :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> hi Geoffrey,
>>>>>>>>>>>>>> Since you are saying it happens on all types of volumes, 
>>>>>>>>>>>>>> lets do the
>>>>>>>>>>>>>> following:
>>>>>>>>>>>>>> 1) Create a dist-repl volume
>>>>>>>>>>>>>> 2) Set the options etc you need.
>>>>>>>>>>>>>> 3) enable gluster volume profile using "gluster volume 
>>>>>>>>>>>>>> profile <volname>
>>>>>>>>>>>>>> start"
>>>>>>>>>>>>>> 4) run the work load
>>>>>>>>>>>>>> 5) give output of "gluster volume profile <volname> info"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Repeat the steps above on new and old version you are 
>>>>>>>>>>>>>> comparing this with.
>>>>>>>>>>>>>> That should give us insight into what could be causing 
>>>>>>>>>>>>>> the slowness.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Pranith
>>>>>>>>>>>>>> On 06/02/2015 03:22 AM, Geoffrey Letessier wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dear all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have a crash test cluster where i’ve tested the new 
>>>>>>>>>>>>>> version of GlusterFS
>>>>>>>>>>>>>> (v3.7) before upgrading my HPC cluster in production.
>>>>>>>>>>>>>> But… all my tests show me very very low performances.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For my benches, as you can read below, I do some actions 
>>>>>>>>>>>>>> (untar, du, find,
>>>>>>>>>>>>>> tar, rm) with linux kernel sources, dropping cache, each 
>>>>>>>>>>>>>> on distributed,
>>>>>>>>>>>>>> replicated, distributed-replicated, single (single brick) 
>>>>>>>>>>>>>> volumes and the
>>>>>>>>>>>>>> native FS of one brick.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; tar xJf 
>>>>>>>>>>>>>> ~/linux-4.1-rc5.tar.xz;
>>>>>>>>>>>>>> sync; echo 3 > /proc/sys/vm/drop_caches)
>>>>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; du -sh 
>>>>>>>>>>>>>> linux-4.1-rc5/; echo 3 >
>>>>>>>>>>>>>> /proc/sys/vm/drop_caches)
>>>>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; find 
>>>>>>>>>>>>>> linux-4.1-rc5/|wc -l; echo 3
>>>>>>>>>>>>>>> /proc/sys/vm/drop_caches)
>>>>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; tar czf 
>>>>>>>>>>>>>> linux-4.1-rc5.tgz
>>>>>>>>>>>>>> linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches)
>>>>>>>>>>>>>> # time (echo 3 > /proc/sys/vm/drop_caches; rm -rf 
>>>>>>>>>>>>>> linux-4.1-rc5.tgz
>>>>>>>>>>>>>> linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> And here are the process times:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>>>>> | | UNTAR | DU | FIND | TAR | RM |
>>>>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>>>>> | single | ~3m45s | ~43s | ~47s | ~3m10s | ~3m15s |
>>>>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>>>>> | replicated | ~5m10s | ~59s | ~1m6s | ~1m19s | ~1m49s |
>>>>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>>>>> | distributed | ~4m18s | ~41s | ~57s | ~2m24s | ~1m38s |
>>>>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>>>>> | dist-repl | ~8m18s | ~1m4s | ~1m11s | ~1m24s | ~2m40s |
>>>>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>>>>> | native FS | ~11s | ~4s | ~2s | ~56s | ~10s |
>>>>>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I get the same results, whether with default 
>>>>>>>>>>>>>> configurations with custom
>>>>>>>>>>>>>> configurations.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> if I look at the side of the ifstat command, I can note 
>>>>>>>>>>>>>> my IO write processes
>>>>>>>>>>>>>> never exceed 3MBs...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> EXT4 native FS seems to be faster (roughly 15-20% but no 
>>>>>>>>>>>>>> more) than XFS one
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My [test] storage cluster config is composed by 2 
>>>>>>>>>>>>>> identical servers (biCPU
>>>>>>>>>>>>>> Intel Xeon X5355, 8GB of RAM, 2x2TB HDD (no-RAID) and Gb 
>>>>>>>>>>>>>> ethernet)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My volume settings:
>>>>>>>>>>>>>> single: 1server 1 brick
>>>>>>>>>>>>>> replicated: 2 servers 1 brick each
>>>>>>>>>>>>>> distributed: 2 servers 2 bricks each
>>>>>>>>>>>>>> dist-repl: 2 bricks in the same server and replica 2
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> All seems to be OK in gluster status command line.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Do you have an idea why I obtain so bad results?
>>>>>>>>>>>>>> Thanks in advance.
>>>>>>>>>>>>>> Geoffrey
>>>>>>>>>>>>>> -----------------------------------------------
>>>>>>>>>>>>>> Geoffrey Letessier
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Responsable informatique & ingénieur système
>>>>>>>>>>>>>> CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
>>>>>>>>>>>>>> Institut de Biologie Physico-Chimique
>>>>>>>>>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>>>>>>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr 
>>>>>>>>>>>>>> <mailto:geoffrey.letessier at cnrs.fr>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Gluster-users mailing list Gluster-users at gluster.org 
>>>>>>>>>>>>>> <mailto:Gluster-users at gluster.org>
>>>>>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>> <quota-verify.gz>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150613/c135d964/attachment.html>


More information about the Gluster-users mailing list