[Gluster-users] glusterfs under high load failing?

Roman romeo.r at gmail.com
Wed Oct 15 21:35:48 UTC 2014


But VMs are running from different volume, so you are not able to see it...
today my collegue started a full format for that huge virtual disk file and
I've recieved it again:

      WARNINGs: Read IO Wait time is 1.22 (outside range [0:1]).

I'll send logs @ friday.

2014-10-15 18:24 GMT+03:00 Pranith Kumar Karampuri <pkarampu at redhat.com>:

>
> On 10/14/2014 01:20 AM, Roman wrote:
>
> ok. done.
> this time there were no disconnects, at least all of vms are working, but
> got some mails from VM about IO writes again.
>
>  WARNINGs: Read IO Wait time is 1.45 (outside range [0:1]).
>
> This warning says 'Read IO wait' and there is not a single READ operation
> that came to gluster. Wondering why that is :-/. Any clue? There is at
> least one write which took 3 seconds according to the stats. At least one
> synchronization operation (FINODELK) took 23 seconds. Could you give logs
> of this run? for  mount, glustershd, bricks.
>
> Pranith
>
>
>  here is the output
>
>  root at stor1:~# gluster volume profile HA-WIN-TT-1T info
> Brick: stor1:/exports/NFS-WIN/1T
> --------------------------------
> Cumulative Stats:
>    Block Size:             131072b+              262144b+
>  No. of Reads:                    0                     0
> No. of Writes:              7372798                     1
>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls
>   Fop
>  ---------   -----------   -----------   -----------   ------------
>  ----
>       0.00       0.00 us       0.00 us       0.00 us             25
> RELEASE
>       0.00       0.00 us       0.00 us       0.00 us             16
>  RELEASEDIR
>       0.00      64.00 us      52.00 us      76.00 us              2
> ENTRYLK
>       0.00      73.50 us      51.00 us      96.00 us              2
> FLUSH
>       0.00      68.43 us      30.00 us     135.00 us              7
>  STATFS
>       0.00      54.31 us      44.00 us     109.00 us             16
> OPENDIR
>       0.00      50.75 us      16.00 us      74.00 us             24
> FSTAT
>       0.00      47.77 us      19.00 us     119.00 us             26
>  GETXATTR
>       0.00      59.21 us      21.00 us      89.00 us             24
>  OPEN
>       0.00      59.39 us      22.00 us     296.00 us             28
> READDIR
>       0.00    4972.00 us    4972.00 us    4972.00 us              1
>  CREATE
>       0.00      97.42 us      19.00 us     184.00 us             62
>  LOOKUP
>       0.00      89.49 us      20.00 us     656.00 us            324
>  FXATTROP
>       3.91 1255944.81 us     127.00 us 23397532.00 us            189
> FSYNC
>       7.40 3406275.50 us      17.00 us 23398013.00 us            132
> INODELK
>      34.96   94598.02 us       8.00 us 23398705.00 us          22445
>  FINODELK
>      53.73     442.66 us      79.00 us 3116494.00 us        7372799
> WRITE
>
>      Duration: 7813 seconds
>    Data Read: 0 bytes
> Data Written: 966367641600 bytes
>
>  Interval 0 Stats:
>    Block Size:             131072b+              262144b+
>  No. of Reads:                    0                     0
> No. of Writes:              7372798                     1
>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls
>   Fop
>  ---------   -----------   -----------   -----------   ------------
>  ----
>       0.00       0.00 us       0.00 us       0.00 us             25
> RELEASE
>       0.00       0.00 us       0.00 us       0.00 us             16
>  RELEASEDIR
>       0.00      64.00 us      52.00 us      76.00 us              2
> ENTRYLK
>       0.00      73.50 us      51.00 us      96.00 us              2
> FLUSH
>       0.00      68.43 us      30.00 us     135.00 us              7
>  STATFS
>       0.00      54.31 us      44.00 us     109.00 us             16
> OPENDIR
>       0.00      50.75 us      16.00 us      74.00 us             24
> FSTAT
>       0.00      47.77 us      19.00 us     119.00 us             26
>  GETXATTR
>       0.00      59.21 us      21.00 us      89.00 us             24
>  OPEN
>       0.00      59.39 us      22.00 us     296.00 us             28
> READDIR
>       0.00    4972.00 us    4972.00 us    4972.00 us              1
>  CREATE
>       0.00      97.42 us      19.00 us     184.00 us             62
>  LOOKUP
>       0.00      89.49 us      20.00 us     656.00 us            324
>  FXATTROP
>       3.91 1255944.81 us     127.00 us 23397532.00 us            189
> FSYNC
>       7.40 3406275.50 us      17.00 us 23398013.00 us            132
> INODELK
>      34.96   94598.02 us       8.00 us 23398705.00 us          22445
>  FINODELK
>      53.73     442.66 us      79.00 us 3116494.00 us        7372799
> WRITE
>
>      Duration: 7813 seconds
>    Data Read: 0 bytes
> Data Written: 966367641600 bytes
>
>  Brick: stor2:/exports/NFS-WIN/1T
> --------------------------------
> Cumulative Stats:
>    Block Size:             131072b+              262144b+
>  No. of Reads:                    0                     0
> No. of Writes:              7372798                     1
>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls
>   Fop
>  ---------   -----------   -----------   -----------   ------------
>  ----
>       0.00       0.00 us       0.00 us       0.00 us             25
> RELEASE
>       0.00       0.00 us       0.00 us       0.00 us             16
>  RELEASEDIR
>       0.00      61.50 us      46.00 us      77.00 us              2
> ENTRYLK
>       0.00      82.00 us      67.00 us      97.00 us              2
> FLUSH
>       0.00     265.00 us     265.00 us     265.00 us              1
>  CREATE
>       0.00      57.43 us      30.00 us      85.00 us              7
>  STATFS
>       0.00      61.12 us      37.00 us     107.00 us             16
> OPENDIR
>       0.00      44.04 us      12.00 us      86.00 us             24
> FSTAT
>       0.00      41.42 us      24.00 us      96.00 us             26
>  GETXATTR
>       0.00      45.93 us      24.00 us     133.00 us             28
> READDIR
>       0.00      57.17 us      25.00 us     147.00 us             24
>  OPEN
>       0.00     145.28 us      31.00 us     288.00 us             32
>  READDIRP
>       0.00      39.50 us      10.00 us     152.00 us            132
> INODELK
>       0.00     330.97 us      20.00 us   14280.00 us             62
>  LOOKUP
>       0.00      79.06 us      19.00 us     851.00 us            430
>  FXATTROP
>       0.02      29.32 us       7.00 us   28154.00 us          22568
>  FINODELK
>       7.80 1313096.68 us     125.00 us 23281862.00 us            189
> FSYNC
>      92.18     397.92 us      76.00 us 1838343.00 us        7372799
> WRITE
>
>      Duration: 7811 seconds
>    Data Read: 0 bytes
> Data Written: 966367641600 bytes
>
>  Interval 0 Stats:
>    Block Size:             131072b+              262144b+
>  No. of Reads:                    0                     0
> No. of Writes:              7372798                     1
>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls
>   Fop
>  ---------   -----------   -----------   -----------   ------------
>  ----
>       0.00       0.00 us       0.00 us       0.00 us             25
> RELEASE
>       0.00       0.00 us       0.00 us       0.00 us             16
>  RELEASEDIR
>       0.00      61.50 us      46.00 us      77.00 us              2
> ENTRYLK
>       0.00      82.00 us      67.00 us      97.00 us              2
> FLUSH
>       0.00     265.00 us     265.00 us     265.00 us              1
>  CREATE
>       0.00      57.43 us      30.00 us      85.00 us              7
>  STATFS
>       0.00      61.12 us      37.00 us     107.00 us             16
> OPENDIR
>       0.00      44.04 us      12.00 us      86.00 us             24
> FSTAT
>       0.00      41.42 us      24.00 us      96.00 us             26
>  GETXATTR
>       0.00      45.93 us      24.00 us     133.00 us             28
> READDIR
>       0.00      57.17 us      25.00 us     147.00 us             24
>  OPEN
>       0.00     145.28 us      31.00 us     288.00 us             32
>  READDIRP
>       0.00      39.50 us      10.00 us     152.00 us            132
> INODELK
>       0.00     330.97 us      20.00 us   14280.00 us             62
>  LOOKUP
>       0.00      79.06 us      19.00 us     851.00 us            430
>  FXATTROP
>       0.02      29.32 us       7.00 us   28154.00 us          22568
>  FINODELK
>       7.80 1313096.68 us     125.00 us 23281862.00 us            189
> FSYNC
>      92.18     397.92 us      76.00 us 1838343.00 us        7372799
> WRITE
>
>      Duration: 7811 seconds
>    Data Read: 0 bytes
> Data Written: 966367641600 bytes
>
>  does it make something more clear?
>
> 2014-10-13 20:40 GMT+03:00 Roman <romeo.r at gmail.com>:
>
>> i think i may know what was an issue. There was an iscsitarget service
>> runing, that was exporting this generated block device. so maybe my
>> collegue Windows server picked it up and mountd :) I'll if it will happen
>> again.
>>
>> 2014-10-13 20:27 GMT+03:00 Roman <romeo.r at gmail.com>:
>>
>>> So may I restart the volume and start the test, or you need something
>>> else from this issue?
>>>
>>> 2014-10-13 19:49 GMT+03:00 Pranith Kumar Karampuri <pkarampu at redhat.com>
>>> :
>>>
>>>>
>>>> On 10/13/2014 10:03 PM, Roman wrote:
>>>>
>>>> hmm,
>>>> seems like another strange issue? Seen this before. Had to restart the
>>>> volume to get my empty space back.
>>>>  root at glstor-cli:/srv/nfs/HA-WIN-TT-1T# ls -l
>>>> total 943718400
>>>> -rw-r--r-- 1 root root 966367641600 Oct 13 16:55 disk
>>>> root at glstor-cli:/srv/nfs/HA-WIN-TT-1T# rm disk
>>>> root at glstor-cli:/srv/nfs/HA-WIN-TT-1T# df -h
>>>> Filesystem                                              Size  Used
>>>> Avail Use% Mounted on
>>>> rootfs                                                  282G  1.1G
>>>>  266G   1% /
>>>> udev                                                     10M     0
>>>> 10M   0% /dev
>>>> tmpfs                                                   1.4G  228K
>>>>  1.4G   1% /run
>>>> /dev/disk/by-uuid/c62ee3c0-c0e5-44af-b0cd-7cb3fbcc0fba  282G  1.1G
>>>>  266G   1% /
>>>> tmpfs                                                   5.0M     0
>>>>  5.0M   0% /run/lock
>>>> tmpfs                                                   5.2G     0
>>>>  5.2G   0% /run/shm
>>>> stor1:HA-WIN-TT-1T                                     1008G  901G
>>>> 57G  95% /srv/nfs/HA-WIN-TT-1T
>>>>
>>>>  no file, but size is still 901G.
>>>> Both servers show the same.
>>>> Do I really have to restart the volume to fix that?
>>>>
>>>>  IMO this can happen if there is an fd leak. open-fd is the only
>>>> variable that can change with volume restart. How do you re-create the bug?
>>>>
>>>> Pranith
>>>>
>>>>
>>>> 2014-10-13 19:30 GMT+03:00 Roman <romeo.r at gmail.com>:
>>>>
>>>>> Sure.
>>>>> I'll let it to run for this night .
>>>>>
>>>>> 2014-10-13 19:19 GMT+03:00 Pranith Kumar Karampuri <
>>>>> pkarampu at redhat.com>:
>>>>>
>>>>>>  hi Roman,
>>>>>>      Do you think we can run this test again? this time, could you
>>>>>> enable 'gluster volume profile <volname> start', do the same test. Provide
>>>>>> output of 'gluster volume profile <volname> info' and logs after the test?
>>>>>>
>>>>>> Pranith
>>>>>>
>>>>>> On 10/13/2014 09:45 PM, Roman wrote:
>>>>>>
>>>>>> Sure !
>>>>>>
>>>>>>  root at stor1:~# gluster volume info
>>>>>>
>>>>>>  Volume Name: HA-2TB-TT-Proxmox-cluster
>>>>>> Type: Replicate
>>>>>> Volume ID: 66e38bde-c5fa-4ce2-be6e-6b2adeaa16c2
>>>>>> Status: Started
>>>>>> Number of Bricks: 1 x 2 = 2
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: stor1:/exports/HA-2TB-TT-Proxmox-cluster/2TB
>>>>>> Brick2: stor2:/exports/HA-2TB-TT-Proxmox-cluster/2TB
>>>>>> Options Reconfigured:
>>>>>> nfs.disable: 0
>>>>>> network.ping-timeout: 10
>>>>>>
>>>>>>  Volume Name: HA-WIN-TT-1T
>>>>>> Type: Replicate
>>>>>> Volume ID: 2937ac01-4cba-44a8-8ff8-0161b67f8ee4
>>>>>> Status: Started
>>>>>> Number of Bricks: 1 x 2 = 2
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: stor1:/exports/NFS-WIN/1T
>>>>>> Brick2: stor2:/exports/NFS-WIN/1T
>>>>>> Options Reconfigured:
>>>>>> nfs.disable: 1
>>>>>> network.ping-timeout: 10
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2014-10-13 19:09 GMT+03:00 Pranith Kumar Karampuri <
>>>>>> pkarampu at redhat.com>:
>>>>>>
>>>>>>>  Could you give your 'gluster volume info' output?
>>>>>>>
>>>>>>> Pranith
>>>>>>>
>>>>>>> On 10/13/2014 09:36 PM, Roman wrote:
>>>>>>>
>>>>>>>  Hi,
>>>>>>>
>>>>>>>  I've got this kind of setup (servers run replica)
>>>>>>>
>>>>>>>
>>>>>>>  @ 10G backend
>>>>>>> gluster storage1
>>>>>>> gluster storage2
>>>>>>> gluster client1
>>>>>>>
>>>>>>>  @1g backend
>>>>>>> other gluster clients
>>>>>>>
>>>>>>>  Servers got HW RAID5 with SAS disks.
>>>>>>>
>>>>>>>  So today I've desided to create a 900GB file for iscsi target that
>>>>>>> will be located @ glusterfs separate volume, using dd (just a dummy file
>>>>>>> filled with zeros, bs=1G count 900)
>>>>>>> For the first of all the process took pretty lots of time, the
>>>>>>> writing speed was 130 MB/sec (client port was 2 gbps, servers ports were
>>>>>>> running @ 1gbps).
>>>>>>> Then it reported something like "endpoint is not connected" and all
>>>>>>> of my VMs on the other volume started to give me IO errors.
>>>>>>> Servers load was around 4,6 (total 12 cores)
>>>>>>>
>>>>>>>  Maybe it was due to timeout of 2 secs, so I've made it a big
>>>>>>> higher, 10 sec.
>>>>>>>
>>>>>>>  Also during the dd image creation time, VMs very often reported me
>>>>>>> that their disks are slow like
>>>>>>>
>>>>>>> WARNINGs: Read IO Wait time is -0.02 (outside range [0:1]).
>>>>>>>
>>>>>>> Is 130MB /sec is the maximum bandwidth for all of the volumes in
>>>>>>> total? That why would we need 10g backends?
>>>>>>>
>>>>>>> HW Raid local speed is 300 MB/sec, so it should not be an issue. any
>>>>>>> ideas or mby any advices?
>>>>>>>
>>>>>>>
>>>>>>>  Maybe some1 got optimized sysctl.conf for 10G backend?
>>>>>>>
>>>>>>> mine is pretty simple, which can be found from googling.
>>>>>>>
>>>>>>>
>>>>>>>  just to mention: those VM-s were connected using separate 1gbps
>>>>>>> intraface, which means, they should not be affected by the client with 10g
>>>>>>> backend.
>>>>>>>
>>>>>>>
>>>>>>>  logs are pretty useless, they just say  this during the outage
>>>>>>>
>>>>>>>
>>>>>>>  [2014-10-13 12:09:18.392910] W
>>>>>>> [client-handshake.c:276:client_ping_cbk]
>>>>>>> 0-HA-2TB-TT-Proxmox-cluster-client-0: timer must have expired
>>>>>>>
>>>>>>> [2014-10-13 12:10:08.389708] C
>>>>>>> [client-handshake.c:127:rpc_client_ping_timer_expired]
>>>>>>> 0-HA-2TB-TT-Proxmox-cluster-client-0: server 10.250.0.1:49159 has
>>>>>>> not responded in the last 2 seconds, disconnecting.
>>>>>>>
>>>>>>> [2014-10-13 12:10:08.390312] W
>>>>>>> [client-handshake.c:276:client_ping_cbk]
>>>>>>> 0-HA-2TB-TT-Proxmox-cluster-client-0: timer must have expired
>>>>>>>  so I decided to set the timout a bit higher.
>>>>>>>
>>>>>>>  So it seems to me, that under high load GlusterFS is not useable?
>>>>>>> 130 MB/s is not that much to get some kind of timeouts or makeing the
>>>>>>> systme so slow, that VM-s feeling themselves bad.
>>>>>>>
>>>>>>>  Of course, after the disconnection, healing process was started,
>>>>>>> but as VM-s lost connection to both of servers, it was pretty useless, they
>>>>>>> could not run anymore. and BTW, when u load the server with such huge job
>>>>>>> (dd of 900GB), healing process goes soooooo slow :)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>> Best regards,
>>>>>>> Roman.
>>>>>>>
>>>>>>>
>>>>>>>  _______________________________________________
>>>>>>> Gluster-users mailing listGluster-users at gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>>>>> Best regards,
>>>>>> Roman.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>   --
>>>>> Best regards,
>>>>> Roman.
>>>>>
>>>>
>>>>
>>>>
>>>>  --
>>>> Best regards,
>>>> Roman.
>>>>
>>>>
>>>>
>>>
>>>
>>>   --
>>> Best regards,
>>> Roman.
>>>
>>
>>
>>
>>   --
>> Best regards,
>> Roman.
>>
>
>
>
>  --
> Best regards,
> Roman.
>
>
>


-- 
Best regards,
Roman.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141016/d93c9d50/attachment.html>


More information about the Gluster-users mailing list