[Gluster-users] glusterfs under high load failing?
Pranith Kumar Karampuri
pkarampu at redhat.com
Wed Oct 15 15:24:37 UTC 2014
On 10/14/2014 01:20 AM, Roman wrote:
> ok. done.
> this time there were no disconnects, at least all of vms are working,
> but got some mails from VM about IO writes again.
>
> WARNINGs: Read IO Wait time is 1.45 (outside range [0:1]).
This warning says 'Read IO wait' and there is not a single READ
operation that came to gluster. Wondering why that is :-/. Any clue?
There is at least one write which took 3 seconds according to the stats.
At least one synchronization operation (FINODELK) took 23 seconds. Could
you give logs of this run? for mount, glustershd, bricks.
Pranith
>
> here is the output
>
> root at stor1:~# gluster volume profile HA-WIN-TT-1T info
> Brick: stor1:/exports/NFS-WIN/1T
> --------------------------------
> Cumulative Stats:
> Block Size: 131072b+ 262144b+
> No. of Reads: 0 0
> No. of Writes: 7372798 1
> %-latency Avg-latency Min-Latency Max-Latency No. of calls
> Fop
> --------- ----------- ----------- ----------- ------------
> ----
> 0.00 0.00 us 0.00 us 0.00 us 25
> RELEASE
> 0.00 0.00 us 0.00 us 0.00 us 16
> RELEASEDIR
> 0.00 64.00 us 52.00 us 76.00 us 2
> ENTRYLK
> 0.00 73.50 us 51.00 us 96.00 us 2
> FLUSH
> 0.00 68.43 us 30.00 us 135.00 us 7
> STATFS
> 0.00 54.31 us 44.00 us 109.00 us 16
> OPENDIR
> 0.00 50.75 us 16.00 us 74.00 us 24
> FSTAT
> 0.00 47.77 us 19.00 us 119.00 us 26
> GETXATTR
> 0.00 59.21 us 21.00 us 89.00 us 24
> OPEN
> 0.00 59.39 us 22.00 us 296.00 us 28
> READDIR
> 0.00 4972.00 us 4972.00 us 4972.00 us 1
> CREATE
> 0.00 97.42 us 19.00 us 184.00 us 62
> LOOKUP
> 0.00 89.49 us 20.00 us 656.00 us 324
> FXATTROP
> 3.91 1255944.81 us 127.00 us 23397532.00 us 189
> FSYNC
> 7.40 3406275.50 us 17.00 us 23398013.00 us 132
> INODELK
> 34.96 94598.02 us 8.00 us 23398705.00 us 22445
> FINODELK
> 53.73 442.66 us 79.00 us 3116494.00 us 7372799
> WRITE
>
> Duration: 7813 seconds
> Data Read: 0 bytes
> Data Written: 966367641600 bytes
>
> Interval 0 Stats:
> Block Size: 131072b+ 262144b+
> No. of Reads: 0 0
> No. of Writes: 7372798 1
> %-latency Avg-latency Min-Latency Max-Latency No. of calls
> Fop
> --------- ----------- ----------- ----------- ------------
> ----
> 0.00 0.00 us 0.00 us 0.00 us 25
> RELEASE
> 0.00 0.00 us 0.00 us 0.00 us 16
> RELEASEDIR
> 0.00 64.00 us 52.00 us 76.00 us 2
> ENTRYLK
> 0.00 73.50 us 51.00 us 96.00 us 2
> FLUSH
> 0.00 68.43 us 30.00 us 135.00 us 7
> STATFS
> 0.00 54.31 us 44.00 us 109.00 us 16
> OPENDIR
> 0.00 50.75 us 16.00 us 74.00 us 24
> FSTAT
> 0.00 47.77 us 19.00 us 119.00 us 26
> GETXATTR
> 0.00 59.21 us 21.00 us 89.00 us 24
> OPEN
> 0.00 59.39 us 22.00 us 296.00 us 28
> READDIR
> 0.00 4972.00 us 4972.00 us 4972.00 us 1
> CREATE
> 0.00 97.42 us 19.00 us 184.00 us 62
> LOOKUP
> 0.00 89.49 us 20.00 us 656.00 us 324
> FXATTROP
> 3.91 1255944.81 us 127.00 us 23397532.00 us 189
> FSYNC
> 7.40 3406275.50 us 17.00 us 23398013.00 us 132
> INODELK
> 34.96 94598.02 us 8.00 us 23398705.00 us 22445
> FINODELK
> 53.73 442.66 us 79.00 us 3116494.00 us 7372799
> WRITE
>
> Duration: 7813 seconds
> Data Read: 0 bytes
> Data Written: 966367641600 bytes
>
> Brick: stor2:/exports/NFS-WIN/1T
> --------------------------------
> Cumulative Stats:
> Block Size: 131072b+ 262144b+
> No. of Reads: 0 0
> No. of Writes: 7372798 1
> %-latency Avg-latency Min-Latency Max-Latency No. of calls
> Fop
> --------- ----------- ----------- ----------- ------------
> ----
> 0.00 0.00 us 0.00 us 0.00 us 25
> RELEASE
> 0.00 0.00 us 0.00 us 0.00 us 16
> RELEASEDIR
> 0.00 61.50 us 46.00 us 77.00 us 2
> ENTRYLK
> 0.00 82.00 us 67.00 us 97.00 us 2
> FLUSH
> 0.00 265.00 us 265.00 us 265.00 us 1
> CREATE
> 0.00 57.43 us 30.00 us 85.00 us 7
> STATFS
> 0.00 61.12 us 37.00 us 107.00 us 16
> OPENDIR
> 0.00 44.04 us 12.00 us 86.00 us 24
> FSTAT
> 0.00 41.42 us 24.00 us 96.00 us 26
> GETXATTR
> 0.00 45.93 us 24.00 us 133.00 us 28
> READDIR
> 0.00 57.17 us 25.00 us 147.00 us 24
> OPEN
> 0.00 145.28 us 31.00 us 288.00 us 32
> READDIRP
> 0.00 39.50 us 10.00 us 152.00 us 132
> INODELK
> 0.00 330.97 us 20.00 us 14280.00 us 62
> LOOKUP
> 0.00 79.06 us 19.00 us 851.00 us 430
> FXATTROP
> 0.02 29.32 us 7.00 us 28154.00 us 22568
> FINODELK
> 7.80 1313096.68 us 125.00 us 23281862.00 us 189
> FSYNC
> 92.18 397.92 us 76.00 us 1838343.00 us 7372799
> WRITE
>
> Duration: 7811 seconds
> Data Read: 0 bytes
> Data Written: 966367641600 bytes
>
> Interval 0 Stats:
> Block Size: 131072b+ 262144b+
> No. of Reads: 0 0
> No. of Writes: 7372798 1
> %-latency Avg-latency Min-Latency Max-Latency No. of calls
> Fop
> --------- ----------- ----------- ----------- ------------
> ----
> 0.00 0.00 us 0.00 us 0.00 us 25
> RELEASE
> 0.00 0.00 us 0.00 us 0.00 us 16
> RELEASEDIR
> 0.00 61.50 us 46.00 us 77.00 us 2
> ENTRYLK
> 0.00 82.00 us 67.00 us 97.00 us 2
> FLUSH
> 0.00 265.00 us 265.00 us 265.00 us 1
> CREATE
> 0.00 57.43 us 30.00 us 85.00 us 7
> STATFS
> 0.00 61.12 us 37.00 us 107.00 us 16
> OPENDIR
> 0.00 44.04 us 12.00 us 86.00 us 24
> FSTAT
> 0.00 41.42 us 24.00 us 96.00 us 26
> GETXATTR
> 0.00 45.93 us 24.00 us 133.00 us 28
> READDIR
> 0.00 57.17 us 25.00 us 147.00 us 24
> OPEN
> 0.00 145.28 us 31.00 us 288.00 us 32
> READDIRP
> 0.00 39.50 us 10.00 us 152.00 us 132
> INODELK
> 0.00 330.97 us 20.00 us 14280.00 us 62
> LOOKUP
> 0.00 79.06 us 19.00 us 851.00 us 430
> FXATTROP
> 0.02 29.32 us 7.00 us 28154.00 us 22568
> FINODELK
> 7.80 1313096.68 us 125.00 us 23281862.00 us 189
> FSYNC
> 92.18 397.92 us 76.00 us 1838343.00 us 7372799
> WRITE
>
> Duration: 7811 seconds
> Data Read: 0 bytes
> Data Written: 966367641600 bytes
>
> does it make something more clear?
>
> 2014-10-13 20:40 GMT+03:00 Roman <romeo.r at gmail.com
> <mailto:romeo.r at gmail.com>>:
>
> i think i may know what was an issue. There was an iscsitarget
> service runing, that was exporting this generated block device. so
> maybe my collegue Windows server picked it up and mountd :) I'll
> if it will happen again.
>
> 2014-10-13 20:27 GMT+03:00 Roman <romeo.r at gmail.com
> <mailto:romeo.r at gmail.com>>:
>
> So may I restart the volume and start the test, or you need
> something else from this issue?
>
> 2014-10-13 19:49 GMT+03:00 Pranith Kumar Karampuri
> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>
>
> On 10/13/2014 10:03 PM, Roman wrote:
>> hmm,
>> seems like another strange issue? Seen this before. Had
>> to restart the volume to get my empty space back.
>> root at glstor-cli:/srv/nfs/HA-WIN-TT-1T# ls -l
>> total 943718400
>> -rw-r--r-- 1 root root 966367641600 Oct 13 16:55 disk
>> root at glstor-cli:/srv/nfs/HA-WIN-TT-1T# rm disk
>> root at glstor-cli:/srv/nfs/HA-WIN-TT-1T# df -h
>> Filesystem Size Used Avail Use% Mounted on
>> rootfs 282G 1.1G 266G 1% /
>> udev 10M 0 10M 0% /dev
>> tmpfs 1.4G 228K 1.4G 1% /run
>> /dev/disk/by-uuid/c62ee3c0-c0e5-44af-b0cd-7cb3fbcc0fba
>> 282G 1.1G 266G 1% /
>> tmpfs 5.0M 0 5.0M 0% /run/lock
>> tmpfs 5.2G 0 5.2G 0% /run/shm
>> stor1:HA-WIN-TT-1T 1008G 901G 57G 95%
>> /srv/nfs/HA-WIN-TT-1T
>>
>> no file, but size is still 901G.
>> Both servers show the same.
>> Do I really have to restart the volume to fix that?
> IMO this can happen if there is an fd leak. open-fd is the
> only variable that can change with volume restart. How do
> you re-create the bug?
>
> Pranith
>
>>
>> 2014-10-13 19:30 GMT+03:00 Roman <romeo.r at gmail.com
>> <mailto:romeo.r at gmail.com>>:
>>
>> Sure.
>> I'll let it to run for this night .
>>
>> 2014-10-13 19:19 GMT+03:00 Pranith Kumar Karampuri
>> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>
>> hi Roman,
>> Do you think we can run this test again?
>> this time, could you enable 'gluster volume
>> profile <volname> start', do the same test.
>> Provide output of 'gluster volume profile
>> <volname> info' and logs after the test?
>>
>> Pranith
>>
>> On 10/13/2014 09:45 PM, Roman wrote:
>>> Sure !
>>>
>>> root at stor1:~# gluster volume info
>>>
>>> Volume Name: HA-2TB-TT-Proxmox-cluster
>>> Type: Replicate
>>> Volume ID: 66e38bde-c5fa-4ce2-be6e-6b2adeaa16c2
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: stor1:/exports/HA-2TB-TT-Proxmox-cluster/2TB
>>> Brick2: stor2:/exports/HA-2TB-TT-Proxmox-cluster/2TB
>>> Options Reconfigured:
>>> nfs.disable: 0
>>> network.ping-timeout: 10
>>>
>>> Volume Name: HA-WIN-TT-1T
>>> Type: Replicate
>>> Volume ID: 2937ac01-4cba-44a8-8ff8-0161b67f8ee4
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: stor1:/exports/NFS-WIN/1T
>>> Brick2: stor2:/exports/NFS-WIN/1T
>>> Options Reconfigured:
>>> nfs.disable: 1
>>> network.ping-timeout: 10
>>>
>>>
>>>
>>> 2014-10-13 19:09 GMT+03:00 Pranith Kumar
>>> Karampuri <pkarampu at redhat.com
>>> <mailto:pkarampu at redhat.com>>:
>>>
>>> Could you give your 'gluster volume info'
>>> output?
>>>
>>> Pranith
>>>
>>> On 10/13/2014 09:36 PM, Roman wrote:
>>>> Hi,
>>>>
>>>> I've got this kind of setup (servers run
>>>> replica)
>>>>
>>>>
>>>> @ 10G backend
>>>> gluster storage1
>>>> gluster storage2
>>>> gluster client1
>>>>
>>>> @1g backend
>>>> other gluster clients
>>>>
>>>> Servers got HW RAID5 with SAS disks.
>>>>
>>>> So today I've desided to create a 900GB
>>>> file for iscsi target that will be located
>>>> @ glusterfs separate volume, using dd (just
>>>> a dummy file filled with zeros, bs=1G count
>>>> 900)
>>>> For the first of all the process took
>>>> pretty lots of time, the writing speed was
>>>> 130 MB/sec (client port was 2 gbps, servers
>>>> ports were running @ 1gbps).
>>>> Then it reported something like "endpoint
>>>> is not connected" and all of my VMs on the
>>>> other volume started to give me IO errors.
>>>> Servers load was around 4,6 (total 12 cores)
>>>>
>>>> Maybe it was due to timeout of 2 secs, so
>>>> I've made it a big higher, 10 sec.
>>>>
>>>> Also during the dd image creation time, VMs
>>>> very often reported me that their disks are
>>>> slow like
>>>>
>>>> WARNINGs: Read IO Wait time is -0.02
>>>> (outside range [0:1]).
>>>>
>>>> Is 130MB /sec is the maximum bandwidth for
>>>> all of the volumes in total? That why would
>>>> we need 10g backends?
>>>>
>>>> HW Raid local speed is 300 MB/sec, so it
>>>> should not be an issue. any ideas or mby
>>>> any advices?
>>>>
>>>>
>>>> Maybe some1 got optimized sysctl.conf for
>>>> 10G backend?
>>>>
>>>> mine is pretty simple, which can be found
>>>> from googling.
>>>>
>>>>
>>>> just to mention: those VM-s were connected
>>>> using separate 1gbps intraface, which
>>>> means, they should not be affected by the
>>>> client with 10g backend.
>>>>
>>>>
>>>> logs are pretty useless, they just say
>>>> this during the outage
>>>>
>>>>
>>>> [2014-10-13 12:09:18.392910] W
>>>> [client-handshake.c:276:client_ping_cbk]
>>>> 0-HA-2TB-TT-Proxmox-cluster-client-0: timer
>>>> must have expired
>>>>
>>>> [2014-10-13 12:10:08.389708] C
>>>> [client-handshake.c:127:rpc_client_ping_timer_expired]
>>>> 0-HA-2TB-TT-Proxmox-cluster-client-0:
>>>> server 10.250.0.1:49159
>>>> <http://10.250.0.1:49159> has not responded
>>>> in the last 2 seconds, disconnecting.
>>>>
>>>> [2014-10-13 12:10:08.390312] W
>>>> [client-handshake.c:276:client_ping_cbk]
>>>> 0-HA-2TB-TT-Proxmox-cluster-client-0: timer
>>>> must have expired
>>>>
>>>> so I decided to set the timout a bit higher.
>>>>
>>>> So it seems to me, that under high load
>>>> GlusterFS is not useable? 130 MB/s is not
>>>> that much to get some kind of timeouts or
>>>> makeing the systme so slow, that VM-s
>>>> feeling themselves bad.
>>>>
>>>> Of course, after the disconnection, healing
>>>> process was started, but as VM-s lost
>>>> connection to both of servers, it was
>>>> pretty useless, they could not run anymore.
>>>> and BTW, when u load the server with such
>>>> huge job (dd of 900GB), healing process
>>>> goes soooooo slow :)
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Roman.
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Roman.
>>
>>
>>
>>
>> --
>> Best regards,
>> Roman.
>>
>>
>>
>>
>> --
>> Best regards,
>> Roman.
>
>
>
>
> --
> Best regards,
> Roman.
>
>
>
>
> --
> Best regards,
> Roman.
>
>
>
>
> --
> Best regards,
> Roman.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141015/7dc8effc/attachment.html>
More information about the Gluster-users
mailing list