[Gluster-devel] Fw: Re[2]: missing files
Pranith Kumar Karampuri
pkarampu at redhat.com
Wed Feb 11 13:21:18 UTC 2015
On 02/11/2015 06:49 PM, Pranith Kumar Karampuri wrote:
>
> On 02/11/2015 08:36 AM, Shyam wrote:
>> Did some analysis with David today on this here is a gist for the list,
>>
>> 1) Volumes classified as slow (i.e with a lot of pre-existing data)
>> and fast (new volumes carved from the same backend file system that
>> slow bricks are on, with little or no data)
>>
>> 2) We ran an strace of tar and also collected io-stats outputs from
>> these volumes, both show that create and mkdir is slower on slow as
>> compared to the fast volume. This seems to be the overall reason for
>> slowness.
> Did you happen to do strace of the brick when this happened? If not,
> David, can we get that information as well?
It would be nice to compare the difference in syscalls of the bricks of
two volumes to see if there are any extra syscalls that is adding to the
delay.
Pranith
>
> Pranith
>>
>> 3) The tarball extraction is to a new directory on the gluster mount,
>> so all lookups etc. happen within this new name space on the volume
>>
>> 4) Checked memory footprints of the slow bricks and fast bricks etc.
>> nothing untoward noticed there
>>
>> 5) Restarted the slow volume, just as a test case to do things from
>> scratch, no improvement in performance.
>>
>> Currently attempting to reproduce this on a local system to see if
>> the same behavior is seen so that it becomes easier to debug etc.
>>
>> Others on the list can chime in as they see fit.
>>
>> Thanks,
>> Shyam
>>
>> On 02/10/2015 09:58 AM, David F. Robinson wrote:
>>> Forwarding to devel list as recommended by Justin...
>>>
>>> David
>>>
>>>
>>> ------ Forwarded Message ------
>>> From: "David F. Robinson" <david.robinson at corvidtec.com>
>>> To: "Justin Clift" <justin at gluster.org>
>>> Sent: 2/10/2015 9:49:09 AM
>>> Subject: Re[2]: [Gluster-devel] missing files
>>>
>>> Bad news... I don't think it is the old linkto files. Bad because if
>>> that was the issue, cleaning up all of bad linkto files would have
>>> fixed
>>> the issue. It seems like the system just gets slower as you add data.
>>>
>>> First, I setup a new clean volume (test2brick) on the same system as
>>> the
>>> old one (homegfs_bkp). See 'gluster v info' below. I ran my simple tar
>>> extraction test on the new volume and it took 58-seconds to complete
>>> (which, BTW, is 10-seconds faster than my old non-gluster system, so
>>> kudos). The time on homegfs_bkp is 19-minutes.
>>>
>>> Next, I copied 10-terabytes of data over to test2brick and re-ran the
>>> test which then took 7-minutes. I created a test3brick and ran the test
>>> and it took 53-seconds.
>>>
>>> To confirm all of this, I deleted all of the data from test2brick and
>>> re-ran the test. It took 51-seconds!!!
>>>
>>> BTW. I also checked the .glusterfs for stale linkto files (find . -type
>>> f -size 0 -perm 1000 -exec ls -al {} \;). There are many, many
>>> thousands
>>> of these types of files on the old volume and none on the new one, so I
>>> don't think this is related to the performance issue.
>>>
>>> Let me know how I should proceed. Send this to devel list? Pranith?
>>> others? Thanks...
>>>
>>> [root at gfs01bkp .glusterfs]# gluster volume info homegfs_bkp
>>> Volume Name: homegfs_bkp
>>> Type: Distribute
>>> Volume ID: 96de8872-d957-4205-bf5a-076e3f35b294
>>> Status: Started
>>> Number of Bricks: 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/homegfs_bkp
>>> Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/homegfs_bkp
>>>
>>> [root at gfs01bkp .glusterfs]# gluster volume info test2brick
>>> Volume Name: test2brick
>>> Type: Distribute
>>> Volume ID: 123259b2-3c61-4277-a7e8-27c7ec15e550
>>> Status: Started
>>> Number of Bricks: 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/test2brick
>>> Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/test2brick
>>>
>>> [root at gfs01bkp glusterfs]# gluster volume info test3brick
>>> Volume Name: test3brick
>>> Type: Distribute
>>> Volume ID: 9b1613fc-f7e5-4325-8f94-e3611a5c3701
>>> Status: Started
>>> Number of Bricks: 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/test3brick
>>> Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/test3brick
>>>
>>>
>>> From homegfs_bkp:
>>> # find . -type f -size 0 -perm 1000 -exec ls -al {} \;
>>> --------T 2 gmathur pme_ics 0 Jan 9 16:59
>>> ./00/16/00169a69-1a7a-44c9-b2d8-991671ee87c4
>>> ---------T 3 jcowan users 0 Jan 9 17:51
>>> ./00/16/0016a0a0-fd22-4fb5-b6fb-5d7f9024ab74
>>> ---------T 2 morourke sbir 0 Jan 9 18:17
>>> ./00/16/0016b36f-32fc-4f2c-accd-e36be2f6c602
>>> ---------T 2 carpentr irl 0 Jan 9 18:52
>>> ./00/16/00163faf-741c-4e40-8081-784786b3cc71
>>> ---------T 3 601 raven 0 Jan 9 22:49
>>> ./00/16/00163385-a332-4050-8104-1b1af6cd8249
>>> ---------T 3 bangell sbir 0 Jan 9 22:56
>>> ./00/16/00167803-0244-46de-8246-d9c382dd3083
>>> ---------T 2 morourke sbir 0 Jan 9 23:17
>>> ./00/16/00167bc5-fc56-42ee-9e3f-1e238f3828f4
>>> ---------T 3 morourke sbir 0 Jan 9 23:34
>>> ./00/16/0016a71e-89cf-4a86-9575-49c7e9d216c6
>>> ---------T 2 gmathur users 0 Jan 9 23:47
>>> ./00/16/00168aa2-d069-4a77-8790-e36431324ca5
>>> ---------T 2 bangell users 0 Jan 22 09:24
>>> ./00/16/0016e720-a190-4e43-962f-aa3e4216e5f5
>>> ---------T 2 root root 0 Jan 22 09:26
>>> ./00/16/00169e95-64b7-455c-82dc-d9940ee7fe43
>>> ---------T 2 dfrobins users 0 Jan 22 09:27
>>> ./00/16/00161b04-1612-4fba-99a4-2a2b54062fdb
>>> ---------T 2 mdick users 0 Jan 22 09:27
>>> ./00/16/0016ba60-310a-4bee-968a-36eb290e8c9e
>>> ---------T 2 dfrobins users 0 Jan 22 09:43
>>> ./00/16/00160315-1533-4290-8c1a-72e2fbb1962a
>>> From test2brick:
>>> find . -type f -size 0 -perm 1000 -exec ls -al {} \;
>>>
>>>
>>>
>>>
>>>
>>> ------ Original Message ------
>>> From: "Justin Clift" <justin at gluster.org>
>>> To: "David F. Robinson" <david.robinson at corvidtec.com>
>>> Sent: 2/9/2015 11:33:54 PM
>>> Subject: Re: [Gluster-devel] missing files
>>>
>>>> Interesting. (I'm 1/2 asleep atm and really need sleep soon, so
>>>> take this
>>>> with a grain of salt... ;>)
>>>>
>>>> As a curiosity question, does the homegfs_bkp volume have a bunch of
>>>> outdated metadata still in it? eg left over extended attributes or
>>>> something
>>>>
>>>> Remembering a question you asked earlier er... today/yesterday
>>>> about old
>>>> extended attribute entries and if they hang around forever. I don't
>>>> know the
>>>> answer to that, but if the old volume still has a 1000's (or more) of
>>>> entries
>>>> around, perhaps there's some lookup problem that's killing lookup
>>>> times for
>>>> file operations.
>>>>
>>>> On a side note, I can probably setup my test lab stuff here again
>>>> tomorrow
>>>> and try this stuff out myself to see if I can replicate the problem.
>>>> (if that
>>>> could potentially be useful?)
>>>>
>>>> + Justin
>>>>
>>>>
>>>>
>>>> On 9 Feb 2015, at 22:56, David F. Robinson
>>>> <david.robinson at corvidtec.com> wrote:
>>>>> Justin,
>>>>>
>>>>> Hoping you can help point this to the right people once again. Maybe
>>>>> all of these issues are related.
>>>>>
>>>>> You can look at the email traffic below, but the summary is that I
>>>>> was working with Ben to figure out why my GFS system was 20x slower
>>>>> than my old storage system. During my tracing of this issue, I
>>>>> determined that if I create a new volume on my storage system, this
>>>>> slowness goes away. So, either it is faster because it doesn't have
>>>>> any data on this new volume (I hope this isn't the case) or the older
>>>>> partitions somehow became corrupted during the upgrades or has some
>>>>> depricated parameters set that slow it down.
>>>>>
>>>>> Very strange and hoping you can once again help... Thanks in
>>>>> advance...
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>> ------ Forwarded Message ------
>>>>> From: "David F. Robinson" <david.robinson at corvidtec.com>
>>>>> To: "Benjamin Turner" <bennyturns at gmail.com>
>>>>> Sent: 2/9/2015 5:52:00 PM
>>>>> Subject: Re[5]: [Gluster-devel] missing files
>>>>>
>>>>> Ben,
>>>>>
>>>>> I cleared the logs and rebooted the machine. Same issue. homegfs_bkp
>>>>> takes 19-minutes and test2brick (the new volume) takes 1-minute.
>>>>>
>>>>> Is it possible that some old parameters are still set for
>>>>> homegfs_bkp that are no longer in use? I tried a gluster volume reset
>>>>> for homegfs_bkp, but it didn't have any effect.
>>>>>
>>>>> I have attached the full logs.
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>> ------ Original Message ------
>>>>> From: "David F. Robinson" <david.robinson at corvidtec.com>
>>>>> To: "Benjamin Turner" <bennyturns at gmail.com>
>>>>> Sent: 2/9/2015 5:39:18 PM
>>>>> Subject: Re[4]: [Gluster-devel] missing files
>>>>>
>>>>>> Ben,
>>>>>>
>>>>>> I have traced this out to a point where I can rule out many issues.
>>>>>> I was hoping you could help me from here.
>>>>>> I went with the "tar -xPf boost.tar" as my test case, which on my
>>>>>> old storage system took about 1-minute to extract. On my backup
>>>>>> system and my primary storage (both gluster), it takes roughly
>>>>>> 19-minutes.
>>>>>>
>>>>>> First step was to create a new storage system (striped RAID, two
>>>>>> sets of 3-drives). All was good here with a gluster extraction time
>>>>>> of 1-minute. I then went to my backup system and created another
>>>>>> partition using only one of the two bricks on that system. Still
>>>>>> 1-minute. I went to a two brick setup and it stayed at 1-minute.
>>>>>>
>>>>>> At this point, I have recreated using the same parameters on a
>>>>>> test2brick volume that should be identical to my homegfs_bkp volume.
>>>>>> Everything is the same including how I mounted the volume. The only
>>>>>> different is that the homegfs_bkp has 30-TB of data and the
>>>>>> test2brick is blank. I didn't think that performance would be
>>>>>> affected by putting data on the volume.
>>>>>>
>>>>>> Can you help? Do you have any suggestions? Do you think upgrading
>>>>>> gluster from 3.5 to 3.6.1 to 3.6.2 somehow message up homegfs_bkp?
>>>>>> My layout is shown below. These should give identical speeds.
>>>>>>
>>>>>> [root at gfs01bkp test2brick]# gluster volume info homegfs_bkp
>>>>>> Volume Name: homegfs_bkp
>>>>>> Type: Distribute
>>>>>> Volume ID: 96de8872-d957-4205-bf5a-076e3f35b294
>>>>>> Status: Started
>>>>>> Number of Bricks: 2
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/homegfs_bkp
>>>>>> Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/homegfs_bkp
>>>>>> [root at gfs01bkp test2brick]# gluster volume info test2brick
>>>>>>
>>>>>> Volume Name: test2brick
>>>>>> Type: Distribute
>>>>>> Volume ID: 123259b2-3c61-4277-a7e8-27c7ec15e550
>>>>>> Status: Started
>>>>>> Number of Bricks: 2
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/test2brick
>>>>>> Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/test2brick
>>>>>>
>>>>>>
>>>>>> [root at gfs01bkp brick02bkp]# mount | grep test2brick
>>>>>> gfsib01bkp.corvidtec.com:/test2brick.tcp on /test2brick type
>>>>>> fuse.glusterfs (rw,allow_other,max_read=131072)
>>>>>> [root at gfs01bkp brick02bkp]# mount | grep homegfs_bkp
>>>>>> gfsib01bkp.corvidtec.com:/homegfs_bkp.tcp on /backup/homegfs type
>>>>>> fuse.glusterfs (rw,allow_other,max_read=131072)
>>>>>>
>>>>>> [root at gfs01bkp brick02bkp]# df -h
>>>>>> Filesystem Size Used Avail Use% Mounted on
>>>>>> /dev/mapper/vg00-lv_root 20G 1.7G 18G 9% /
>>>>>> tmpfs 16G 0 16G 0% /dev/shm
>>>>>> /dev/md126p1 1008M 110M 848M 12% /boot
>>>>>> /dev/mapper/vg00-lv_opt 5.0G 220M 4.5G 5% /opt
>>>>>> /dev/mapper/vg00-lv_tmp 5.0G 139M 4.6G 3% /tmp
>>>>>> /dev/mapper/vg00-lv_usr 20G 2.7G 17G 15% /usr
>>>>>> /dev/mapper/vg00-lv_var 40G 4.4G 34G 12% /var
>>>>>> /dev/mapper/vg01-lvol1 88T 22T 67T 25% /data/brick01bkp
>>>>>> /dev/mapper/vg02-lvol1 88T 22T 67T 25% /data/brick02bkp
>>>>>> gfsib01bkp.corvidtec.com:/homegfs_bkp.tcp 175T 43T 133T 25%
>>>>>> /backup/homegfs
>>>>>> gfsib01bkp.corvidtec.com:/test2brick.tcp 175T 43T 133T 25%
>>>>>> /test2brick
>>>>>>
>>>>>>
>>>>>> ------ Original Message ------
>>>>>> From: "Benjamin Turner" <bennyturns at gmail.com>
>>>>>> To: "David F. Robinson" <david.robinson at corvidtec.com>
>>>>>> Sent: 2/6/2015 12:52:58 PM
>>>>>> Subject: Re: Re[2]: [Gluster-devel] missing files
>>>>>>
>>>>>>> Hi David. Lets start with the basics and go from there. IIRC you
>>>>>>> are using LVM with thick provisioning, lets verify the following:
>>>>>>>
>>>>>>> 1. You have everything properly aligned for your RAID stripe size,
>>>>>>> etc. I have attached the script we package with RHS that I am in
>>>>>>> the process of updating. I want to double check you created the PV
>>>>>>> / VG / LV with the proper variables. Have a look at the create_pv,
>>>>>>> create_vg, and create_lv(old) functions. You will need to know the
>>>>>>> stripe size of your raid and the number of stripe elements(data
>>>>>>> disks, not hotspares). Also make sure you mkfs.xfs with:
>>>>>>>
>>>>>>> echo "mkfs -t xfs -f -K -i size=$inode_size -d
>>>>>>> sw=$stripe_elements,su=$stripesize -n size=$fs_block_size
>>>>>>> /dev/$vgname/$lvname"
>>>>>>>
>>>>>>> We use 512k inodes because some workload use more than the default
>>>>>>> inode size and you don't want xattrs bleeding over inodes.
>>>>>>>
>>>>>>> 2. Are you running RHEL or Centos? If so I would recommend
>>>>>>> tuned_profile=rhs-high-throughput. If you don't have that tuned
>>>>>>> profile I'll get you everything it sets.
>>>>>>>
>>>>>>> 3. For small files we we recommend the following:
>>>>>>>
>>>>>>> # RAID related variables.
>>>>>>> # stripesize - RAID controller stripe unit size
>>>>>>> # stripe_elements - the number of data disks
>>>>>>> # The --dataalignment option is used while creating the physical
>>>>>>> volumeTo
>>>>>>> # align I/O at LVM layer
>>>>>>> # dataalign -
>>>>>>> # RAID6 is recommended when the workload has predominantly larger
>>>>>>> # files ie not in kilobytes.
>>>>>>> # For RAID6 with 12 disks and 128K stripe element size.
>>>>>>> stripesize=128k
>>>>>>> stripe_elements=10
>>>>>>> dataalign=1280k
>>>>>>>
>>>>>>> # RAID10 is recommended when the workload has predominantly
>>>>>>> smaller files
>>>>>>> # i.e in kilobytes.
>>>>>>> # For RAID10 with 12 disks and 256K stripe element size, uncomment
>>>>>>> the
>>>>>>> # lines below.
>>>>>>> # stripesize=256k
>>>>>>> # stripe_elements=6
>>>>>>> # dataalign=1536k
>>>>>>>
>>>>>>> 4. Jumbo frames everywhere! Check out the effect of jumbo frames,
>>>>>>> make sure they are setup properly on your switch and add the
>>>>>>> MTU=9000 to your ifcfg files(unless you have it already):
>>>>>>>
>>>>>>>
>>>>>>> https://rhsummit.files.wordpress.com/2013/07/england_th_0450_rhs_perf_practices-4_neependra.pdf
>>>>>>>
>>>>>>> (see the jumbo frames section here, the whole thing is a good read)
>>>>>>>
>>>>>>> https://rhsummit.files.wordpress.com/2014/04/bengland_h_1100_rhs_performance.pdf
>>>>>>>
>>>>>>> (this is updated for 2014)
>>>>>>>
>>>>>>> 5. There is a smallfile enhancement that just landed in master
>>>>>>> that is showing me a 60% improvement in writes. This is called
>>>>>>> multi threaded epoll and it is looking VERY promising WRT smallfile
>>>>>>> performance. Here is a summary:
>>>>>>>
>>>>>>> Hi all. I see alot of discussion on $subject and I wanted to take
>>>>>>> a minute to talk about it and what we can do to test / observe the
>>>>>>> effects of it. Lets start with a bit of background:
>>>>>>>
>>>>>>> **Background**
>>>>>>>
>>>>>>> -Currently epoll is single threaded on both clients and servers.
>>>>>>> *This leads to a "hot thread" which consumes 100% of a CPU core.
>>>>>>> *This can be observed by running BenE's smallfile benchmark to
>>>>>>> create files, running top(on both clients and servers), and
>>>>>>> pressing H to show threads.
>>>>>>> *You will be able to see a single glusterfs thread eating 100%
>>>>>>> of the CPU:
>>>>>>>
>>>>>>> 2871 root 20 0 746m 24m 3004 S 100.0 0.1 14:35.89 glusterfsd
>>>>>>> 4522 root 20 0 747m 24m 3004 S 5.3 0.1 0:02.25 glusterfsd
>>>>>>> 4507 root 20 0 747m 24m 3004 S 5.0 0.1 0:05.91 glusterfsd
>>>>>>> 21200 root 20 0 747m 24m 3004 S 4.6 0.1 0:21.16 glusterfsd
>>>>>>>
>>>>>>> -Single threaded epoll is a bottlenck for high IOP / low metadata
>>>>>>> workloads(think smallfile). With single threaded epoll we are CPU
>>>>>>> bound by the single thread pegging out a CPU.
>>>>>>>
>>>>>>> So the proposed solution to this problem is to make epoll multi
>>>>>>> threaded on both servers and clients. Here is a link to the
>>>>>>> upstream proposal:
>>>>>>>
>>>>>>>
>>>>>>> http://www.gluster.org/community/documentation/index.php/Features/Feature_Smallfile_Perf#multi-thread-epoll
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Status: [ http://review.gluster.org/#/c/3842/ based on Anand
>>>>>>> Avati's patch ]
>>>>>>>
>>>>>>> Why: remove single-thread-per-brick barrier to higher CPU
>>>>>>> utilization by servers
>>>>>>>
>>>>>>> Use case: multi-client and multi-thread applications
>>>>>>>
>>>>>>> Improvement: measured 40% with 2 epoll threads and 100% with 4
>>>>>>> epoll threads for small file creates to an SSD
>>>>>>>
>>>>>>> Disadvantage: conflicts with support for SSL sockets, may require
>>>>>>> significant code change to support both.
>>>>>>>
>>>>>>> Note: this enhancement also helps high-IOPS applications such as
>>>>>>> databases and virtualization which are not metadata-intensive. This
>>>>>>> has been measured already using a Fusion I/O SSD performing random
>>>>>>> reads and writes -- it was necessary to define multiple bricks per
>>>>>>> SSD device to get Gluster to the same order of magnitude IOPS as a
>>>>>>> local filesystem. But this workaround is problematic for users,
>>>>>>> because storage space is not properly measured when there are
>>>>>>> multiple bricks on the same filesystem.
>>>>>>>
>>>>>>> Multi threaded epoll is part of a larger page that talks about
>>>>>>> smallfile performance enhancements, proposed and happening.
>>>>>>>
>>>>>>> Goal: if successful, throughput bottleneck should be either the
>>>>>>> network or the brick filesystem!
>>>>>>> What it doesn't do: multi-thread-epoll does not solve the
>>>>>>> excessive-round-trip protocol problems that Gluster has.
>>>>>>> What it should do: allow Gluster to exploit the mostly untapped
>>>>>>> CPU resources on the Gluster servers and clients.
>>>>>>> How it does it: allow multiple threads to read protocol messages
>>>>>>> and process them at the same time.
>>>>>>> How to observe: multi-thread-epoll should be configurable (how to
>>>>>>> configure? gluster command?), with thread count 1 it should be same
>>>>>>> as RHS 3.0, with thread count 2-4 it should show significantly more
>>>>>>> CPU utilization (threads visible with "top -H"), resulting in
>>>>>>> higher throughput.
>>>>>>>
>>>>>>> **How to observe**
>>>>>>>
>>>>>>> Here are the commands needed to setup an environment to test in on
>>>>>>> RHS 3.0.3:
>>>>>>> rpm -e glusterfs-api glusterfs glusterfs-libs glusterfs-fuse
>>>>>>> glusterfs-geo-replication glusterfs-rdma glusterfs-server
>>>>>>> glusterfs-cli gluster-nagios-common samba-glusterfs vdsm-gluster
>>>>>>> --nodeps
>>>>>>> rhn_register
>>>>>>> yum groupinstall "Development tools"
>>>>>>> git clone https://github.com/gluster/glusterfs.git
>>>>>>> git branch test
>>>>>>> git checkout test
>>>>>>> git fetch http://review.gluster.org/glusterfs
>>>>>>> refs/changes/42/3842/17 && git cherry-pick FETCH_HEAD
>>>>>>> git fetch http://review.gluster.org/glusterfs
>>>>>>> refs/changes/88/9488/2 && git cherry-pick FETCH_HEAD
>>>>>>> yum install openssl openssl-devel
>>>>>>> wget
>>>>>>> ftp://fr2.rpmfind.net/linux/epel/6/x86_64/cmockery2-1.3.8-2.el6.x86_64.rpm
>>>>>>>
>>>>>>>
>>>>>>> wget
>>>>>>> ftp://fr2.rpmfind.net/linux/epel/6/x86_64/cmockery2-devel-1.3.8-2.el6.x86_64.rpm
>>>>>>>
>>>>>>>
>>>>>>> yum install cmockery2-1.3.8-2.el6.x86_64.rpm
>>>>>>> cmockery2-devel-1.3.8-2.el6.x86_64.rpm libxml2-devel
>>>>>>> ./autogen.sh
>>>>>>> ./configure
>>>>>>> make
>>>>>>> make install
>>>>>>>
>>>>>>> Verify you are using the upstream with:
>>>>>>>
>>>>>>> # gluster -- version
>>>>>>>
>>>>>>> To enable set multithreaded epoll run the following commands:
>>>>>>>
>>>>>>> From the patch:
>>>>>>> { .key = "client.event-threads", 839
>>>>>>> .voltype = "protocol/client", 840
>>>>>>> .op_version = GD_OP_VERSION_3_7_0, 841
>>>>>>> },
>>>>>>> { .key = "server.event-threads", 946
>>>>>>> .voltype = "protocol/server", 947
>>>>>>> .op_version = GD_OP_VERSION_3_7_0, 948
>>>>>>> },
>>>>>>>
>>>>>>> # gluster v set <volname> server.event-threads 4
>>>>>>> # gluster v set <volname> client.event-threads 4
>>>>>>>
>>>>>>> Also grab smallfile:
>>>>>>>
>>>>>>> https://github.com/bengland2/smallfile
>>>>>>>
>>>>>>> After git cloneing smallfile run:
>>>>>>>
>>>>>>> python /small-files/smallfile/smallfile_cli.py --operation create
>>>>>>> --threads 8 --file-size 64 --files 10000 --top /gluster-mount
>>>>>>> --pause 1000 --host-set "client1 client2"
>>>>>>>
>>>>>>> Again we will be looking at top + show threads(press H). With 4
>>>>>>> threads on both clients and servers you should see something
>>>>>>> similar to(this isnt exact, I coped and pasted):
>>>>>>>
>>>>>>> 2871 root 20 0 746m 24m 3004 S 35.0 0.1 14:35.89 glusterfsd
>>>>>>> 2872 root 20 0 746m 24m 3004 S 51.0 0.1 14:35.89 glusterfsd
>>>>>>> 2873 root 20 0 746m 24m 3004 S 43.0 0.1 14:35.89 glusterfsd
>>>>>>> 2874 root 20 0 746m 24m 3004 S 65.0 0.1 14:35.89 glusterfsd
>>>>>>> 4522 root 20 0 747m 24m 3004 S 5.3 0.1 0:02.25 glusterfsd
>>>>>>> 4507 root 20 0 747m 24m 3004 S 5.0 0.1 0:05.91 glusterfsd
>>>>>>> 21200 root 20 0 747m 24m 3004 S 4.6 0.1 0:21.16 glusterfsd
>>>>>>>
>>>>>>> If you have a test env I would be interested to see how multi
>>>>>>> threaded epoll performs, but I am 100% sure its not ready for
>>>>>>> production yet. RH will be supporting it with our 3.0.4(the next
>>>>>>> one) release unless we find show stopping bugs. My testing looks
>>>>>>> very promising though.
>>>>>>>
>>>>>>> Smallfile performance enhancements are one of the key focuses for
>>>>>>> our 3.1 release this summer, we are working very hard to improve
>>>>>>> this as this is the use case for the majority of people.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Feb 6, 2015 at 11:59 AM, David F. Robinson
>>>>>>> <david.robinson at corvidtec.com> wrote:
>>>>>>> Ben,
>>>>>>>
>>>>>>> I was hoping you might be able to help with two performance
>>>>>>> questions. I was doing some testing of my rsync where I am backing
>>>>>>> up my primary gluster system (distributed + replicated) to my
>>>>>>> backup gluster system (distributed). I tried three tests where I
>>>>>>> rsynced from one of my primary sytems (gfsib02b) to my backup
>>>>>>> machine. The test directory contains roughly 5500 files, most of
>>>>>>> which are small. The script I ran is shown below which repeats the
>>>>>>> tests 3x for each section to check variability in timing.
>>>>>>>
>>>>>>> 1) Writing to the local disk is drastically faster than writing to
>>>>>>> gluster. So, my writes to the backup gluster system are what is
>>>>>>> slowing me down, which makes sense.
>>>>>>> 2) When I write to the backup gluster system (/backup/homegfs),
>>>>>>> the timing goes from 35-seconds to 1min40seconds. The question here
>>>>>>> is whether you could recommend any settings for this volume that
>>>>>>> would improve performance for small file writes? I have included
>>>>>>> the output of 'gluster volume info" below.
>>>>>>> 3) When I did the same tests on the Source_bkp volume, it is
>>>>>>> almost 3x as slow as the homegfs_bkp volume. However, these are
>>>>>>> just different volumes on the same storage system. The volume
>>>>>>> parameters are identical (see below). The performance of these two
>>>>>>> should be identical. Any idea why they wouldn't be? And any
>>>>>>> suggestions for how to fix this? The only thing that I see
>>>>>>> different between the two is the order of the "Options
>>>>>>> reconfigured" section. I assume order of options doesn't matter.
>>>>>>>
>>>>>>> Backup to local hard disk (no gluster writes)
>>>>>>> time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>>> gfsib02b:/homegfs/test /temp1
>>>>>>> time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>>> gfsib02b:/homegfs/test /temp2
>>>>>>> time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>>> gfsib02b:/homegfs/test /temp3
>>>>>>>
>>>>>>> real 0m35.579s
>>>>>>> user 0m31.290s
>>>>>>> sys 0m12.282s
>>>>>>>
>>>>>>> real 0m38.035s
>>>>>>> user 0m31.622s
>>>>>>> sys 0m10.907s
>>>>>>> real 0m38.313s
>>>>>>> user 0m31.458s
>>>>>>> sys 0m10.891s
>>>>>>> Backup to gluster backup system on volume homegfs_bkp
>>>>>>> time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>>> gfsib02b:/homegfs/test /backup/homegfs/temp1
>>>>>>> time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>>> gfsib02b:/homegfs/test /backup/homegfs/temp2
>>>>>>> time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>>> gfsib02b:/homegfs/test /backup/homegfs/temp3
>>>>>>>
>>>>>>> real 1m42.026s
>>>>>>> user 0m32.604s
>>>>>>> sys 0m9.967s
>>>>>>>
>>>>>>> real 1m45.480s
>>>>>>> user 0m32.577s
>>>>>>> sys 0m11.994s
>>>>>>>
>>>>>>> real 1m40.436s
>>>>>>> user 0m32.521s
>>>>>>> sys 0m11.240s
>>>>>>>
>>>>>>> Backup to gluster backup system on volume Source_bkp
>>>>>>> time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>>> gfsib02b:/homegfs/test /backup/Source/temp1
>>>>>>> time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>>> gfsib02b:/homegfs/test /backup/Source/temp2
>>>>>>> time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>>> gfsib02b:/homegfs/test /backup/Source/temp3
>>>>>>>
>>>>>>> real 3m30.491s
>>>>>>> user 0m32.676s
>>>>>>> sys 0m10.776s
>>>>>>>
>>>>>>> real 3m26.076s
>>>>>>> user 0m32.588s
>>>>>>> sys 0m11.048s
>>>>>>> real 3m7.460s
>>>>>>> user 0m32.763s
>>>>>>> sys 0m11.687s
>>>>>>>
>>>>>>>
>>>>>>> Volume Name: Source_bkp
>>>>>>> Type: Distribute
>>>>>>> Volume ID: 1d4c210d-a731-4d39-a0c5-ea0546592c1d
>>>>>>> Status: Started
>>>>>>> Number of Bricks: 2
>>>>>>> Transport-type: tcp
>>>>>>> Bricks:
>>>>>>> Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/Source_bkp
>>>>>>> Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/Source_bkp
>>>>>>> Options Reconfigured:
>>>>>>> performance.cache-size: 128MB
>>>>>>> performance.io-thread-count: 32
>>>>>>> server.allow-insecure: on
>>>>>>> network.ping-timeout: 10
>>>>>>> storage.owner-gid: 100
>>>>>>> performance.write-behind-window-size: 128MB
>>>>>>> server.manage-gids: on
>>>>>>> changelog.rollover-time: 15
>>>>>>> changelog.fsync-interval: 3
>>>>>>>
>>>>>>> Volume Name: homegfs_bkp
>>>>>>> Type: Distribute
>>>>>>> Volume ID: 96de8872-d957-4205-bf5a-076e3f35b294
>>>>>>> Status: Started
>>>>>>> Number of Bricks: 2
>>>>>>> Transport-type: tcp
>>>>>>> Bricks:
>>>>>>> Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/homegfs_bkp
>>>>>>> Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/homegfs_bkp
>>>>>>> Options Reconfigured:
>>>>>>> storage.owner-gid: 100
>>>>>>> performance.io-thread-count: 32
>>>>>>> server.allow-insecure: on
>>>>>>> network.ping-timeout: 10
>>>>>>> performance.cache-size: 128MB
>>>>>>> performance.write-behind-window-size: 128MB
>>>>>>> server.manage-gids: on
>>>>>>> changelog.rollover-time: 15
>>>>>>> changelog.fsync-interval: 3
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ------ Original Message ------
>>>>>>> From: "Benjamin Turner" <bennyturns at gmail.com>
>>>>>>> To: "David F. Robinson" <david.robinson at corvidtec.com>
>>>>>>> Cc: "Gluster Devel" <gluster-devel at gluster.org>;
>>>>>>> "gluster-users at gluster.org" <gluster-users at gluster.org>
>>>>>>> Sent: 2/3/2015 7:12:34 PM
>>>>>>> Subject: Re: [Gluster-devel] missing files
>>>>>>>
>>>>>>>> It sounds to me like the files were only copied to one replica,
>>>>>>>> werent there for the initial for the initial ls which triggered a
>>>>>>>> self heal, and were there for the last ls because they were
>>>>>>>> healed. Is there any chance that one of the replicas was down
>>>>>>>> during the rsync? It could be that you lost a brick during copy or
>>>>>>>> something like that. To confirm I would look for disconnects in
>>>>>>>> the brick logs as well as checking glusterfshd.log to verify the
>>>>>>>> missing files were actually healed.
>>>>>>>>
>>>>>>>> -b
>>>>>>>>
>>>>>>>> On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson
>>>>>>>> <david.robinson at corvidtec.com> wrote:
>>>>>>>> I rsync'd 20-TB over to my gluster system and noticed that I had
>>>>>>>> some directories missing even though the rsync completed normally.
>>>>>>>> The rsync logs showed that the missing files were transferred.
>>>>>>>>
>>>>>>>> I went to the bricks and did an 'ls -al
>>>>>>>> /data/brick*/homegfs/dir/*' the files were on the bricks. After I
>>>>>>>> did this 'ls', the files then showed up on the FUSE mounts.
>>>>>>>>
>>>>>>>> 1) Why are the files hidden on the fuse mount?
>>>>>>>> 2) Why does the ls make them show up on the FUSE mount?
>>>>>>>> 3) How can I prevent this from happening again?
>>>>>>>>
>>>>>>>> Note, I also mounted the gluster volume using NFS and saw the
>>>>>>>> same behavior. The files/directories were not shown until I did
>>>>>>>> the "ls" on the bricks.
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ===============================
>>>>>>>> David F. Robinson, Ph.D.
>>>>>>>> President - Corvid Technologies
>>>>>>>> 704.799.6944 x101 [office]
>>>>>>>> 704.252.1310 [cell]
>>>>>>>> 704.799.7974 [fax]
>>>>>>>> David.Robinson at corvidtec.com
>>>>>>>> http://www.corvidtechnologies.com/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-devel mailing list
>>>>>>>> Gluster-devel at gluster.org
>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>> <glusterfs.tgz>
>>>>
>>>> --
>>>> GlusterFS - http://www.gluster.org
>>>>
>>>> An open source, distributed file system scaling to several
>>>> petabytes, and handling thousands of clients.
>>>>
>>>> My personal twitter: twitter.com/realjustinclift
>>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
More information about the Gluster-devel
mailing list