[Gluster-devel] Fw: Re[2]: missing files
Pranith Kumar Karampuri
pkarampu at redhat.com
Wed Feb 11 13:19:54 UTC 2015
On 02/11/2015 08:36 AM, Shyam wrote:
> Did some analysis with David today on this here is a gist for the list,
>
> 1) Volumes classified as slow (i.e with a lot of pre-existing data)
> and fast (new volumes carved from the same backend file system that
> slow bricks are on, with little or no data)
>
> 2) We ran an strace of tar and also collected io-stats outputs from
> these volumes, both show that create and mkdir is slower on slow as
> compared to the fast volume. This seems to be the overall reason for
> slowness.
Did you happen to do strace of the brick when this happened? If not,
David, can we get that information as well?
Pranith
>
> 3) The tarball extraction is to a new directory on the gluster mount,
> so all lookups etc. happen within this new name space on the volume
>
> 4) Checked memory footprints of the slow bricks and fast bricks etc.
> nothing untoward noticed there
>
> 5) Restarted the slow volume, just as a test case to do things from
> scratch, no improvement in performance.
>
> Currently attempting to reproduce this on a local system to see if the
> same behavior is seen so that it becomes easier to debug etc.
>
> Others on the list can chime in as they see fit.
>
> Thanks,
> Shyam
>
> On 02/10/2015 09:58 AM, David F. Robinson wrote:
>> Forwarding to devel list as recommended by Justin...
>>
>> David
>>
>>
>> ------ Forwarded Message ------
>> From: "David F. Robinson" <david.robinson at corvidtec.com>
>> To: "Justin Clift" <justin at gluster.org>
>> Sent: 2/10/2015 9:49:09 AM
>> Subject: Re[2]: [Gluster-devel] missing files
>>
>> Bad news... I don't think it is the old linkto files. Bad because if
>> that was the issue, cleaning up all of bad linkto files would have fixed
>> the issue. It seems like the system just gets slower as you add data.
>>
>> First, I setup a new clean volume (test2brick) on the same system as the
>> old one (homegfs_bkp). See 'gluster v info' below. I ran my simple tar
>> extraction test on the new volume and it took 58-seconds to complete
>> (which, BTW, is 10-seconds faster than my old non-gluster system, so
>> kudos). The time on homegfs_bkp is 19-minutes.
>>
>> Next, I copied 10-terabytes of data over to test2brick and re-ran the
>> test which then took 7-minutes. I created a test3brick and ran the test
>> and it took 53-seconds.
>>
>> To confirm all of this, I deleted all of the data from test2brick and
>> re-ran the test. It took 51-seconds!!!
>>
>> BTW. I also checked the .glusterfs for stale linkto files (find . -type
>> f -size 0 -perm 1000 -exec ls -al {} \;). There are many, many thousands
>> of these types of files on the old volume and none on the new one, so I
>> don't think this is related to the performance issue.
>>
>> Let me know how I should proceed. Send this to devel list? Pranith?
>> others? Thanks...
>>
>> [root at gfs01bkp .glusterfs]# gluster volume info homegfs_bkp
>> Volume Name: homegfs_bkp
>> Type: Distribute
>> Volume ID: 96de8872-d957-4205-bf5a-076e3f35b294
>> Status: Started
>> Number of Bricks: 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/homegfs_bkp
>> Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/homegfs_bkp
>>
>> [root at gfs01bkp .glusterfs]# gluster volume info test2brick
>> Volume Name: test2brick
>> Type: Distribute
>> Volume ID: 123259b2-3c61-4277-a7e8-27c7ec15e550
>> Status: Started
>> Number of Bricks: 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/test2brick
>> Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/test2brick
>>
>> [root at gfs01bkp glusterfs]# gluster volume info test3brick
>> Volume Name: test3brick
>> Type: Distribute
>> Volume ID: 9b1613fc-f7e5-4325-8f94-e3611a5c3701
>> Status: Started
>> Number of Bricks: 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/test3brick
>> Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/test3brick
>>
>>
>> From homegfs_bkp:
>> # find . -type f -size 0 -perm 1000 -exec ls -al {} \;
>> --------T 2 gmathur pme_ics 0 Jan 9 16:59
>> ./00/16/00169a69-1a7a-44c9-b2d8-991671ee87c4
>> ---------T 3 jcowan users 0 Jan 9 17:51
>> ./00/16/0016a0a0-fd22-4fb5-b6fb-5d7f9024ab74
>> ---------T 2 morourke sbir 0 Jan 9 18:17
>> ./00/16/0016b36f-32fc-4f2c-accd-e36be2f6c602
>> ---------T 2 carpentr irl 0 Jan 9 18:52
>> ./00/16/00163faf-741c-4e40-8081-784786b3cc71
>> ---------T 3 601 raven 0 Jan 9 22:49
>> ./00/16/00163385-a332-4050-8104-1b1af6cd8249
>> ---------T 3 bangell sbir 0 Jan 9 22:56
>> ./00/16/00167803-0244-46de-8246-d9c382dd3083
>> ---------T 2 morourke sbir 0 Jan 9 23:17
>> ./00/16/00167bc5-fc56-42ee-9e3f-1e238f3828f4
>> ---------T 3 morourke sbir 0 Jan 9 23:34
>> ./00/16/0016a71e-89cf-4a86-9575-49c7e9d216c6
>> ---------T 2 gmathur users 0 Jan 9 23:47
>> ./00/16/00168aa2-d069-4a77-8790-e36431324ca5
>> ---------T 2 bangell users 0 Jan 22 09:24
>> ./00/16/0016e720-a190-4e43-962f-aa3e4216e5f5
>> ---------T 2 root root 0 Jan 22 09:26
>> ./00/16/00169e95-64b7-455c-82dc-d9940ee7fe43
>> ---------T 2 dfrobins users 0 Jan 22 09:27
>> ./00/16/00161b04-1612-4fba-99a4-2a2b54062fdb
>> ---------T 2 mdick users 0 Jan 22 09:27
>> ./00/16/0016ba60-310a-4bee-968a-36eb290e8c9e
>> ---------T 2 dfrobins users 0 Jan 22 09:43
>> ./00/16/00160315-1533-4290-8c1a-72e2fbb1962a
>> From test2brick:
>> find . -type f -size 0 -perm 1000 -exec ls -al {} \;
>>
>>
>>
>>
>>
>> ------ Original Message ------
>> From: "Justin Clift" <justin at gluster.org>
>> To: "David F. Robinson" <david.robinson at corvidtec.com>
>> Sent: 2/9/2015 11:33:54 PM
>> Subject: Re: [Gluster-devel] missing files
>>
>>> Interesting. (I'm 1/2 asleep atm and really need sleep soon, so take
>>> this
>>> with a grain of salt... ;>)
>>>
>>> As a curiosity question, does the homegfs_bkp volume have a bunch of
>>> outdated metadata still in it? eg left over extended attributes or
>>> something
>>>
>>> Remembering a question you asked earlier er... today/yesterday about
>>> old
>>> extended attribute entries and if they hang around forever. I don't
>>> know the
>>> answer to that, but if the old volume still has a 1000's (or more) of
>>> entries
>>> around, perhaps there's some lookup problem that's killing lookup
>>> times for
>>> file operations.
>>>
>>> On a side note, I can probably setup my test lab stuff here again
>>> tomorrow
>>> and try this stuff out myself to see if I can replicate the problem.
>>> (if that
>>> could potentially be useful?)
>>>
>>> + Justin
>>>
>>>
>>>
>>> On 9 Feb 2015, at 22:56, David F. Robinson
>>> <david.robinson at corvidtec.com> wrote:
>>>> Justin,
>>>>
>>>> Hoping you can help point this to the right people once again. Maybe
>>>> all of these issues are related.
>>>>
>>>> You can look at the email traffic below, but the summary is that I
>>>> was working with Ben to figure out why my GFS system was 20x slower
>>>> than my old storage system. During my tracing of this issue, I
>>>> determined that if I create a new volume on my storage system, this
>>>> slowness goes away. So, either it is faster because it doesn't have
>>>> any data on this new volume (I hope this isn't the case) or the older
>>>> partitions somehow became corrupted during the upgrades or has some
>>>> depricated parameters set that slow it down.
>>>>
>>>> Very strange and hoping you can once again help... Thanks in
>>>> advance...
>>>>
>>>> David
>>>>
>>>>
>>>> ------ Forwarded Message ------
>>>> From: "David F. Robinson" <david.robinson at corvidtec.com>
>>>> To: "Benjamin Turner" <bennyturns at gmail.com>
>>>> Sent: 2/9/2015 5:52:00 PM
>>>> Subject: Re[5]: [Gluster-devel] missing files
>>>>
>>>> Ben,
>>>>
>>>> I cleared the logs and rebooted the machine. Same issue. homegfs_bkp
>>>> takes 19-minutes and test2brick (the new volume) takes 1-minute.
>>>>
>>>> Is it possible that some old parameters are still set for
>>>> homegfs_bkp that are no longer in use? I tried a gluster volume reset
>>>> for homegfs_bkp, but it didn't have any effect.
>>>>
>>>> I have attached the full logs.
>>>>
>>>> David
>>>>
>>>>
>>>> ------ Original Message ------
>>>> From: "David F. Robinson" <david.robinson at corvidtec.com>
>>>> To: "Benjamin Turner" <bennyturns at gmail.com>
>>>> Sent: 2/9/2015 5:39:18 PM
>>>> Subject: Re[4]: [Gluster-devel] missing files
>>>>
>>>>> Ben,
>>>>>
>>>>> I have traced this out to a point where I can rule out many issues.
>>>>> I was hoping you could help me from here.
>>>>> I went with the "tar -xPf boost.tar" as my test case, which on my
>>>>> old storage system took about 1-minute to extract. On my backup
>>>>> system and my primary storage (both gluster), it takes roughly
>>>>> 19-minutes.
>>>>>
>>>>> First step was to create a new storage system (striped RAID, two
>>>>> sets of 3-drives). All was good here with a gluster extraction time
>>>>> of 1-minute. I then went to my backup system and created another
>>>>> partition using only one of the two bricks on that system. Still
>>>>> 1-minute. I went to a two brick setup and it stayed at 1-minute.
>>>>>
>>>>> At this point, I have recreated using the same parameters on a
>>>>> test2brick volume that should be identical to my homegfs_bkp volume.
>>>>> Everything is the same including how I mounted the volume. The only
>>>>> different is that the homegfs_bkp has 30-TB of data and the
>>>>> test2brick is blank. I didn't think that performance would be
>>>>> affected by putting data on the volume.
>>>>>
>>>>> Can you help? Do you have any suggestions? Do you think upgrading
>>>>> gluster from 3.5 to 3.6.1 to 3.6.2 somehow message up homegfs_bkp?
>>>>> My layout is shown below. These should give identical speeds.
>>>>>
>>>>> [root at gfs01bkp test2brick]# gluster volume info homegfs_bkp
>>>>> Volume Name: homegfs_bkp
>>>>> Type: Distribute
>>>>> Volume ID: 96de8872-d957-4205-bf5a-076e3f35b294
>>>>> Status: Started
>>>>> Number of Bricks: 2
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/homegfs_bkp
>>>>> Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/homegfs_bkp
>>>>> [root at gfs01bkp test2brick]# gluster volume info test2brick
>>>>>
>>>>> Volume Name: test2brick
>>>>> Type: Distribute
>>>>> Volume ID: 123259b2-3c61-4277-a7e8-27c7ec15e550
>>>>> Status: Started
>>>>> Number of Bricks: 2
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/test2brick
>>>>> Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/test2brick
>>>>>
>>>>>
>>>>> [root at gfs01bkp brick02bkp]# mount | grep test2brick
>>>>> gfsib01bkp.corvidtec.com:/test2brick.tcp on /test2brick type
>>>>> fuse.glusterfs (rw,allow_other,max_read=131072)
>>>>> [root at gfs01bkp brick02bkp]# mount | grep homegfs_bkp
>>>>> gfsib01bkp.corvidtec.com:/homegfs_bkp.tcp on /backup/homegfs type
>>>>> fuse.glusterfs (rw,allow_other,max_read=131072)
>>>>>
>>>>> [root at gfs01bkp brick02bkp]# df -h
>>>>> Filesystem Size Used Avail Use% Mounted on
>>>>> /dev/mapper/vg00-lv_root 20G 1.7G 18G 9% /
>>>>> tmpfs 16G 0 16G 0% /dev/shm
>>>>> /dev/md126p1 1008M 110M 848M 12% /boot
>>>>> /dev/mapper/vg00-lv_opt 5.0G 220M 4.5G 5% /opt
>>>>> /dev/mapper/vg00-lv_tmp 5.0G 139M 4.6G 3% /tmp
>>>>> /dev/mapper/vg00-lv_usr 20G 2.7G 17G 15% /usr
>>>>> /dev/mapper/vg00-lv_var 40G 4.4G 34G 12% /var
>>>>> /dev/mapper/vg01-lvol1 88T 22T 67T 25% /data/brick01bkp
>>>>> /dev/mapper/vg02-lvol1 88T 22T 67T 25% /data/brick02bkp
>>>>> gfsib01bkp.corvidtec.com:/homegfs_bkp.tcp 175T 43T 133T 25%
>>>>> /backup/homegfs
>>>>> gfsib01bkp.corvidtec.com:/test2brick.tcp 175T 43T 133T 25%
>>>>> /test2brick
>>>>>
>>>>>
>>>>> ------ Original Message ------
>>>>> From: "Benjamin Turner" <bennyturns at gmail.com>
>>>>> To: "David F. Robinson" <david.robinson at corvidtec.com>
>>>>> Sent: 2/6/2015 12:52:58 PM
>>>>> Subject: Re: Re[2]: [Gluster-devel] missing files
>>>>>
>>>>>> Hi David. Lets start with the basics and go from there. IIRC you
>>>>>> are using LVM with thick provisioning, lets verify the following:
>>>>>>
>>>>>> 1. You have everything properly aligned for your RAID stripe size,
>>>>>> etc. I have attached the script we package with RHS that I am in
>>>>>> the process of updating. I want to double check you created the PV
>>>>>> / VG / LV with the proper variables. Have a look at the create_pv,
>>>>>> create_vg, and create_lv(old) functions. You will need to know the
>>>>>> stripe size of your raid and the number of stripe elements(data
>>>>>> disks, not hotspares). Also make sure you mkfs.xfs with:
>>>>>>
>>>>>> echo "mkfs -t xfs -f -K -i size=$inode_size -d
>>>>>> sw=$stripe_elements,su=$stripesize -n size=$fs_block_size
>>>>>> /dev/$vgname/$lvname"
>>>>>>
>>>>>> We use 512k inodes because some workload use more than the default
>>>>>> inode size and you don't want xattrs bleeding over inodes.
>>>>>>
>>>>>> 2. Are you running RHEL or Centos? If so I would recommend
>>>>>> tuned_profile=rhs-high-throughput. If you don't have that tuned
>>>>>> profile I'll get you everything it sets.
>>>>>>
>>>>>> 3. For small files we we recommend the following:
>>>>>>
>>>>>> # RAID related variables.
>>>>>> # stripesize - RAID controller stripe unit size
>>>>>> # stripe_elements - the number of data disks
>>>>>> # The --dataalignment option is used while creating the physical
>>>>>> volumeTo
>>>>>> # align I/O at LVM layer
>>>>>> # dataalign -
>>>>>> # RAID6 is recommended when the workload has predominantly larger
>>>>>> # files ie not in kilobytes.
>>>>>> # For RAID6 with 12 disks and 128K stripe element size.
>>>>>> stripesize=128k
>>>>>> stripe_elements=10
>>>>>> dataalign=1280k
>>>>>>
>>>>>> # RAID10 is recommended when the workload has predominantly
>>>>>> smaller files
>>>>>> # i.e in kilobytes.
>>>>>> # For RAID10 with 12 disks and 256K stripe element size, uncomment
>>>>>> the
>>>>>> # lines below.
>>>>>> # stripesize=256k
>>>>>> # stripe_elements=6
>>>>>> # dataalign=1536k
>>>>>>
>>>>>> 4. Jumbo frames everywhere! Check out the effect of jumbo frames,
>>>>>> make sure they are setup properly on your switch and add the
>>>>>> MTU=9000 to your ifcfg files(unless you have it already):
>>>>>>
>>>>>>
>>>>>> https://rhsummit.files.wordpress.com/2013/07/england_th_0450_rhs_perf_practices-4_neependra.pdf
>>>>>>
>>>>>> (see the jumbo frames section here, the whole thing is a good read)
>>>>>>
>>>>>> https://rhsummit.files.wordpress.com/2014/04/bengland_h_1100_rhs_performance.pdf
>>>>>>
>>>>>> (this is updated for 2014)
>>>>>>
>>>>>> 5. There is a smallfile enhancement that just landed in master
>>>>>> that is showing me a 60% improvement in writes. This is called
>>>>>> multi threaded epoll and it is looking VERY promising WRT smallfile
>>>>>> performance. Here is a summary:
>>>>>>
>>>>>> Hi all. I see alot of discussion on $subject and I wanted to take
>>>>>> a minute to talk about it and what we can do to test / observe the
>>>>>> effects of it. Lets start with a bit of background:
>>>>>>
>>>>>> **Background**
>>>>>>
>>>>>> -Currently epoll is single threaded on both clients and servers.
>>>>>> *This leads to a "hot thread" which consumes 100% of a CPU core.
>>>>>> *This can be observed by running BenE's smallfile benchmark to
>>>>>> create files, running top(on both clients and servers), and
>>>>>> pressing H to show threads.
>>>>>> *You will be able to see a single glusterfs thread eating 100%
>>>>>> of the CPU:
>>>>>>
>>>>>> 2871 root 20 0 746m 24m 3004 S 100.0 0.1 14:35.89 glusterfsd
>>>>>> 4522 root 20 0 747m 24m 3004 S 5.3 0.1 0:02.25 glusterfsd
>>>>>> 4507 root 20 0 747m 24m 3004 S 5.0 0.1 0:05.91 glusterfsd
>>>>>> 21200 root 20 0 747m 24m 3004 S 4.6 0.1 0:21.16 glusterfsd
>>>>>>
>>>>>> -Single threaded epoll is a bottlenck for high IOP / low metadata
>>>>>> workloads(think smallfile). With single threaded epoll we are CPU
>>>>>> bound by the single thread pegging out a CPU.
>>>>>>
>>>>>> So the proposed solution to this problem is to make epoll multi
>>>>>> threaded on both servers and clients. Here is a link to the
>>>>>> upstream proposal:
>>>>>>
>>>>>>
>>>>>> http://www.gluster.org/community/documentation/index.php/Features/Feature_Smallfile_Perf#multi-thread-epoll
>>>>>>
>>>>>>
>>>>>>
>>>>>> Status: [ http://review.gluster.org/#/c/3842/ based on Anand
>>>>>> Avati's patch ]
>>>>>>
>>>>>> Why: remove single-thread-per-brick barrier to higher CPU
>>>>>> utilization by servers
>>>>>>
>>>>>> Use case: multi-client and multi-thread applications
>>>>>>
>>>>>> Improvement: measured 40% with 2 epoll threads and 100% with 4
>>>>>> epoll threads for small file creates to an SSD
>>>>>>
>>>>>> Disadvantage: conflicts with support for SSL sockets, may require
>>>>>> significant code change to support both.
>>>>>>
>>>>>> Note: this enhancement also helps high-IOPS applications such as
>>>>>> databases and virtualization which are not metadata-intensive. This
>>>>>> has been measured already using a Fusion I/O SSD performing random
>>>>>> reads and writes -- it was necessary to define multiple bricks per
>>>>>> SSD device to get Gluster to the same order of magnitude IOPS as a
>>>>>> local filesystem. But this workaround is problematic for users,
>>>>>> because storage space is not properly measured when there are
>>>>>> multiple bricks on the same filesystem.
>>>>>>
>>>>>> Multi threaded epoll is part of a larger page that talks about
>>>>>> smallfile performance enhancements, proposed and happening.
>>>>>>
>>>>>> Goal: if successful, throughput bottleneck should be either the
>>>>>> network or the brick filesystem!
>>>>>> What it doesn't do: multi-thread-epoll does not solve the
>>>>>> excessive-round-trip protocol problems that Gluster has.
>>>>>> What it should do: allow Gluster to exploit the mostly untapped
>>>>>> CPU resources on the Gluster servers and clients.
>>>>>> How it does it: allow multiple threads to read protocol messages
>>>>>> and process them at the same time.
>>>>>> How to observe: multi-thread-epoll should be configurable (how to
>>>>>> configure? gluster command?), with thread count 1 it should be same
>>>>>> as RHS 3.0, with thread count 2-4 it should show significantly more
>>>>>> CPU utilization (threads visible with "top -H"), resulting in
>>>>>> higher throughput.
>>>>>>
>>>>>> **How to observe**
>>>>>>
>>>>>> Here are the commands needed to setup an environment to test in on
>>>>>> RHS 3.0.3:
>>>>>> rpm -e glusterfs-api glusterfs glusterfs-libs glusterfs-fuse
>>>>>> glusterfs-geo-replication glusterfs-rdma glusterfs-server
>>>>>> glusterfs-cli gluster-nagios-common samba-glusterfs vdsm-gluster
>>>>>> --nodeps
>>>>>> rhn_register
>>>>>> yum groupinstall "Development tools"
>>>>>> git clone https://github.com/gluster/glusterfs.git
>>>>>> git branch test
>>>>>> git checkout test
>>>>>> git fetch http://review.gluster.org/glusterfs
>>>>>> refs/changes/42/3842/17 && git cherry-pick FETCH_HEAD
>>>>>> git fetch http://review.gluster.org/glusterfs
>>>>>> refs/changes/88/9488/2 && git cherry-pick FETCH_HEAD
>>>>>> yum install openssl openssl-devel
>>>>>> wget
>>>>>> ftp://fr2.rpmfind.net/linux/epel/6/x86_64/cmockery2-1.3.8-2.el6.x86_64.rpm
>>>>>>
>>>>>>
>>>>>> wget
>>>>>> ftp://fr2.rpmfind.net/linux/epel/6/x86_64/cmockery2-devel-1.3.8-2.el6.x86_64.rpm
>>>>>>
>>>>>>
>>>>>> yum install cmockery2-1.3.8-2.el6.x86_64.rpm
>>>>>> cmockery2-devel-1.3.8-2.el6.x86_64.rpm libxml2-devel
>>>>>> ./autogen.sh
>>>>>> ./configure
>>>>>> make
>>>>>> make install
>>>>>>
>>>>>> Verify you are using the upstream with:
>>>>>>
>>>>>> # gluster -- version
>>>>>>
>>>>>> To enable set multithreaded epoll run the following commands:
>>>>>>
>>>>>> From the patch:
>>>>>> { .key = "client.event-threads", 839
>>>>>> .voltype = "protocol/client", 840
>>>>>> .op_version = GD_OP_VERSION_3_7_0, 841
>>>>>> },
>>>>>> { .key = "server.event-threads", 946
>>>>>> .voltype = "protocol/server", 947
>>>>>> .op_version = GD_OP_VERSION_3_7_0, 948
>>>>>> },
>>>>>>
>>>>>> # gluster v set <volname> server.event-threads 4
>>>>>> # gluster v set <volname> client.event-threads 4
>>>>>>
>>>>>> Also grab smallfile:
>>>>>>
>>>>>> https://github.com/bengland2/smallfile
>>>>>>
>>>>>> After git cloneing smallfile run:
>>>>>>
>>>>>> python /small-files/smallfile/smallfile_cli.py --operation create
>>>>>> --threads 8 --file-size 64 --files 10000 --top /gluster-mount
>>>>>> --pause 1000 --host-set "client1 client2"
>>>>>>
>>>>>> Again we will be looking at top + show threads(press H). With 4
>>>>>> threads on both clients and servers you should see something
>>>>>> similar to(this isnt exact, I coped and pasted):
>>>>>>
>>>>>> 2871 root 20 0 746m 24m 3004 S 35.0 0.1 14:35.89 glusterfsd
>>>>>> 2872 root 20 0 746m 24m 3004 S 51.0 0.1 14:35.89 glusterfsd
>>>>>> 2873 root 20 0 746m 24m 3004 S 43.0 0.1 14:35.89 glusterfsd
>>>>>> 2874 root 20 0 746m 24m 3004 S 65.0 0.1 14:35.89 glusterfsd
>>>>>> 4522 root 20 0 747m 24m 3004 S 5.3 0.1 0:02.25 glusterfsd
>>>>>> 4507 root 20 0 747m 24m 3004 S 5.0 0.1 0:05.91 glusterfsd
>>>>>> 21200 root 20 0 747m 24m 3004 S 4.6 0.1 0:21.16 glusterfsd
>>>>>>
>>>>>> If you have a test env I would be interested to see how multi
>>>>>> threaded epoll performs, but I am 100% sure its not ready for
>>>>>> production yet. RH will be supporting it with our 3.0.4(the next
>>>>>> one) release unless we find show stopping bugs. My testing looks
>>>>>> very promising though.
>>>>>>
>>>>>> Smallfile performance enhancements are one of the key focuses for
>>>>>> our 3.1 release this summer, we are working very hard to improve
>>>>>> this as this is the use case for the majority of people.
>>>>>>
>>>>>>
>>>>>> On Fri, Feb 6, 2015 at 11:59 AM, David F. Robinson
>>>>>> <david.robinson at corvidtec.com> wrote:
>>>>>> Ben,
>>>>>>
>>>>>> I was hoping you might be able to help with two performance
>>>>>> questions. I was doing some testing of my rsync where I am backing
>>>>>> up my primary gluster system (distributed + replicated) to my
>>>>>> backup gluster system (distributed). I tried three tests where I
>>>>>> rsynced from one of my primary sytems (gfsib02b) to my backup
>>>>>> machine. The test directory contains roughly 5500 files, most of
>>>>>> which are small. The script I ran is shown below which repeats the
>>>>>> tests 3x for each section to check variability in timing.
>>>>>>
>>>>>> 1) Writing to the local disk is drastically faster than writing to
>>>>>> gluster. So, my writes to the backup gluster system are what is
>>>>>> slowing me down, which makes sense.
>>>>>> 2) When I write to the backup gluster system (/backup/homegfs),
>>>>>> the timing goes from 35-seconds to 1min40seconds. The question here
>>>>>> is whether you could recommend any settings for this volume that
>>>>>> would improve performance for small file writes? I have included
>>>>>> the output of 'gluster volume info" below.
>>>>>> 3) When I did the same tests on the Source_bkp volume, it is
>>>>>> almost 3x as slow as the homegfs_bkp volume. However, these are
>>>>>> just different volumes on the same storage system. The volume
>>>>>> parameters are identical (see below). The performance of these two
>>>>>> should be identical. Any idea why they wouldn't be? And any
>>>>>> suggestions for how to fix this? The only thing that I see
>>>>>> different between the two is the order of the "Options
>>>>>> reconfigured" section. I assume order of options doesn't matter.
>>>>>>
>>>>>> Backup to local hard disk (no gluster writes)
>>>>>> time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>> gfsib02b:/homegfs/test /temp1
>>>>>> time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>> gfsib02b:/homegfs/test /temp2
>>>>>> time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>> gfsib02b:/homegfs/test /temp3
>>>>>>
>>>>>> real 0m35.579s
>>>>>> user 0m31.290s
>>>>>> sys 0m12.282s
>>>>>>
>>>>>> real 0m38.035s
>>>>>> user 0m31.622s
>>>>>> sys 0m10.907s
>>>>>> real 0m38.313s
>>>>>> user 0m31.458s
>>>>>> sys 0m10.891s
>>>>>> Backup to gluster backup system on volume homegfs_bkp
>>>>>> time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>> gfsib02b:/homegfs/test /backup/homegfs/temp1
>>>>>> time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>> gfsib02b:/homegfs/test /backup/homegfs/temp2
>>>>>> time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>> gfsib02b:/homegfs/test /backup/homegfs/temp3
>>>>>>
>>>>>> real 1m42.026s
>>>>>> user 0m32.604s
>>>>>> sys 0m9.967s
>>>>>>
>>>>>> real 1m45.480s
>>>>>> user 0m32.577s
>>>>>> sys 0m11.994s
>>>>>>
>>>>>> real 1m40.436s
>>>>>> user 0m32.521s
>>>>>> sys 0m11.240s
>>>>>>
>>>>>> Backup to gluster backup system on volume Source_bkp
>>>>>> time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>> gfsib02b:/homegfs/test /backup/Source/temp1
>>>>>> time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>> gfsib02b:/homegfs/test /backup/Source/temp2
>>>>>> time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>> gfsib02b:/homegfs/test /backup/Source/temp3
>>>>>>
>>>>>> real 3m30.491s
>>>>>> user 0m32.676s
>>>>>> sys 0m10.776s
>>>>>>
>>>>>> real 3m26.076s
>>>>>> user 0m32.588s
>>>>>> sys 0m11.048s
>>>>>> real 3m7.460s
>>>>>> user 0m32.763s
>>>>>> sys 0m11.687s
>>>>>>
>>>>>>
>>>>>> Volume Name: Source_bkp
>>>>>> Type: Distribute
>>>>>> Volume ID: 1d4c210d-a731-4d39-a0c5-ea0546592c1d
>>>>>> Status: Started
>>>>>> Number of Bricks: 2
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/Source_bkp
>>>>>> Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/Source_bkp
>>>>>> Options Reconfigured:
>>>>>> performance.cache-size: 128MB
>>>>>> performance.io-thread-count: 32
>>>>>> server.allow-insecure: on
>>>>>> network.ping-timeout: 10
>>>>>> storage.owner-gid: 100
>>>>>> performance.write-behind-window-size: 128MB
>>>>>> server.manage-gids: on
>>>>>> changelog.rollover-time: 15
>>>>>> changelog.fsync-interval: 3
>>>>>>
>>>>>> Volume Name: homegfs_bkp
>>>>>> Type: Distribute
>>>>>> Volume ID: 96de8872-d957-4205-bf5a-076e3f35b294
>>>>>> Status: Started
>>>>>> Number of Bricks: 2
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/homegfs_bkp
>>>>>> Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/homegfs_bkp
>>>>>> Options Reconfigured:
>>>>>> storage.owner-gid: 100
>>>>>> performance.io-thread-count: 32
>>>>>> server.allow-insecure: on
>>>>>> network.ping-timeout: 10
>>>>>> performance.cache-size: 128MB
>>>>>> performance.write-behind-window-size: 128MB
>>>>>> server.manage-gids: on
>>>>>> changelog.rollover-time: 15
>>>>>> changelog.fsync-interval: 3
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ------ Original Message ------
>>>>>> From: "Benjamin Turner" <bennyturns at gmail.com>
>>>>>> To: "David F. Robinson" <david.robinson at corvidtec.com>
>>>>>> Cc: "Gluster Devel" <gluster-devel at gluster.org>;
>>>>>> "gluster-users at gluster.org" <gluster-users at gluster.org>
>>>>>> Sent: 2/3/2015 7:12:34 PM
>>>>>> Subject: Re: [Gluster-devel] missing files
>>>>>>
>>>>>>> It sounds to me like the files were only copied to one replica,
>>>>>>> werent there for the initial for the initial ls which triggered a
>>>>>>> self heal, and were there for the last ls because they were
>>>>>>> healed. Is there any chance that one of the replicas was down
>>>>>>> during the rsync? It could be that you lost a brick during copy or
>>>>>>> something like that. To confirm I would look for disconnects in
>>>>>>> the brick logs as well as checking glusterfshd.log to verify the
>>>>>>> missing files were actually healed.
>>>>>>>
>>>>>>> -b
>>>>>>>
>>>>>>> On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson
>>>>>>> <david.robinson at corvidtec.com> wrote:
>>>>>>> I rsync'd 20-TB over to my gluster system and noticed that I had
>>>>>>> some directories missing even though the rsync completed normally.
>>>>>>> The rsync logs showed that the missing files were transferred.
>>>>>>>
>>>>>>> I went to the bricks and did an 'ls -al
>>>>>>> /data/brick*/homegfs/dir/*' the files were on the bricks. After I
>>>>>>> did this 'ls', the files then showed up on the FUSE mounts.
>>>>>>>
>>>>>>> 1) Why are the files hidden on the fuse mount?
>>>>>>> 2) Why does the ls make them show up on the FUSE mount?
>>>>>>> 3) How can I prevent this from happening again?
>>>>>>>
>>>>>>> Note, I also mounted the gluster volume using NFS and saw the
>>>>>>> same behavior. The files/directories were not shown until I did
>>>>>>> the "ls" on the bricks.
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ===============================
>>>>>>> David F. Robinson, Ph.D.
>>>>>>> President - Corvid Technologies
>>>>>>> 704.799.6944 x101 [office]
>>>>>>> 704.252.1310 [cell]
>>>>>>> 704.799.7974 [fax]
>>>>>>> David.Robinson at corvidtec.com
>>>>>>> http://www.corvidtechnologies.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-devel mailing list
>>>>>>> Gluster-devel at gluster.org
>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>>>>
>>>>>>>
>>>>>>
>>>> <glusterfs.tgz>
>>>
>>> --
>>> GlusterFS - http://www.gluster.org
>>>
>>> An open source, distributed file system scaling to several
>>> petabytes, and handling thousands of clients.
>>>
>>> My personal twitter: twitter.com/realjustinclift
>>>
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
More information about the Gluster-devel
mailing list