[Gluster-devel] Fw: Re[2]: missing files

Pranith Kumar Karampuri pkarampu at redhat.com
Wed Feb 11 13:19:54 UTC 2015


On 02/11/2015 08:36 AM, Shyam wrote:
> Did some analysis with David today on this here is a gist for the list,
>
> 1) Volumes classified as slow (i.e with a lot of pre-existing data) 
> and fast (new volumes carved from the same backend file system that 
> slow bricks are on, with little or no data)
>
> 2) We ran an strace of tar and also collected io-stats outputs from 
> these volumes, both show that create and mkdir is slower on slow as 
> compared to the fast volume. This seems to be the overall reason for 
> slowness.
Did you happen to do strace of the brick when this happened? If not, 
David, can we get that information as well?

Pranith
>
> 3) The tarball extraction is to a new directory on the gluster mount, 
> so all lookups etc. happen within this new name space on the volume
>
> 4) Checked memory footprints of the slow bricks and fast bricks etc. 
> nothing untoward noticed there
>
> 5) Restarted the slow volume, just as a test case to do things from 
> scratch, no improvement in performance.
>
> Currently attempting to reproduce this on a local system to see if the 
> same behavior is seen so that it becomes easier to debug etc.
>
> Others on the list can chime in as they see fit.
>
> Thanks,
> Shyam
>
> On 02/10/2015 09:58 AM, David F. Robinson wrote:
>> Forwarding to devel list as recommended by Justin...
>>
>> David
>>
>>
>> ------ Forwarded Message ------
>> From: "David F. Robinson" <david.robinson at corvidtec.com>
>> To: "Justin Clift" <justin at gluster.org>
>> Sent: 2/10/2015 9:49:09 AM
>> Subject: Re[2]: [Gluster-devel] missing files
>>
>> Bad news... I don't think it is the old linkto files. Bad because if
>> that was the issue, cleaning up all of bad linkto files would have fixed
>> the issue. It seems like the system just gets slower as you add data.
>>
>> First, I setup a new clean volume (test2brick) on the same system as the
>> old one (homegfs_bkp). See 'gluster v info' below. I ran my simple tar
>> extraction test on the new volume and it took 58-seconds to complete
>> (which, BTW, is 10-seconds faster than my old non-gluster system, so
>> kudos). The time on homegfs_bkp is 19-minutes.
>>
>> Next, I copied 10-terabytes of data over to test2brick and re-ran the
>> test which then took 7-minutes. I created a test3brick and ran the test
>> and it took 53-seconds.
>>
>> To confirm all of this, I deleted all of the data from test2brick and
>> re-ran the test. It took 51-seconds!!!
>>
>> BTW. I also checked the .glusterfs for stale linkto files (find . -type
>> f -size 0 -perm 1000 -exec ls -al {} \;). There are many, many thousands
>> of these types of files on the old volume and none on the new one, so I
>> don't think this is related to the performance issue.
>>
>> Let me know how I should proceed. Send this to devel list? Pranith?
>> others? Thanks...
>>
>> [root at gfs01bkp .glusterfs]# gluster volume info homegfs_bkp
>> Volume Name: homegfs_bkp
>> Type: Distribute
>> Volume ID: 96de8872-d957-4205-bf5a-076e3f35b294
>> Status: Started
>> Number of Bricks: 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/homegfs_bkp
>> Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/homegfs_bkp
>>
>> [root at gfs01bkp .glusterfs]# gluster volume info test2brick
>> Volume Name: test2brick
>> Type: Distribute
>> Volume ID: 123259b2-3c61-4277-a7e8-27c7ec15e550
>> Status: Started
>> Number of Bricks: 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/test2brick
>> Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/test2brick
>>
>> [root at gfs01bkp glusterfs]# gluster volume info test3brick
>> Volume Name: test3brick
>> Type: Distribute
>> Volume ID: 9b1613fc-f7e5-4325-8f94-e3611a5c3701
>> Status: Started
>> Number of Bricks: 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/test3brick
>> Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/test3brick
>>
>>
>>  From homegfs_bkp:
>> # find . -type f -size 0 -perm 1000 -exec ls -al {} \;
>> --------T 2 gmathur pme_ics 0 Jan 9 16:59
>> ./00/16/00169a69-1a7a-44c9-b2d8-991671ee87c4
>> ---------T 3 jcowan users 0 Jan 9 17:51
>> ./00/16/0016a0a0-fd22-4fb5-b6fb-5d7f9024ab74
>> ---------T 2 morourke sbir 0 Jan 9 18:17
>> ./00/16/0016b36f-32fc-4f2c-accd-e36be2f6c602
>> ---------T 2 carpentr irl 0 Jan 9 18:52
>> ./00/16/00163faf-741c-4e40-8081-784786b3cc71
>> ---------T 3 601 raven 0 Jan 9 22:49
>> ./00/16/00163385-a332-4050-8104-1b1af6cd8249
>> ---------T 3 bangell sbir 0 Jan 9 22:56
>> ./00/16/00167803-0244-46de-8246-d9c382dd3083
>> ---------T 2 morourke sbir 0 Jan 9 23:17
>> ./00/16/00167bc5-fc56-42ee-9e3f-1e238f3828f4
>> ---------T 3 morourke sbir 0 Jan 9 23:34
>> ./00/16/0016a71e-89cf-4a86-9575-49c7e9d216c6
>> ---------T 2 gmathur users 0 Jan 9 23:47
>> ./00/16/00168aa2-d069-4a77-8790-e36431324ca5
>> ---------T 2 bangell users 0 Jan 22 09:24
>> ./00/16/0016e720-a190-4e43-962f-aa3e4216e5f5
>> ---------T 2 root root 0 Jan 22 09:26
>> ./00/16/00169e95-64b7-455c-82dc-d9940ee7fe43
>> ---------T 2 dfrobins users 0 Jan 22 09:27
>> ./00/16/00161b04-1612-4fba-99a4-2a2b54062fdb
>> ---------T 2 mdick users 0 Jan 22 09:27
>> ./00/16/0016ba60-310a-4bee-968a-36eb290e8c9e
>> ---------T 2 dfrobins users 0 Jan 22 09:43
>> ./00/16/00160315-1533-4290-8c1a-72e2fbb1962a
>>  From test2brick:
>> find . -type f -size 0 -perm 1000 -exec ls -al {} \;
>>
>>
>>
>>
>>
>> ------ Original Message ------
>> From: "Justin Clift" <justin at gluster.org>
>> To: "David F. Robinson" <david.robinson at corvidtec.com>
>> Sent: 2/9/2015 11:33:54 PM
>> Subject: Re: [Gluster-devel] missing files
>>
>>> Interesting. (I'm 1/2 asleep atm and really need sleep soon, so take 
>>> this
>>> with a grain of salt... ;>)
>>>
>>> As a curiosity question, does the homegfs_bkp volume have a bunch of
>>> outdated metadata still in it? eg left over extended attributes or
>>> something
>>>
>>> Remembering a question you asked earlier er... today/yesterday about 
>>> old
>>> extended attribute entries and if they hang around forever. I don't
>>> know the
>>> answer to that, but if the old volume still has a 1000's (or more) of
>>> entries
>>> around, perhaps there's some lookup problem that's killing lookup
>>> times for
>>> file operations.
>>>
>>> On a side note, I can probably setup my test lab stuff here again
>>> tomorrow
>>> and try this stuff out myself to see if I can replicate the problem.
>>> (if that
>>> could potentially be useful?)
>>>
>>> + Justin
>>>
>>>
>>>
>>> On 9 Feb 2015, at 22:56, David F. Robinson
>>> <david.robinson at corvidtec.com> wrote:
>>>>  Justin,
>>>>
>>>>  Hoping you can help point this to the right people once again. Maybe
>>>> all of these issues are related.
>>>>
>>>>  You can look at the email traffic below, but the summary is that I
>>>> was working with Ben to figure out why my GFS system was 20x slower
>>>> than my old storage system. During my tracing of this issue, I
>>>> determined that if I create a new volume on my storage system, this
>>>> slowness goes away. So, either it is faster because it doesn't have
>>>> any data on this new volume (I hope this isn't the case) or the older
>>>> partitions somehow became corrupted during the upgrades or has some
>>>> depricated parameters set that slow it down.
>>>>
>>>>  Very strange and hoping you can once again help... Thanks in 
>>>> advance...
>>>>
>>>>  David
>>>>
>>>>
>>>>  ------ Forwarded Message ------
>>>>  From: "David F. Robinson" <david.robinson at corvidtec.com>
>>>>  To: "Benjamin Turner" <bennyturns at gmail.com>
>>>>  Sent: 2/9/2015 5:52:00 PM
>>>>  Subject: Re[5]: [Gluster-devel] missing files
>>>>
>>>>  Ben,
>>>>
>>>>  I cleared the logs and rebooted the machine. Same issue. homegfs_bkp
>>>> takes 19-minutes and test2brick (the new volume) takes 1-minute.
>>>>
>>>>  Is it possible that some old parameters are still set for
>>>> homegfs_bkp that are no longer in use? I tried a gluster volume reset
>>>> for homegfs_bkp, but it didn't have any effect.
>>>>
>>>>  I have attached the full logs.
>>>>
>>>>  David
>>>>
>>>>
>>>>  ------ Original Message ------
>>>>  From: "David F. Robinson" <david.robinson at corvidtec.com>
>>>>  To: "Benjamin Turner" <bennyturns at gmail.com>
>>>>  Sent: 2/9/2015 5:39:18 PM
>>>>  Subject: Re[4]: [Gluster-devel] missing files
>>>>
>>>>>  Ben,
>>>>>
>>>>>  I have traced this out to a point where I can rule out many issues.
>>>>> I was hoping you could help me from here.
>>>>>  I went with the "tar -xPf boost.tar" as my test case, which on my
>>>>> old storage system took about 1-minute to extract. On my backup
>>>>> system and my primary storage (both gluster), it takes roughly
>>>>> 19-minutes.
>>>>>
>>>>>  First step was to create a new storage system (striped RAID, two
>>>>> sets of 3-drives). All was good here with a gluster extraction time
>>>>> of 1-minute. I then went to my backup system and created another
>>>>> partition using only one of the two bricks on that system. Still
>>>>> 1-minute. I went to a two brick setup and it stayed at 1-minute.
>>>>>
>>>>>  At this point, I have recreated using the same parameters on a
>>>>> test2brick volume that should be identical to my homegfs_bkp volume.
>>>>> Everything is the same including how I mounted the volume. The only
>>>>> different is that the homegfs_bkp has 30-TB of data and the
>>>>> test2brick is blank. I didn't think that performance would be
>>>>> affected by putting data on the volume.
>>>>>
>>>>>  Can you help? Do you have any suggestions? Do you think upgrading
>>>>> gluster from 3.5 to 3.6.1 to 3.6.2 somehow message up homegfs_bkp?
>>>>> My layout is shown below. These should give identical speeds.
>>>>>
>>>>>  [root at gfs01bkp test2brick]# gluster volume info homegfs_bkp
>>>>>  Volume Name: homegfs_bkp
>>>>>  Type: Distribute
>>>>>  Volume ID: 96de8872-d957-4205-bf5a-076e3f35b294
>>>>>  Status: Started
>>>>>  Number of Bricks: 2
>>>>>  Transport-type: tcp
>>>>>  Bricks:
>>>>>  Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/homegfs_bkp
>>>>>  Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/homegfs_bkp
>>>>>  [root at gfs01bkp test2brick]# gluster volume info test2brick
>>>>>
>>>>>  Volume Name: test2brick
>>>>>  Type: Distribute
>>>>>  Volume ID: 123259b2-3c61-4277-a7e8-27c7ec15e550
>>>>>  Status: Started
>>>>>  Number of Bricks: 2
>>>>>  Transport-type: tcp
>>>>>  Bricks:
>>>>>  Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/test2brick
>>>>>  Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/test2brick
>>>>>
>>>>>
>>>>>  [root at gfs01bkp brick02bkp]# mount | grep test2brick
>>>>>  gfsib01bkp.corvidtec.com:/test2brick.tcp on /test2brick type
>>>>> fuse.glusterfs (rw,allow_other,max_read=131072)
>>>>>  [root at gfs01bkp brick02bkp]# mount | grep homegfs_bkp
>>>>>  gfsib01bkp.corvidtec.com:/homegfs_bkp.tcp on /backup/homegfs type
>>>>> fuse.glusterfs (rw,allow_other,max_read=131072)
>>>>>
>>>>>  [root at gfs01bkp brick02bkp]# df -h
>>>>>  Filesystem Size Used Avail Use% Mounted on
>>>>>  /dev/mapper/vg00-lv_root 20G 1.7G 18G 9% /
>>>>>  tmpfs 16G 0 16G 0% /dev/shm
>>>>>  /dev/md126p1 1008M 110M 848M 12% /boot
>>>>>  /dev/mapper/vg00-lv_opt 5.0G 220M 4.5G 5% /opt
>>>>>  /dev/mapper/vg00-lv_tmp 5.0G 139M 4.6G 3% /tmp
>>>>>  /dev/mapper/vg00-lv_usr 20G 2.7G 17G 15% /usr
>>>>>  /dev/mapper/vg00-lv_var 40G 4.4G 34G 12% /var
>>>>>  /dev/mapper/vg01-lvol1 88T 22T 67T 25% /data/brick01bkp
>>>>>  /dev/mapper/vg02-lvol1 88T 22T 67T 25% /data/brick02bkp
>>>>>  gfsib01bkp.corvidtec.com:/homegfs_bkp.tcp 175T 43T 133T 25%
>>>>> /backup/homegfs
>>>>>  gfsib01bkp.corvidtec.com:/test2brick.tcp 175T 43T 133T 25% 
>>>>> /test2brick
>>>>>
>>>>>
>>>>>  ------ Original Message ------
>>>>>  From: "Benjamin Turner" <bennyturns at gmail.com>
>>>>>  To: "David F. Robinson" <david.robinson at corvidtec.com>
>>>>>  Sent: 2/6/2015 12:52:58 PM
>>>>>  Subject: Re: Re[2]: [Gluster-devel] missing files
>>>>>
>>>>>>  Hi David. Lets start with the basics and go from there. IIRC you
>>>>>> are using LVM with thick provisioning, lets verify the following:
>>>>>>
>>>>>>  1. You have everything properly aligned for your RAID stripe size,
>>>>>> etc. I have attached the script we package with RHS that I am in
>>>>>> the process of updating. I want to double check you created the PV
>>>>>> / VG / LV with the proper variables. Have a look at the create_pv,
>>>>>> create_vg, and create_lv(old) functions. You will need to know the
>>>>>> stripe size of your raid and the number of stripe elements(data
>>>>>> disks, not hotspares). Also make sure you mkfs.xfs with:
>>>>>>
>>>>>>  echo "mkfs -t xfs -f -K -i size=$inode_size -d
>>>>>> sw=$stripe_elements,su=$stripesize -n size=$fs_block_size
>>>>>> /dev/$vgname/$lvname"
>>>>>>
>>>>>>  We use 512k inodes because some workload use more than the default
>>>>>> inode size and you don't want xattrs bleeding over inodes.
>>>>>>
>>>>>>  2. Are you running RHEL or Centos? If so I would recommend
>>>>>> tuned_profile=rhs-high-throughput. If you don't have that tuned
>>>>>> profile I'll get you everything it sets.
>>>>>>
>>>>>>  3. For small files we we recommend the following:
>>>>>>
>>>>>>  # RAID related variables.
>>>>>>  # stripesize - RAID controller stripe unit size
>>>>>>  # stripe_elements - the number of data disks
>>>>>>  # The --dataalignment option is used while creating the physical
>>>>>> volumeTo
>>>>>>  # align I/O at LVM layer
>>>>>>  # dataalign -
>>>>>>  # RAID6 is recommended when the workload has predominantly larger
>>>>>>  # files ie not in kilobytes.
>>>>>>  # For RAID6 with 12 disks and 128K stripe element size.
>>>>>>  stripesize=128k
>>>>>>  stripe_elements=10
>>>>>>  dataalign=1280k
>>>>>>
>>>>>>  # RAID10 is recommended when the workload has predominantly
>>>>>> smaller files
>>>>>>  # i.e in kilobytes.
>>>>>>  # For RAID10 with 12 disks and 256K stripe element size, uncomment
>>>>>> the
>>>>>>  # lines below.
>>>>>>  # stripesize=256k
>>>>>>  # stripe_elements=6
>>>>>>  # dataalign=1536k
>>>>>>
>>>>>>  4. Jumbo frames everywhere! Check out the effect of jumbo frames,
>>>>>> make sure they are setup properly on your switch and add the
>>>>>> MTU=9000 to your ifcfg files(unless you have it already):
>>>>>>
>>>>>>
>>>>>> https://rhsummit.files.wordpress.com/2013/07/england_th_0450_rhs_perf_practices-4_neependra.pdf 
>>>>>>
>>>>>> (see the jumbo frames section here, the whole thing is a good read)
>>>>>>
>>>>>> https://rhsummit.files.wordpress.com/2014/04/bengland_h_1100_rhs_performance.pdf 
>>>>>>
>>>>>> (this is updated for 2014)
>>>>>>
>>>>>>  5. There is a smallfile enhancement that just landed in master
>>>>>> that is showing me a 60% improvement in writes. This is called
>>>>>> multi threaded epoll and it is looking VERY promising WRT smallfile
>>>>>> performance. Here is a summary:
>>>>>>
>>>>>>  Hi all. I see alot of discussion on $subject and I wanted to take
>>>>>> a minute to talk about it and what we can do to test / observe the
>>>>>> effects of it. Lets start with a bit of background:
>>>>>>
>>>>>>  **Background**
>>>>>>
>>>>>>  -Currently epoll is single threaded on both clients and servers.
>>>>>>    *This leads to a "hot thread" which consumes 100% of a CPU core.
>>>>>>    *This can be observed by running BenE's smallfile benchmark to
>>>>>> create files, running top(on both clients and servers), and
>>>>>> pressing H to show threads.
>>>>>>    *You will be able to see a single glusterfs thread eating 100%
>>>>>> of the CPU:
>>>>>>
>>>>>>   2871 root 20 0 746m 24m 3004 S 100.0 0.1 14:35.89 glusterfsd
>>>>>>   4522 root 20 0 747m 24m 3004 S 5.3 0.1 0:02.25 glusterfsd
>>>>>>   4507 root 20 0 747m 24m 3004 S 5.0 0.1 0:05.91 glusterfsd
>>>>>>  21200 root 20 0 747m 24m 3004 S 4.6 0.1 0:21.16 glusterfsd
>>>>>>
>>>>>>  -Single threaded epoll is a bottlenck for high IOP / low metadata
>>>>>> workloads(think smallfile). With single threaded epoll we are CPU
>>>>>> bound by the single thread pegging out a CPU.
>>>>>>
>>>>>>  So the proposed solution to this problem is to make epoll multi
>>>>>> threaded on both servers and clients. Here is a link to the
>>>>>> upstream proposal:
>>>>>>
>>>>>>
>>>>>> http://www.gluster.org/community/documentation/index.php/Features/Feature_Smallfile_Perf#multi-thread-epoll 
>>>>>>
>>>>>>
>>>>>>
>>>>>>  Status: [ http://review.gluster.org/#/c/3842/ based on Anand
>>>>>> Avati's patch ]
>>>>>>
>>>>>>  Why: remove single-thread-per-brick barrier to higher CPU
>>>>>> utilization by servers
>>>>>>
>>>>>>  Use case: multi-client and multi-thread applications
>>>>>>
>>>>>>  Improvement: measured 40% with 2 epoll threads and 100% with 4
>>>>>> epoll threads for small file creates to an SSD
>>>>>>
>>>>>>  Disadvantage: conflicts with support for SSL sockets, may require
>>>>>> significant code change to support both.
>>>>>>
>>>>>>  Note: this enhancement also helps high-IOPS applications such as
>>>>>> databases and virtualization which are not metadata-intensive. This
>>>>>> has been measured already using a Fusion I/O SSD performing random
>>>>>> reads and writes -- it was necessary to define multiple bricks per
>>>>>> SSD device to get Gluster to the same order of magnitude IOPS as a
>>>>>> local filesystem. But this workaround is problematic for users,
>>>>>> because storage space is not properly measured when there are
>>>>>> multiple bricks on the same filesystem.
>>>>>>
>>>>>>  Multi threaded epoll is part of a larger page that talks about
>>>>>> smallfile performance enhancements, proposed and happening.
>>>>>>
>>>>>>  Goal: if successful, throughput bottleneck should be either the
>>>>>> network or the brick filesystem!
>>>>>>  What it doesn't do: multi-thread-epoll does not solve the
>>>>>> excessive-round-trip protocol problems that Gluster has.
>>>>>>  What it should do: allow Gluster to exploit the mostly untapped
>>>>>> CPU resources on the Gluster servers and clients.
>>>>>>  How it does it: allow multiple threads to read protocol messages
>>>>>> and process them at the same time.
>>>>>>  How to observe: multi-thread-epoll should be configurable (how to
>>>>>> configure? gluster command?), with thread count 1 it should be same
>>>>>> as RHS 3.0, with thread count 2-4 it should show significantly more
>>>>>> CPU utilization (threads visible with "top -H"), resulting in
>>>>>> higher throughput.
>>>>>>
>>>>>>  **How to observe**
>>>>>>
>>>>>>  Here are the commands needed to setup an environment to test in on
>>>>>> RHS 3.0.3:
>>>>>>  rpm -e glusterfs-api glusterfs glusterfs-libs glusterfs-fuse
>>>>>> glusterfs-geo-replication glusterfs-rdma glusterfs-server
>>>>>> glusterfs-cli gluster-nagios-common samba-glusterfs vdsm-gluster
>>>>>> --nodeps
>>>>>>  rhn_register
>>>>>>  yum groupinstall "Development tools"
>>>>>>  git clone https://github.com/gluster/glusterfs.git
>>>>>>  git branch test
>>>>>>  git checkout test
>>>>>>  git fetch http://review.gluster.org/glusterfs
>>>>>> refs/changes/42/3842/17 && git cherry-pick FETCH_HEAD
>>>>>>  git fetch http://review.gluster.org/glusterfs
>>>>>> refs/changes/88/9488/2 && git cherry-pick FETCH_HEAD
>>>>>>  yum install openssl openssl-devel
>>>>>>  wget
>>>>>> ftp://fr2.rpmfind.net/linux/epel/6/x86_64/cmockery2-1.3.8-2.el6.x86_64.rpm 
>>>>>>
>>>>>>
>>>>>>  wget
>>>>>> ftp://fr2.rpmfind.net/linux/epel/6/x86_64/cmockery2-devel-1.3.8-2.el6.x86_64.rpm 
>>>>>>
>>>>>>
>>>>>>  yum install cmockery2-1.3.8-2.el6.x86_64.rpm
>>>>>> cmockery2-devel-1.3.8-2.el6.x86_64.rpm libxml2-devel
>>>>>>  ./autogen.sh
>>>>>>  ./configure
>>>>>>  make
>>>>>>  make install
>>>>>>
>>>>>>  Verify you are using the upstream with:
>>>>>>
>>>>>>  # gluster -- version
>>>>>>
>>>>>>  To enable set multithreaded epoll run the following commands:
>>>>>>
>>>>>>  From the patch:
>>>>>>          { .key = "client.event-threads", 839
>>>>>>            .voltype = "protocol/client", 840
>>>>>>            .op_version = GD_OP_VERSION_3_7_0, 841
>>>>>>                                  },
>>>>>>          { .key = "server.event-threads", 946
>>>>>>            .voltype = "protocol/server", 947
>>>>>>            .op_version = GD_OP_VERSION_3_7_0, 948
>>>>>>          },
>>>>>>
>>>>>>  # gluster v set <volname> server.event-threads 4
>>>>>>  # gluster v set <volname> client.event-threads 4
>>>>>>
>>>>>>  Also grab smallfile:
>>>>>>
>>>>>>  https://github.com/bengland2/smallfile
>>>>>>
>>>>>>  After git cloneing smallfile run:
>>>>>>
>>>>>>  python /small-files/smallfile/smallfile_cli.py --operation create
>>>>>> --threads 8 --file-size 64 --files 10000 --top /gluster-mount
>>>>>> --pause 1000 --host-set "client1 client2"
>>>>>>
>>>>>>  Again we will be looking at top + show threads(press H). With 4
>>>>>> threads on both clients and servers you should see something
>>>>>> similar to(this isnt exact, I coped and pasted):
>>>>>>
>>>>>>   2871 root 20 0 746m 24m 3004 S 35.0 0.1 14:35.89 glusterfsd
>>>>>>   2872 root 20 0 746m 24m 3004 S 51.0 0.1 14:35.89 glusterfsd
>>>>>>   2873 root 20 0 746m 24m 3004 S 43.0 0.1 14:35.89 glusterfsd
>>>>>>   2874 root 20 0 746m 24m 3004 S 65.0 0.1 14:35.89 glusterfsd
>>>>>>   4522 root 20 0 747m 24m 3004 S 5.3 0.1 0:02.25 glusterfsd
>>>>>>   4507 root 20 0 747m 24m 3004 S 5.0 0.1 0:05.91 glusterfsd
>>>>>>  21200 root 20 0 747m 24m 3004 S 4.6 0.1 0:21.16 glusterfsd
>>>>>>
>>>>>>  If you have a test env I would be interested to see how multi
>>>>>> threaded epoll performs, but I am 100% sure its not ready for
>>>>>> production yet. RH will be supporting it with our 3.0.4(the next
>>>>>> one) release unless we find show stopping bugs. My testing looks
>>>>>> very promising though.
>>>>>>
>>>>>>  Smallfile performance enhancements are one of the key focuses for
>>>>>> our 3.1 release this summer, we are working very hard to improve
>>>>>> this as this is the use case for the majority of people.
>>>>>>
>>>>>>
>>>>>>  On Fri, Feb 6, 2015 at 11:59 AM, David F. Robinson
>>>>>> <david.robinson at corvidtec.com> wrote:
>>>>>>  Ben,
>>>>>>
>>>>>>  I was hoping you might be able to help with two performance
>>>>>> questions. I was doing some testing of my rsync where I am backing
>>>>>> up my primary gluster system (distributed + replicated) to my
>>>>>> backup gluster system (distributed). I tried three tests where I
>>>>>> rsynced from one of my primary sytems (gfsib02b) to my backup
>>>>>> machine. The test directory contains roughly 5500 files, most of
>>>>>> which are small. The script I ran is shown below which repeats the
>>>>>> tests 3x for each section to check variability in timing.
>>>>>>
>>>>>>  1) Writing to the local disk is drastically faster than writing to
>>>>>> gluster. So, my writes to the backup gluster system are what is
>>>>>> slowing me down, which makes sense.
>>>>>>  2) When I write to the backup gluster system (/backup/homegfs),
>>>>>> the timing goes from 35-seconds to 1min40seconds. The question here
>>>>>> is whether you could recommend any settings for this volume that
>>>>>> would improve performance for small file writes? I have included
>>>>>> the output of 'gluster volume info" below.
>>>>>>  3) When I did the same tests on the Source_bkp volume, it is
>>>>>> almost 3x as slow as the homegfs_bkp volume. However, these are
>>>>>> just different volumes on the same storage system. The volume
>>>>>> parameters are identical (see below). The performance of these two
>>>>>> should be identical. Any idea why they wouldn't be? And any
>>>>>> suggestions for how to fix this? The only thing that I see
>>>>>> different between the two is the order of the "Options
>>>>>> reconfigured" section. I assume order of options doesn't matter.
>>>>>>
>>>>>>  Backup to local hard disk (no gluster writes)
>>>>>>   time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>> gfsib02b:/homegfs/test /temp1
>>>>>>   time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>> gfsib02b:/homegfs/test /temp2
>>>>>>   time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>> gfsib02b:/homegfs/test /temp3
>>>>>>
>>>>>>          real 0m35.579s
>>>>>>          user 0m31.290s
>>>>>>          sys 0m12.282s
>>>>>>
>>>>>>          real 0m38.035s
>>>>>>          user 0m31.622s
>>>>>>          sys 0m10.907s
>>>>>>          real 0m38.313s
>>>>>>          user 0m31.458s
>>>>>>          sys 0m10.891s
>>>>>>  Backup to gluster backup system on volume homegfs_bkp
>>>>>>   time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>> gfsib02b:/homegfs/test /backup/homegfs/temp1
>>>>>>   time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>> gfsib02b:/homegfs/test /backup/homegfs/temp2
>>>>>>   time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>> gfsib02b:/homegfs/test /backup/homegfs/temp3
>>>>>>
>>>>>>          real 1m42.026s
>>>>>>          user 0m32.604s
>>>>>>          sys 0m9.967s
>>>>>>
>>>>>>          real 1m45.480s
>>>>>>          user 0m32.577s
>>>>>>          sys 0m11.994s
>>>>>>
>>>>>>          real 1m40.436s
>>>>>>          user 0m32.521s
>>>>>>          sys 0m11.240s
>>>>>>
>>>>>>  Backup to gluster backup system on volume Source_bkp
>>>>>>   time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>> gfsib02b:/homegfs/test /backup/Source/temp1
>>>>>>   time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>> gfsib02b:/homegfs/test /backup/Source/temp2
>>>>>>   time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>> --block-size=131072 -e "ssh -T -c arcfour -o Compression=no -x"
>>>>>> gfsib02b:/homegfs/test /backup/Source/temp3
>>>>>>
>>>>>>          real 3m30.491s
>>>>>>          user 0m32.676s
>>>>>>          sys 0m10.776s
>>>>>>
>>>>>>          real 3m26.076s
>>>>>>          user 0m32.588s
>>>>>>          sys 0m11.048s
>>>>>>          real 3m7.460s
>>>>>>          user 0m32.763s
>>>>>>          sys 0m11.687s
>>>>>>
>>>>>>
>>>>>>  Volume Name: Source_bkp
>>>>>>  Type: Distribute
>>>>>>  Volume ID: 1d4c210d-a731-4d39-a0c5-ea0546592c1d
>>>>>>  Status: Started
>>>>>>  Number of Bricks: 2
>>>>>>  Transport-type: tcp
>>>>>>  Bricks:
>>>>>>  Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/Source_bkp
>>>>>>  Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/Source_bkp
>>>>>>  Options Reconfigured:
>>>>>>  performance.cache-size: 128MB
>>>>>>  performance.io-thread-count: 32
>>>>>>  server.allow-insecure: on
>>>>>>  network.ping-timeout: 10
>>>>>>  storage.owner-gid: 100
>>>>>>  performance.write-behind-window-size: 128MB
>>>>>>  server.manage-gids: on
>>>>>>  changelog.rollover-time: 15
>>>>>>  changelog.fsync-interval: 3
>>>>>>
>>>>>>  Volume Name: homegfs_bkp
>>>>>>  Type: Distribute
>>>>>>  Volume ID: 96de8872-d957-4205-bf5a-076e3f35b294
>>>>>>  Status: Started
>>>>>>  Number of Bricks: 2
>>>>>>  Transport-type: tcp
>>>>>>  Bricks:
>>>>>>  Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/homegfs_bkp
>>>>>>  Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/homegfs_bkp
>>>>>>  Options Reconfigured:
>>>>>>  storage.owner-gid: 100
>>>>>>  performance.io-thread-count: 32
>>>>>>  server.allow-insecure: on
>>>>>>  network.ping-timeout: 10
>>>>>>  performance.cache-size: 128MB
>>>>>>  performance.write-behind-window-size: 128MB
>>>>>>  server.manage-gids: on
>>>>>>  changelog.rollover-time: 15
>>>>>>  changelog.fsync-interval: 3
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>  ------ Original Message ------
>>>>>>  From: "Benjamin Turner" <bennyturns at gmail.com>
>>>>>>  To: "David F. Robinson" <david.robinson at corvidtec.com>
>>>>>>  Cc: "Gluster Devel" <gluster-devel at gluster.org>;
>>>>>> "gluster-users at gluster.org" <gluster-users at gluster.org>
>>>>>>  Sent: 2/3/2015 7:12:34 PM
>>>>>>  Subject: Re: [Gluster-devel] missing files
>>>>>>
>>>>>>>  It sounds to me like the files were only copied to one replica,
>>>>>>> werent there for the initial for the initial ls which triggered a
>>>>>>> self heal, and were there for the last ls because they were
>>>>>>> healed. Is there any chance that one of the replicas was down
>>>>>>> during the rsync? It could be that you lost a brick during copy or
>>>>>>> something like that. To confirm I would look for disconnects in
>>>>>>> the brick logs as well as checking glusterfshd.log to verify the
>>>>>>> missing files were actually healed.
>>>>>>>
>>>>>>>  -b
>>>>>>>
>>>>>>>  On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson
>>>>>>> <david.robinson at corvidtec.com> wrote:
>>>>>>>  I rsync'd 20-TB over to my gluster system and noticed that I had
>>>>>>> some directories missing even though the rsync completed normally.
>>>>>>>  The rsync logs showed that the missing files were transferred.
>>>>>>>
>>>>>>>  I went to the bricks and did an 'ls -al
>>>>>>> /data/brick*/homegfs/dir/*' the files were on the bricks. After I
>>>>>>> did this 'ls', the files then showed up on the FUSE mounts.
>>>>>>>
>>>>>>>  1) Why are the files hidden on the fuse mount?
>>>>>>>  2) Why does the ls make them show up on the FUSE mount?
>>>>>>>  3) How can I prevent this from happening again?
>>>>>>>
>>>>>>>  Note, I also mounted the gluster volume using NFS and saw the
>>>>>>> same behavior. The files/directories were not shown until I did
>>>>>>> the "ls" on the bricks.
>>>>>>>
>>>>>>>  David
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  ===============================
>>>>>>>  David F. Robinson, Ph.D.
>>>>>>>  President - Corvid Technologies
>>>>>>>  704.799.6944 x101 [office]
>>>>>>>  704.252.1310 [cell]
>>>>>>>  704.799.7974 [fax]
>>>>>>>  David.Robinson at corvidtec.com
>>>>>>>  http://www.corvidtechnologies.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  _______________________________________________
>>>>>>>  Gluster-devel mailing list
>>>>>>>  Gluster-devel at gluster.org
>>>>>>>  http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>>>>
>>>>>>>
>>>>>>
>>>>  <glusterfs.tgz>
>>>
>>> -- 
>>> GlusterFS - http://www.gluster.org
>>>
>>> An open source, distributed file system scaling to several
>>> petabytes, and handling thousands of clients.
>>>
>>> My personal twitter: twitter.com/realjustinclift
>>>
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel



More information about the Gluster-devel mailing list