[Gluster-devel] Fwd: Re[2]: missing files

David F. Robinson david.robinson at corvidtec.com
Wed Feb 11 14:23:00 UTC 2015



David  (Sent from mobile)

===============================
David F. Robinson, Ph.D. 
President - Corvid Technologies
704.799.6944 x101 [office]
704.252.1310      [cell]
704.799.7974      [fax]
David.Robinson at corvidtec.com
http://www.corvidtechnologies.com

Begin forwarded message:

> From: "David F. Robinson" <david.robinson at corvidtec.com>
> Date: February 10, 2015 at 4:44:25 PM EST
> To: Shyam <srangana at redhat.com>
> Subject: Re[2]: [Gluster-devel] missing files
> Reply-To: "David F. Robinson" <david.robinson at corvidtec.com>
> updated test files after restarting the homegfs_bkp volume...
> 
> David
> 
> 
> ------ Original Message ------
> From: "Shyam" <srangana at redhat.com>
> To: "David F. Robinson" <david.robinson at corvidtec.com>
> Sent: 2/10/2015 3:25:13 PM
> Subject: Re: [Gluster-devel] missing files
> 
>> Ouch1 I think one file is missing io-stats-post-slow.txt could you check and send that across.
>> 
>> As of now from the strace data, the slow volume is slow in open and mkdir, which is causing the overall slowness in the tar operation.
>> 
>> Data is:
>> syscall #count avg.increase(in secs) total contribution (in secs)
>> open 4061 0.0368440096 149.6235229856
>> mkdir 384 0.0741610521 28.4778440064
>> 
>> Now to check what io-stats reflects in terms of numbers to glean some internal gluster specifics.
>> 
>> Shyam
>> 
>>> On 02/10/2015 02:18 PM, David F. Robinson wrote:
>>> files attached...
>>> 
>>> David
>>> 
>>> 
>>> ------ Original Message ------
>>> From: "Shyam" <srangana at redhat.com>
>>> To: "David F. Robinson" <david.robinson at corvidtec.com>
>>> Sent: 2/10/2015 1:23:58 PM
>>> Subject: Re: [Gluster-devel] missing files
>>> 
>>>> Hi David,
>>>> 
>>>> I am attempting to get a system with 20TB space to see if I can
>>>> recreate the issue in house. I have the machines, I need to configure
>>>> the same and get the data generation going.
>>>> 
>>>> In the meantime would it be possible to get the following output from
>>>> both the fast and slow volumes (the job need not complete on the slow
>>>> volume), each run should include the previous output, i.e run strace
>>>> first, then strace with the gluster io-stats enabled, etc.
>>>> 
>>>> 1) strace -tt tar -xPf <boost tar file>
>>>> 
>>>> 2) Enable io-stats opions and get the folowing information,
>>>> 
>>>> 2.1) Enable io-stats options:
>>>>  - gluster volume set <volname> diagnostics.latency-measurement on
>>>>  - gluster volume set <volname> diagnostics.count-fop-hits on
>>>> 
>>>> 2.2) Get an initial dump before starting the tar operation
>>>>  - setfattr -n trusted.io-stats-dump -v /tmp/io-stats-pre.txt
>>>> <gluster mount point>
>>>> 
>>>> 2.3) run with strace as in (1)
>>>> 
>>>> 2.3) Get a final dump after the tar operation
>>>>  - As in 2.2, replace file name with /tmp/io-stats-post.txt
>>>> 
>>>> 2.4) Turn off io-stats
>>>>  - gluster volume set <volname> diagnostics.latency-measurement off
>>>>  - gluster volume set <volname> diagnostics.count-fop-hits off
>>>> 
>>>> I am fine with a call, just that I do not have any questions as of
>>>> now, or the need to poke around the volume etc. I am hoping that I can
>>>> reproduce this in house and see the results for myself (and in the
>>>> meantime gather the above information).
>>>> 
>>>> Thanks, for coming on the IRC channel, for any quick conversation on
>>>> the same.
>>>> 
>>>> Shyam
>>>> 
>>>> 
>>>>> On 02/10/2015 12:29 PM, David F. Robinson wrote:
>>>>> 
>>>>>> I have some questions from the logs, just to confirm I am seeing the
>>>>>> right things here (below). Some other observations as follows,
>>>>>> 
>>>>>> a) The stale linkto files should not cause a slowness as observed,
>>>>>> from the code standpoint it should not be the reason (although it is
>>>>>> too early to rule out other possibilities at present).
>>>>>> 
>>>>>> b) From the current logs I do not have any clue on the problem, other
>>>>>> than time stamps, as a result the approach forward maybe,
>>>>>>  - Reproduce this in house and check (mostly easiest as I would have
>>>>>> total control of the systems)
>>>>>>  - I may request you for some additional logs (thinking what is best
>>>>>> at present)
>>>>>> 
>>>>>> Onto the information from the logs:
>>>>>> 
>>>>>> 1) backup-homegfs.log is the mount /backup/homegfs that is always
>>>>>> showing up a slow performance, correct? This is the primary production
>>>>>> volume, correct?
>>>>>> 
>>>>>> - The logs are neat, just captures the extract portions of activity
>>>>>> making it easier to look at them, thanks.
>>>>> Yes, /backup/homegfs is the mount point for the homegfs_bkp volume and
>>>>> it was showing the performance problems. I created a new volume for
>>>>> testing (same machine, same bricks, etc...) called test2brick. The
>>>>> performance issue was gone. My first guess was that somehow the
>>>>> homegfs_bkp volume was corrupted and my second guess was that the
>>>>> performance improvement on test2brick was due to it having no data. To
>>>>> test, I did a 'cp -ar /backup/homegfs/xyz /test2brick' for roughly 10TB
>>>>> of files. I then re-ran my test on test2brick and it showed the
>>>>> performance issue. I then did a 'rm -rf /test2brick/xyz' and reran the
>>>>> tests and the performance recovered.
>>>>> 
>>>>>> 
>>>>>> 2) Did the tar job complete on backup-homegfs? I ask this, as the
>>>>>> following are the start and end logs of the extract and this is
>>>>>> incorrect when compared to the two other logs:
>>>>>> 
>>>>>> [2015-02-09 22:44:57.872633] I [MSGID: 109036]
>>>>>> [dht-common.c:6222:dht_log_new_layout_for_dir_selfheal]
>>>>>> 0-homegfs_bkp-dht: Setting layout of /boost_1_57_0 with [Subvol_name:
>>>>>> homegfs_bkp-client-0, Err: -1 , Start: 2105878482 , Stop: 4294967295
>>>>>> ], [Subvol_name: homegfs_bkp-client-1, Err: -1 , Start: 0 , Stop:
>>>>>> 2105878481 ],
>>>>>> 
>>>>>> [2015-02-09 22:48:38.898166] I [MSGID: 109036]
>>>>>> [dht-common.c:6222:dht_log_new_layout_for_dir_selfheal]
>>>>>> 0-homegfs_bkp-dht: Setting layout of
>>>>>> /boost_1_57_0/boost/math/special_functions/detail with [Subvol_name:
>>>>>> homegfs_bkp-client-0, Err: -1 , Start: 0 , Stop: 2105878481 ],
>>>>>> [Subvol_name: homegfs_bkp-client-1, Err: -1 , Start: 2105878482 ,
>>>>>> Stop: 4294967295 ],
>>>>>> 
>>>>>> Note the last log ends with the name of the directory as,
>>>>>> "/boost_1_57_0/boost/math/special_functions/detail", whereas in the
>>>>>> test2 and test3 bricks these end at
>>>>>> "/boost_1_57_0/tools/regression/xsl_reports/xsl/v2/html"
>>>>> I ran it many times on both volumes to make sure that it wasn't an
>>>>> anomaly. The tar extraction on the homegfs_bkp volume takes 19-minutes.
>>>>>  I have let it complete several times, but I also have killed it after
>>>>> 5-10 minutes because after 5-minutes, I know there is still a
>>>>> performance problem. So, the log you were looking at was probably one
>>>>> of the runs where I killed the extraction.
>>>>> 
>>>>> 
>>>>>> 
>>>>>> 3) Various tests on test3brick details:
>>>>>> 
>>>>>> [2015-02-10 14:22:52.403546] I [MSGID: 109036] - Start
>>>>>> [2015-02-10 14:23:51.082752] I [MSGID: 109036] - Stop
>>>>>> 
>>>>>> Time taken: 00:59
>>>>>> 
>>>>>> [2015-02-10 14:24:20.590523] I [MSGID: 109036] - Start
>>>>>> [2015-02-10 14:25:14.113263] I [MSGID: 109036] - End
>>>>>> 
>>>>>> Time taken: 00:54
>>>>>> 
>>>>>> This did not have any other data on the bricks, correct?
>>>>> I haven't done much testing on brick3 yet. I was going under the
>>>>> assumption that the gluster developers were going to want me to test
>>>>> some options and I didn't want to have to wait for 10TB to copy over
>>>>> prior to each test. So, I created a test2brick that will contain
>>>>> roughly 10TB of data and I created a test3brick that is empty. Copying
>>>>> over the 10TB of data takes several hours.
>>>>> 
>>>>> 
>>>>>> 
>>>>>> 4) Various tests on test2brick:
>>>>>> 
>>>>>> (i)
>>>>>> [2015-02-09 22:42:43.215401] I [MSGID: 109036] - Start
>>>>>> [2015-02-09 22:44:16.270895] I [MSGID: 109036] - End
>>>>>> 
>>>>>> Time taken on empty volume: 1:33 (with no data on volume, correct?)
>>>>> This sounds about right, although with 0-data the time comes out around
>>>>> 50-58 seconds.
>>>>> 
>>>>>> 
>>>>>> (ii)
>>>>>> [2015-02-10 02:24:32.497366] I [MSGID: 109036] - Start
>>>>>> [2015-02-10 02:26:35.599863] I [MSGID: 109036] - Stop
>>>>>> 
>>>>>> Time taken with some data in the volume: 2:03
>>>>>> 
>>>>>> Added data, /users/carpentr
>>>>> Correct. I started testing as the cp was ongoing to see what the effect
>>>>> of the cp was on performance. It was slower (as expected), but nothing
>>>>> like the 19-minutes I was getting on homegfs_bkp.
>>>>> 
>>>>>> 
>>>>>> (iii)
>>>>>> 
>>>>>> [2015-02-10 13:55:32.496258] I [MSGID: 109036] - Start
>>>>>> [2015-02-10 13:58:49.158800] I [MSGID: 109036] - Stop
>>>>>> 
>>>>>> Time taken with some _more_ data in the volume: 3:17
>>>>>> 
>>>>>> Added data, /users/former_employees
>>>>> Correct. As the amount of data on test2brick increases, the time
>>>>> increases.
>>>>> 
>>>>>> 
>>>>>> (iv)
>>>>>> [2015-02-10 14:15:48.617364] I [MSGID: 109036] - Start
>>>>>> [2015-02-10 14:22:48.178021] I [MSGID: 109036] - Stop
>>>>>> 
>>>>>> Time taken with no change between (iii) and (iv) in terms of volume
>>>>>> data: 7:00
>>>>>> 
>>>>>> Are the above observations right?
>>>>> The time when the full copy had finished was approximately 7-minutes. I
>>>>> only copied over approximately 10-TB to test2brick. Homegfs_bkp
>>>>> contains roughly 40-TB. I am assuming that if I copied over the full
>>>>> 40TB, the test2brick would reproduce the 19-minute runtime of
>>>>> homegfs_bkp.
>>>>> 
>>>>> I am working on getting on IIRC now. I also am availble at 704.252.1310
>>>>> if you want to discuss.
>>>>> 
>>>>> David
>>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> Shyam
>>>>>> 
>>>>>>> On 02/10/2015 10:51 AM, David F. Robinson wrote:
>>>>>>> I am not on IRC, but I could get on it. I can be available for a call
>>>>>>> as well. You can call my cell 704.252.1310 anytime. Or, if you want
>>>>>>> multiple people on the call, I have a Telecon # setup that I could
>>>>>>> send out.
>>>>>>> 
>>>>>>> David (Sent from mobile)
>>>>>>> 
>>>>>>> ===============================
>>>>>>> David F. Robinson, Ph.D.
>>>>>>> President - Corvid Technologies
>>>>>>> 704.799.6944 x101 [office]
>>>>>>> 704.252.1310 [cell]
>>>>>>> 704.799.7974 [fax]
>>>>>>> David.Robinson at corvidtec.com
>>>>>>> http://www.corvidtechnologies.com
>>>>>>> 
>>>>>>>> On Feb 10, 2015, at 10:12 AM, Shyam <srangana at redhat.com> wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Are you on IRC? (if so nick and channels please (#gluster-devel
>>>>>>>> would be nice))
>>>>>>>> 
>>>>>>>> Can we get into a call or talk somehow? Maybe faster for me to catch
>>>>>>>> up on certain things and also to look at what the problem could be.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Shyam
>>>>>>>> 
>>>>>>>>> On 02/10/2015 12:11 AM, David Robinson wrote:
>>>>>>>>> When I created a new volume on the same storage system my
>>>>>>>>> performance
>>>>>>>>> problem goes away completely.
>>>>>>>>> The only difference is that the new volume doesn't have all of the
>>>>>>>>> broken links in .glusterfs and the new volume was created with
>>>>>>>>> 3.6.2.
>>>>>>>>> The old volume was created using an older version of gluster prior
>>>>>>>>> to my
>>>>>>>>> upgrades to 3.6.
>>>>>>>>> 
>>>>>>>>> I even tried a gluster volume reset to make sure all of the
>>>>>>>>> performance
>>>>>>>>> options were the same. The new volume (test2brick) is 20x faster for
>>>>>>>>> small file operations.
>>>>>>>>> 
>>>>>>>>> Email train below.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ===============================
>>>>>>>>> David F. Robinson, Ph.D.
>>>>>>>>> President - Corvid Technologies
>>>>>>>>> 704.799.6944 x101 [office]
>>>>>>>>> 704.252.1310 [cell]
>>>>>>>>> 704.799.7974 [fax]
>>>>>>>>> David.Robinson at corvidtec.com <mailto:David.Robinson at corvidtec.com>
>>>>>>>>> http://www.corvidtechnologies.com
>>>>>>>>> 
>>>>>>>>> Begin forwarded message:
>>>>>>>>> 
>>>>>>>>>> *From:* "David F. Robinson" <david.robinson at corvidtec.com
>>>>>>>>>> <mailto:david.robinson at corvidtec.com>>
>>>>>>>>>> *Date:* February 9, 2015 at 5:52:00 PM EST
>>>>>>>>>> *To:* "Benjamin Turner" <bennyturns at gmail.com
>>>>>>>>>> <mailto:bennyturns at gmail.com>>
>>>>>>>>>> *Subject:* *Re[5]: [Gluster-devel] missing files*
>>>>>>>>>> *Reply-To:* "David F. Robinson" <david.robinson at corvidtec.com
>>>>>>>>>> <mailto:david.robinson at corvidtec.com>>
>>>>>>>>>> 
>>>>>>>>>> Ben,
>>>>>>>>>> I cleared the logs and rebooted the machine. Same issue.
>>>>>>>>>> homegfs_bkp
>>>>>>>>>> takes 19-minutes and test2brick (the new volume) takes 1-minute.
>>>>>>>>>> Is it possible that some old parameters are still set for
>>>>>>>>>> homegfs_bkp
>>>>>>>>>> that are no longer in use? I tried a gluster volume reset for
>>>>>>>>>> homegfs_bkp, but it didn't have any effect.
>>>>>>>>>> I have attached the full logs.
>>>>>>>>>> David
>>>>>>>>>> ------ Original Message ------
>>>>>>>>>> From: "David F. Robinson" <david.robinson at corvidtec.com
>>>>>>>>>> <mailto:david.robinson at corvidtec.com>>
>>>>>>>>>> To: "Benjamin Turner" <bennyturns at gmail.com
>>>>>>>>>> <mailto:bennyturns at gmail.com>>
>>>>>>>>>> Sent: 2/9/2015 5:39:18 PM
>>>>>>>>>> Subject: Re[4]: [Gluster-devel] missing files
>>>>>>>>>>> Ben,
>>>>>>>>>>> I have traced this out to a point where I can rule out many
>>>>>>>>>>> issues.
>>>>>>>>>>> I was hoping you could help me from here.
>>>>>>>>>>> I went with the "tar -xPf boost.tar" as my test case, which on my
>>>>>>>>>>> old
>>>>>>>>>>> storage system took about 1-minute to extract. On my backup system
>>>>>>>>>>> and my primary storage (both gluster), it takes roughly
>>>>>>>>>>> 19-minutes.
>>>>>>>>>>> First step was to create a new storage system (striped RAID, two
>>>>>>>>>>> sets
>>>>>>>>>>> of 3-drives). All was good here with a gluster extraction time of
>>>>>>>>>>> 1-minute. I then went to my backup system and created another
>>>>>>>>>>> partition using only one of the two bricks on that system. Still
>>>>>>>>>>> 1-minute. I went to a two brick setup and it stayed at 1-minute.
>>>>>>>>>>> At this point, I have recreated using the same parameters on a
>>>>>>>>>>> test2brick volume that should be identical to my homegfs_bkp
>>>>>>>>>>> volume. Everything is the same including how I mounted the volume.
>>>>>>>>>>> The only different is that the homegfs_bkp has 30-TB of data
>>>>>>>>>>> and the
>>>>>>>>>>> test2brick is blank. I didn't think that performance would be
>>>>>>>>>>> affected by putting data on the volume.
>>>>>>>>>>> Can you help? Do you have any suggestions? Do you think upgrading
>>>>>>>>>>> gluster from 3.5 to 3.6.1 to 3.6.2 somehow message up
>>>>>>>>>>> homegfs_bkp? My
>>>>>>>>>>> layout is shown below. These should give identical speeds.
>>>>>>>>>>> [root at gfs01bkp test2brick]# gluster volume info homegfs_bkp
>>>>>>>>>>> Volume Name: homegfs_bkp
>>>>>>>>>>> Type: Distribute
>>>>>>>>>>> Volume ID: 96de8872-d957-4205-bf5a-076e3f35b294
>>>>>>>>>>> Status: Started
>>>>>>>>>>> Number of Bricks: 2
>>>>>>>>>>> Transport-type: tcp
>>>>>>>>>>> Bricks:
>>>>>>>>>>> Brick1: gfsib01bkp.corvidtec.com
>>>>>>>>>>> <http://gfsib01bkp.corvidtec.com>:/data/brick01bkp/homegfs_bkp
>>>>>>>>>>> Brick2: gfsib01bkp.corvidtec.com
>>>>>>>>>>> <http://gfsib01bkp.corvidtec.com>:/data/brick02bkp/homegfs_bkp
>>>>>>>>>>> [root at gfs01bkp test2brick]# gluster volume info test2brick
>>>>>>>>>>> 
>>>>>>>>>>> Volume Name: test2brick
>>>>>>>>>>> Type: Distribute
>>>>>>>>>>> Volume ID: 123259b2-3c61-4277-a7e8-27c7ec15e550
>>>>>>>>>>> Status: Started
>>>>>>>>>>> Number of Bricks: 2
>>>>>>>>>>> Transport-type: tcp
>>>>>>>>>>> Bricks:
>>>>>>>>>>> Brick1: gfsib01bkp.corvidtec.com
>>>>>>>>>>> <http://gfsib01bkp.corvidtec.com>:/data/brick01bkp/test2brick
>>>>>>>>>>> Brick2: gfsib01bkp.corvidtec.com
>>>>>>>>>>> <http://gfsib01bkp.corvidtec.com>:/data/brick02bkp/test2brick
>>>>>>>>>>> [root at gfs01bkp brick02bkp]# mount | grep test2brick
>>>>>>>>>>> gfsib01bkp.corvidtec.com
>>>>>>>>>>> <http://gfsib01bkp.corvidtec.com>:/test2brick.tcp on /test2brick
>>>>>>>>>>> type
>>>>>>>>>>> fuse.glusterfs (rw,allow_other,max_read=131072)
>>>>>>>>>>> [root at gfs01bkp brick02bkp]# mount | grep homegfs_bkp
>>>>>>>>>>> gfsib01bkp.corvidtec.com
>>>>>>>>>>> <http://gfsib01bkp.corvidtec.com>:/homegfs_bkp.tcp on
>>>>>>>>>>> /backup/homegfs
>>>>>>>>>>> type fuse.glusterfs (rw,allow_other,max_read=131072)
>>>>>>>>>>> [root at gfs01bkp brick02bkp]# df -h
>>>>>>>>>>> Filesystem Size Used Avail Use%
>>>>>>>>>>> Mounted on
>>>>>>>>>>> /dev/mapper/vg00-lv_root 20G 1.7G 18G 9% /
>>>>>>>>>>> tmpfs 16G 0 16G 0%
>>>>>>>>>>> /dev/shm
>>>>>>>>>>> /dev/md126p1 1008M 110M 848M 12% /boot
>>>>>>>>>>> /dev/mapper/vg00-lv_opt 5.0G 220M 4.5G 5% /opt
>>>>>>>>>>> /dev/mapper/vg00-lv_tmp 5.0G 139M 4.6G 3% /tmp
>>>>>>>>>>> /dev/mapper/vg00-lv_usr 20G 2.7G 17G 15% /usr
>>>>>>>>>>> /dev/mapper/vg00-lv_var 40G 4.4G 34G 12% /var
>>>>>>>>>>> /dev/mapper/vg01-lvol1 88T 22T 67T 25%
>>>>>>>>>>> /data/brick01bkp
>>>>>>>>>>> /dev/mapper/vg02-lvol1 88T 22T 67T 25%
>>>>>>>>>>> /data/brick02bkp
>>>>>>>>>>> gfsib01bkp.corvidtec.com
>>>>>>>>>>> <http://gfsib01bkp.corvidtec.com>:/homegfs_bkp.tcp 175T 43T
>>>>>>>>>>> 133T 25% /backup/homegfs
>>>>>>>>>>> gfsib01bkp.corvidtec.com
>>>>>>>>>>> <http://gfsib01bkp.corvidtec.com>:/test2brick.tcp 175T 43T
>>>>>>>>>>> 133T 25% /test2brick
>>>>>>>>>>> ------ Original Message ------
>>>>>>>>>>> From: "Benjamin Turner" <bennyturns at gmail.com
>>>>>>>>>>> <mailto:bennyturns at gmail.com>>
>>>>>>>>>>> To: "David F. Robinson" <david.robinson at corvidtec.com
>>>>>>>>>>> <mailto:david.robinson at corvidtec.com>>
>>>>>>>>>>> Sent: 2/6/2015 12:52:58 PM
>>>>>>>>>>> Subject: Re: Re[2]: [Gluster-devel] missing files
>>>>>>>>>>>> Hi David. Lets start with the basics and go from there. IIRC you
>>>>>>>>>>>> are using LVM with thick provisioning, lets verify the following:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. You have everything properly aligned for your RAID stripe
>>>>>>>>>>>> size,
>>>>>>>>>>>> etc. I have attached the script we package with RHS that I am in
>>>>>>>>>>>> the process of updating. I want to double check you created
>>>>>>>>>>>> the PV
>>>>>>>>>>>> / VG / LV with the proper variables. Have a look at the
>>>>>>>>>>>> create_pv,
>>>>>>>>>>>> create_vg, and create_lv(old) functions. You will need to know
>>>>>>>>>>>> the
>>>>>>>>>>>> stripe size of your raid and the number of stripe elements(data
>>>>>>>>>>>> disks, not hotspares). Also make sure you mkfs.xfs with:
>>>>>>>>>>>> 
>>>>>>>>>>>> echo "mkfs -t xfs -f -K -i size=$inode_size -d
>>>>>>>>>>>> sw=$stripe_elements,su=$stripesize -n size=$fs_block_size
>>>>>>>>>>>> /dev/$vgname/$lvname"
>>>>>>>>>>>> 
>>>>>>>>>>>> We use 512k inodes because some workload use more than the
>>>>>>>>>>>> default
>>>>>>>>>>>> inode size and you don't want xattrs bleeding over inodes.
>>>>>>>>>>>> 
>>>>>>>>>>>> 2. Are you running RHEL or Centos? If so I would
>>>>>>>>>>>> recommend tuned_profile=rhs-high-throughput. If you don't have
>>>>>>>>>>>> that
>>>>>>>>>>>> tuned profile I'll get you everything it sets.
>>>>>>>>>>>> 
>>>>>>>>>>>> 3. For small files we we recommend the following:
>>>>>>>>>>>> 
>>>>>>>>>>>> # RAID related variables.
>>>>>>>>>>>> # stripesize - RAID controller stripe unit size
>>>>>>>>>>>> # stripe_elements - the number of data disks
>>>>>>>>>>>> # The --dataalignment option is used while creating the physical
>>>>>>>>>>>> volumeTo
>>>>>>>>>>>> # align I/O at LVM layer
>>>>>>>>>>>> # dataalign -
>>>>>>>>>>>> # RAID6 is recommended when the workload has predominantly larger
>>>>>>>>>>>> # files ie not in kilobytes.
>>>>>>>>>>>> # For RAID6 with 12 disks and 128K stripe element size.
>>>>>>>>>>>> stripesize=128k
>>>>>>>>>>>> stripe_elements=10
>>>>>>>>>>>> dataalign=1280k
>>>>>>>>>>>> 
>>>>>>>>>>>> # RAID10 is recommended when the workload has predominantly
>>>>>>>>>>>> smaller
>>>>>>>>>>>> files
>>>>>>>>>>>> # i.e in kilobytes.
>>>>>>>>>>>> # For RAID10 with 12 disks and 256K stripe element size,
>>>>>>>>>>>> uncomment the
>>>>>>>>>>>> # lines below.
>>>>>>>>>>>> # stripesize=256k
>>>>>>>>>>>> # stripe_elements=6
>>>>>>>>>>>> # dataalign=1536k
>>>>>>>>>>>> 
>>>>>>>>>>>> 4. Jumbo frames everywhere! Check out the effect of jumbo frames,
>>>>>>>>>>>> make sure they are setup properly on your switch and add the
>>>>>>>>>>>> MTU=9000 to your ifcfg files(unless you have it already):
>>>>>>>>>>>> 
>>>>>>>>>>>> https://rhsummit.files.wordpress.com/2013/07/england_th_0450_rhs_perf_practices-4_neependra.pdf
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> (see the jumbo frames section here, the whole thing is a good
>>>>>>>>>>>> read)
>>>>>>>>>>>> https://rhsummit.files.wordpress.com/2014/04/bengland_h_1100_rhs_performance.pdf
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> (this is updated for 2014)
>>>>>>>>>>>> 
>>>>>>>>>>>> 5. There is a smallfile enhancement that just landed in master
>>>>>>>>>>>> that
>>>>>>>>>>>> is showing me a 60% improvement in writes. This is called multi
>>>>>>>>>>>> threaded epoll and it is looking VERY promising WRT smallfile
>>>>>>>>>>>> performance. Here is a summary:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi all. I see alot of discussion on $subject and I wanted to
>>>>>>>>>>>> take a
>>>>>>>>>>>> minute to talk about it and what we can do to test / observe the
>>>>>>>>>>>> effects of it. Lets start with a bit of background:
>>>>>>>>>>>> 
>>>>>>>>>>>> **Background**
>>>>>>>>>>>> 
>>>>>>>>>>>> -Currently epoll is single threaded on both clients and servers.
>>>>>>>>>>>> *This leads to a "hot thread" which consumes 100% of a CPU core.
>>>>>>>>>>>> *This can be observed by running BenE's smallfile benchmark to
>>>>>>>>>>>> create files, running top(on both clients and servers), and
>>>>>>>>>>>> pressing
>>>>>>>>>>>> H to show threads.
>>>>>>>>>>>> *You will be able to see a single glusterfs thread eating 100% of
>>>>>>>>>>>> the CPU:
>>>>>>>>>>>> 
>>>>>>>>>>>> 2871 root 20 0 746m 24m 3004 S100.0 0.1 14 <callto:100.0
>>>>>>>>>>>> 0.1 14>:35.89 glusterfsd
>>>>>>>>>>>> 4522 root 20 0 747m 24m 3004 S5.3 0.1 0 <callto:5.3 0.1
>>>>>>>>>>>> 0>:02.25 glusterfsd
>>>>>>>>>>>> 4507 root 20 0 747m 24m 3004 S5.0 0.1 0 <callto:5.0 0.1
>>>>>>>>>>>> 0>:05.91 glusterfsd
>>>>>>>>>>>> 21200 root 20 0 747m 24m 3004 S4.6 0.1 0 <callto:4.6 0.1
>>>>>>>>>>>> 0>:21.16 glusterfsd
>>>>>>>>>>>> 
>>>>>>>>>>>> -Single threaded epoll is a bottlenck for high IOP / low metadata
>>>>>>>>>>>> workloads(think smallfile). With single threaded epoll we are CPU
>>>>>>>>>>>> bound by the single thread pegging out a CPU.
>>>>>>>>>>>> 
>>>>>>>>>>>> So the proposed solution to this problem is to make epoll multi
>>>>>>>>>>>> threaded on both servers and clients. Here is a link to the
>>>>>>>>>>>> upstream proposal:
>>>>>>>>>>>> 
>>>>>>>>>>>> http://www.gluster.org/community/documentation/index.php/Features/Feature_Smallfile_Perf#multi-thread-epoll
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Status: [http://review.gluster.org/#/c/3842/based on Anand
>>>>>>>>>>>> Avati's
>>>>>>>>>>>> patch ]
>>>>>>>>>>>> 
>>>>>>>>>>>> Why: remove single-thread-per-brick barrier to higher CPU
>>>>>>>>>>>> utilization by servers
>>>>>>>>>>>> 
>>>>>>>>>>>> Use case: multi-client and multi-thread applications
>>>>>>>>>>>> 
>>>>>>>>>>>> Improvement: measured 40% with 2 epoll threads and 100% with 4
>>>>>>>>>>>> epoll
>>>>>>>>>>>> threads for small file creates to an SSD
>>>>>>>>>>>> 
>>>>>>>>>>>> Disadvantage: conflicts with support for SSL sockets, may require
>>>>>>>>>>>> significant code change to support both.
>>>>>>>>>>>> 
>>>>>>>>>>>> Note: this enhancement also helps high-IOPS applications such as
>>>>>>>>>>>> databases and virtualization which are not metadata-intensive.
>>>>>>>>>>>> This
>>>>>>>>>>>> has been measured already using a Fusion I/O SSD performing
>>>>>>>>>>>> random
>>>>>>>>>>>> reads and writes -- it was necessary to define multiple bricks
>>>>>>>>>>>> per
>>>>>>>>>>>> SSD device to get Gluster to the same order of magnitude IOPS
>>>>>>>>>>>> as a
>>>>>>>>>>>> local filesystem. But this workaround is problematic for users,
>>>>>>>>>>>> because storage space is not properly measured when there are
>>>>>>>>>>>> multiple bricks on the same filesystem.
>>>>>>>>>>>> 
>>>>>>>>>>>> Multi threaded epoll is part of a larger page that talks about
>>>>>>>>>>>> smallfile performance enhancements, proposed and happening.
>>>>>>>>>>>> 
>>>>>>>>>>>> Goal: if successful, throughput bottleneck should be either the
>>>>>>>>>>>> network or the brick filesystem!
>>>>>>>>>>>> What it doesn't do: multi-thread-epoll does not solve the
>>>>>>>>>>>> excessive-round-trip protocol problems that Gluster has.
>>>>>>>>>>>> What it should do: allow Gluster to exploit the mostly
>>>>>>>>>>>> untapped CPU
>>>>>>>>>>>> resources on the Gluster servers and clients.
>>>>>>>>>>>> How it does it: allow multiple threads to read protocol messages
>>>>>>>>>>>> and
>>>>>>>>>>>> process them at the same time.
>>>>>>>>>>>> How to observe: multi-thread-epoll should be configurable (how to
>>>>>>>>>>>> configure? gluster command?), with thread count 1 it should be
>>>>>>>>>>>> same
>>>>>>>>>>>> as RHS 3.0, with thread count 2-4 it should show significantly
>>>>>>>>>>>> more
>>>>>>>>>>>> CPU utilization (threads visible with "top -H"), resulting in
>>>>>>>>>>>> higher
>>>>>>>>>>>> throughput.
>>>>>>>>>>>> 
>>>>>>>>>>>> **How to observe**
>>>>>>>>>>>> 
>>>>>>>>>>>> Here are the commands needed to setup an environment to test
>>>>>>>>>>>> in on
>>>>>>>>>>>> RHS 3.0.3:
>>>>>>>>>>>> rpm -e glusterfs-api glusterfs glusterfs-libs glusterfs-fuse
>>>>>>>>>>>> glusterfs-geo-replication glusterfs-rdma glusterfs-server
>>>>>>>>>>>> glusterfs-cli gluster-nagios-common samba-glusterfs vdsm-gluster
>>>>>>>>>>>> --nodeps
>>>>>>>>>>>> rhn_register
>>>>>>>>>>>> yum groupinstall "Development tools"
>>>>>>>>>>>> git clonehttps://github.com/gluster/glusterfs.git
>>>>>>>>>>>> git branch test
>>>>>>>>>>>> git checkout test
>>>>>>>>>>>> git
>>>>>>>>>>>> fetchhttp://review.gluster.org/glusterfsrefs/changes/42/3842/17
>>>>>>>>>>>> && git cherry-pick FETCH_HEAD
>>>>>>>>>>>> git
>>>>>>>>>>>> fetchhttp://review.gluster.org/glusterfsrefs/changes/88/9488/2
>>>>>>>>>>>> && git cherry-pick FETCH_HEAD
>>>>>>>>>>>> yum install openssl openssl-devel
>>>>>>>>>>>> wgetftp://fr2.rpmfind.net/linux/epel/6/x86_64/cmockery2-1.3.8-2.el6.x86_64.rpm
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> wgetftp://fr2.rpmfind.net/linux/epel/6/x86_64/cmockery2-devel-1.3.8-2.el6.x86_64.rpm
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> yum install cmockery2-1.3.8-2.el6.x86_64.rpm
>>>>>>>>>>>> cmockery2-devel-1.3.8-2.el6.x86_64.rpm libxml2-devel
>>>>>>>>>>>> ./autogen.sh
>>>>>>>>>>>> ./configure
>>>>>>>>>>>> make
>>>>>>>>>>>> make install
>>>>>>>>>>>> 
>>>>>>>>>>>> Verify you are using the upstream with:
>>>>>>>>>>>> 
>>>>>>>>>>>> # gluster -- version
>>>>>>>>>>>> 
>>>>>>>>>>>> To enable set multithreaded epoll run the following commands:
>>>>>>>>>>>> 
>>>>>>>>>>>> From the patch:
>>>>>>>>>>>> { .key = "client.event-threads", 839
>>>>>>>>>>>> .voltype = "protocol/client", 840
>>>>>>>>>>>> .op_version = GD_OP_VERSION_3_7_0, 841
>>>>>>>>>>>> },
>>>>>>>>>>>> { .key = "server.event-threads", 946
>>>>>>>>>>>> .voltype = "protocol/server", 947
>>>>>>>>>>>> .op_version = GD_OP_VERSION_3_7_0, 948
>>>>>>>>>>>> },
>>>>>>>>>>>> 
>>>>>>>>>>>> # gluster v set <volname> server.event-threads 4
>>>>>>>>>>>> # gluster v set <volname> client.event-threads 4
>>>>>>>>>>>> 
>>>>>>>>>>>> Also grab smallfile:
>>>>>>>>>>>> 
>>>>>>>>>>>> https://github.com/bengland2/smallfile
>>>>>>>>>>>> 
>>>>>>>>>>>> After git cloneing smallfile run:
>>>>>>>>>>>> 
>>>>>>>>>>>> python /small-files/smallfile/smallfile_cli.py --operation create
>>>>>>>>>>>> --threads 8 --file-size 64 --files 10000 --top /gluster-mount
>>>>>>>>>>>> --pause 1000 --host-set "client1 client2"
>>>>>>>>>>>> 
>>>>>>>>>>>> Again we will be looking at top + show threads(press H). With 4
>>>>>>>>>>>> threads on both clients and servers you should see something
>>>>>>>>>>>> similar
>>>>>>>>>>>> to(this isnt exact, I coped and pasted):
>>>>>>>>>>>> 
>>>>>>>>>>>> 2871 root 20 0 746m 24m 3004 S 35.0 0.1 14 <callto:25.0
>>>>>>>>>>>> 0.1 14>:35.89 glusterfsd
>>>>>>>>>>>> 2872 root 20 0 746m 24m 3004 S 51.0 0.1 14 <callto:25.0
>>>>>>>>>>>> 0.1 14>:35.89 glusterfsd
>>>>>>>>>>>> 2873 root 20 0 746m 24m 3004 S 43.0 0.1 14 <callto:25.0
>>>>>>>>>>>> 0.1 14>:35.89 glusterfsd
>>>>>>>>>>>> 2874 root 20 0 746m 24m 3004 S 65.0 0.1 14 <callto:25.0
>>>>>>>>>>>> 0.1 14>:35.89 glusterfsd
>>>>>>>>>>>> 4522 root 20 0 747m 24m 3004 S5.3 0.1 0 <callto:5.3 0.1
>>>>>>>>>>>> 0>:02.25 glusterfsd
>>>>>>>>>>>> 4507 root 20 0 747m 24m 3004 S5.0 0.1 0 <callto:5.0 0.1
>>>>>>>>>>>> 0>:05.91 glusterfsd
>>>>>>>>>>>> 21200 root 20 0 747m 24m 3004 S4.6 0.1 0 <callto:4.6 0.1
>>>>>>>>>>>> 0>:21.16 glusterfsd
>>>>>>>>>>>> 
>>>>>>>>>>>> If you have a test env I would be interested to see how multi
>>>>>>>>>>>> threaded epoll performs, but I am 100% sure its not ready for
>>>>>>>>>>>> production yet. RH will be supporting it with our 3.0.4(the next
>>>>>>>>>>>> one) release unless we find show stopping bugs. My testing looks
>>>>>>>>>>>> very promising though.
>>>>>>>>>>>> 
>>>>>>>>>>>> Smallfile performance enhancements are one of the key focuses for
>>>>>>>>>>>> our 3.1 release this summer, we are working very hard to improve
>>>>>>>>>>>> this as this is the use case for the majority of people.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Feb 6, 2015 at 11:59 AM, David F. Robinson
>>>>>>>>>>>> <david.robinson at corvidtec.com
>>>>>>>>>>>> <mailto:david.robinson at corvidtec.com>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>    Ben,
>>>>>>>>>>>>    I was hoping you might be able to help with two performance
>>>>>>>>>>>>    questions. I was doing some testing of my rsync where I am
>>>>>>>>>>>>    backing up my primary gluster system (distributed +
>>>>>>>>>>>> replicated)
>>>>>>>>>>>>    to my backup gluster system (distributed). I tried three
>>>>>>>>>>>> tests
>>>>>>>>>>>>    where I rsynced from one of my primary sytems (gfsib02b)
>>>>>>>>>>>> to my
>>>>>>>>>>>>    backup machine. The test directory contains roughly 5500
>>>>>>>>>>>> files,
>>>>>>>>>>>>    most of which are small. The script I ran is shown below
>>>>>>>>>>>> which
>>>>>>>>>>>>    repeats the tests 3x for each section to check variability in
>>>>>>>>>>>>    timing.
>>>>>>>>>>>>    1) Writing to the local disk is drastically faster than
>>>>>>>>>>>> writing
>>>>>>>>>>>>    to gluster. So, my writes to the backup gluster system are
>>>>>>>>>>>> what
>>>>>>>>>>>>    is slowing me down, which makes sense.
>>>>>>>>>>>>    2) When I write to the backup gluster system
>>>>>>>>>>>> (/backup/homegfs),
>>>>>>>>>>>>    the timing goes from 35-seconds to 1min40seconds. The
>>>>>>>>>>>> question
>>>>>>>>>>>>    here is whether you could recommend any settings for this
>>>>>>>>>>>> volume
>>>>>>>>>>>>    that would improve performance for small file writes? I have
>>>>>>>>>>>>    included the output of 'gluster volume info" below.
>>>>>>>>>>>>    3) When I did the same tests on the Source_bkp volume, it is
>>>>>>>>>>>>    almost 3x as slow as the homegfs_bkp volume. However,
>>>>>>>>>>>> these are
>>>>>>>>>>>>    just different volumes on the same storage system. The volume
>>>>>>>>>>>>    parameters are identical (see below). The performance of
>>>>>>>>>>>> these
>>>>>>>>>>>>    two should be identical. Any idea why they wouldn't be? And
>>>>>>>>>>>>    any suggestions for how to fix this? The only thing that I
>>>>>>>>>>>> see
>>>>>>>>>>>>    different between the two is the order of the "Options
>>>>>>>>>>>>    reconfigured" section. I assume order of options doesn't
>>>>>>>>>>>> matter.
>>>>>>>>>>>>    *Backup to local hard disk (no gluster writes)*
>>>>>>>>>>>>    /time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>>>>>>>>    --block-size=131072 -e "ssh -T -c arcfour -o
>>>>>>>>>>>> Compression=no -x"
>>>>>>>>>>>>    gfsib02b:/homegfs/test /temp1
>>>>>>>>>>>>     time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>>>>>>>>    --block-size=131072 -e "ssh -T -c arcfour -o
>>>>>>>>>>>> Compression=no -x"
>>>>>>>>>>>>    gfsib02b:/homegfs/test /temp2
>>>>>>>>>>>>     time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>>>>>>>>    --block-size=131072 -e "ssh -T -c arcfour -o
>>>>>>>>>>>> Compression=no -x"
>>>>>>>>>>>>    gfsib02b:/homegfs/test /temp3 /
>>>>>>>>>>>>    //
>>>>>>>>>>>>    / real 0m35.579s
>>>>>>>>>>>>            user 0m31.290s
>>>>>>>>>>>>            sys 0m12.282s
>>>>>>>>>>>>    ///
>>>>>>>>>>>>    //
>>>>>>>>>>>>    / real 0m38.035s
>>>>>>>>>>>>            user 0m31.622s
>>>>>>>>>>>>            sys 0m10.907s
>>>>>>>>>>>>    /
>>>>>>>>>>>>    //
>>>>>>>>>>>>    / real 0m38.313s
>>>>>>>>>>>>            user 0m31.458s
>>>>>>>>>>>>            sys 0m10.891s
>>>>>>>>>>>>    /
>>>>>>>>>>>>    *Backup to gluster backup system on volume homegfs_bkp*
>>>>>>>>>>>>    / time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>>>>>>>>    --block-size=131072 -e "ssh -T -c arcfour -o
>>>>>>>>>>>> Compression=no -x"
>>>>>>>>>>>>    gfsib02b:/homegfs/test /backup/homegfs/temp1
>>>>>>>>>>>>     time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>>>>>>>>    --block-size=131072 -e "ssh -T -c arcfour -o
>>>>>>>>>>>> Compression=no -x"
>>>>>>>>>>>>    gfsib02b:/homegfs/test /backup/homegfs/temp2
>>>>>>>>>>>>     time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>>>>>>>>    --block-size=131072 -e "ssh -T -c arcfour -o
>>>>>>>>>>>> Compression=no -x"
>>>>>>>>>>>>    gfsib02b:/homegfs/test /backup/homegfs/temp3 /
>>>>>>>>>>>>    //
>>>>>>>>>>>>    /
>>>>>>>>>>>>    / real 1m42.026s
>>>>>>>>>>>>            user 0m32.604s
>>>>>>>>>>>>            sys 0m9.967s
>>>>>>>>>>>>    /
>>>>>>>>>>>>    / real 1m45.480s
>>>>>>>>>>>>            user 0m32.577s
>>>>>>>>>>>>            sys 0m11.994s
>>>>>>>>>>>>    /
>>>>>>>>>>>>    / real 1m40.436s
>>>>>>>>>>>>            user 0m32.521s
>>>>>>>>>>>>            sys 0m11.240s/
>>>>>>>>>>>>    /
>>>>>>>>>>>>    //
>>>>>>>>>>>>    *Backup to gluster backup system on volume Source_bkp*
>>>>>>>>>>>>    / time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>>>>>>>>    --block-size=131072 -e "ssh -T -c arcfour -o
>>>>>>>>>>>> Compression=no -x"
>>>>>>>>>>>>    gfsib02b:/homegfs/test /backup/Source/temp1
>>>>>>>>>>>>     time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>>>>>>>>    --block-size=131072 -e "ssh -T -c arcfour -o
>>>>>>>>>>>> Compression=no -x"
>>>>>>>>>>>>    gfsib02b:/homegfs/test /backup/Source/temp2
>>>>>>>>>>>>     time /usr/local/bin/rsync -av --numeric-ids --delete
>>>>>>>>>>>>    --block-size=131072 -e "ssh -T -c arcfour -o
>>>>>>>>>>>> Compression=no -x"
>>>>>>>>>>>>    gfsib02b:/homegfs/test /backup/Source/temp3 /
>>>>>>>>>>>>    / real 3m30.491s
>>>>>>>>>>>>            user 0m32.676s
>>>>>>>>>>>>            sys 0m10.776s
>>>>>>>>>>>>    ///
>>>>>>>>>>>>    //
>>>>>>>>>>>>    / real 3m26.076s
>>>>>>>>>>>>            user 0m32.588s
>>>>>>>>>>>>            sys 0m11.048s
>>>>>>>>>>>>    /
>>>>>>>>>>>>    //
>>>>>>>>>>>>    / real 3m7.460s
>>>>>>>>>>>>            user 0m32.763s
>>>>>>>>>>>>            sys 0m11.687s
>>>>>>>>>>>>    /
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>    /Volume Name: *Source_bkp*
>>>>>>>>>>>>    Type: Distribute
>>>>>>>>>>>>    Volume ID: 1d4c210d-a731-4d39-a0c5-ea0546592c1d
>>>>>>>>>>>>    Status: Started
>>>>>>>>>>>>    Number of Bricks: 2
>>>>>>>>>>>>    Transport-type: tcp
>>>>>>>>>>>>    Bricks:
>>>>>>>>>>>>    Brick1: gfsib01bkp.corvidtec.com
>>>>>>>>>>>> <http://gfsib01bkp.corvidtec.com>:/data/brick01bkp/Source_bkp
>>>>>>>>>>>>    Brick2: gfsib01bkp.corvidtec.com
>>>>>>>>>>>> <http://gfsib01bkp.corvidtec.com>:/data/brick02bkp/Source_bkp
>>>>>>>>>>>>    Options Reconfigured:
>>>>>>>>>>>>    performance.cache-size: 128MB
>>>>>>>>>>>>    performance.io-thread-count: 32
>>>>>>>>>>>>    server.allow-insecure: on
>>>>>>>>>>>>    network.ping-timeout: 10
>>>>>>>>>>>>    storage.owner-gid: 100
>>>>>>>>>>>>    performance.write-behind-window-size: 128MB
>>>>>>>>>>>>    server.manage-gids: on
>>>>>>>>>>>>    changelog.rollover-time: 15
>>>>>>>>>>>>    changelog.fsync-interval: 3
>>>>>>>>>>>> 
>>>>>>>>>>>>    Volume Name: *homegfs_bkp*
>>>>>>>>>>>>    Type: Distribute
>>>>>>>>>>>>    Volume ID: 96de8872-d957-4205-bf5a-076e3f35b294
>>>>>>>>>>>>    Status: Started
>>>>>>>>>>>>    Number of Bricks: 2
>>>>>>>>>>>>    Transport-type: tcp
>>>>>>>>>>>>    Bricks:
>>>>>>>>>>>>    Brick1: gfsib01bkp.corvidtec.com
>>>>>>>>>>>> <http://gfsib01bkp.corvidtec.com>:/data/brick01bkp/homegfs_bkp
>>>>>>>>>>>>    Brick2: gfsib01bkp.corvidtec.com
>>>>>>>>>>>> <http://gfsib01bkp.corvidtec.com>:/data/brick02bkp/homegfs_bkp
>>>>>>>>>>>>    Options Reconfigured:
>>>>>>>>>>>>    storage.owner-gid: 100
>>>>>>>>>>>>    performance.io-thread-count: 32
>>>>>>>>>>>>    server.allow-insecure: on
>>>>>>>>>>>>    network.ping-timeout: 10
>>>>>>>>>>>>    performance.cache-size: 128MB
>>>>>>>>>>>>    performance.write-behind-window-size: 128MB
>>>>>>>>>>>>    server.manage-gids: on
>>>>>>>>>>>>    changelog.rollover-time: 15
>>>>>>>>>>>>    changelog.fsync-interval: 3
>>>>>>>>>>>>    /
>>>>>>>>>>>>    /
>>>>>>>>>>>>    /
>>>>>>>>>>>>    //
>>>>>>>>>>>>    //
>>>>>>>>>>>>    /
>>>>>>>>>>>>    /
>>>>>>>>>>>>    //
>>>>>>>>>>>>    //
>>>>>>>>>>>>    //
>>>>>>>>>>>>    ------ Original Message ------
>>>>>>>>>>>>    From: "Benjamin Turner" <bennyturns at gmail.com
>>>>>>>>>>>>    <mailto:bennyturns at gmail.com>>
>>>>>>>>>>>>    To: "David F. Robinson" <david.robinson at corvidtec.com
>>>>>>>>>>>>    <mailto:david.robinson at corvidtec.com>>
>>>>>>>>>>>>    Cc: "Gluster Devel" <gluster-devel at gluster.org
>>>>>>>>>>>>    <mailto:gluster-devel at gluster.org>>;
>>>>>>>>>>>> "gluster-users at gluster.org
>>>>>>>>>>>>    <mailto:gluster-users at gluster.org>"
>>>>>>>>>>>> <gluster-users at gluster.org
>>>>>>>>>>>>    <mailto:gluster-users at gluster.org>>
>>>>>>>>>>>>    Sent: 2/3/2015 7:12:34 PM
>>>>>>>>>>>>    Subject: Re: [Gluster-devel] missing files
>>>>>>>>>>>>>    It sounds to me like the files were only copied to one
>>>>>>>>>>>>> replica,
>>>>>>>>>>>>>    werent there for the initial for the initial ls which
>>>>>>>>>>>>> triggered
>>>>>>>>>>>>>    a self heal, and were there for the last ls because they
>>>>>>>>>>>>> were
>>>>>>>>>>>>>    healed. Is there any chance that one of the replicas was
>>>>>>>>>>>>> down
>>>>>>>>>>>>>    during the rsync? It could be that you lost a brick during
>>>>>>>>>>>>>    copy or something like that. To confirm I would look for
>>>>>>>>>>>>>    disconnects in the brick logs as well as checking
>>>>>>>>>>>>>    glusterfshd.log to verify the missing files were actually
>>>>>>>>>>>>> healed.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>    -b
>>>>>>>>>>>>> 
>>>>>>>>>>>>>    On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson
>>>>>>>>>>>>>    <david.robinson at corvidtec.com
>>>>>>>>>>>>>    <mailto:david.robinson at corvidtec.com>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>        I rsync'd 20-TB over to my gluster system and noticed
>>>>>>>>>>>>> that
>>>>>>>>>>>>>        I had some directories missing even though the rsync
>>>>>>>>>>>>>        completed normally.
>>>>>>>>>>>>>        The rsync logs showed that the missing files were
>>>>>>>>>>>>> transferred.
>>>>>>>>>>>>>        I went to the bricks and did an 'ls -al
>>>>>>>>>>>>>        /data/brick*/homegfs/dir/*' the files were on the
>>>>>>>>>>>>> bricks.
>>>>>>>>>>>>>        After I did this 'ls', the files then showed up on the
>>>>>>>>>>>>> FUSE
>>>>>>>>>>>>>        mounts.
>>>>>>>>>>>>>        1) Why are the files hidden on the fuse mount?
>>>>>>>>>>>>>        2) Why does the ls make them show up on the FUSE mount?
>>>>>>>>>>>>>        3) How can I prevent this from happening again?
>>>>>>>>>>>>>        Note, I also mounted the gluster volume using NFS and
>>>>>>>>>>>>> saw
>>>>>>>>>>>>>        the same behavior. The files/directories were not shown
>>>>>>>>>>>>>        until I did the "ls" on the bricks.
>>>>>>>>>>>>>        David
>>>>>>>>>>>>>        ===============================
>>>>>>>>>>>>>        David F. Robinson, Ph.D.
>>>>>>>>>>>>>        President - Corvid Technologies
>>>>>>>>>>>>>        704.799.6944 x101 <tel:704.799.6944%20x101> [office]
>>>>>>>>>>>>>        704.252.1310 <tel:704.252.1310> [cell]
>>>>>>>>>>>>>        704.799.7974 <tel:704.799.7974> [fax]
>>>>>>>>>>>>>        David.Robinson at corvidtec.com
>>>>>>>>>>>>>        <mailto:David.Robinson at corvidtec.com>
>>>>>>>>>>>>>        http://www.corvidtechnologies.com/
>>>>>>>>>>>>> 
>>>>>>>>>>>>>        _______________________________________________
>>>>>>>>>>>>>        Gluster-devel mailing list
>>>>>>>>>>>>>        Gluster-devel at gluster.org
>>>>>>>>>>>>> <mailto:Gluster-devel at gluster.org>
>>>>>>>>>>>>>        http://www.gluster.org/mailman/listinfo/gluster-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20150211/42e4fb0a/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: after_volume_restart.tgz
Type: application/x-compressed
Size: 5864233 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20150211/42e4fb0a/attachment-0001.bin>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20150211/42e4fb0a/attachment-0003.html>


More information about the Gluster-devel mailing list