[Gluster-users] [ovirt-users] Re: Gluster problems, cluster performance issues
Ravishankar N
ravishankar at redhat.com
Wed May 30 04:43:55 UTC 2018
@Jim Kusznir
For the heal issue, can you provide the getfattr output of one of the 8
files in question from all 3 bricks?
Example: `getfattr -d -m . -e hex
/gluster/brick3/data-hdd/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/48d7ecb8-7ac5-4725-bca5-b3519681cf2f/0d6080b0-7018-4fa3-bb82-1dd9ef07d9b9`
Also provide the stat output of the same file from all 3 bricks.
Thanks,
Ravi
On 05/30/2018 09:47 AM, Krutika Dhananjay wrote:
> Adding Ravi to look into the heal issue.
>
> As for the fsync hang and subsequent IO errors, it seems a lot like
> https://bugzilla.redhat.com/show_bug.cgi?id=1497156 and Paolo Bonzini
> from qemu had pointed out that this would be fixed by the following
> commit:
>
> commit e72c9a2a67a6400c8ef3d01d4c461dbbbfa0e1f0
> Author: Paolo Bonzini <pbonzini at redhat.com <mailto:pbonzini at redhat.com>>
> Date: Wed Jun 21 16:35:46 2017 +0200
>
> scsi: virtio_scsi: let host do exception handling
>
> virtio_scsi tries to do exception handling after the default 30 seconds
> timeout expires. However, it's better to let the host control the
> timeout, otherwise with a heavy I/O load it is likely that an abort will
> also timeout. This leads to fatal errors like filesystems going
> offline.
>
> Disable the 'sd' timeout and allow the host to do exception handling,
> following the precedent of the storvsc driver.
>
> Hannes has a proposal to introduce timeouts in virtio, but this provides
> an immediate solution for stable kernels too.
>
> [mkp: fixed typo]
>
> Reported-by: Douglas Miller <dougmill at linux.vnet.ibm.com <mailto:dougmill at linux.vnet.ibm.com>>
> Cc: "James E.J. Bottomley" <jejb at linux.vnet.ibm.com <mailto:jejb at linux.vnet.ibm.com>>
> Cc: "Martin K. Petersen" <martin.petersen at oracle.com <mailto:martin.petersen at oracle.com>>
> Cc: Hannes Reinecke <hare at suse.de <mailto:hare at suse.de>>
> Cc:linux-scsi at vger.kernel.org <mailto:linux-scsi at vger.kernel.org>
> Cc:stable at vger.kernel.org <mailto:stable at vger.kernel.org>
> Signed-off-by: Paolo Bonzini <pbonzini at redhat.com <mailto:pbonzini at redhat.com>>
> Signed-off-by: Martin K. Petersen <martin.petersen at oracle.com <mailto:martin.petersen at oracle.com>>
>
> Adding Paolo/Kevin to comment.
>
> As for the poor gluster performance, could you disable
> cluster.eager-lock and see if that makes any difference:
>
> # gluster volume set <VOL> cluster.eager-lock off
>
> Do also capture the volume profile again if you still see performance
> issues after disabling eager-lock.
>
> -Krutika
>
>
> On Wed, May 30, 2018 at 6:55 AM, Jim Kusznir <jim at palousetech.com
> <mailto:jim at palousetech.com>> wrote:
>
> I also finally found the following in my system log on one server:
>
> [10679.524491] INFO: task glusterclogro:14933 blocked for more
> than 120 seconds.
> [10679.525826] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [10679.527144] glusterclogro D ffff97209832bf40 0 14933
> 1 0x00000080
> [10679.527150] Call Trace:
> [10679.527161] [<ffffffffb9913f79>] schedule+0x29/0x70
> [10679.527218] [<ffffffffc060e388>]
> _xfs_log_force_lsn+0x2e8/0x340 [xfs]
> [10679.527225] [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
> [10679.527254] [<ffffffffc05eeb97>] xfs_file_fsync+0x107/0x1e0 [xfs]
> [10679.527260] [<ffffffffb944f0e7>] do_fsync+0x67/0xb0
> [10679.527268] [<ffffffffb992076f>] ?
> system_call_after_swapgs+0xbc/0x160
> [10679.527271] [<ffffffffb944f3d0>] SyS_fsync+0x10/0x20
> [10679.527275] [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
> [10679.527279] [<ffffffffb992077b>] ?
> system_call_after_swapgs+0xc8/0x160
> [10679.527283] INFO: task glusterposixfsy:14941 blocked for more
> than 120 seconds.
> [10679.528608] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [10679.529956] glusterposixfsy D ffff972495f84f10 0 14941
> 1 0x00000080
> [10679.529961] Call Trace:
> [10679.529966] [<ffffffffb9913f79>] schedule+0x29/0x70
> [10679.530003] [<ffffffffc060e388>]
> _xfs_log_force_lsn+0x2e8/0x340 [xfs]
> [10679.530008] [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
> [10679.530038] [<ffffffffc05eeb97>] xfs_file_fsync+0x107/0x1e0 [xfs]
> [10679.530042] [<ffffffffb944f0e7>] do_fsync+0x67/0xb0
> [10679.530046] [<ffffffffb992076f>] ?
> system_call_after_swapgs+0xbc/0x160
> [10679.530050] [<ffffffffb944f3f3>] SyS_fdatasync+0x13/0x20
> [10679.530054] [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
> [10679.530058] [<ffffffffb992077b>] ?
> system_call_after_swapgs+0xc8/0x160
> [10679.530062] INFO: task glusteriotwr13:15486 blocked for more
> than 120 seconds.
> [10679.531805] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [10679.533732] glusteriotwr13 D ffff9720a83f0000 0 15486
> 1 0x00000080
> [10679.533738] Call Trace:
> [10679.533747] [<ffffffffb9913f79>] schedule+0x29/0x70
> [10679.533799] [<ffffffffc060e388>]
> _xfs_log_force_lsn+0x2e8/0x340 [xfs]
> [10679.533806] [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
> [10679.533846] [<ffffffffc05eeb97>] xfs_file_fsync+0x107/0x1e0 [xfs]
> [10679.533852] [<ffffffffb944f0e7>] do_fsync+0x67/0xb0
> [10679.533858] [<ffffffffb992076f>] ?
> system_call_after_swapgs+0xbc/0x160
> [10679.533863] [<ffffffffb944f3f3>] SyS_fdatasync+0x13/0x20
> [10679.533868] [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
> [10679.533873] [<ffffffffb992077b>] ?
> system_call_after_swapgs+0xc8/0x160
> [10919.512757] INFO: task glusterclogro:14933 blocked for more
> than 120 seconds.
> [10919.514714] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [10919.516663] glusterclogro D ffff97209832bf40 0 14933
> 1 0x00000080
> [10919.516677] Call Trace:
> [10919.516690] [<ffffffffb9913f79>] schedule+0x29/0x70
> [10919.516696] [<ffffffffb99118e9>] schedule_timeout+0x239/0x2c0
> [10919.516703] [<ffffffffb951cc04>] ? blk_finish_plug+0x14/0x40
> [10919.516768] [<ffffffffc05e9224>] ?
> _xfs_buf_ioapply+0x334/0x460 [xfs]
> [10919.516774] [<ffffffffb991432d>] wait_for_completion+0xfd/0x140
> [10919.516782] [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
> [10919.516821] [<ffffffffc05eb0a3>] ? _xfs_buf_read+0x23/0x40 [xfs]
> [10919.516859] [<ffffffffc05eafa9>]
> xfs_buf_submit_wait+0xf9/0x1d0 [xfs]
> [10919.516902] [<ffffffffc061b279>] ?
> xfs_trans_read_buf_map+0x199/0x400 [xfs]
> [10919.516940] [<ffffffffc05eb0a3>] _xfs_buf_read+0x23/0x40 [xfs]
> [10919.516977] [<ffffffffc05eb1b9>] xfs_buf_read_map+0xf9/0x160 [xfs]
> [10919.517022] [<ffffffffc061b279>]
> xfs_trans_read_buf_map+0x199/0x400 [xfs]
> [10919.517057] [<ffffffffc05c8d04>] xfs_da_read_buf+0xd4/0x100 [xfs]
> [10919.517091] [<ffffffffc05c8d53>] xfs_da3_node_read+0x23/0xd0 [xfs]
> [10919.517126] [<ffffffffc05c9fee>]
> xfs_da3_node_lookup_int+0x6e/0x2f0 [xfs]
> [10919.517160] [<ffffffffc05d5a1d>]
> xfs_dir2_node_lookup+0x4d/0x170 [xfs]
> [10919.517194] [<ffffffffc05ccf5d>] xfs_dir_lookup+0x1bd/0x1e0 [xfs]
> [10919.517233] [<ffffffffc05fd8d9>] xfs_lookup+0x69/0x140 [xfs]
> [10919.517271] [<ffffffffc05fa018>] xfs_vn_lookup+0x78/0xc0 [xfs]
> [10919.517278] [<ffffffffb9425cf3>] lookup_real+0x23/0x60
> [10919.517283] [<ffffffffb9426702>] __lookup_hash+0x42/0x60
> [10919.517288] [<ffffffffb942d519>] SYSC_renameat2+0x3a9/0x5a0
> [10919.517296] [<ffffffffb94d3753>] ?
> selinux_file_free_security+0x23/0x30
> [10919.517304] [<ffffffffb992077b>] ?
> system_call_after_swapgs+0xc8/0x160
> [10919.517309] [<ffffffffb992076f>] ?
> system_call_after_swapgs+0xbc/0x160
> [10919.517313] [<ffffffffb992077b>] ?
> system_call_after_swapgs+0xc8/0x160
> [10919.517318] [<ffffffffb992076f>] ?
> system_call_after_swapgs+0xbc/0x160
> [10919.517323] [<ffffffffb942e58e>] SyS_renameat2+0xe/0x10
> [10919.517328] [<ffffffffb942e5ce>] SyS_rename+0x1e/0x20
> [10919.517333] [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
> [10919.517339] [<ffffffffb992077b>] ?
> system_call_after_swapgs+0xc8/0x160
> [11159.496095] INFO: task glusteriotwr9:15482 blocked for more
> than 120 seconds.
> [11159.497546] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [11159.498978] glusteriotwr9 D ffff971fa0fa1fa0 0 15482
> 1 0x00000080
> [11159.498984] Call Trace:
> [11159.498995] [<ffffffffb9911f00>] ? bit_wait+0x50/0x50
> [11159.498999] [<ffffffffb9913f79>] schedule+0x29/0x70
> [11159.499003] [<ffffffffb99118e9>] schedule_timeout+0x239/0x2c0
> [11159.499056] [<ffffffffc05dd9b7>] ?
> xfs_iext_bno_to_ext+0xa7/0x1a0 [xfs]
> [11159.499082] [<ffffffffc05dd43e>] ?
> xfs_iext_bno_to_irec+0x8e/0xd0 [xfs]
> [11159.499090] [<ffffffffb92f7a12>] ? ktime_get_ts64+0x52/0xf0
> [11159.499093] [<ffffffffb9911f00>] ? bit_wait+0x50/0x50
> [11159.499097] [<ffffffffb991348d>] io_schedule_timeout+0xad/0x130
> [11159.499101] [<ffffffffb9913528>] io_schedule+0x18/0x20
> [11159.499104] [<ffffffffb9911f11>] bit_wait_io+0x11/0x50
> [11159.499107] [<ffffffffb9911ac1>] __wait_on_bit_lock+0x61/0xc0
> [11159.499113] [<ffffffffb9393634>] __lock_page+0x74/0x90
> [11159.499118] [<ffffffffb92bc210>] ? wake_bit_function+0x40/0x40
> [11159.499121] [<ffffffffb9394154>] __find_lock_page+0x54/0x70
> [11159.499125] [<ffffffffb9394e85>]
> grab_cache_page_write_begin+0x55/0xc0
> [11159.499130] [<ffffffffb9484b76>] iomap_write_begin+0x66/0x100
> [11159.499135] [<ffffffffb9484edf>] iomap_write_actor+0xcf/0x1d0
> [11159.499140] [<ffffffffb9484e10>] ? iomap_write_end+0x80/0x80
> [11159.499144] [<ffffffffb94854e7>] iomap_apply+0xb7/0x150
> [11159.499149] [<ffffffffb9485621>]
> iomap_file_buffered_write+0xa1/0xe0
> [11159.499153] [<ffffffffb9484e10>] ? iomap_write_end+0x80/0x80
> [11159.499182] [<ffffffffc05f025d>]
> xfs_file_buffered_aio_write+0x12d/0x2c0 [xfs]
> [11159.499213] [<ffffffffc05f057d>]
> xfs_file_aio_write+0x18d/0x1b0 [xfs]
> [11159.499217] [<ffffffffb941a533>] do_sync_write+0x93/0xe0
> [11159.499222] [<ffffffffb941b010>] vfs_write+0xc0/0x1f0
> [11159.499225] [<ffffffffb941c002>] SyS_pwrite64+0x92/0xc0
> [11159.499230] [<ffffffffb992076f>] ?
> system_call_after_swapgs+0xbc/0x160
> [11159.499234] [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
> [11159.499238] [<ffffffffb992077b>] ?
> system_call_after_swapgs+0xc8/0x160
> [11279.488720] INFO: task xfsaild/dm-10:1134 blocked for more than
> 120 seconds.
> [11279.490197] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [11279.491665] xfsaild/dm-10 D ffff9720a8660fd0 0 1134
> 2 0x00000000
> [11279.491671] Call Trace:
> [11279.491682] [<ffffffffb92a3a2e>] ? try_to_del_timer_sync+0x5e/0x90
> [11279.491688] [<ffffffffb9913f79>] schedule+0x29/0x70
> [11279.491744] [<ffffffffc060de36>] _xfs_log_force+0x1c6/0x2c0 [xfs]
> [11279.491750] [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
> [11279.491783] [<ffffffffc0619fec>] ? xfsaild+0x16c/0x6f0 [xfs]
> [11279.491817] [<ffffffffc060df5c>] xfs_log_force+0x2c/0x70 [xfs]
> [11279.491849] [<ffffffffc0619e80>] ?
> xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
> [11279.491880] [<ffffffffc0619fec>] xfsaild+0x16c/0x6f0 [xfs]
> [11279.491913] [<ffffffffc0619e80>] ?
> xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
> [11279.491919] [<ffffffffb92bb161>] kthread+0xd1/0xe0
> [11279.491926] [<ffffffffb92bb090>] ? insert_kthread_work+0x40/0x40
> [11279.491932] [<ffffffffb9920677>]
> ret_from_fork_nospec_begin+0x21/0x21
> [11279.491936] [<ffffffffb92bb090>] ? insert_kthread_work+0x40/0x40
> [11279.491976] INFO: task glusterclogfsyn:14934 blocked for more
> than 120 seconds.
> [11279.493466] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [11279.494952] glusterclogfsyn D ffff97209832af70 0 14934
> 1 0x00000080
> [11279.494957] Call Trace:
> [11279.494979] [<ffffffffc0309839>] ?
> __split_and_process_bio+0x2e9/0x520 [dm_mod]
> [11279.494983] [<ffffffffb9913f79>] schedule+0x29/0x70
> [11279.494987] [<ffffffffb99118e9>] schedule_timeout+0x239/0x2c0
> [11279.494997] [<ffffffffc0309d98>] ? dm_make_request+0x128/0x1a0
> [dm_mod]
> [11279.495001] [<ffffffffb991348d>] io_schedule_timeout+0xad/0x130
> [11279.495005] [<ffffffffb99145ad>] wait_for_completion_io+0xfd/0x140
> [11279.495010] [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
> [11279.495016] [<ffffffffb951e574>] blkdev_issue_flush+0xb4/0x110
> [11279.495049] [<ffffffffc06064b9>]
> xfs_blkdev_issue_flush+0x19/0x20 [xfs]
> [11279.495079] [<ffffffffc05eec40>] xfs_file_fsync+0x1b0/0x1e0 [xfs]
> [11279.495086] [<ffffffffb944f0e7>] do_fsync+0x67/0xb0
> [11279.495090] [<ffffffffb992076f>] ?
> system_call_after_swapgs+0xbc/0x160
> [11279.495094] [<ffffffffb944f3d0>] SyS_fsync+0x10/0x20
> [11279.495098] [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
> [11279.495102] [<ffffffffb992077b>] ?
> system_call_after_swapgs+0xc8/0x160
> [11279.495105] INFO: task glusterposixfsy:14941 blocked for more
> than 120 seconds.
> [11279.496606] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [11279.498114] glusterposixfsy D ffff972495f84f10 0 14941
> 1 0x00000080
> [11279.498118] Call Trace:
> [11279.498134] [<ffffffffc0309839>] ?
> __split_and_process_bio+0x2e9/0x520 [dm_mod]
> [11279.498138] [<ffffffffb9913f79>] schedule+0x29/0x70
> [11279.498142] [<ffffffffb99118e9>] schedule_timeout+0x239/0x2c0
> [11279.498152] [<ffffffffc0309d98>] ? dm_make_request+0x128/0x1a0
> [dm_mod]
> [11279.498156] [<ffffffffb991348d>] io_schedule_timeout+0xad/0x130
> [11279.498160] [<ffffffffb99145ad>] wait_for_completion_io+0xfd/0x140
> [11279.498165] [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
> [11279.498169] [<ffffffffb951e574>] blkdev_issue_flush+0xb4/0x110
> [11279.498202] [<ffffffffc06064b9>]
> xfs_blkdev_issue_flush+0x19/0x20 [xfs]
> [11279.498231] [<ffffffffc05eec40>] xfs_file_fsync+0x1b0/0x1e0 [xfs]
> [11279.498238] [<ffffffffb944f0e7>] do_fsync+0x67/0xb0
> [11279.498242] [<ffffffffb992076f>] ?
> system_call_after_swapgs+0xbc/0x160
> [11279.498246] [<ffffffffb944f3f3>] SyS_fdatasync+0x13/0x20
> [11279.498250] [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
> [11279.498254] [<ffffffffb992077b>] ?
> system_call_after_swapgs+0xc8/0x160
> [11279.498257] INFO: task glusteriotwr1:14950 blocked for more
> than 120 seconds.
> [11279.499789] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [11279.501343] glusteriotwr1 D ffff97208b6daf70 0 14950
> 1 0x00000080
> [11279.501348] Call Trace:
> [11279.501353] [<ffffffffb9913f79>] schedule+0x29/0x70
> [11279.501390] [<ffffffffc060e388>]
> _xfs_log_force_lsn+0x2e8/0x340 [xfs]
> [11279.501396] [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
> [11279.501428] [<ffffffffc05eeb97>] xfs_file_fsync+0x107/0x1e0 [xfs]
> [11279.501432] [<ffffffffb944ef3f>] generic_write_sync+0x4f/0x70
> [11279.501461] [<ffffffffc05f0545>]
> xfs_file_aio_write+0x155/0x1b0 [xfs]
> [11279.501466] [<ffffffffb941a533>] do_sync_write+0x93/0xe0
> [11279.501471] [<ffffffffb941b010>] vfs_write+0xc0/0x1f0
> [11279.501475] [<ffffffffb941c002>] SyS_pwrite64+0x92/0xc0
> [11279.501479] [<ffffffffb992076f>] ?
> system_call_after_swapgs+0xbc/0x160
> [11279.501483] [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
> [11279.501489] [<ffffffffb992077b>] ?
> system_call_after_swapgs+0xc8/0x160
> [11279.501493] INFO: task glusteriotwr4:14953 blocked for more
> than 120 seconds.
> [11279.503047] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [11279.504630] glusteriotwr4 D ffff972499f2bf40 0 14953
> 1 0x00000080
> [11279.504635] Call Trace:
> [11279.504640] [<ffffffffb9913f79>] schedule+0x29/0x70
> [11279.504676] [<ffffffffc060e388>]
> _xfs_log_force_lsn+0x2e8/0x340 [xfs]
> [11279.504681] [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
> [11279.504710] [<ffffffffc05eeb97>] xfs_file_fsync+0x107/0x1e0 [xfs]
> [11279.504714] [<ffffffffb944f0e7>] do_fsync+0x67/0xb0
> [11279.504718] [<ffffffffb992076f>] ?
> system_call_after_swapgs+0xbc/0x160
> [11279.504722] [<ffffffffb944f3d0>] SyS_fsync+0x10/0x20
> [11279.504725] [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
> [11279.504730] [<ffffffffb992077b>] ?
> system_call_after_swapgs+0xc8/0x160
> [12127.466494] perf: interrupt took too long (8263 > 8150),
> lowering kernel.perf_event_max_sample_rate to 24000
>
> --------------------
> I think this is the cause of the massive ovirt performance issues
> irrespective of gluster volume. At the time this happened, I was
> also ssh'ed into the host, and was doing some rpm querry
> commands. I had just run rpm -qa |grep glusterfs (to verify what
> version was actually installed), and that command took almost 2
> minutes to return! Normally it takes less than 2 seconds. That
> is all pure local SSD IO, too....
>
> I'm no expert, but its my understanding that anytime a software
> causes these kinds of issues, its a serious bug in the software,
> even if its mis-handled exceptions. Is this correct?
>
> --Jim
>
> On Tue, May 29, 2018 at 3:01 PM, Jim Kusznir <jim at palousetech.com
> <mailto:jim at palousetech.com>> wrote:
>
> I think this is the profile information for one of the volumes
> that lives on the SSDs and is fully operational with no
> down/problem disks:
>
> [root at ovirt2 yum.repos.d]# gluster volume profile data info
> Brick: ovirt2.nwfiber.com:/gluster/brick2/data
> ----------------------------------------------
> Cumulative Stats:
> Block Size: 256b+ 512b+
> 1024b+
> No. of Reads: 983 2696
> 1059
> No. of Writes: 0 1113
> 302
> Block Size: 2048b+ 4096b+
> 8192b+
> No. of Reads: 852 88608
> 53526
> No. of Writes: 522 812340
> 76257
> Block Size: 16384b+ 32768b+
> 65536b+
> No. of Reads: 54351 241901
> 15024
> No. of Writes: 21636 8656
> 8976
> Block Size: 131072b+
> No. of Reads: 524156
> No. of Writes: 296071
> %-latency Avg-latency Min-Latency Max-Latency No. of
> calls Fop
> --------- ----------- ----------- -----------
> ------------ ----
> 0.00 0.00 us 0.00 us 0.00 us
> 4189 RELEASE
> 0.00 0.00 us 0.00 us 0.00 us
> 1257 RELEASEDIR
> 0.00 46.19 us 12.00 us 187.00 us
> 69 FLUSH
> 0.00 147.00 us 78.00 us 367.00 us
> 86 REMOVEXATTR
> 0.00 223.46 us 24.00 us 1166.00 us
> 149 READDIR
> 0.00 565.34 us 76.00 us 3639.00 us
> 88 FTRUNCATE
> 0.00 263.28 us 20.00 us 28385.00 us
> 228 LK
> 0.00 98.84 us 2.00 us 880.00 us
> 1198 OPENDIR
> 0.00 91.59 us 26.00 us 10371.00 us
> 3853 STATFS
> 0.00 494.14 us 17.00 us 193439.00 us
> 1171 GETXATTR
> 0.00 299.42 us 35.00 us 9799.00 us
> 2044 READDIRP
> 0.00 1965.31 us 110.00 us 382258.00 us
> 321 XATTROP
> 0.01 113.40 us 24.00 us 61061.00 us
> 8134 STAT
> 0.01 755.38 us 57.00 us 607603.00 us
> 3196 DISCARD
> 0.05 2690.09 us 58.00 us 2704761.00 us
> 3206 OPEN
> 0.10 119978.25 us 97.00 us 9406684.00 us
> 154 SETATTR
> 0.18 101.73 us 28.00 us 700477.00 us
> 313379 FSTAT
> 0.23 1059.84 us 25.00 us 2716124.00 us
> 38255 LOOKUP
> 0.47 1024.11 us 54.00 us 6197164.00 us
> 81455 FXATTROP
> 1.72 2984.00 us 15.00 us 37098954.00 us
> 103020 FINODELK
> 5.92 44315.32 us 51.00 us 24731536.00 us
> 23957 FSYNC
> 13.27 2399.78 us 25.00 us 22089540.00 us
> 991005 READ
> 37.00 5980.43 us 52.00 us 22099889.00 us
> 1108976 WRITE
> 41.04 5452.75 us 13.00 us 22102452.00 us
> 1349053 INODELK
> Duration: 10026 seconds
> Data Read: 80046027759 bytes
> Data Written: 44496632320 bytes
> Interval 1 Stats:
> Block Size: 256b+ 512b+
> 1024b+
> No. of Reads: 983 2696
> 1059
> No. of Writes: 0 838
> 185
> Block Size: 2048b+ 4096b+
> 8192b+
> No. of Reads: 852 85856
> 51575
> No. of Writes: 382 705802
> 57812
> Block Size: 16384b+ 32768b+
> 65536b+
> No. of Reads: 52673 232093
> 14984
> No. of Writes: 13499 4908
> 4242
> Block Size: 131072b+
> No. of Reads: 460040
> No. of Writes: 6411
> %-latency Avg-latency Min-Latency Max-Latency No. of
> calls Fop
> --------- ----------- ----------- -----------
> ------------ ----
> 0.00 0.00 us 0.00 us 0.00 us
> 2093 RELEASE
> 0.00 0.00 us 0.00 us 0.00 us
> 1093 RELEASEDIR
> 0.00 53.38 us 26.00 us 111.00 us
> 16 FLUSH
> 0.00 145.14 us 78.00 us 367.00 us
> 71 REMOVEXATTR
> 0.00 190.96 us 114.00 us 298.00 us
> 71 SETATTR
> 0.00 213.38 us 24.00 us 1145.00 us
> 90 READDIR
> 0.00 263.28 us 20.00 us 28385.00 us
> 228 LK
> 0.00 101.76 us 2.00 us 880.00 us
> 1093 OPENDIR
> 0.01 93.60 us 27.00 us 10371.00 us
> 3090 STATFS
> 0.02 537.47 us 17.00 us 193439.00 us
> 1038 GETXATTR
> 0.03 297.44 us 35.00 us 9799.00 us
> 1990 READDIRP
> 0.03 2357.28 us 110.00 us 382258.00 us
> 253 XATTROP
> 0.04 385.93 us 58.00 us 47593.00 us
> 2091 OPEN
> 0.04 114.86 us 24.00 us 61061.00 us
> 7715 STAT
> 0.06 444.59 us 57.00 us 333240.00 us
> 3053 DISCARD
> 0.42 316.24 us 25.00 us 290728.00 us
> 29823 LOOKUP
> 0.73 257.92 us 54.00 us 344812.00 us
> 63296 FXATTROP
> 1.37 98.30 us 28.00 us 67621.00 us
> 313172 FSTAT
> 1.58 2124.69 us 51.00 us 849200.00 us
> 16717 FSYNC
> 5.73 162.46 us 52.00 us 748492.00 us
> 794079 WRITE
> 7.19 2065.17 us 16.00 us 37098954.00 us
> 78381 FINODELK
> 36.44 886.32 us 25.00 us 2216436.00 us
> 925421 READ
> 46.30 1178.04 us 13.00 us 1700704.00 us
> 884635 INODELK
> Duration: 7485 seconds
> Data Read: 71250527215 bytes
> Data Written: 5119903744 bytes
> Brick: ovirt3.nwfiber.com:/gluster/brick2/data
> ----------------------------------------------
> Cumulative Stats:
> Block Size: 1b+
> No. of Reads: 0
> No. of Writes: 3264419
> %-latency Avg-latency Min-Latency Max-Latency No. of
> calls Fop
> --------- ----------- ----------- -----------
> ------------ ----
> 0.00 0.00 us 0.00 us 0.00 us
> 90 FORGET
> 0.00 0.00 us 0.00 us 0.00 us
> 9462 RELEASE
> 0.00 0.00 us 0.00 us 0.00 us
> 4254 RELEASEDIR
> 0.00 50.52 us 13.00 us 190.00 us
> 71 FLUSH
> 0.00 186.97 us 87.00 us 713.00 us
> 86 REMOVEXATTR
> 0.00 79.32 us 33.00 us 189.00 us
> 228 LK
> 0.00 220.98 us 129.00 us 513.00 us
> 86 SETATTR
> 0.01 259.30 us 26.00 us 2632.00 us
> 137 READDIR
> 0.02 322.76 us 145.00 us 2125.00 us
> 321 XATTROP
> 0.03 109.55 us 2.00 us 1258.00 us
> 1193 OPENDIR
> 0.05 70.21 us 21.00 us 431.00 us
> 3196 DISCARD
> 0.05 169.26 us 21.00 us 2315.00 us
> 1545 GETXATTR
> 0.12 176.85 us 63.00 us 2844.00 us
> 3206 OPEN
> 0.61 303.49 us 90.00 us 3085.00 us
> 9633 FSTAT
> 2.44 305.66 us 28.00 us 3716.00 us
> 38230 LOOKUP
> 4.52 266.22 us 55.00 us 53424.00 us
> 81455 FXATTROP
> 6.96 1397.99 us 51.00 us 64822.00 us
> 23889 FSYNC
> 16.48 84.74 us 25.00 us 6917.00 us
> 932592 WRITE
> 30.16 106.90 us 13.00 us 3920189.00 us
> 1353046 INODELK
> 38.55 1794.52 us 14.00 us 16210553.00 us
> 103039 FINODELK
> Duration: 66562 seconds
> Data Read: 0 bytes
> Data Written: 3264419 bytes
> Interval 1 Stats:
> Block Size: 1b+
> No. of Reads: 0
> No. of Writes: 794080
> %-latency Avg-latency Min-Latency Max-Latency No. of
> calls Fop
> --------- ----------- ----------- -----------
> ------------ ----
> 0.00 0.00 us 0.00 us 0.00 us
> 2093 RELEASE
> 0.00 0.00 us 0.00 us 0.00 us
> 1093 RELEASEDIR
> 0.00 70.31 us 26.00 us 125.00 us
> 16 FLUSH
> 0.00 193.10 us 103.00 us 713.00 us
> 71 REMOVEXATTR
> 0.01 227.32 us 133.00 us 513.00 us
> 71 SETATTR
> 0.01 79.32 us 33.00 us 189.00 us
> 228 LK
> 0.01 259.83 us 35.00 us 1138.00 us
> 89 READDIR
> 0.03 318.26 us 145.00 us 2047.00 us
> 253 XATTROP
> 0.04 112.67 us 3.00 us 1258.00 us
> 1093 OPENDIR
> 0.06 167.98 us 23.00 us 1951.00 us
> 1014 GETXATTR
> 0.08 70.97 us 22.00 us 431.00 us
> 3053 DISCARD
> 0.13 183.78 us 66.00 us 2844.00 us
> 2091 OPEN
> 1.01 303.82 us 90.00 us 3085.00 us
> 9610 FSTAT
> 3.27 316.59 us 30.00 us 3716.00 us
> 29820 LOOKUP
> 5.83 265.79 us 59.00 us 53424.00 us
> 63296 FXATTROP
> 7.95 1373.89 us 51.00 us 64822.00 us
> 16717 FSYNC
> 23.17 851.99 us 14.00 us 16210553.00 us
> 78555 FINODELK
> 24.04 87.44 us 27.00 us 6917.00 us
> 794081 WRITE
> 34.36 111.91 us 14.00 us 984871.00 us
> 886790 INODELK
> Duration: 7485 seconds
> Data Read: 0 bytes
> Data Written: 794080 bytes
>
>
> -----------------------
> Here is the data from the volume that is backed by the SHDDs
> and has one failed disk:
> [root at ovirt2 yum.repos.d]# gluster volume profile data-hdd info
> Brick: 172.172.1.12:/gluster/brick3/data-hdd
> --------------------------------------------
> Cumulative Stats:
> Block Size: 256b+ 512b+
> 1024b+
> No. of Reads: 1702 86
> 16
> No. of Writes: 0 767
> 71
> Block Size: 2048b+ 4096b+
> 8192b+
> No. of Reads: 19 51841
> 2049
> No. of Writes: 76 60668
> 35727
> Block Size: 16384b+ 32768b+
> 65536b+
> No. of Reads: 1744 639
> 1088
> No. of Writes: 8524 2410
> 1285
> Block Size: 131072b+
> No. of Reads: 771999
> No. of Writes: 29584
> %-latency Avg-latency Min-Latency Max-Latency No. of
> calls Fop
> --------- ----------- ----------- -----------
> ------------ ----
> 0.00 0.00 us 0.00 us 0.00 us
> 2902 RELEASE
> 0.00 0.00 us 0.00 us 0.00 us
> 1517 RELEASEDIR
> 0.00 197.00 us 197.00 us 197.00 us
> 1 FTRUNCATE
> 0.00 70.24 us 16.00 us 758.00 us
> 51 FLUSH
> 0.00 143.93 us 82.00 us 305.00 us
> 57 REMOVEXATTR
> 0.00 178.63 us 105.00 us 712.00 us
> 60 SETATTR
> 0.00 67.30 us 19.00 us 572.00 us
> 555 LK
> 0.00 322.80 us 23.00 us 4673.00 us
> 138 READDIR
> 0.00 336.56 us 106.00 us 11994.00 us
> 237 XATTROP
> 0.00 84.70 us 28.00 us 1071.00 us
> 3469 STATFS
> 0.01 387.75 us 2.00 us 146017.00 us
> 1467 OPENDIR
> 0.01 148.59 us 21.00 us 64374.00 us
> 4454 STAT
> 0.02 783.02 us 16.00 us 93502.00 us
> 1902 GETXATTR
> 0.03 1516.10 us 17.00 us 210690.00 us
> 1364 ENTRYLK
> 0.03 2555.47 us 300.00 us 674454.00 us
> 1064 READDIRP
> 0.07 85.74 us 19.00 us 68340.00 us
> 62849 FSTAT
> 0.07 1978.12 us 59.00 us 202596.00 us
> 2729 OPEN
> 0.22 708.57 us 15.00 us 394799.00 us
> 25447 LOOKUP
> 5.94 2331.74 us 15.00 us 1099530.00 us
> 207534 FINODELK
> 7.31 8311.75 us 58.00 us 1800216.00 us
> 71668 FXATTROP
> 12.49 7735.19 us 51.00 us 3595513.00 us
> 131642 WRITE
> 17.70 957.08 us 16.00 us 13700466.00 us
> 1508160 INODELK
> 24.55 2546.43 us 26.00 us 5077347.00 us
> 786060 READ
> 31.56 49699.15 us 47.00 us 3746331.00 us
> 51777 FSYNC
> Duration: 10101 seconds
> Data Read: 101562897361 bytes
> Data Written: 4834450432 bytes
> Interval 0 Stats:
> Block Size: 256b+ 512b+
> 1024b+
> No. of Reads: 1702 86
> 16
> No. of Writes: 0 767
> 71
> Block Size: 2048b+ 4096b+
> 8192b+
> No. of Reads: 19 51841
> 2049
> No. of Writes: 76 60668
> 35727
> Block Size: 16384b+ 32768b+
> 65536b+
> No. of Reads: 1744 639
> 1088
> No. of Writes: 8524 2410
> 1285
> Block Size: 131072b+
> No. of Reads: 771999
> No. of Writes: 29584
> %-latency Avg-latency Min-Latency Max-Latency No. of
> calls Fop
> --------- ----------- ----------- -----------
> ------------ ----
> 0.00 0.00 us 0.00 us 0.00 us
> 2902 RELEASE
> 0.00 0.00 us 0.00 us 0.00 us
> 1517 RELEASEDIR
> 0.00 197.00 us 197.00 us 197.00 us
> 1 FTRUNCATE
> 0.00 70.24 us 16.00 us 758.00 us
> 51 FLUSH
> 0.00 143.93 us 82.00 us 305.00 us
> 57 REMOVEXATTR
> 0.00 178.63 us 105.00 us 712.00 us
> 60 SETATTR
> 0.00 67.30 us 19.00 us 572.00 us
> 555 LK
> 0.00 322.80 us 23.00 us 4673.00 us
> 138 READDIR
> 0.00 336.56 us 106.00 us 11994.00 us
> 237 XATTROP
> 0.00 84.70 us 28.00 us 1071.00 us
> 3469 STATFS
> 0.01 387.75 us 2.00 us 146017.00 us
> 1467 OPENDIR
> 0.01 148.59 us 21.00 us 64374.00 us
> 4454 STAT
> 0.02 783.02 us 16.00 us 93502.00 us
> 1902 GETXATTR
> 0.03 1516.10 us 17.00 us 210690.00 us
> 1364 ENTRYLK
> 0.03 2555.47 us 300.00 us 674454.00 us
> 1064 READDIRP
> 0.07 85.73 us 19.00 us 68340.00 us
> 62849 FSTAT
> 0.07 1978.12 us 59.00 us 202596.00 us
> 2729 OPEN
> 0.22 708.57 us 15.00 us 394799.00 us
> 25447 LOOKUP
> 5.94 2334.57 us 15.00 us 1099530.00 us
> 207534 FINODELK
> 7.31 8311.49 us 58.00 us 1800216.00 us
> 71668 FXATTROP
> 12.49 7735.32 us 51.00 us 3595513.00 us
> 131642 WRITE
> 17.71 957.08 us 16.00 us 13700466.00 us
> 1508160 INODELK
> 24.56 2546.42 us 26.00 us 5077347.00 us
> 786060 READ
> 31.54 49651.63 us 47.00 us 3746331.00 us
> 51777 FSYNC
> Duration: 10101 seconds
> Data Read: 101562897361 bytes
> Data Written: 4834450432 bytes
>
>
> On Tue, May 29, 2018 at 2:55 PM, Jim Kusznir
> <jim at palousetech.com <mailto:jim at palousetech.com>> wrote:
>
> Thank you for your response.
>
> I have 4 gluster volumes. 3 are replica 2 + arbitrator.
> replica bricks are on ovirt1 and ovirt2, arbitrator on
> ovirt3. The 4th volume is replica 3, with a brick on all
> three ovirt machines.
>
> The first 3 volumes are on an SSD disk; the 4th is on a
> Seagate SSHD (same in all three machines). On ovirt3, the
> SSHD has reported hard IO failures, and that brick is
> offline. However, the other two replicas are fully
> operational (although they still show contents in the heal
> info command that won't go away, but that may be the case
> until I replace the failed disk).
>
> What is bothering me is that ALL 4 gluster volumes are
> showing horrible performance issues. At this point, as
> the bad disk has been completely offlined, I would expect
> gluster to perform at normal speed, but that is definitely
> not the case.
>
> I've also noticed that the performance hits seem to come
> in waves: things seem to work acceptably (but slow) for a
> while, then suddenly, its as if all disk IO on all volumes
> (including non-gluster local OS disk volumes for the
> hosts) pause for about 30 seconds, then IO resumes again.
> During those times, I start getting VM not responding and
> host not responding notices as well as the applications
> having major issues.
>
> I've shut down most of my VMs and am down to just my
> essential core VMs (shedded about 75% of my VMs). I still
> am experiencing the same issues.
>
> Am I correct in believing that once the failed disk was
> brought offline that performance should return to normal?
>
> On Tue, May 29, 2018 at 1:27 PM, Alex K
> <rightkicktech at gmail.com <mailto:rightkicktech at gmail.com>>
> wrote:
>
> I would check disks status and accessibility of mount
> points where your gluster volumes reside.
>
> On Tue, May 29, 2018, 22:28 Jim Kusznir
> <jim at palousetech.com <mailto:jim at palousetech.com>> wrote:
>
> On one ovirt server, I'm now seeing these messages:
> [56474.239725] blk_update_request: 63 callbacks
> suppressed
> [56474.239732] blk_update_request: I/O error, dev
> dm-2, sector 0
> [56474.240602] blk_update_request: I/O error, dev
> dm-2, sector 3905945472
> [56474.241346] blk_update_request: I/O error, dev
> dm-2, sector 3905945584
> [56474.242236] blk_update_request: I/O error, dev
> dm-2, sector 2048
> [56474.243072] blk_update_request: I/O error, dev
> dm-2, sector 3905943424
> [56474.243997] blk_update_request: I/O error, dev
> dm-2, sector 3905943536
> [56474.247347] blk_update_request: I/O error, dev
> dm-2, sector 0
> [56474.248315] blk_update_request: I/O error, dev
> dm-2, sector 3905945472
> [56474.249231] blk_update_request: I/O error, dev
> dm-2, sector 3905945584
> [56474.250221] blk_update_request: I/O error, dev
> dm-2, sector 2048
>
>
>
>
> On Tue, May 29, 2018 at 11:59 AM, Jim Kusznir
> <jim at palousetech.com <mailto:jim at palousetech.com>>
> wrote:
>
> I see in messages on ovirt3 (my 3rd machine,
> the one upgraded to 4.2):
>
> May 29 11:54:41 ovirt3 ovs-vsctl:
> ovs|00001|db_ctl_base|ERR|unix:/var/run/openvswitch/db.sock:
> database connection failed (No such file or
> directory)
> May 29 11:54:51 ovirt3 ovs-vsctl:
> ovs|00001|db_ctl_base|ERR|unix:/var/run/openvswitch/db.sock:
> database connection failed (No such file or
> directory)
> May 29 11:55:01 ovirt3 ovs-vsctl:
> ovs|00001|db_ctl_base|ERR|unix:/var/run/openvswitch/db.sock:
> database connection failed (No such file or
> directory)
> (appears a lot).
>
> I also found on the ssh session of that, some
> sysv warnings about the backing disk for one
> of the gluster volumes (straight replica 3).
> The glusterfs process for that disk on that
> machine went offline. Its my understanding
> that it should continue to work with the other
> two machines while I attempt to replace that
> disk, right? Attempted writes (touching an
> empty file) can take 15 seconds, repeating it
> later will be much faster.
>
> Gluster generates a bunch of different log
> files, I don't know what ones you want, or
> from which machine(s).
>
> How do I do "volume profiling"?
>
> Thanks!
>
> On Tue, May 29, 2018 at 11:53 AM, Sahina Bose
> <sabose at redhat.com <mailto:sabose at redhat.com>>
> wrote:
>
> Do you see errors reported in the mount
> logs for the volume? If so, could you
> attach the logs?
> Any issues with your underlying disks. Can
> you also attach output of volume profiling?
>
> On Wed, May 30, 2018 at 12:13 AM, Jim
> Kusznir <jim at palousetech.com
> <mailto:jim at palousetech.com>> wrote:
>
> Ok, things have gotten MUCH worse this
> morning. I'm getting random errors
> from VMs, right now, about a third of
> my VMs have been paused due to storage
> issues, and most of the remaining VMs
> are not performing well.
>
> At this point, I am in full EMERGENCY
> mode, as my production services are
> now impacted, and I'm getting calls
> coming in with problems...
>
> I'd greatly appreciate help...VMs are
> running VERY slowly (when they run),
> and they are steadily getting worse.
> I don't know why. I was seeing CPU
> peaks (to 100%) on several VMs, in
> perfect sync, for a few minutes at a
> time (while the VM became unresponsive
> and any VMs I was logged into that
> were linux were giving me the CPU
> stuck messages in my origional post).
> Is all this storage related?
>
> I also have two different gluster
> volumes for VM storage, and only one
> had the issues, but now VMs in both
> are being affected at the same time
> and same way.
>
> --Jim
>
> On Mon, May 28, 2018 at 10:50 PM,
> Sahina Bose <sabose at redhat.com
> <mailto:sabose at redhat.com>> wrote:
>
> [Adding gluster-users to look at
> the heal issue]
>
> On Tue, May 29, 2018 at 9:17 AM,
> Jim Kusznir <jim at palousetech.com
> <mailto:jim at palousetech.com>> wrote:
>
> Hello:
>
> I've been having some cluster
> and gluster performance issues
> lately. I also found that my
> cluster was out of date, and
> was trying to apply updates
> (hoping to fix some of these),
> and discovered the ovirt 4.1
> repos were taken completely
> offline. So, I was forced to
> begin an upgrade to 4.2.
> According to docs I
> found/read, I needed only add
> the new repo, do a yum update,
> reboot, and be good on my
> hosts (did the yum update, the
> engine-setup on my hosted
> engine). Things seemed to work
> relatively well, except for a
> gluster sync issue that showed up.
>
> My cluster is a 3 node
> hyperconverged cluster. I
> upgraded the hosted engine
> first, then engine 3. When
> engine 3 came back up, for
> some reason one of my gluster
> volumes would not sync. Here's
> sample output:
>
> [root at ovirt3 ~]# gluster
> volume heal data-hdd info
> Brick
> 172.172.1.11:/gluster/brick3/data-hdd
> /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/48d7ecb8-7ac5-4725-bca5-b3519681cf2f/0d6080b0-7018-4fa3-bb82-1dd9ef07d9b9
>
> /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/647be733-f153-4cdc-85bd-ba72544c2631/b453a300-0602-4be1-8310-8bd5abe00971
>
> /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/6da854d1-b6be-446b-9bf0-90a0dbbea830/3c93bd1f-b7fa-4aa2-b445-6904e31839ba
>
> /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/7f647567-d18c-44f1-a58e-9b8865833acb/f9364470-9770-4bb1-a6b9-a54861849625
>
> /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/f3c8e7aa-6ef2-42a7-93d4-e0a4df6dd2fa/2eb0b1ad-2606-44ef-9cd3-ae59610a504b
>
> /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/b1ea3f62-0f05-4ded-8c82-9c91c90e0b61/d5d6bf5a-499f-431d-9013-5453db93ed32
>
> /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/8c8b5147-e9d6-4810-b45b-185e3ed65727/16f08231-93b0-489d-a2fd-687b6bf88eaa
>
> /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/12924435-b9c2-4aab-ba19-1c1bc31310ef/07b3db69-440e-491e-854c-bbfa18a7cff2
>
> Status: Connected
> Number of entries: 8
>
> Brick
> 172.172.1.12:/gluster/brick3/data-hdd
> /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/48d7ecb8-7ac5-4725-bca5-b3519681cf2f/0d6080b0-7018-4fa3-bb82-1dd9ef07d9b9
>
> /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/647be733-f153-4cdc-85bd-ba72544c2631/b453a300-0602-4be1-8310-8bd5abe00971
>
> /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/b1ea3f62-0f05-4ded-8c82-9c91c90e0b61/d5d6bf5a-499f-431d-9013-5453db93ed32
>
> /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/6da854d1-b6be-446b-9bf0-90a0dbbea830/3c93bd1f-b7fa-4aa2-b445-6904e31839ba
>
> /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/7f647567-d18c-44f1-a58e-9b8865833acb/f9364470-9770-4bb1-a6b9-a54861849625
>
> /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/8c8b5147-e9d6-4810-b45b-185e3ed65727/16f08231-93b0-489d-a2fd-687b6bf88eaa
>
> /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/12924435-b9c2-4aab-ba19-1c1bc31310ef/07b3db69-440e-491e-854c-bbfa18a7cff2
>
> /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/f3c8e7aa-6ef2-42a7-93d4-e0a4df6dd2fa/2eb0b1ad-2606-44ef-9cd3-ae59610a504b
>
> Status: Connected
> Number of entries: 8
>
> Brick
> 172.172.1.13:/gluster/brick3/data-hdd
> /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/b1ea3f62-0f05-4ded-8c82-9c91c90e0b61/d5d6bf5a-499f-431d-9013-5453db93ed32
>
> /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/8c8b5147-e9d6-4810-b45b-185e3ed65727/16f08231-93b0-489d-a2fd-687b6bf88eaa
>
> /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/12924435-b9c2-4aab-ba19-1c1bc31310ef/07b3db69-440e-491e-854c-bbfa18a7cff2
>
> /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/f3c8e7aa-6ef2-42a7-93d4-e0a4df6dd2fa/2eb0b1ad-2606-44ef-9cd3-ae59610a504b
>
> /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/647be733-f153-4cdc-85bd-ba72544c2631/b453a300-0602-4be1-8310-8bd5abe00971
>
> /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/48d7ecb8-7ac5-4725-bca5-b3519681cf2f/0d6080b0-7018-4fa3-bb82-1dd9ef07d9b9
>
> /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/6da854d1-b6be-446b-9bf0-90a0dbbea830/3c93bd1f-b7fa-4aa2-b445-6904e31839ba
>
> /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/7f647567-d18c-44f1-a58e-9b8865833acb/f9364470-9770-4bb1-a6b9-a54861849625
>
> Status: Connected
> Number of entries: 8
>
> ---------
> Its been in this state for a
> couple days now, and bandwidth
> monitoring shows no
> appreciable data moving. I've
> tried repeatedly commanding a
> full heal from all three
> clusters in the node. Its
> always the same files that
> need healing.
>
> When running gluster volume
> heal data-hdd statistics, I
> see sometimes different
> information, but always some
> number of "heal failed"
> entries. It shows 0 for split
> brain.
>
> I'm not quite sure what to
> do. I suspect it may be due
> to nodes 1 and 2 still being
> on the older ovirt/gluster
> release, but I'm afraid to
> upgrade and reboot them until
> I have a good gluster sync
> (don't need to create a split
> brain issue). How do I
> proceed with this?
>
> Second issue: I've been
> experiencing VERY POOR
> performance on most of my
> VMs. To the tune that logging
> into a windows 10 vm via
> remote desktop can take 5
> minutes, launching quickbooks
> inside said vm can easily take
> 10 minutes. On some linux
> VMs, I get random messages
> like this:
> Message from syslogd at unifi at
> May 28 20:39:23 ...
> kernel:[6171996.308904] NMI
> watchdog: BUG: soft lockup -
> CPU#0 stuck for 22s!
> [mongod:14766]
>
> (the process and PID are often
> different)
>
> I'm not quite sure what to do
> about this either. My initial
> thought was upgrad everything
> to current and see if its
> still there, but I cannot move
> forward with that until my
> gluster is healed...
>
> Thanks!
> --Jim
>
> _______________________________________________
> Users mailing list --
> users at ovirt.org
> <mailto:users at ovirt.org>
> To unsubscribe send an email
> to users-leave at ovirt.org
> <mailto:users-leave at ovirt.org>
> Privacy Statement:
> https://www.ovirt.org/site/privacy-policy/
> <https://www.ovirt.org/site/privacy-policy/>
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> <https://www.ovirt.org/community/about/community-guidelines/>
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/3LEV6ZQ3JV2XLAL7NYBTXOYMYUOTIRQF/
> <https://lists.ovirt.org/archives/list/users@ovirt.org/message/3LEV6ZQ3JV2XLAL7NYBTXOYMYUOTIRQF/>
>
>
>
>
>
>
> _______________________________________________
> Users mailing list -- users at ovirt.org
> <mailto:users at ovirt.org>
> To unsubscribe send an email to
> users-leave at ovirt.org <mailto:users-leave at ovirt.org>
> Privacy Statement:
> https://www.ovirt.org/site/privacy-policy/
> <https://www.ovirt.org/site/privacy-policy/>
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> <https://www.ovirt.org/community/about/community-guidelines/>
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/ACO7RFSLBSRBAIONIC2HQ6Z24ZDES5MF/
> <https://lists.ovirt.org/archives/list/users@ovirt.org/message/ACO7RFSLBSRBAIONIC2HQ6Z24ZDES5MF/>
>
>
>
>
>
> _______________________________________________
> Users mailing list -- users at ovirt.org <mailto:users at ovirt.org>
> To unsubscribe send an email to users-leave at ovirt.org
> <mailto:users-leave at ovirt.org>
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> <https://www.ovirt.org/site/privacy-policy/>
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> <https://www.ovirt.org/community/about/community-guidelines/>
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/3DEQQLJM3WHQNZJ7KEMRZVFZ52MTIL74/
> <https://lists.ovirt.org/archives/list/users@ovirt.org/message/3DEQQLJM3WHQNZJ7KEMRZVFZ52MTIL74/>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180530/0704061f/attachment.html>
More information about the Gluster-users
mailing list