[Gluster-users] [ovirt-users] Re: Gluster problems, cluster performance issues

Wed May 30 04:43:55 UTC 2018

@Jim Kusznir

For the heal issue, can you provide the getfattr output of one of the 8 
files in question from all 3 bricks?
Example: `getfattr -d -m . -e hex 
/gluster/brick3/data-hdd/cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/48d7ecb8-7ac5-4725-bca5-b3519681cf2f/0d6080b0-7018-4fa3-bb82-1dd9ef07d9b9`
Also provide the stat output of the same file from all 3 bricks.

Thanks,
Ravi

On 05/30/2018 09:47 AM, Krutika Dhananjay wrote:
> Adding Ravi to look into the heal issue.
>
> As for the fsync hang and subsequent IO errors, it seems a lot like 
> https://bugzilla.redhat.com/show_bug.cgi?id=1497156 and Paolo Bonzini 
> from qemu had pointed out that this would be fixed by the following 
> commit:
>
>    commit e72c9a2a67a6400c8ef3d01d4c461dbbbfa0e1f0
>      Author: Paolo Bonzini <pbonzini at redhat.com <mailto:pbonzini at redhat.com>>
>      Date:   Wed Jun 21 16:35:46 2017 +0200
>
>      scsi: virtio_scsi: let host do exception handling
>      
>      virtio_scsi tries to do exception handling after the default 30 seconds
>      timeout expires.  However, it's better to let the host control the
>      timeout, otherwise with a heavy I/O load it is likely that an abort will
>      also timeout.  This leads to fatal errors like filesystems going
>      offline.
>      
>      Disable the 'sd' timeout and allow the host to do exception handling,
>      following the precedent of the storvsc driver.
>      
>      Hannes has a proposal to introduce timeouts in virtio, but this provides
>      an immediate solution for stable kernels too.
>      
>      [mkp: fixed typo]
>      
>      Reported-by: Douglas Miller <dougmill at linux.vnet.ibm.com <mailto:dougmill at linux.vnet.ibm.com>>
>      Cc: "James E.J. Bottomley" <jejb at linux.vnet.ibm.com <mailto:jejb at linux.vnet.ibm.com>>
>      Cc: "Martin K. Petersen" <martin.petersen at oracle.com <mailto:martin.petersen at oracle.com>>
>      Cc: Hannes Reinecke <hare at suse.de <mailto:hare at suse.de>>
>      Cc:linux-scsi at vger.kernel.org <mailto:linux-scsi at vger.kernel.org>
>      Cc:stable at vger.kernel.org <mailto:stable at vger.kernel.org>
>      Signed-off-by: Paolo Bonzini <pbonzini at redhat.com <mailto:pbonzini at redhat.com>>
>      Signed-off-by: Martin K. Petersen <martin.petersen at oracle.com <mailto:martin.petersen at oracle.com>>
>
> Adding Paolo/Kevin to comment.
>
> As for the poor gluster performance, could you disable 
> cluster.eager-lock and see if that makes any difference:
>
> # gluster volume set <VOL> cluster.eager-lock off
>
> Do also capture the volume profile again if you still see performance 
> issues after disabling eager-lock.
>
> -Krutika
>
>
> On Wed, May 30, 2018 at 6:55 AM, Jim Kusznir <jim at palousetech.com 
> <mailto:jim at palousetech.com>> wrote:
>
>     I also finally found the following in my system log on one server:
>
>     [10679.524491] INFO: task glusterclogro:14933 blocked for more
>     than 120 seconds.
>     [10679.525826] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>     disables this message.
>     [10679.527144] glusterclogro   D ffff97209832bf40    0 14933     
>     1 0x00000080
>     [10679.527150] Call Trace:
>     [10679.527161]  [<ffffffffb9913f79>] schedule+0x29/0x70
>     [10679.527218]  [<ffffffffc060e388>]
>     _xfs_log_force_lsn+0x2e8/0x340 [xfs]
>     [10679.527225]  [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
>     [10679.527254]  [<ffffffffc05eeb97>] xfs_file_fsync+0x107/0x1e0 [xfs]
>     [10679.527260]  [<ffffffffb944f0e7>] do_fsync+0x67/0xb0
>     [10679.527268]  [<ffffffffb992076f>] ?
>     system_call_after_swapgs+0xbc/0x160
>     [10679.527271]  [<ffffffffb944f3d0>] SyS_fsync+0x10/0x20
>     [10679.527275]  [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
>     [10679.527279]  [<ffffffffb992077b>] ?
>     system_call_after_swapgs+0xc8/0x160
>     [10679.527283] INFO: task glusterposixfsy:14941 blocked for more
>     than 120 seconds.
>     [10679.528608] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>     disables this message.
>     [10679.529956] glusterposixfsy D ffff972495f84f10    0 14941     
>     1 0x00000080
>     [10679.529961] Call Trace:
>     [10679.529966]  [<ffffffffb9913f79>] schedule+0x29/0x70
>     [10679.530003]  [<ffffffffc060e388>]
>     _xfs_log_force_lsn+0x2e8/0x340 [xfs]
>     [10679.530008]  [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
>     [10679.530038]  [<ffffffffc05eeb97>] xfs_file_fsync+0x107/0x1e0 [xfs]
>     [10679.530042]  [<ffffffffb944f0e7>] do_fsync+0x67/0xb0
>     [10679.530046]  [<ffffffffb992076f>] ?
>     system_call_after_swapgs+0xbc/0x160
>     [10679.530050]  [<ffffffffb944f3f3>] SyS_fdatasync+0x13/0x20
>     [10679.530054]  [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
>     [10679.530058]  [<ffffffffb992077b>] ?
>     system_call_after_swapgs+0xc8/0x160
>     [10679.530062] INFO: task glusteriotwr13:15486 blocked for more
>     than 120 seconds.
>     [10679.531805] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>     disables this message.
>     [10679.533732] glusteriotwr13  D ffff9720a83f0000    0 15486     
>     1 0x00000080
>     [10679.533738] Call Trace:
>     [10679.533747]  [<ffffffffb9913f79>] schedule+0x29/0x70
>     [10679.533799]  [<ffffffffc060e388>]
>     _xfs_log_force_lsn+0x2e8/0x340 [xfs]
>     [10679.533806]  [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
>     [10679.533846]  [<ffffffffc05eeb97>] xfs_file_fsync+0x107/0x1e0 [xfs]
>     [10679.533852]  [<ffffffffb944f0e7>] do_fsync+0x67/0xb0
>     [10679.533858]  [<ffffffffb992076f>] ?
>     system_call_after_swapgs+0xbc/0x160
>     [10679.533863]  [<ffffffffb944f3f3>] SyS_fdatasync+0x13/0x20
>     [10679.533868]  [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
>     [10679.533873]  [<ffffffffb992077b>] ?
>     system_call_after_swapgs+0xc8/0x160
>     [10919.512757] INFO: task glusterclogro:14933 blocked for more
>     than 120 seconds.
>     [10919.514714] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>     disables this message.
>     [10919.516663] glusterclogro   D ffff97209832bf40    0 14933     
>     1 0x00000080
>     [10919.516677] Call Trace:
>     [10919.516690]  [<ffffffffb9913f79>] schedule+0x29/0x70
>     [10919.516696]  [<ffffffffb99118e9>] schedule_timeout+0x239/0x2c0
>     [10919.516703]  [<ffffffffb951cc04>] ? blk_finish_plug+0x14/0x40
>     [10919.516768]  [<ffffffffc05e9224>] ?
>     _xfs_buf_ioapply+0x334/0x460 [xfs]
>     [10919.516774]  [<ffffffffb991432d>] wait_for_completion+0xfd/0x140
>     [10919.516782]  [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
>     [10919.516821]  [<ffffffffc05eb0a3>] ? _xfs_buf_read+0x23/0x40 [xfs]
>     [10919.516859]  [<ffffffffc05eafa9>]
>     xfs_buf_submit_wait+0xf9/0x1d0 [xfs]
>     [10919.516902]  [<ffffffffc061b279>] ?
>     xfs_trans_read_buf_map+0x199/0x400 [xfs]
>     [10919.516940]  [<ffffffffc05eb0a3>] _xfs_buf_read+0x23/0x40 [xfs]
>     [10919.516977]  [<ffffffffc05eb1b9>] xfs_buf_read_map+0xf9/0x160 [xfs]
>     [10919.517022]  [<ffffffffc061b279>]
>     xfs_trans_read_buf_map+0x199/0x400 [xfs]
>     [10919.517057]  [<ffffffffc05c8d04>] xfs_da_read_buf+0xd4/0x100 [xfs]
>     [10919.517091]  [<ffffffffc05c8d53>] xfs_da3_node_read+0x23/0xd0 [xfs]
>     [10919.517126]  [<ffffffffc05c9fee>]
>     xfs_da3_node_lookup_int+0x6e/0x2f0 [xfs]
>     [10919.517160]  [<ffffffffc05d5a1d>]
>     xfs_dir2_node_lookup+0x4d/0x170 [xfs]
>     [10919.517194]  [<ffffffffc05ccf5d>] xfs_dir_lookup+0x1bd/0x1e0 [xfs]
>     [10919.517233]  [<ffffffffc05fd8d9>] xfs_lookup+0x69/0x140 [xfs]
>     [10919.517271]  [<ffffffffc05fa018>] xfs_vn_lookup+0x78/0xc0 [xfs]
>     [10919.517278]  [<ffffffffb9425cf3>] lookup_real+0x23/0x60
>     [10919.517283]  [<ffffffffb9426702>] __lookup_hash+0x42/0x60
>     [10919.517288]  [<ffffffffb942d519>] SYSC_renameat2+0x3a9/0x5a0
>     [10919.517296]  [<ffffffffb94d3753>] ?
>     selinux_file_free_security+0x23/0x30
>     [10919.517304]  [<ffffffffb992077b>] ?
>     system_call_after_swapgs+0xc8/0x160
>     [10919.517309]  [<ffffffffb992076f>] ?
>     system_call_after_swapgs+0xbc/0x160
>     [10919.517313]  [<ffffffffb992077b>] ?
>     system_call_after_swapgs+0xc8/0x160
>     [10919.517318]  [<ffffffffb992076f>] ?
>     system_call_after_swapgs+0xbc/0x160
>     [10919.517323]  [<ffffffffb942e58e>] SyS_renameat2+0xe/0x10
>     [10919.517328]  [<ffffffffb942e5ce>] SyS_rename+0x1e/0x20
>     [10919.517333]  [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
>     [10919.517339]  [<ffffffffb992077b>] ?
>     system_call_after_swapgs+0xc8/0x160
>     [11159.496095] INFO: task glusteriotwr9:15482 blocked for more
>     than 120 seconds.
>     [11159.497546] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>     disables this message.
>     [11159.498978] glusteriotwr9   D ffff971fa0fa1fa0    0 15482     
>     1 0x00000080
>     [11159.498984] Call Trace:
>     [11159.498995]  [<ffffffffb9911f00>] ? bit_wait+0x50/0x50
>     [11159.498999]  [<ffffffffb9913f79>] schedule+0x29/0x70
>     [11159.499003]  [<ffffffffb99118e9>] schedule_timeout+0x239/0x2c0
>     [11159.499056]  [<ffffffffc05dd9b7>] ?
>     xfs_iext_bno_to_ext+0xa7/0x1a0 [xfs]
>     [11159.499082]  [<ffffffffc05dd43e>] ?
>     xfs_iext_bno_to_irec+0x8e/0xd0 [xfs]
>     [11159.499090]  [<ffffffffb92f7a12>] ? ktime_get_ts64+0x52/0xf0
>     [11159.499093]  [<ffffffffb9911f00>] ? bit_wait+0x50/0x50
>     [11159.499097]  [<ffffffffb991348d>] io_schedule_timeout+0xad/0x130
>     [11159.499101]  [<ffffffffb9913528>] io_schedule+0x18/0x20
>     [11159.499104]  [<ffffffffb9911f11>] bit_wait_io+0x11/0x50
>     [11159.499107]  [<ffffffffb9911ac1>] __wait_on_bit_lock+0x61/0xc0
>     [11159.499113]  [<ffffffffb9393634>] __lock_page+0x74/0x90
>     [11159.499118]  [<ffffffffb92bc210>] ? wake_bit_function+0x40/0x40
>     [11159.499121]  [<ffffffffb9394154>] __find_lock_page+0x54/0x70
>     [11159.499125]  [<ffffffffb9394e85>]
>     grab_cache_page_write_begin+0x55/0xc0
>     [11159.499130]  [<ffffffffb9484b76>] iomap_write_begin+0x66/0x100
>     [11159.499135]  [<ffffffffb9484edf>] iomap_write_actor+0xcf/0x1d0
>     [11159.499140]  [<ffffffffb9484e10>] ? iomap_write_end+0x80/0x80
>     [11159.499144]  [<ffffffffb94854e7>] iomap_apply+0xb7/0x150
>     [11159.499149]  [<ffffffffb9485621>]
>     iomap_file_buffered_write+0xa1/0xe0
>     [11159.499153]  [<ffffffffb9484e10>] ? iomap_write_end+0x80/0x80
>     [11159.499182]  [<ffffffffc05f025d>]
>     xfs_file_buffered_aio_write+0x12d/0x2c0 [xfs]
>     [11159.499213]  [<ffffffffc05f057d>]
>     xfs_file_aio_write+0x18d/0x1b0 [xfs]
>     [11159.499217]  [<ffffffffb941a533>] do_sync_write+0x93/0xe0
>     [11159.499222]  [<ffffffffb941b010>] vfs_write+0xc0/0x1f0
>     [11159.499225]  [<ffffffffb941c002>] SyS_pwrite64+0x92/0xc0
>     [11159.499230]  [<ffffffffb992076f>] ?
>     system_call_after_swapgs+0xbc/0x160
>     [11159.499234]  [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
>     [11159.499238]  [<ffffffffb992077b>] ?
>     system_call_after_swapgs+0xc8/0x160
>     [11279.488720] INFO: task xfsaild/dm-10:1134 blocked for more than
>     120 seconds.
>     [11279.490197] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>     disables this message.
>     [11279.491665] xfsaild/dm-10   D ffff9720a8660fd0    0  1134     
>     2 0x00000000
>     [11279.491671] Call Trace:
>     [11279.491682]  [<ffffffffb92a3a2e>] ? try_to_del_timer_sync+0x5e/0x90
>     [11279.491688]  [<ffffffffb9913f79>] schedule+0x29/0x70
>     [11279.491744]  [<ffffffffc060de36>] _xfs_log_force+0x1c6/0x2c0 [xfs]
>     [11279.491750]  [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
>     [11279.491783]  [<ffffffffc0619fec>] ? xfsaild+0x16c/0x6f0 [xfs]
>     [11279.491817]  [<ffffffffc060df5c>] xfs_log_force+0x2c/0x70 [xfs]
>     [11279.491849]  [<ffffffffc0619e80>] ?
>     xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
>     [11279.491880]  [<ffffffffc0619fec>] xfsaild+0x16c/0x6f0 [xfs]
>     [11279.491913]  [<ffffffffc0619e80>] ?
>     xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
>     [11279.491919]  [<ffffffffb92bb161>] kthread+0xd1/0xe0
>     [11279.491926]  [<ffffffffb92bb090>] ? insert_kthread_work+0x40/0x40
>     [11279.491932]  [<ffffffffb9920677>]
>     ret_from_fork_nospec_begin+0x21/0x21
>     [11279.491936]  [<ffffffffb92bb090>] ? insert_kthread_work+0x40/0x40
>     [11279.491976] INFO: task glusterclogfsyn:14934 blocked for more
>     than 120 seconds.
>     [11279.493466] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>     disables this message.
>     [11279.494952] glusterclogfsyn D ffff97209832af70    0 14934     
>     1 0x00000080
>     [11279.494957] Call Trace:
>     [11279.494979]  [<ffffffffc0309839>] ?
>     __split_and_process_bio+0x2e9/0x520 [dm_mod]
>     [11279.494983]  [<ffffffffb9913f79>] schedule+0x29/0x70
>     [11279.494987]  [<ffffffffb99118e9>] schedule_timeout+0x239/0x2c0
>     [11279.494997]  [<ffffffffc0309d98>] ? dm_make_request+0x128/0x1a0
>     [dm_mod]
>     [11279.495001]  [<ffffffffb991348d>] io_schedule_timeout+0xad/0x130
>     [11279.495005]  [<ffffffffb99145ad>] wait_for_completion_io+0xfd/0x140
>     [11279.495010]  [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
>     [11279.495016]  [<ffffffffb951e574>] blkdev_issue_flush+0xb4/0x110
>     [11279.495049]  [<ffffffffc06064b9>]
>     xfs_blkdev_issue_flush+0x19/0x20 [xfs]
>     [11279.495079]  [<ffffffffc05eec40>] xfs_file_fsync+0x1b0/0x1e0 [xfs]
>     [11279.495086]  [<ffffffffb944f0e7>] do_fsync+0x67/0xb0
>     [11279.495090]  [<ffffffffb992076f>] ?
>     system_call_after_swapgs+0xbc/0x160
>     [11279.495094]  [<ffffffffb944f3d0>] SyS_fsync+0x10/0x20
>     [11279.495098]  [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
>     [11279.495102]  [<ffffffffb992077b>] ?
>     system_call_after_swapgs+0xc8/0x160
>     [11279.495105] INFO: task glusterposixfsy:14941 blocked for more
>     than 120 seconds.
>     [11279.496606] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>     disables this message.
>     [11279.498114] glusterposixfsy D ffff972495f84f10    0 14941     
>     1 0x00000080
>     [11279.498118] Call Trace:
>     [11279.498134]  [<ffffffffc0309839>] ?
>     __split_and_process_bio+0x2e9/0x520 [dm_mod]
>     [11279.498138]  [<ffffffffb9913f79>] schedule+0x29/0x70
>     [11279.498142]  [<ffffffffb99118e9>] schedule_timeout+0x239/0x2c0
>     [11279.498152]  [<ffffffffc0309d98>] ? dm_make_request+0x128/0x1a0
>     [dm_mod]
>     [11279.498156]  [<ffffffffb991348d>] io_schedule_timeout+0xad/0x130
>     [11279.498160]  [<ffffffffb99145ad>] wait_for_completion_io+0xfd/0x140
>     [11279.498165]  [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
>     [11279.498169]  [<ffffffffb951e574>] blkdev_issue_flush+0xb4/0x110
>     [11279.498202]  [<ffffffffc06064b9>]
>     xfs_blkdev_issue_flush+0x19/0x20 [xfs]
>     [11279.498231]  [<ffffffffc05eec40>] xfs_file_fsync+0x1b0/0x1e0 [xfs]
>     [11279.498238]  [<ffffffffb944f0e7>] do_fsync+0x67/0xb0
>     [11279.498242]  [<ffffffffb992076f>] ?
>     system_call_after_swapgs+0xbc/0x160
>     [11279.498246]  [<ffffffffb944f3f3>] SyS_fdatasync+0x13/0x20
>     [11279.498250]  [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
>     [11279.498254]  [<ffffffffb992077b>] ?
>     system_call_after_swapgs+0xc8/0x160
>     [11279.498257] INFO: task glusteriotwr1:14950 blocked for more
>     than 120 seconds.
>     [11279.499789] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>     disables this message.
>     [11279.501343] glusteriotwr1   D ffff97208b6daf70    0 14950     
>     1 0x00000080
>     [11279.501348] Call Trace:
>     [11279.501353]  [<ffffffffb9913f79>] schedule+0x29/0x70
>     [11279.501390]  [<ffffffffc060e388>]
>     _xfs_log_force_lsn+0x2e8/0x340 [xfs]
>     [11279.501396]  [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
>     [11279.501428]  [<ffffffffc05eeb97>] xfs_file_fsync+0x107/0x1e0 [xfs]
>     [11279.501432]  [<ffffffffb944ef3f>] generic_write_sync+0x4f/0x70
>     [11279.501461]  [<ffffffffc05f0545>]
>     xfs_file_aio_write+0x155/0x1b0 [xfs]
>     [11279.501466]  [<ffffffffb941a533>] do_sync_write+0x93/0xe0
>     [11279.501471]  [<ffffffffb941b010>] vfs_write+0xc0/0x1f0
>     [11279.501475]  [<ffffffffb941c002>] SyS_pwrite64+0x92/0xc0
>     [11279.501479]  [<ffffffffb992076f>] ?
>     system_call_after_swapgs+0xbc/0x160
>     [11279.501483]  [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
>     [11279.501489]  [<ffffffffb992077b>] ?
>     system_call_after_swapgs+0xc8/0x160
>     [11279.501493] INFO: task glusteriotwr4:14953 blocked for more
>     than 120 seconds.
>     [11279.503047] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>     disables this message.
>     [11279.504630] glusteriotwr4   D ffff972499f2bf40    0 14953     
>     1 0x00000080
>     [11279.504635] Call Trace:
>     [11279.504640]  [<ffffffffb9913f79>] schedule+0x29/0x70
>     [11279.504676]  [<ffffffffc060e388>]
>     _xfs_log_force_lsn+0x2e8/0x340 [xfs]
>     [11279.504681]  [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
>     [11279.504710]  [<ffffffffc05eeb97>] xfs_file_fsync+0x107/0x1e0 [xfs]
>     [11279.504714]  [<ffffffffb944f0e7>] do_fsync+0x67/0xb0
>     [11279.504718]  [<ffffffffb992076f>] ?
>     system_call_after_swapgs+0xbc/0x160
>     [11279.504722]  [<ffffffffb944f3d0>] SyS_fsync+0x10/0x20
>     [11279.504725]  [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
>     [11279.504730]  [<ffffffffb992077b>] ?
>     system_call_after_swapgs+0xc8/0x160
>     [12127.466494] perf: interrupt took too long (8263 > 8150),
>     lowering kernel.perf_event_max_sample_rate to 24000
>
>     --------------------
>     I think this is the cause of the massive ovirt performance issues
>     irrespective of gluster volume.  At the time this happened, I was
>     also ssh'ed into the host, and was doing some rpm querry
>     commands.  I had just run rpm -qa |grep glusterfs (to verify what
>     version was actually installed), and that command took almost 2
>     minutes to return!  Normally it takes less than 2 seconds.  That
>     is all pure local SSD IO, too....
>
>     I'm no expert, but its my understanding that anytime a software
>     causes these kinds of issues, its a serious bug in the software,
>     even if its mis-handled exceptions.  Is this correct?
>
>     --Jim
>
>     On Tue, May 29, 2018 at 3:01 PM, Jim Kusznir <jim at palousetech.com
>     <mailto:jim at palousetech.com>> wrote:
>
>         I think this is the profile information for one of the volumes
>         that lives on the SSDs and is fully operational with no
>         down/problem disks:
>
>         [root at ovirt2 yum.repos.d]# gluster volume profile data info
>         Brick: ovirt2.nwfiber.com:/gluster/brick2/data
>         ----------------------------------------------
>         Cumulative Stats:
>            Block Size:                256b+            512b+         
>               1024b+
>          No. of Reads:                  983             2696         
>                 1059
>         No. of Writes:                    0             1113         
>                  302
>            Block Size:               2048b+           4096b+         
>               8192b+
>          No. of Reads:                  852            88608         
>                53526
>         No. of Writes:                  522           812340         
>                76257
>            Block Size:              16384b+          32768b+         
>              65536b+
>          No. of Reads:                54351           241901         
>                15024
>         No. of Writes:                21636             8656         
>                 8976
>            Block Size:             131072b+
>          No. of Reads:               524156
>         No. of Writes:               296071
>          %-latency   Avg-latency   Min-Latency  Max-Latency   No. of
>         calls         Fop
>          ---------   -----------   -----------  ----------- 
>          ------------        ----
>               0.00       0.00 us       0.00 us    0.00 us         
>          4189     RELEASE
>               0.00       0.00 us       0.00 us    0.00 us         
>          1257  RELEASEDIR
>               0.00      46.19 us      12.00 us  187.00 us           
>          69       FLUSH
>               0.00     147.00 us      78.00 us  367.00 us           
>          86 REMOVEXATTR
>               0.00     223.46 us      24.00 us 1166.00 us           
>         149     READDIR
>               0.00     565.34 us      76.00 us 3639.00 us           
>          88   FTRUNCATE
>               0.00     263.28 us      20.00 us  28385.00 us           
>         228          LK
>               0.00      98.84 us       2.00 us  880.00 us         
>          1198     OPENDIR
>               0.00      91.59 us      26.00 us  10371.00 us         
>          3853      STATFS
>               0.00     494.14 us      17.00 us 193439.00 us         
>          1171    GETXATTR
>               0.00     299.42 us      35.00 us 9799.00 us         
>          2044    READDIRP
>               0.00    1965.31 us     110.00 us 382258.00 us           
>         321     XATTROP
>               0.01     113.40 us      24.00 us  61061.00 us         
>          8134        STAT
>               0.01     755.38 us      57.00 us 607603.00 us         
>          3196     DISCARD
>               0.05    2690.09 us      58.00 us 2704761.00 us         
>          3206        OPEN
>               0.10  119978.25 us      97.00 us 9406684.00 us         
>           154     SETATTR
>               0.18     101.73 us      28.00 us 700477.00 us       
>          313379       FSTAT
>               0.23    1059.84 us      25.00 us 2716124.00 us         
>         38255      LOOKUP
>               0.47    1024.11 us      54.00 us 6197164.00 us         
>         81455    FXATTROP
>               1.72    2984.00 us      15.00 us 37098954.00 us       
>          103020    FINODELK
>               5.92   44315.32 us      51.00 us 24731536.00 us         
>         23957       FSYNC
>              13.27    2399.78 us      25.00 us 22089540.00 us       
>          991005        READ
>              37.00    5980.43 us      52.00 us 22099889.00 us       
>         1108976       WRITE
>              41.04    5452.75 us      13.00 us 22102452.00 us       
>         1349053     INODELK
>             Duration: 10026 seconds
>            Data Read: 80046027759 bytes
>         Data Written: 44496632320 bytes
>         Interval 1 Stats:
>            Block Size:                256b+            512b+         
>               1024b+
>          No. of Reads:                  983             2696         
>                 1059
>         No. of Writes:                    0              838         
>                  185
>            Block Size:               2048b+           4096b+         
>               8192b+
>          No. of Reads:                  852            85856         
>                51575
>         No. of Writes:                  382           705802         
>                57812
>            Block Size:              16384b+          32768b+         
>              65536b+
>          No. of Reads:                52673           232093         
>                14984
>         No. of Writes:                13499             4908         
>                 4242
>            Block Size:             131072b+
>          No. of Reads:               460040
>         No. of Writes:                 6411
>          %-latency   Avg-latency   Min-Latency  Max-Latency   No. of
>         calls         Fop
>          ---------   -----------   -----------  ----------- 
>          ------------        ----
>               0.00       0.00 us       0.00 us    0.00 us         
>          2093     RELEASE
>               0.00       0.00 us       0.00 us    0.00 us         
>          1093  RELEASEDIR
>               0.00      53.38 us      26.00 us  111.00 us           
>          16       FLUSH
>               0.00     145.14 us      78.00 us  367.00 us           
>          71 REMOVEXATTR
>               0.00     190.96 us     114.00 us  298.00 us           
>          71     SETATTR
>               0.00     213.38 us      24.00 us 1145.00 us           
>          90     READDIR
>               0.00     263.28 us      20.00 us  28385.00 us           
>         228          LK
>               0.00     101.76 us       2.00 us  880.00 us         
>          1093     OPENDIR
>               0.01      93.60 us      27.00 us  10371.00 us         
>          3090      STATFS
>               0.02     537.47 us      17.00 us 193439.00 us         
>          1038    GETXATTR
>               0.03     297.44 us      35.00 us 9799.00 us         
>          1990    READDIRP
>               0.03    2357.28 us     110.00 us 382258.00 us           
>         253     XATTROP
>               0.04     385.93 us      58.00 us  47593.00 us         
>          2091        OPEN
>               0.04     114.86 us      24.00 us  61061.00 us         
>          7715        STAT
>               0.06     444.59 us      57.00 us 333240.00 us         
>          3053     DISCARD
>               0.42     316.24 us      25.00 us 290728.00 us         
>         29823      LOOKUP
>               0.73     257.92 us      54.00 us 344812.00 us         
>         63296    FXATTROP
>               1.37      98.30 us      28.00 us  67621.00 us       
>          313172       FSTAT
>               1.58    2124.69 us      51.00 us 849200.00 us         
>         16717       FSYNC
>               5.73     162.46 us      52.00 us 748492.00 us       
>          794079       WRITE
>               7.19    2065.17 us      16.00 us 37098954.00 us         
>         78381    FINODELK
>              36.44     886.32 us      25.00 us 2216436.00 us       
>          925421        READ
>              46.30    1178.04 us      13.00 us 1700704.00 us       
>          884635     INODELK
>             Duration: 7485 seconds
>            Data Read: 71250527215 bytes
>         Data Written: 5119903744 bytes
>         Brick: ovirt3.nwfiber.com:/gluster/brick2/data
>         ----------------------------------------------
>         Cumulative Stats:
>            Block Size:                  1b+
>          No. of Reads:                    0
>         No. of Writes:              3264419
>          %-latency   Avg-latency   Min-Latency  Max-Latency   No. of
>         calls         Fop
>          ---------   -----------   -----------  ----------- 
>          ------------        ----
>               0.00       0.00 us       0.00 us    0.00 us           
>          90      FORGET
>               0.00       0.00 us       0.00 us    0.00 us         
>          9462     RELEASE
>               0.00       0.00 us       0.00 us    0.00 us         
>          4254  RELEASEDIR
>               0.00      50.52 us      13.00 us  190.00 us           
>          71       FLUSH
>               0.00     186.97 us      87.00 us  713.00 us           
>          86 REMOVEXATTR
>               0.00      79.32 us      33.00 us  189.00 us           
>         228          LK
>               0.00     220.98 us     129.00 us  513.00 us           
>          86     SETATTR
>               0.01     259.30 us      26.00 us 2632.00 us           
>         137     READDIR
>               0.02     322.76 us     145.00 us 2125.00 us           
>         321     XATTROP
>               0.03     109.55 us       2.00 us 1258.00 us         
>          1193     OPENDIR
>               0.05      70.21 us      21.00 us  431.00 us         
>          3196     DISCARD
>               0.05     169.26 us      21.00 us 2315.00 us         
>          1545    GETXATTR
>               0.12     176.85 us      63.00 us 2844.00 us         
>          3206        OPEN
>               0.61     303.49 us      90.00 us 3085.00 us         
>          9633       FSTAT
>               2.44     305.66 us      28.00 us 3716.00 us         
>         38230      LOOKUP
>               4.52     266.22 us      55.00 us  53424.00 us         
>         81455    FXATTROP
>               6.96    1397.99 us      51.00 us  64822.00 us         
>         23889       FSYNC
>              16.48      84.74 us      25.00 us 6917.00 us       
>          932592       WRITE
>              30.16     106.90 us      13.00 us 3920189.00 us       
>         1353046     INODELK
>              38.55    1794.52 us      14.00 us 16210553.00 us       
>          103039    FINODELK
>             Duration: 66562 seconds
>            Data Read: 0 bytes
>         Data Written: 3264419 bytes
>         Interval 1 Stats:
>            Block Size:                  1b+
>          No. of Reads:                    0
>         No. of Writes:               794080
>          %-latency   Avg-latency   Min-Latency  Max-Latency   No. of
>         calls         Fop
>          ---------   -----------   -----------  ----------- 
>          ------------        ----
>               0.00       0.00 us       0.00 us    0.00 us         
>          2093     RELEASE
>               0.00       0.00 us       0.00 us    0.00 us         
>          1093  RELEASEDIR
>               0.00      70.31 us      26.00 us  125.00 us           
>          16       FLUSH
>               0.00     193.10 us     103.00 us  713.00 us           
>          71 REMOVEXATTR
>               0.01     227.32 us     133.00 us  513.00 us           
>          71     SETATTR
>               0.01      79.32 us      33.00 us  189.00 us           
>         228          LK
>               0.01     259.83 us      35.00 us 1138.00 us           
>          89     READDIR
>               0.03     318.26 us     145.00 us 2047.00 us           
>         253     XATTROP
>               0.04     112.67 us       3.00 us 1258.00 us         
>          1093     OPENDIR
>               0.06     167.98 us      23.00 us 1951.00 us         
>          1014    GETXATTR
>               0.08      70.97 us      22.00 us  431.00 us         
>          3053     DISCARD
>               0.13     183.78 us      66.00 us 2844.00 us         
>          2091        OPEN
>               1.01     303.82 us      90.00 us 3085.00 us         
>          9610       FSTAT
>               3.27     316.59 us      30.00 us 3716.00 us         
>         29820      LOOKUP
>               5.83     265.79 us      59.00 us  53424.00 us         
>         63296    FXATTROP
>               7.95    1373.89 us      51.00 us  64822.00 us         
>         16717       FSYNC
>              23.17     851.99 us      14.00 us 16210553.00 us         
>         78555    FINODELK
>              24.04      87.44 us      27.00 us 6917.00 us       
>          794081       WRITE
>              34.36     111.91 us      14.00 us 984871.00 us       
>          886790     INODELK
>             Duration: 7485 seconds
>            Data Read: 0 bytes
>         Data Written: 794080 bytes
>
>
>         -----------------------
>         Here is the data from the volume that is backed by the SHDDs
>         and has one failed disk:
>         [root at ovirt2 yum.repos.d]# gluster volume profile data-hdd info
>         Brick: 172.172.1.12:/gluster/brick3/data-hdd
>         --------------------------------------------
>         Cumulative Stats:
>            Block Size:                256b+            512b+         
>               1024b+
>          No. of Reads:                 1702               86         
>                   16
>         No. of Writes:                    0              767         
>                   71
>            Block Size:               2048b+           4096b+         
>               8192b+
>          No. of Reads:                   19            51841         
>                 2049
>         No. of Writes:                   76            60668         
>                35727
>            Block Size:              16384b+          32768b+         
>              65536b+
>          No. of Reads:                 1744              639         
>                 1088
>         No. of Writes:                 8524             2410         
>                 1285
>            Block Size:             131072b+
>          No. of Reads:               771999
>         No. of Writes:                29584
>          %-latency   Avg-latency   Min-Latency  Max-Latency   No. of
>         calls         Fop
>          ---------   -----------   -----------  ----------- 
>          ------------        ----
>               0.00       0.00 us       0.00 us    0.00 us         
>          2902     RELEASE
>               0.00       0.00 us       0.00 us    0.00 us         
>          1517  RELEASEDIR
>               0.00     197.00 us     197.00 us  197.00 us             
>         1   FTRUNCATE
>               0.00      70.24 us      16.00 us  758.00 us           
>          51       FLUSH
>               0.00     143.93 us      82.00 us  305.00 us           
>          57 REMOVEXATTR
>               0.00     178.63 us     105.00 us  712.00 us           
>          60     SETATTR
>               0.00      67.30 us      19.00 us  572.00 us           
>         555          LK
>               0.00     322.80 us      23.00 us 4673.00 us           
>         138     READDIR
>               0.00     336.56 us     106.00 us  11994.00 us           
>         237     XATTROP
>               0.00      84.70 us      28.00 us 1071.00 us         
>          3469      STATFS
>               0.01     387.75 us       2.00 us 146017.00 us         
>          1467     OPENDIR
>               0.01     148.59 us      21.00 us  64374.00 us         
>          4454        STAT
>               0.02     783.02 us      16.00 us  93502.00 us         
>          1902    GETXATTR
>               0.03    1516.10 us      17.00 us 210690.00 us         
>          1364     ENTRYLK
>               0.03    2555.47 us     300.00 us 674454.00 us         
>          1064    READDIRP
>               0.07      85.74 us      19.00 us  68340.00 us         
>         62849       FSTAT
>               0.07    1978.12 us      59.00 us 202596.00 us         
>          2729        OPEN
>               0.22     708.57 us      15.00 us 394799.00 us         
>         25447      LOOKUP
>               5.94    2331.74 us      15.00 us 1099530.00 us       
>          207534    FINODELK
>               7.31    8311.75 us      58.00 us 1800216.00 us         
>         71668    FXATTROP
>              12.49    7735.19 us      51.00 us 3595513.00 us       
>          131642       WRITE
>              17.70     957.08 us      16.00 us 13700466.00 us       
>         1508160     INODELK
>              24.55    2546.43 us      26.00 us 5077347.00 us       
>          786060        READ
>              31.56   49699.15 us      47.00 us 3746331.00 us         
>         51777       FSYNC
>             Duration: 10101 seconds
>            Data Read: 101562897361 bytes
>         Data Written: 4834450432 bytes
>         Interval 0 Stats:
>            Block Size:                256b+            512b+         
>               1024b+
>          No. of Reads:                 1702               86         
>                   16
>         No. of Writes:                    0              767         
>                   71
>            Block Size:               2048b+           4096b+         
>               8192b+
>          No. of Reads:                   19            51841         
>                 2049
>         No. of Writes:                   76            60668         
>                35727
>            Block Size:              16384b+          32768b+         
>              65536b+
>          No. of Reads:                 1744              639         
>                 1088
>         No. of Writes:                 8524             2410         
>                 1285
>            Block Size:             131072b+
>          No. of Reads:               771999
>         No. of Writes:                29584
>          %-latency   Avg-latency   Min-Latency  Max-Latency   No. of
>         calls         Fop
>          ---------   -----------   -----------  ----------- 
>          ------------        ----
>               0.00       0.00 us       0.00 us    0.00 us         
>          2902     RELEASE
>               0.00       0.00 us       0.00 us    0.00 us         
>          1517  RELEASEDIR
>               0.00     197.00 us     197.00 us  197.00 us             
>         1   FTRUNCATE
>               0.00      70.24 us      16.00 us  758.00 us           
>          51       FLUSH
>               0.00     143.93 us      82.00 us  305.00 us           
>          57 REMOVEXATTR
>               0.00     178.63 us     105.00 us  712.00 us           
>          60     SETATTR
>               0.00      67.30 us      19.00 us  572.00 us           
>         555          LK
>               0.00     322.80 us      23.00 us 4673.00 us           
>         138     READDIR
>               0.00     336.56 us     106.00 us  11994.00 us           
>         237     XATTROP
>               0.00      84.70 us      28.00 us 1071.00 us         
>          3469      STATFS
>               0.01     387.75 us       2.00 us 146017.00 us         
>          1467     OPENDIR
>               0.01     148.59 us      21.00 us  64374.00 us         
>          4454        STAT
>               0.02     783.02 us      16.00 us  93502.00 us         
>          1902    GETXATTR
>               0.03    1516.10 us      17.00 us 210690.00 us         
>          1364     ENTRYLK
>               0.03    2555.47 us     300.00 us 674454.00 us         
>          1064    READDIRP
>               0.07      85.73 us      19.00 us  68340.00 us         
>         62849       FSTAT
>               0.07    1978.12 us      59.00 us 202596.00 us         
>          2729        OPEN
>               0.22     708.57 us      15.00 us 394799.00 us         
>         25447      LOOKUP
>               5.94    2334.57 us      15.00 us 1099530.00 us       
>          207534    FINODELK
>               7.31    8311.49 us      58.00 us 1800216.00 us         
>         71668    FXATTROP
>              12.49    7735.32 us      51.00 us 3595513.00 us       
>          131642       WRITE
>              17.71     957.08 us      16.00 us 13700466.00 us       
>         1508160     INODELK
>              24.56    2546.42 us      26.00 us 5077347.00 us       
>          786060        READ
>              31.54   49651.63 us      47.00 us 3746331.00 us         
>         51777       FSYNC
>             Duration: 10101 seconds
>            Data Read: 101562897361 bytes
>         Data Written: 4834450432 bytes
>
>
>         On Tue, May 29, 2018 at 2:55 PM, Jim Kusznir
>         <jim at palousetech.com <mailto:jim at palousetech.com>> wrote:
>
>             Thank you for your response.
>
>             I have 4 gluster volumes.  3 are replica 2 + arbitrator. 
>             replica bricks are on ovirt1 and ovirt2, arbitrator on
>             ovirt3.  The 4th volume is replica 3, with a brick on all
>             three ovirt machines.
>
>             The first 3 volumes are on an SSD disk; the 4th is on a
>             Seagate SSHD (same in all three machines).  On ovirt3, the
>             SSHD has reported hard IO failures, and that brick is
>             offline.  However, the other two replicas are fully
>             operational (although they still show contents in the heal
>             info command that won't go away, but that may be the case
>             until I replace the failed disk).
>
>             What is bothering me is that ALL 4 gluster volumes are
>             showing horrible performance issues.  At this point, as
>             the bad disk has been completely offlined, I would expect
>             gluster to perform at normal speed, but that is definitely
>             not the case.
>
>             I've also noticed that the performance hits seem to come
>             in waves: things seem to work acceptably (but slow) for a
>             while, then suddenly, its as if all disk IO on all volumes
>             (including non-gluster local OS disk volumes for the
>             hosts) pause for about 30 seconds, then IO resumes again.
>             During those times, I start getting VM not responding and
>             host not responding notices as well as the applications
>             having major issues.
>
>             I've shut down most of my VMs and am down to just my
>             essential core VMs (shedded about 75% of my VMs). I still
>             am experiencing the same issues.
>
>             Am I correct in believing that once the failed disk was
>             brought offline that performance should return to normal?
>
>             On Tue, May 29, 2018 at 1:27 PM, Alex K
>             <rightkicktech at gmail.com <mailto:rightkicktech at gmail.com>>
>             wrote:
>
>                 I would check disks status and accessibility of mount
>                 points where your gluster volumes reside.
>
>                 On Tue, May 29, 2018, 22:28 Jim Kusznir
>                 <jim at palousetech.com <mailto:jim at palousetech.com>> wrote:
>
>                     On one ovirt server, I'm now seeing these messages:
>                     [56474.239725] blk_update_request: 63 callbacks
>                     suppressed
>                     [56474.239732] blk_update_request: I/O error, dev
>                     dm-2, sector 0
>                     [56474.240602] blk_update_request: I/O error, dev
>                     dm-2, sector 3905945472
>                     [56474.241346] blk_update_request: I/O error, dev
>                     dm-2, sector 3905945584
>                     [56474.242236] blk_update_request: I/O error, dev
>                     dm-2, sector 2048
>                     [56474.243072] blk_update_request: I/O error, dev
>                     dm-2, sector 3905943424
>                     [56474.243997] blk_update_request: I/O error, dev
>                     dm-2, sector 3905943536
>                     [56474.247347] blk_update_request: I/O error, dev
>                     dm-2, sector 0
>                     [56474.248315] blk_update_request: I/O error, dev
>                     dm-2, sector 3905945472
>                     [56474.249231] blk_update_request: I/O error, dev
>                     dm-2, sector 3905945584
>                     [56474.250221] blk_update_request: I/O error, dev
>                     dm-2, sector 2048
>
>
>
>
>                     On Tue, May 29, 2018 at 11:59 AM, Jim Kusznir
>                     <jim at palousetech.com <mailto:jim at palousetech.com>>
>                     wrote:
>
>                         I see in messages on ovirt3 (my 3rd machine,
>                         the one upgraded to 4.2):
>
>                         May 29 11:54:41 ovirt3 ovs-vsctl:
>                         ovs|00001|db_ctl_base|ERR|unix:/var/run/openvswitch/db.sock:
>                         database connection failed (No such file or
>                         directory)
>                         May 29 11:54:51 ovirt3 ovs-vsctl:
>                         ovs|00001|db_ctl_base|ERR|unix:/var/run/openvswitch/db.sock:
>                         database connection failed (No such file or
>                         directory)
>                         May 29 11:55:01 ovirt3 ovs-vsctl:
>                         ovs|00001|db_ctl_base|ERR|unix:/var/run/openvswitch/db.sock:
>                         database connection failed (No such file or
>                         directory)
>                         (appears a lot).
>
>                         I also found on the ssh session of that, some
>                         sysv warnings about the backing disk for one
>                         of the gluster volumes (straight replica 3).
>                         The glusterfs process for that disk on that
>                         machine went offline. Its my understanding
>                         that it should continue to work with the other
>                         two machines while I attempt to replace that
>                         disk, right? Attempted writes (touching an
>                         empty file) can take 15 seconds, repeating it
>                         later will be much faster.
>
>                         Gluster generates a bunch of different log
>                         files, I don't know what ones you want, or
>                         from which machine(s).
>
>                         How do I do "volume profiling"?
>
>                         Thanks!
>
>                         On Tue, May 29, 2018 at 11:53 AM, Sahina Bose
>                         <sabose at redhat.com <mailto:sabose at redhat.com>>
>                         wrote:
>
>                             Do you see errors reported in the mount
>                             logs for the volume? If so, could you
>                             attach the logs?
>                             Any issues with your underlying disks. Can
>                             you also attach output of volume profiling?
>
>                             On Wed, May 30, 2018 at 12:13 AM, Jim
>                             Kusznir <jim at palousetech.com
>                             <mailto:jim at palousetech.com>> wrote:
>
>                                 Ok, things have gotten MUCH worse this
>                                 morning.  I'm getting random errors
>                                 from VMs, right now, about a third of
>                                 my VMs have been paused due to storage
>                                 issues, and most of the remaining VMs
>                                 are not performing well.
>
>                                 At this point, I am in full EMERGENCY
>                                 mode, as my production services are
>                                 now impacted, and I'm getting calls
>                                 coming in with problems...
>
>                                 I'd greatly appreciate help...VMs are
>                                 running VERY slowly (when they run),
>                                 and they are steadily getting worse. 
>                                 I don't know why.  I was seeing CPU
>                                 peaks (to 100%) on several VMs, in
>                                 perfect sync, for a few minutes at a
>                                 time (while the VM became unresponsive
>                                 and any VMs I was logged into that
>                                 were linux were giving me the CPU
>                                 stuck messages in my origional post). 
>                                 Is all this storage related?
>
>                                 I also have two different gluster
>                                 volumes for VM storage, and only one
>                                 had the issues, but now VMs in both
>                                 are being affected at the same time
>                                 and same way.
>
>                                 --Jim
>
>                                 On Mon, May 28, 2018 at 10:50 PM,
>                                 Sahina Bose <sabose at redhat.com
>                                 <mailto:sabose at redhat.com>> wrote:
>
>                                     [Adding gluster-users to look at
>                                     the heal issue]
>
>                                     On Tue, May 29, 2018 at 9:17 AM,
>                                     Jim Kusznir <jim at palousetech.com
>                                     <mailto:jim at palousetech.com>> wrote:
>
>                                         Hello:
>
>                                         I've been having some cluster
>                                         and gluster performance issues
>                                         lately.  I also found that my
>                                         cluster was out of date, and
>                                         was trying to apply updates
>                                         (hoping to fix some of these),
>                                         and discovered the ovirt 4.1
>                                         repos were taken completely
>                                         offline.  So, I was forced to
>                                         begin an upgrade to 4.2.
>                                         According to docs I
>                                         found/read, I needed only add
>                                         the new repo, do a yum update,
>                                         reboot, and be good on my
>                                         hosts (did the yum update, the
>                                         engine-setup on my hosted
>                                         engine). Things seemed to work
>                                         relatively well, except for a
>                                         gluster sync issue that showed up.
>
>                                         My cluster is a 3 node
>                                         hyperconverged cluster.  I
>                                         upgraded the hosted engine
>                                         first, then engine 3. When
>                                         engine 3 came back up, for
>                                         some reason one of my gluster
>                                         volumes would not sync. Here's
>                                         sample output:
>
>                                         [root at ovirt3 ~]# gluster
>                                         volume heal data-hdd info
>                                         Brick
>                                         172.172.1.11:/gluster/brick3/data-hdd
>                                         /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/48d7ecb8-7ac5-4725-bca5-b3519681cf2f/0d6080b0-7018-4fa3-bb82-1dd9ef07d9b9
>
>                                         /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/647be733-f153-4cdc-85bd-ba72544c2631/b453a300-0602-4be1-8310-8bd5abe00971
>
>                                         /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/6da854d1-b6be-446b-9bf0-90a0dbbea830/3c93bd1f-b7fa-4aa2-b445-6904e31839ba
>
>                                         /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/7f647567-d18c-44f1-a58e-9b8865833acb/f9364470-9770-4bb1-a6b9-a54861849625
>
>                                         /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/f3c8e7aa-6ef2-42a7-93d4-e0a4df6dd2fa/2eb0b1ad-2606-44ef-9cd3-ae59610a504b
>
>                                         /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/b1ea3f62-0f05-4ded-8c82-9c91c90e0b61/d5d6bf5a-499f-431d-9013-5453db93ed32
>
>                                         /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/8c8b5147-e9d6-4810-b45b-185e3ed65727/16f08231-93b0-489d-a2fd-687b6bf88eaa
>
>                                         /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/12924435-b9c2-4aab-ba19-1c1bc31310ef/07b3db69-440e-491e-854c-bbfa18a7cff2
>
>                                         Status: Connected
>                                         Number of entries: 8
>
>                                         Brick
>                                         172.172.1.12:/gluster/brick3/data-hdd
>                                         /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/48d7ecb8-7ac5-4725-bca5-b3519681cf2f/0d6080b0-7018-4fa3-bb82-1dd9ef07d9b9
>
>                                         /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/647be733-f153-4cdc-85bd-ba72544c2631/b453a300-0602-4be1-8310-8bd5abe00971
>
>                                         /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/b1ea3f62-0f05-4ded-8c82-9c91c90e0b61/d5d6bf5a-499f-431d-9013-5453db93ed32
>
>                                         /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/6da854d1-b6be-446b-9bf0-90a0dbbea830/3c93bd1f-b7fa-4aa2-b445-6904e31839ba
>
>                                         /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/7f647567-d18c-44f1-a58e-9b8865833acb/f9364470-9770-4bb1-a6b9-a54861849625
>
>                                         /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/8c8b5147-e9d6-4810-b45b-185e3ed65727/16f08231-93b0-489d-a2fd-687b6bf88eaa
>
>                                         /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/12924435-b9c2-4aab-ba19-1c1bc31310ef/07b3db69-440e-491e-854c-bbfa18a7cff2
>
>                                         /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/f3c8e7aa-6ef2-42a7-93d4-e0a4df6dd2fa/2eb0b1ad-2606-44ef-9cd3-ae59610a504b
>
>                                         Status: Connected
>                                         Number of entries: 8
>
>                                         Brick
>                                         172.172.1.13:/gluster/brick3/data-hdd
>                                         /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/b1ea3f62-0f05-4ded-8c82-9c91c90e0b61/d5d6bf5a-499f-431d-9013-5453db93ed32
>
>                                         /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/8c8b5147-e9d6-4810-b45b-185e3ed65727/16f08231-93b0-489d-a2fd-687b6bf88eaa
>
>                                         /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/12924435-b9c2-4aab-ba19-1c1bc31310ef/07b3db69-440e-491e-854c-bbfa18a7cff2
>
>                                         /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/f3c8e7aa-6ef2-42a7-93d4-e0a4df6dd2fa/2eb0b1ad-2606-44ef-9cd3-ae59610a504b
>
>                                         /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/647be733-f153-4cdc-85bd-ba72544c2631/b453a300-0602-4be1-8310-8bd5abe00971
>
>                                         /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/48d7ecb8-7ac5-4725-bca5-b3519681cf2f/0d6080b0-7018-4fa3-bb82-1dd9ef07d9b9
>
>                                         /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/6da854d1-b6be-446b-9bf0-90a0dbbea830/3c93bd1f-b7fa-4aa2-b445-6904e31839ba
>
>                                         /cc65f671-3377-494a-a7d4-1d9f7c3ae46c/images/7f647567-d18c-44f1-a58e-9b8865833acb/f9364470-9770-4bb1-a6b9-a54861849625
>
>                                         Status: Connected
>                                         Number of entries: 8
>
>                                         ---------
>                                         Its been in this state for a
>                                         couple days now, and bandwidth
>                                         monitoring shows no
>                                         appreciable data moving. I've
>                                         tried repeatedly commanding a
>                                         full heal from all three
>                                         clusters in the node.  Its
>                                         always the same files that
>                                         need healing.
>
>                                         When running gluster volume
>                                         heal data-hdd statistics, I
>                                         see sometimes different
>                                         information, but always some
>                                         number of "heal failed"
>                                         entries.  It shows 0 for split
>                                         brain.
>
>                                         I'm not quite sure what to
>                                         do.  I suspect it may be due
>                                         to nodes 1 and 2 still being
>                                         on the older ovirt/gluster
>                                         release, but I'm afraid to
>                                         upgrade and reboot them until
>                                         I have a good gluster sync
>                                         (don't need to create a split
>                                         brain issue).  How do I
>                                         proceed with this?
>
>                                         Second issue: I've been
>                                         experiencing VERY POOR
>                                         performance on most of my
>                                         VMs.  To the tune that logging
>                                         into a windows 10 vm via
>                                         remote desktop can take 5
>                                         minutes, launching quickbooks
>                                         inside said vm can easily take
>                                         10 minutes.  On some linux
>                                         VMs, I get random messages
>                                         like this:
>                                         Message from syslogd at unifi at
>                                         May 28 20:39:23 ...
>                                          kernel:[6171996.308904] NMI
>                                         watchdog: BUG: soft lockup -
>                                         CPU#0 stuck for 22s!
>                                         [mongod:14766]
>
>                                         (the process and PID are often
>                                         different)
>
>                                         I'm not quite sure what to do
>                                         about this either.  My initial
>                                         thought was upgrad everything
>                                         to current and see if its
>                                         still there, but I cannot move
>                                         forward with that until my
>                                         gluster is healed...
>
>                                         Thanks!
>                                         --Jim
>
>                                         _______________________________________________
>                                         Users mailing list --
>                                         users at ovirt.org
>                                         <mailto:users at ovirt.org>
>                                         To unsubscribe send an email
>                                         to users-leave at ovirt.org
>                                         <mailto:users-leave at ovirt.org>
>                                         Privacy Statement:
>                                         https://www.ovirt.org/site/privacy-policy/
>                                         <https://www.ovirt.org/site/privacy-policy/>
>                                         oVirt Code of Conduct:
>                                         https://www.ovirt.org/community/about/community-guidelines/
>                                         <https://www.ovirt.org/community/about/community-guidelines/>
>                                         List Archives:
>                                         https://lists.ovirt.org/archives/list/users@ovirt.org/message/3LEV6ZQ3JV2XLAL7NYBTXOYMYUOTIRQF/
>                                         <https://lists.ovirt.org/archives/list/users@ovirt.org/message/3LEV6ZQ3JV2XLAL7NYBTXOYMYUOTIRQF/>
>
>
>
>
>
>
>                     _______________________________________________
>                     Users mailing list -- users at ovirt.org
>                     <mailto:users at ovirt.org>
>                     To unsubscribe send an email to
>                     users-leave at ovirt.org <mailto:users-leave at ovirt.org>
>                     Privacy Statement:
>                     https://www.ovirt.org/site/privacy-policy/
>                     <https://www.ovirt.org/site/privacy-policy/>
>                     oVirt Code of Conduct:
>                     https://www.ovirt.org/community/about/community-guidelines/
>                     <https://www.ovirt.org/community/about/community-guidelines/>
>                     List Archives:
>                     https://lists.ovirt.org/archives/list/users@ovirt.org/message/ACO7RFSLBSRBAIONIC2HQ6Z24ZDES5MF/
>                     <https://lists.ovirt.org/archives/list/users@ovirt.org/message/ACO7RFSLBSRBAIONIC2HQ6Z24ZDES5MF/>
>
>
>
>
>
>     _______________________________________________
>     Users mailing list -- users at ovirt.org <mailto:users at ovirt.org>
>     To unsubscribe send an email to users-leave at ovirt.org
>     <mailto:users-leave at ovirt.org>
>     Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>     <https://www.ovirt.org/site/privacy-policy/>
>     oVirt Code of Conduct:
>     https://www.ovirt.org/community/about/community-guidelines/
>     <https://www.ovirt.org/community/about/community-guidelines/>
>     List Archives:
>     https://lists.ovirt.org/archives/list/users@ovirt.org/message/3DEQQLJM3WHQNZJ7KEMRZVFZ52MTIL74/
>     <https://lists.ovirt.org/archives/list/users@ovirt.org/message/3DEQQLJM3WHQNZJ7KEMRZVFZ52MTIL74/>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180530/0704061f/attachment.html>