[Gluster-users] [ovirt-users] Re: Gluster problems, cluster performance issues

Jim Kusznir jim at palousetech.com
Sat Jun 2 16:51:23 UTC 2018


I did eventually get ovirt1 to come up.  I lost track of the specific
combination of things, but I know the last command was "reinstall host"
from the gui.  I had tried that several times prior and it failed, but this
time it worked.  As I was running on almost no sleep, I don't remember the
final things that allowed the reinstall to work.  It did take my gluster
volumes down again in the process, though.

Once I got ovirt1 back up and operation and got my gluster volumes online
again, then the gui recognize my storage domains and brought the data
center up, and then I could start doing things in the gui again (like
starting VMs).

I will perform more in depth analysis and troubleshooting of ovirt1's low
level storage performance today.  According to everything I've checked so
far, I don't see anything wrong with it, though.

On the gluster side, I am using LVM and LV_Thin for my gluster backing
volumes...Is that expected to be a problem?

--Jim

On Thu, May 31, 2018 at 11:25 PM, Sahina Bose <sabose at redhat.com> wrote:

>
>
> On Thu, May 31, 2018 at 3:16 AM, Jim Kusznir <jim at palousetech.com> wrote:
>
>> I've been back at it, and still am  unable to get more than one of my
>> physical nodes to come online in ovirt, nor am I able to get more than the
>> two gluster volumes (storage domains) to show online within ovirt.
>>
>> In Storage -> Volumes, they all show offline (many with one brick down,
>> which is correct: I have one server off)
>> However, in Storage -> domains, they all show down (although somehow, I
>> got machines from the data domain to start), and there's no obvious way
>> I've seen to tell them to bring it online.  It claims none of the servers
>> can see that volume, but I've quadruple-checked that the volumes are
>> mounted on the engines and are fully functional there.  I have some more
>> VMs I need to get back up and running.  How do I fix this?
>>
>> ovirt1, for unknown reasons, will not work.  Attempts to bring it online
>> fail, and I haven't figured out what log file to look in yet for more
>> details.
>>
>
>
> As Krutika mentioned before, the storage performance issues seems to stem
> from the slow brick on ovirt1 - ovirt1.nwfiber.com:/gluster/brick1/engine
>
> Have you checked this?
>
> When you say ovirt1 will not work - do you mean the host cannot be
> activated in oVirt engine? Can you look at/share  the engine.log (on the HE
> VM) to understand why it won't. Also if the storage domain is not
> accessible by any of the servers, check/share the vdsm.log to understand
> the error.
>
>
>
>> On Wed, May 30, 2018 at 9:36 AM, Jim Kusznir <jim at palousetech.com> wrote:
>>
>>> Hi all again:
>>>
>>> I'm now subscribed to gluster-users as well, so I should get any replies
>>> from that side too.
>>>
>>> At this point, I am seeing acceptable (although slower than I expect)
>>> performance much of the time, with periodic massive spikes in latency
>>> (occasionally so bad as to cause ovirt to detect a engine bad health
>>> status).  Often, if I check the logs just then, I'll see those call traces
>>> in xfs_log_worker or other gluster processes, as well as hung task timeout
>>> messages.
>>>
>>> As to the profile suggesting ovirt1 had poorer performance than ovirt2,
>>> I don't have an explanation.  gluster volume info engine on both hosts are
>>> identical.  The computers and drives are identical (Dell R610 with PERC 6/i
>>> controller configured to pass through the drive).  ovirt1 and ovirt2's
>>> partiion scheme/map do vary somewhat, but I figured that wouldn't be a big
>>> deal and gluster just uses the least common denominator.  Is that accurate?
>>>
>>> In case it was missed, here are the dmesg errors being thrown by gluster
>>> (as far as I can tell).  They definately started after I upgraded from
>>> gluster 3.8 to 3.12:
>>>
>>> [ 6604.536156] perf: interrupt took too long (9593 > 9565), lowering
>>> kernel.perf_event_max_sample_rate to 20000
>>> [ 8280.188734] INFO: task xfsaild/dm-10:1061 blocked for more than 120
>>> seconds.
>>> [ 8280.188787] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [ 8280.188837] xfsaild/dm-10   D ffff93203a2eeeb0     0  1061      2
>>> 0x00000000
>>> [ 8280.188843] Call Trace:
>>> [ 8280.188857]  [<ffffffff960a3a2e>] ? try_to_del_timer_sync+0x5e/0x90
>>> [ 8280.188864]  [<ffffffff96713f79>] schedule+0x29/0x70
>>> [ 8280.188932]  [<ffffffffc049fe36>] _xfs_log_force+0x1c6/0x2c0 [xfs]
>>> [ 8280.188939]  [<ffffffff960cf1b0>] ? wake_up_state+0x20/0x20
>>> [ 8280.188972]  [<ffffffffc04abfec>] ? xfsaild+0x16c/0x6f0 [xfs]
>>> [ 8280.189003]  [<ffffffffc049ff5c>] xfs_log_force+0x2c/0x70 [xfs]
>>> [ 8280.189035]  [<ffffffffc04abe80>] ? xfs_trans_ail_cursor_first+0x90/0x90
>>> [xfs]
>>> [ 8280.189067]  [<ffffffffc04abfec>] xfsaild+0x16c/0x6f0 [xfs]
>>> [ 8280.189100]  [<ffffffffc04abe80>] ? xfs_trans_ail_cursor_first+0x90/0x90
>>> [xfs]
>>> [ 8280.189105]  [<ffffffff960bb161>] kthread+0xd1/0xe0
>>> [ 8280.189109]  [<ffffffff960bb090>] ? insert_kthread_work+0x40/0x40
>>> [ 8280.189117]  [<ffffffff96720677>] ret_from_fork_nospec_begin+0x2
>>> 1/0x21
>>> [ 8280.189121]  [<ffffffff960bb090>] ? insert_kthread_work+0x40/0x40
>>> [ 8280.189161] INFO: task glusterclogfsyn:6421 blocked for more than 120
>>> seconds.
>>> [ 8280.189207] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [ 8280.189256] glusterclogfsyn D ffff93203a2edee0     0  6421      1
>>> 0x00000080
>>> [ 8280.189260] Call Trace:
>>> [ 8280.189265]  [<ffffffff96713f79>] schedule+0x29/0x70
>>> [ 8280.189300]  [<ffffffffc04a0388>] _xfs_log_force_lsn+0x2e8/0x340 [xfs]
>>> [ 8280.189305]  [<ffffffff960cf1b0>] ? wake_up_state+0x20/0x20
>>> [ 8280.189333]  [<ffffffffc0480b97>] xfs_file_fsync+0x107/0x1e0 [xfs]
>>> [ 8280.189340]  [<ffffffff9624f0e7>] do_fsync+0x67/0xb0
>>> [ 8280.189345]  [<ffffffff9672076f>] ? system_call_after_swapgs+0xbc/
>>> 0x160
>>> [ 8280.189348]  [<ffffffff9624f3d0>] SyS_fsync+0x10/0x20
>>> [ 8280.189352]  [<ffffffff9672082f>] system_call_fastpath+0x1c/0x21
>>> [ 8280.189356]  [<ffffffff9672077b>] ? system_call_after_swapgs+0xc8/
>>> 0x160
>>> [ 8280.189360] INFO: task glusteriotwr2:766 blocked for more than 120
>>> seconds.
>>> [ 8280.189404] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [ 8280.189453] glusteriotwr2   D ffff931cfa510000     0   766      1
>>> 0x00000080
>>> [ 8280.189457] Call Trace:
>>> [ 8280.189476]  [<ffffffffc01a3839>] ? __split_and_process_bio+0x2e9/0x520
>>> [dm_mod]
>>> [ 8280.189480]  [<ffffffff96713f79>] schedule+0x29/0x70
>>> [ 8280.189484]  [<ffffffff967118e9>] schedule_timeout+0x239/0x2c0
>>> [ 8280.189494]  [<ffffffffc01a3d98>] ? dm_make_request+0x128/0x1a0
>>> [dm_mod]
>>> [ 8280.189498]  [<ffffffff9671348d>] io_schedule_timeout+0xad/0x130
>>> [ 8280.189502]  [<ffffffff967145ad>] wait_for_completion_io+0xfd/0x140
>>> [ 8280.189507]  [<ffffffff960cf1b0>] ? wake_up_state+0x20/0x20
>>> [ 8280.189513]  [<ffffffff9631e574>] blkdev_issue_flush+0xb4/0x110
>>> [ 8280.189546]  [<ffffffffc04984b9>] xfs_blkdev_issue_flush+0x19/0x20
>>> [xfs]
>>> [ 8280.189588]  [<ffffffffc0480c40>] xfs_file_fsync+0x1b0/0x1e0 [xfs]
>>> [ 8280.189593]  [<ffffffff9624f0e7>] do_fsync+0x67/0xb0
>>> [ 8280.189597]  [<ffffffff9672076f>] ? system_call_after_swapgs+0xbc/
>>> 0x160
>>> [ 8280.189600]  [<ffffffff9624f3d0>] SyS_fsync+0x10/0x20
>>> [ 8280.189604]  [<ffffffff9672082f>] system_call_fastpath+0x1c/0x21
>>> [ 8280.189608]  [<ffffffff9672077b>] ? system_call_after_swapgs+0xc8/
>>> 0x160
>>> [ 8400.186024] INFO: task xfsaild/dm-10:1061 blocked for more than 120
>>> seconds.
>>> [ 8400.186077] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [ 8400.186127] xfsaild/dm-10   D ffff93203a2eeeb0     0  1061      2
>>> 0x00000000
>>> [ 8400.186133] Call Trace:
>>> [ 8400.186146]  [<ffffffff960a3a2e>] ? try_to_del_timer_sync+0x5e/0x90
>>> [ 8400.186153]  [<ffffffff96713f79>] schedule+0x29/0x70
>>> [ 8400.186221]  [<ffffffffc049fe36>] _xfs_log_force+0x1c6/0x2c0 [xfs]
>>> [ 8400.186227]  [<ffffffff960cf1b0>] ? wake_up_state+0x20/0x20
>>> [ 8400.186260]  [<ffffffffc04abfec>] ? xfsaild+0x16c/0x6f0 [xfs]
>>> [ 8400.186292]  [<ffffffffc049ff5c>] xfs_log_force+0x2c/0x70 [xfs]
>>> [ 8400.186324]  [<ffffffffc04abe80>] ? xfs_trans_ail_cursor_first+0x90/0x90
>>> [xfs]
>>> [ 8400.186356]  [<ffffffffc04abfec>] xfsaild+0x16c/0x6f0 [xfs]
>>> [ 8400.186388]  [<ffffffffc04abe80>] ? xfs_trans_ail_cursor_first+0x90/0x90
>>> [xfs]
>>> [ 8400.186394]  [<ffffffff960bb161>] kthread+0xd1/0xe0
>>> [ 8400.186398]  [<ffffffff960bb090>] ? insert_kthread_work+0x40/0x40
>>> [ 8400.186405]  [<ffffffff96720677>] ret_from_fork_nospec_begin+0x2
>>> 1/0x21
>>> [ 8400.186409]  [<ffffffff960bb090>] ? insert_kthread_work+0x40/0x40
>>> [ 8400.186450] INFO: task glusterclogfsyn:6421 blocked for more than 120
>>> seconds.
>>> [ 8400.186496] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [ 8400.186545] glusterclogfsyn D ffff93203a2edee0     0  6421      1
>>> 0x00000080
>>> [ 8400.186549] Call Trace:
>>> [ 8400.186554]  [<ffffffff96713f79>] schedule+0x29/0x70
>>> [ 8400.186589]  [<ffffffffc04a0388>] _xfs_log_force_lsn+0x2e8/0x340 [xfs]
>>> [ 8400.186593]  [<ffffffff960cf1b0>] ? wake_up_state+0x20/0x20
>>> [ 8400.186622]  [<ffffffffc0480b97>] xfs_file_fsync+0x107/0x1e0 [xfs]
>>> [ 8400.186629]  [<ffffffff9624f0e7>] do_fsync+0x67/0xb0
>>> [ 8400.186634]  [<ffffffff9672076f>] ? system_call_after_swapgs+0xbc/
>>> 0x160
>>> [ 8400.186637]  [<ffffffff9624f3d0>] SyS_fsync+0x10/0x20
>>> [ 8400.186641]  [<ffffffff9672082f>] system_call_fastpath+0x1c/0x21
>>> [ 8400.186645]  [<ffffffff9672077b>] ? system_call_after_swapgs+0xc8/
>>> 0x160
>>> [ 8400.186649] INFO: task glusteriotwr2:766 blocked for more than 120
>>> seconds.
>>> [ 8400.186693] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [ 8400.186741] glusteriotwr2   D ffff931cfa510000     0   766      1
>>> 0x00000080
>>> [ 8400.186746] Call Trace:
>>> [ 8400.186764]  [<ffffffffc01a3839>] ? __split_and_process_bio+0x2e9/0x520
>>> [dm_mod]
>>> [ 8400.186768]  [<ffffffff96713f79>] schedule+0x29/0x70
>>> [ 8400.186772]  [<ffffffff967118e9>] schedule_timeout+0x239/0x2c0
>>> [ 8400.186782]  [<ffffffffc01a3d98>] ? dm_make_request+0x128/0x1a0
>>> [dm_mod]
>>> [ 8400.186786]  [<ffffffff9671348d>] io_schedule_timeout+0xad/0x130
>>> [ 8400.186790]  [<ffffffff967145ad>] wait_for_completion_io+0xfd/0x140
>>> [ 8400.186795]  [<ffffffff960cf1b0>] ? wake_up_state+0x20/0x20
>>> [ 8400.186800]  [<ffffffff9631e574>] blkdev_issue_flush+0xb4/0x110
>>> [ 8400.186833]  [<ffffffffc04984b9>] xfs_blkdev_issue_flush+0x19/0x20
>>> [xfs]
>>> [ 8400.186862]  [<ffffffffc0480c40>] xfs_file_fsync+0x1b0/0x1e0 [xfs]
>>> [ 8400.186879]  [<ffffffff9624f0e7>] do_fsync+0x67/0xb0
>>> [ 8400.186884]  [<ffffffff9672076f>] ? system_call_after_swapgs+0xbc/
>>> 0x160
>>> [ 8400.186887]  [<ffffffff9624f3d0>] SyS_fsync+0x10/0x20
>>> [ 8400.186891]  [<ffffffff9672082f>] system_call_fastpath+0x1c/0x21
>>> [ 8400.186895]  [<ffffffff9672077b>] ? system_call_after_swapgs+0xc8/
>>> 0x160
>>> [ 8400.186912] INFO: task kworker/3:4:3191 blocked for more than 120
>>> seconds.
>>> [ 8400.186957] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [ 8400.187006] kworker/3:4     D ffff931faf598fd0     0  3191      2
>>> 0x00000080
>>> [ 8400.187041] Workqueue: xfs-sync/dm-10 xfs_log_worker [xfs]
>>> [ 8400.187043] Call Trace:
>>> [ 8400.187050]  [<ffffffff9614e45f>] ? delayacct_end+0x8f/0xb0
>>> [ 8400.187055]  [<ffffffff960dbbc8>] ? update_group_power+0x28/0x280
>>> [ 8400.187059]  [<ffffffff96713f79>] schedule+0x29/0x70
>>> [ 8400.187062]  [<ffffffff967118e9>] schedule_timeout+0x239/0x2c0
>>> [ 8400.187066]  [<ffffffff9671432d>] wait_for_completion+0xfd/0x140
>>> [ 8400.187071]  [<ffffffff960cf1b0>] ? wake_up_state+0x20/0x20
>>> [ 8400.187075]  [<ffffffff960b418d>] flush_work+0xfd/0x190
>>> [ 8400.187079]  [<ffffffff960b06e0>] ? move_linked_works+0x90/0x90
>>> [ 8400.187110]  [<ffffffffc04a1d8a>] xlog_cil_force_lsn+0x8a/0x210 [xfs]
>>> [ 8400.187142]  [<ffffffffc049fcf5>] _xfs_log_force+0x85/0x2c0 [xfs]
>>> [ 8400.187174]  [<ffffffffc049ffd6>] ? xfs_log_worker+0x36/0x100 [xfs]
>>> [ 8400.187206]  [<ffffffffc049ff5c>] xfs_log_force+0x2c/0x70 [xfs]
>>> [ 8400.187237]  [<ffffffffc049ffd6>] xfs_log_worker+0x36/0x100 [xfs]
>>> [ 8400.187241]  [<ffffffff960b312f>] process_one_work+0x17f/0x440
>>> [ 8400.187245]  [<ffffffff960b3df6>] worker_thread+0x126/0x3c0
>>> [ 8400.187249]  [<ffffffff960b3cd0>] ? manage_workers.isra.24+0x2a0/0
>>> x2a0
>>> [ 8400.187253]  [<ffffffff960bb161>] kthread+0xd1/0xe0
>>> [ 8400.187257]  [<ffffffff960bb090>] ? insert_kthread_work+0x40/0x40
>>> [ 8400.187261]  [<ffffffff96720677>] ret_from_fork_nospec_begin+0x2
>>> 1/0x21
>>> [ 8400.187265]  [<ffffffff960bb090>] ? insert_kthread_work+0x40/0x40
>>> [13493.643624] perf: interrupt took too long (12026 > 11991), lowering
>>> kernel.perf_event_max_sample_rate to 16000
>>>
>>>
>>> On Wed, May 30, 2018 at 2:44 AM, Krutika Dhananjay <kdhananj at redhat.com>
>>> wrote:
>>>
>>>> The profile seems to suggest very high latencies on the brick at
>>>> ovirt1.nwfiber.com:/gluster/brick1/engine
>>>> ovirt2.* shows decent numbers. Is everything OK with the brick on
>>>> ovirt1?
>>>> Are the bricks of engine volume on both these servers identical in
>>>> terms of their config?
>>>>
>>>> -Krutika
>>>>
>>>>
>>>> On Wed, May 30, 2018 at 3:07 PM, Jim Kusznir <jim at palousetech.com>
>>>> wrote:
>>>>
>>>>> Hi:
>>>>>
>>>>> Thank you.  I was finally able to get my cluster minimally functional
>>>>> again (its 2am local here; had to be up by 6am).  I set the
>>>>> cluster.eager-lock off, but I'm still seeing performance issues.  Here's
>>>>> the engine volume:
>>>>>
>>>>> Brick: ovirt1.nwfiber.com:/gluster/brick1/engine
>>>>> ------------------------------------------------
>>>>> Cumulative Stats:
>>>>>    Block Size:                256b+                 512b+
>>>>>   4096b+
>>>>>  No. of Reads:                  702                    24
>>>>>      772
>>>>> No. of Writes:                    0                   939
>>>>>     4802
>>>>>
>>>>>    Block Size:               8192b+               16384b+
>>>>>  32768b+
>>>>>  No. of Reads:                    0                    51
>>>>>       49
>>>>> No. of Writes:                 4211                   858
>>>>>      291
>>>>>
>>>>>    Block Size:              65536b+              131072b+
>>>>>  No. of Reads:                   49                  5232
>>>>> No. of Writes:                  333                   588
>>>>>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls
>>>>>      Fop
>>>>>  ---------   -----------   -----------   -----------   ------------
>>>>>     ----
>>>>>       0.00       0.00 us       0.00 us       0.00 us              3
>>>>>   FORGET
>>>>>       0.00       0.00 us       0.00 us       0.00 us           1227
>>>>>  RELEASE
>>>>>       0.00       0.00 us       0.00 us       0.00 us           1035
>>>>> RELEASEDIR
>>>>>       0.00     827.00 us     619.00 us    1199.00 us             10
>>>>> READDIRP
>>>>>       0.00      98.38 us      30.00 us     535.00 us            144
>>>>>  ENTRYLK
>>>>>       0.00     180.11 us     121.00 us     257.00 us             94
>>>>> REMOVEXATTR
>>>>>       0.00     208.03 us      23.00 us    1980.00 us            182
>>>>> GETXATTR
>>>>>       0.00    1212.71 us      35.00 us   29459.00 us             38
>>>>>  READDIR
>>>>>       0.01     260.39 us     172.00 us     552.00 us            376
>>>>>  XATTROP
>>>>>       0.01     305.10 us      45.00 us  144983.00 us            727
>>>>>   STATFS
>>>>>       0.03    1267.23 us      22.00 us  418399.00 us            451
>>>>>    FLUSH
>>>>>       0.04    1164.14 us      86.00 us  437638.00 us            600
>>>>>     OPEN
>>>>>       0.07    2253.75 us       3.00 us  645037.00 us            598
>>>>>  OPENDIR
>>>>>       0.08   17360.97 us     149.00 us  962156.00 us             94
>>>>>  SETATTR
>>>>>       0.12    2733.36 us      50.00 us 1683945.00 us            877
>>>>>    FSTAT
>>>>>       0.21    3041.55 us      29.00 us 1021732.00 us           1368
>>>>>  INODELK
>>>>>       0.24     389.93 us     107.00 us  550203.00 us          11840
>>>>> FXATTROP
>>>>>       0.34   14820.49 us      38.00 us 1935527.00 us            444
>>>>>     STAT
>>>>>       0.88    3765.77 us      42.00 us 1341978.00 us           4581
>>>>>   LOOKUP
>>>>>      15.06   19624.04 us      26.00 us 6412511.00 us          14971
>>>>> FINODELK
>>>>>      21.63   59817.26 us      45.00 us 4008832.00 us           7054
>>>>>    FSYNC
>>>>>      23.19  263941.14 us      86.00 us 4130892.00 us           1714
>>>>>     READ
>>>>>      38.10  125578.64 us     207.00 us 7682338.00 us           5919
>>>>>    WRITE
>>>>>
>>>>>     Duration: 6957 seconds
>>>>>    Data Read: 694832857 bytes
>>>>> Data Written: 192751104 bytes
>>>>>
>>>>> Interval 0 Stats:
>>>>>    Block Size:                256b+                 512b+
>>>>>   4096b+
>>>>>  No. of Reads:                  702                    24
>>>>>      772
>>>>> No. of Writes:                    0                   939
>>>>>     4802
>>>>>
>>>>>    Block Size:               8192b+               16384b+
>>>>>  32768b+
>>>>>  No. of Reads:                    0                    51
>>>>>       49
>>>>> No. of Writes:                 4211                   858
>>>>>      291
>>>>>
>>>>>    Block Size:              65536b+              131072b+
>>>>>  No. of Reads:                   49                  5232
>>>>> No. of Writes:                  333                   588
>>>>>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls
>>>>>      Fop
>>>>>  ---------   -----------   -----------   -----------   ------------
>>>>>     ----
>>>>>       0.00       0.00 us       0.00 us       0.00 us              3
>>>>>   FORGET
>>>>>       0.00       0.00 us       0.00 us       0.00 us           1227
>>>>>  RELEASE
>>>>>       0.00       0.00 us       0.00 us       0.00 us           1035
>>>>> RELEASEDIR
>>>>>       0.00     827.00 us     619.00 us    1199.00 us             10
>>>>> READDIRP
>>>>>       0.00      98.38 us      30.00 us     535.00 us            144
>>>>>  ENTRYLK
>>>>>       0.00     180.11 us     121.00 us     257.00 us             94
>>>>> REMOVEXATTR
>>>>>       0.00     208.03 us      23.00 us    1980.00 us            182
>>>>> GETXATTR
>>>>>       0.00    1212.71 us      35.00 us   29459.00 us             38
>>>>>  READDIR
>>>>>       0.01     260.39 us     172.00 us     552.00 us            376
>>>>>  XATTROP
>>>>>       0.01     305.10 us      45.00 us  144983.00 us            727
>>>>>   STATFS
>>>>>       0.03    1267.23 us      22.00 us  418399.00 us            451
>>>>>    FLUSH
>>>>>       0.04    1164.14 us      86.00 us  437638.00 us            600
>>>>>     OPEN
>>>>>       0.07    2253.75 us       3.00 us  645037.00 us            598
>>>>>  OPENDIR
>>>>>       0.08   17360.97 us     149.00 us  962156.00 us             94
>>>>>  SETATTR
>>>>>       0.12    2733.36 us      50.00 us 1683945.00 us            877
>>>>>    FSTAT
>>>>>       0.21    3041.55 us      29.00 us 1021732.00 us           1368
>>>>>  INODELK
>>>>>       0.24     389.93 us     107.00 us  550203.00 us          11840
>>>>> FXATTROP
>>>>>       0.34   14820.49 us      38.00 us 1935527.00 us            444
>>>>>     STAT
>>>>>       0.91    3859.75 us      42.00 us 1341978.00 us           4581
>>>>>   LOOKUP
>>>>>      15.05   19622.80 us      26.00 us 6412511.00 us          14971
>>>>> FINODELK
>>>>>      21.62   59809.04 us      45.00 us 4008832.00 us           7054
>>>>>    FSYNC
>>>>>      23.18  263941.14 us      86.00 us 4130892.00 us           1714
>>>>>     READ
>>>>>      38.09  125562.74 us     207.00 us 7682338.00 us           5919
>>>>>    WRITE
>>>>>
>>>>>     Duration: 6957 seconds
>>>>>    Data Read: 694832857 bytes
>>>>> Data Written: 192751104 bytes
>>>>>
>>>>> Brick: ovirt2.nwfiber.com:/gluster/brick1/engine
>>>>> ------------------------------------------------
>>>>> Cumulative Stats:
>>>>>    Block Size:                256b+                 512b+
>>>>>   1024b+
>>>>>  No. of Reads:                  704                   167
>>>>>      216
>>>>> No. of Writes:                    0                   939
>>>>>        0
>>>>>
>>>>>    Block Size:               2048b+                4096b+
>>>>>   8192b+
>>>>>  No. of Reads:                  424                 34918
>>>>>     9866
>>>>> No. of Writes:                    0                  4802
>>>>>     4211
>>>>>
>>>>>    Block Size:              16384b+               32768b+
>>>>>  65536b+
>>>>>  No. of Reads:                12695                  3290
>>>>>     3880
>>>>> No. of Writes:                  858                   291
>>>>>      333
>>>>>
>>>>>    Block Size:             131072b+
>>>>>  No. of Reads:                13718
>>>>> No. of Writes:                  596
>>>>>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls
>>>>>      Fop
>>>>>  ---------   -----------   -----------   -----------   ------------
>>>>>     ----
>>>>>       0.00       0.00 us       0.00 us       0.00 us              3
>>>>>   FORGET
>>>>>       0.00       0.00 us       0.00 us       0.00 us           1228
>>>>>  RELEASE
>>>>>       0.00       0.00 us       0.00 us       0.00 us           1041
>>>>> RELEASEDIR
>>>>>       0.06      74.26 us      21.00 us     915.00 us            144
>>>>>  ENTRYLK
>>>>>       0.07     141.97 us      87.00 us     268.00 us             94
>>>>> REMOVEXATTR
>>>>>       0.08     370.68 us      22.00 us    1311.00 us             38
>>>>>  READDIR
>>>>>       0.09     184.88 us     108.00 us     316.00 us             94
>>>>>  SETATTR
>>>>>       0.14      56.41 us      21.00 us     540.00 us            451
>>>>>    FLUSH
>>>>>       0.14     139.42 us      18.00 us    3251.00 us            184
>>>>> GETXATTR
>>>>>       0.31      96.06 us       2.00 us    2293.00 us            598
>>>>>  OPENDIR
>>>>>       0.35      88.76 us      31.00 us     716.00 us            727
>>>>>   STATFS
>>>>>       0.43     131.96 us      59.00 us     784.00 us            600
>>>>>     OPEN
>>>>>       0.44     214.77 us     110.00 us     851.00 us            376
>>>>>  XATTROP
>>>>>       0.44      81.02 us      21.00 us    3029.00 us            998
>>>>>     STAT
>>>>>       0.51      68.57 us      17.00 us    3039.00 us           1368
>>>>>  INODELK
>>>>>       1.26     429.06 us     274.00 us     988.00 us            539
>>>>> READDIRP
>>>>>       1.89     109.48 us      38.00 us    2492.00 us           3175
>>>>>    FSTAT
>>>>>       5.09      62.48 us      19.00 us    4578.00 us          14967
>>>>> FINODELK
>>>>>       5.72     229.43 us      32.00 us     979.00 us           4581
>>>>>   LOOKUP
>>>>>      11.48     356.40 us     144.00 us   16222.00 us           5917
>>>>>    WRITE
>>>>>      12.40     192.55 us      87.00 us    4372.00 us          11834
>>>>> FXATTROP
>>>>>      12.48     325.11 us      33.00 us   10702.00 us           7051
>>>>>    FSYNC
>>>>>      46.61    2063.11 us     131.00 us  193833.00 us           4151
>>>>>     READ
>>>>>
>>>>>     Duration: 6958 seconds
>>>>>    Data Read: 2782908225 bytes
>>>>> Data Written: 193799680 bytes
>>>>>
>>>>> Interval 0 Stats:
>>>>>    Block Size:                256b+                 512b+
>>>>>   1024b+
>>>>>  No. of Reads:                  704                   167
>>>>>      216
>>>>> No. of Writes:                    0                   939
>>>>>        0
>>>>>
>>>>>    Block Size:               2048b+                4096b+
>>>>>   8192b+
>>>>>  No. of Reads:                  424                 34918
>>>>>     9866
>>>>> No. of Writes:                    0                  4802
>>>>>     4211
>>>>>
>>>>>    Block Size:              16384b+               32768b+
>>>>>  65536b+
>>>>>  No. of Reads:                12695                  3290
>>>>>     3880
>>>>> No. of Writes:                  858                   291
>>>>>      333
>>>>>
>>>>>    Block Size:             131072b+
>>>>>  No. of Reads:                13718
>>>>> No. of Writes:                  596
>>>>>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls
>>>>>      Fop
>>>>>  ---------   -----------   -----------   -----------   ------------
>>>>>     ----
>>>>>       0.00       0.00 us       0.00 us       0.00 us              3
>>>>>   FORGET
>>>>>       0.00       0.00 us       0.00 us       0.00 us           1228
>>>>>  RELEASE
>>>>>       0.00       0.00 us       0.00 us       0.00 us           1041
>>>>> RELEASEDIR
>>>>>       0.06      74.26 us      21.00 us     915.00 us            144
>>>>>  ENTRYLK
>>>>>       0.07     141.97 us      87.00 us     268.00 us             94
>>>>> REMOVEXATTR
>>>>>       0.08     370.68 us      22.00 us    1311.00 us             38
>>>>>  READDIR
>>>>>       0.09     184.88 us     108.00 us     316.00 us             94
>>>>>  SETATTR
>>>>>       0.14      56.41 us      21.00 us     540.00 us            451
>>>>>    FLUSH
>>>>>       0.14     139.42 us      18.00 us    3251.00 us            184
>>>>> GETXATTR
>>>>>       0.31      96.06 us       2.00 us    2293.00 us            598
>>>>>  OPENDIR
>>>>>       0.35      88.76 us      31.00 us     716.00 us            727
>>>>>   STATFS
>>>>>       0.43     131.96 us      59.00 us     784.00 us            600
>>>>>     OPEN
>>>>>       0.44     214.77 us     110.00 us     851.00 us            376
>>>>>  XATTROP
>>>>>       0.44      81.02 us      21.00 us    3029.00 us            998
>>>>>     STAT
>>>>>       0.51      68.57 us      17.00 us    3039.00 us           1368
>>>>>  INODELK
>>>>>       1.26     429.06 us     274.00 us     988.00 us            539
>>>>> READDIRP
>>>>>       1.89     109.51 us      38.00 us    2492.00 us           3175
>>>>>    FSTAT
>>>>>       5.09      62.48 us      19.00 us    4578.00 us          14967
>>>>> FINODELK
>>>>>       5.72     229.43 us      32.00 us     979.00 us           4581
>>>>>   LOOKUP
>>>>>      11.48     356.40 us     144.00 us   16222.00 us           5917
>>>>>    WRITE
>>>>>      12.40     192.55 us      87.00 us    4372.00 us          11834
>>>>> FXATTROP
>>>>>      12.48     325.11 us      33.00 us   10702.00 us           7051
>>>>>    FSYNC
>>>>>      46.62    2063.91 us     131.00 us  193833.00 us           4151
>>>>>     READ
>>>>>
>>>>>     Duration: 6958 seconds
>>>>>    Data Read: 2782908225 bytes
>>>>> Data Written: 193799680 bytes
>>>>>
>>>>>
>>>>>
>>>>> gluster> volume info engine
>>>>>
>>>>> Volume Name: engine
>>>>> Type: Replicate
>>>>> Volume ID: 87ad86b9-d88b-457e-ba21-5d3173c612de
>>>>> Status: Started
>>>>> Snapshot Count: 0
>>>>> Number of Bricks: 1 x (2 + 1) = 3
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: ovirt1.nwfiber.com:/gluster/brick1/engine
>>>>> Brick2: ovirt2.nwfiber.com:/gluster/brick1/engine
>>>>> Brick3: ovirt3.nwfiber.com:/gluster/brick1/engine (arbiter)
>>>>> Options Reconfigured:
>>>>> diagnostics.count-fop-hits: on
>>>>> diagnostics.latency-measurement: on
>>>>> performance.strict-o-direct: on
>>>>> nfs.disable: on
>>>>> user.cifs: off
>>>>> network.ping-timeout: 30
>>>>> cluster.shd-max-threads: 6
>>>>> cluster.shd-wait-qlength: 10000
>>>>> cluster.locking-scheme: granular
>>>>> cluster.data-self-heal-algorithm: full
>>>>> performance.low-prio-threads: 32
>>>>> features.shard-block-size: 512MB
>>>>> features.shard: on
>>>>> storage.owner-gid: 36
>>>>> storage.owner-uid: 36
>>>>> cluster.server-quorum-type: server
>>>>> cluster.quorum-type: auto
>>>>> network.remote-dio: off
>>>>> cluster.eager-lock: off
>>>>> performance.stat-prefetch: off
>>>>> performance.io-cache: off
>>>>> performance.read-ahead: off
>>>>> performance.quick-read: off
>>>>> performance.readdir-ahead: on
>>>>> geo-replication.indexing: on
>>>>> geo-replication.ignore-pid-check: on
>>>>> changelog.changelog: on
>>>>>
>>>>>
>>>>> On Tue, May 29, 2018 at 9:17 PM, Krutika Dhananjay <
>>>>> kdhananj at redhat.com> wrote:
>>>>>
>>>>>> Adding Ravi to look into the heal issue.
>>>>>>
>>>>>> As for the fsync hang and subsequent IO errors, it seems a lot like
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1497156 and Paolo
>>>>>> Bonzini from qemu had pointed out that this would be fixed by the following
>>>>>> commit:
>>>>>>
>>>>>>   commit e72c9a2a67a6400c8ef3d01d4c461dbbbfa0e1f0
>>>>>>     Author: Paolo Bonzini <pbonzini at redhat.com>
>>>>>>     Date:   Wed Jun 21 16:35:46 2017 +0200
>>>>>>
>>>>>>     scsi: virtio_scsi: let host do exception handling
>>>>>>
>>>>>>     virtio_scsi tries to do exception handling after the default 30 seconds
>>>>>>     timeout expires.  However, it's better to let the host control the
>>>>>>     timeout, otherwise with a heavy I/O load it is likely that an abort will
>>>>>>     also timeout.  This leads to fatal errors like filesystems going
>>>>>>     offline.
>>>>>>
>>>>>>     Disable the 'sd' timeout and allow the host to do exception handling,
>>>>>>     following the precedent of the storvsc driver.
>>>>>>
>>>>>>     Hannes has a proposal to introduce timeouts in virtio, but this provides
>>>>>>     an immediate solution for stable kernels too.
>>>>>>
>>>>>>     [mkp: fixed typo]
>>>>>>
>>>>>>     Reported-by: Douglas Miller <dougmill at linux.vnet.ibm.com>
>>>>>>     Cc: "James E.J. Bottomley" <jejb at linux.vnet.ibm.com>
>>>>>>     Cc: "Martin K. Petersen" <martin.petersen at oracle.com>
>>>>>>     Cc: Hannes Reinecke <hare at suse.de>
>>>>>>     Cc: linux-scsi at vger.kernel.org
>>>>>>     Cc: stable at vger.kernel.org
>>>>>>     Signed-off-by: Paolo Bonzini <pbonzini at redhat.com>
>>>>>>     Signed-off-by: Martin K. Petersen <martin.petersen at oracle.com>
>>>>>>
>>>>>>
>>>>>> Adding Paolo/Kevin to comment.
>>>>>>
>>>>>> As for the poor gluster performance, could you disable
>>>>>> cluster.eager-lock and see if that makes any difference:
>>>>>>
>>>>>> # gluster volume set <VOL> cluster.eager-lock off
>>>>>>
>>>>>> Do also capture the volume profile again if you still see performance
>>>>>> issues after disabling eager-lock.
>>>>>>
>>>>>> -Krutika
>>>>>>
>>>>>>
>>>>>> On Wed, May 30, 2018 at 6:55 AM, Jim Kusznir <jim at palousetech.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I also finally found the following in my system log on one server:
>>>>>>>
>>>>>>> [10679.524491] INFO: task glusterclogro:14933 blocked for more than
>>>>>>> 120 seconds.
>>>>>>> [10679.525826] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>>>> disables this message.
>>>>>>> [10679.527144] glusterclogro   D ffff97209832bf40     0 14933      1
>>>>>>> 0x00000080
>>>>>>> [10679.527150] Call Trace:
>>>>>>> [10679.527161]  [<ffffffffb9913f79>] schedule+0x29/0x70
>>>>>>> [10679.527218]  [<ffffffffc060e388>] _xfs_log_force_lsn+0x2e8/0x340
>>>>>>> [xfs]
>>>>>>> [10679.527225]  [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
>>>>>>> [10679.527254]  [<ffffffffc05eeb97>] xfs_file_fsync+0x107/0x1e0 [xfs]
>>>>>>> [10679.527260]  [<ffffffffb944f0e7>] do_fsync+0x67/0xb0
>>>>>>> [10679.527268]  [<ffffffffb992076f>] ? system_call_after_swapgs+0xbc/
>>>>>>> 0x160
>>>>>>> [10679.527271]  [<ffffffffb944f3d0>] SyS_fsync+0x10/0x20
>>>>>>> [10679.527275]  [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
>>>>>>> [10679.527279]  [<ffffffffb992077b>] ? system_call_after_swapgs+0xc8/
>>>>>>> 0x160
>>>>>>> [10679.527283] INFO: task glusterposixfsy:14941 blocked for more
>>>>>>> than 120 seconds.
>>>>>>> [10679.528608] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>>>> disables this message.
>>>>>>> [10679.529956] glusterposixfsy D ffff972495f84f10     0 14941      1
>>>>>>> 0x00000080
>>>>>>> [10679.529961] Call Trace:
>>>>>>> [10679.529966]  [<ffffffffb9913f79>] schedule+0x29/0x70
>>>>>>> [10679.530003]  [<ffffffffc060e388>] _xfs_log_force_lsn+0x2e8/0x340
>>>>>>> [xfs]
>>>>>>> [10679.530008]  [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
>>>>>>> [10679.530038]  [<ffffffffc05eeb97>] xfs_file_fsync+0x107/0x1e0 [xfs]
>>>>>>> [10679.530042]  [<ffffffffb944f0e7>] do_fsync+0x67/0xb0
>>>>>>> [10679.530046]  [<ffffffffb992076f>] ? system_call_after_swapgs+0xbc/
>>>>>>> 0x160
>>>>>>> [10679.530050]  [<ffffffffb944f3f3>] SyS_fdatasync+0x13/0x20
>>>>>>> [10679.530054]  [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
>>>>>>> [10679.530058]  [<ffffffffb992077b>] ? system_call_after_swapgs+0xc8/
>>>>>>> 0x160
>>>>>>> [10679.530062] INFO: task glusteriotwr13:15486 blocked for more than
>>>>>>> 120 seconds.
>>>>>>> [10679.531805] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>>>> disables this message.
>>>>>>> [10679.533732] glusteriotwr13  D ffff9720a83f0000     0 15486      1
>>>>>>> 0x00000080
>>>>>>> [10679.533738] Call Trace:
>>>>>>> [10679.533747]  [<ffffffffb9913f79>] schedule+0x29/0x70
>>>>>>> [10679.533799]  [<ffffffffc060e388>] _xfs_log_force_lsn+0x2e8/0x340
>>>>>>> [xfs]
>>>>>>> [10679.533806]  [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
>>>>>>> [10679.533846]  [<ffffffffc05eeb97>] xfs_file_fsync+0x107/0x1e0 [xfs]
>>>>>>> [10679.533852]  [<ffffffffb944f0e7>] do_fsync+0x67/0xb0
>>>>>>> [10679.533858]  [<ffffffffb992076f>] ? system_call_after_swapgs+0xbc/
>>>>>>> 0x160
>>>>>>> [10679.533863]  [<ffffffffb944f3f3>] SyS_fdatasync+0x13/0x20
>>>>>>> [10679.533868]  [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
>>>>>>> [10679.533873]  [<ffffffffb992077b>] ? system_call_after_swapgs+0xc8/
>>>>>>> 0x160
>>>>>>> [10919.512757] INFO: task glusterclogro:14933 blocked for more than
>>>>>>> 120 seconds.
>>>>>>> [10919.514714] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>>>> disables this message.
>>>>>>> [10919.516663] glusterclogro   D ffff97209832bf40     0 14933      1
>>>>>>> 0x00000080
>>>>>>> [10919.516677] Call Trace:
>>>>>>> [10919.516690]  [<ffffffffb9913f79>] schedule+0x29/0x70
>>>>>>> [10919.516696]  [<ffffffffb99118e9>] schedule_timeout+0x239/0x2c0
>>>>>>> [10919.516703]  [<ffffffffb951cc04>] ? blk_finish_plug+0x14/0x40
>>>>>>> [10919.516768]  [<ffffffffc05e9224>] ? _xfs_buf_ioapply+0x334/0x460
>>>>>>> [xfs]
>>>>>>> [10919.516774]  [<ffffffffb991432d>] wait_for_completion+0xfd/0x140
>>>>>>> [10919.516782]  [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
>>>>>>> [10919.516821]  [<ffffffffc05eb0a3>] ? _xfs_buf_read+0x23/0x40 [xfs]
>>>>>>> [10919.516859]  [<ffffffffc05eafa9>] xfs_buf_submit_wait+0xf9/0x1d0
>>>>>>> [xfs]
>>>>>>> [10919.516902]  [<ffffffffc061b279>] ? xfs_trans_read_buf_map+0x199/0x400
>>>>>>> [xfs]
>>>>>>> [10919.516940]  [<ffffffffc05eb0a3>] _xfs_buf_read+0x23/0x40 [xfs]
>>>>>>> [10919.516977]  [<ffffffffc05eb1b9>] xfs_buf_read_map+0xf9/0x160
>>>>>>> [xfs]
>>>>>>> [10919.517022]  [<ffffffffc061b279>] xfs_trans_read_buf_map+0x199/0x400
>>>>>>> [xfs]
>>>>>>> [10919.517057]  [<ffffffffc05c8d04>] xfs_da_read_buf+0xd4/0x100 [xfs]
>>>>>>> [10919.517091]  [<ffffffffc05c8d53>] xfs_da3_node_read+0x23/0xd0
>>>>>>> [xfs]
>>>>>>> [10919.517126]  [<ffffffffc05c9fee>] xfs_da3_node_lookup_int+0x6e/0x2f0
>>>>>>> [xfs]
>>>>>>> [10919.517160]  [<ffffffffc05d5a1d>] xfs_dir2_node_lookup+0x4d/0x170
>>>>>>> [xfs]
>>>>>>> [10919.517194]  [<ffffffffc05ccf5d>] xfs_dir_lookup+0x1bd/0x1e0 [xfs]
>>>>>>> [10919.517233]  [<ffffffffc05fd8d9>] xfs_lookup+0x69/0x140 [xfs]
>>>>>>> [10919.517271]  [<ffffffffc05fa018>] xfs_vn_lookup+0x78/0xc0 [xfs]
>>>>>>> [10919.517278]  [<ffffffffb9425cf3>] lookup_real+0x23/0x60
>>>>>>> [10919.517283]  [<ffffffffb9426702>] __lookup_hash+0x42/0x60
>>>>>>> [10919.517288]  [<ffffffffb942d519>] SYSC_renameat2+0x3a9/0x5a0
>>>>>>> [10919.517296]  [<ffffffffb94d3753>] ? selinux_file_free_security+0x2
>>>>>>> 3/0x30
>>>>>>> [10919.517304]  [<ffffffffb992077b>] ? system_call_after_swapgs+0xc8/
>>>>>>> 0x160
>>>>>>> [10919.517309]  [<ffffffffb992076f>] ? system_call_after_swapgs+0xbc/
>>>>>>> 0x160
>>>>>>> [10919.517313]  [<ffffffffb992077b>] ? system_call_after_swapgs+0xc8/
>>>>>>> 0x160
>>>>>>> [10919.517318]  [<ffffffffb992076f>] ? system_call_after_swapgs+0xbc/
>>>>>>> 0x160
>>>>>>> [10919.517323]  [<ffffffffb942e58e>] SyS_renameat2+0xe/0x10
>>>>>>> [10919.517328]  [<ffffffffb942e5ce>] SyS_rename+0x1e/0x20
>>>>>>> [10919.517333]  [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
>>>>>>> [10919.517339]  [<ffffffffb992077b>] ? system_call_after_swapgs+0xc8/
>>>>>>> 0x160
>>>>>>> [11159.496095] INFO: task glusteriotwr9:15482 blocked for more than
>>>>>>> 120 seconds.
>>>>>>> [11159.497546] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>>>> disables this message.
>>>>>>> [11159.498978] glusteriotwr9   D ffff971fa0fa1fa0     0 15482      1
>>>>>>> 0x00000080
>>>>>>> [11159.498984] Call Trace:
>>>>>>> [11159.498995]  [<ffffffffb9911f00>] ? bit_wait+0x50/0x50
>>>>>>> [11159.498999]  [<ffffffffb9913f79>] schedule+0x29/0x70
>>>>>>> [11159.499003]  [<ffffffffb99118e9>] schedule_timeout+0x239/0x2c0
>>>>>>> [11159.499056]  [<ffffffffc05dd9b7>] ?
>>>>>>> xfs_iext_bno_to_ext+0xa7/0x1a0 [xfs]
>>>>>>> [11159.499082]  [<ffffffffc05dd43e>] ?
>>>>>>> xfs_iext_bno_to_irec+0x8e/0xd0 [xfs]
>>>>>>> [11159.499090]  [<ffffffffb92f7a12>] ? ktime_get_ts64+0x52/0xf0
>>>>>>> [11159.499093]  [<ffffffffb9911f00>] ? bit_wait+0x50/0x50
>>>>>>> [11159.499097]  [<ffffffffb991348d>] io_schedule_timeout+0xad/0x130
>>>>>>> [11159.499101]  [<ffffffffb9913528>] io_schedule+0x18/0x20
>>>>>>> [11159.499104]  [<ffffffffb9911f11>] bit_wait_io+0x11/0x50
>>>>>>> [11159.499107]  [<ffffffffb9911ac1>] __wait_on_bit_lock+0x61/0xc0
>>>>>>> [11159.499113]  [<ffffffffb9393634>] __lock_page+0x74/0x90
>>>>>>> [11159.499118]  [<ffffffffb92bc210>] ? wake_bit_function+0x40/0x40
>>>>>>> [11159.499121]  [<ffffffffb9394154>] __find_lock_page+0x54/0x70
>>>>>>> [11159.499125]  [<ffffffffb9394e85>] grab_cache_page_write_begin+0x
>>>>>>> 55/0xc0
>>>>>>> [11159.499130]  [<ffffffffb9484b76>] iomap_write_begin+0x66/0x100
>>>>>>> [11159.499135]  [<ffffffffb9484edf>] iomap_write_actor+0xcf/0x1d0
>>>>>>> [11159.499140]  [<ffffffffb9484e10>] ? iomap_write_end+0x80/0x80
>>>>>>> [11159.499144]  [<ffffffffb94854e7>] iomap_apply+0xb7/0x150
>>>>>>> [11159.499149]  [<ffffffffb9485621>] iomap_file_buffered_write+0xa1
>>>>>>> /0xe0
>>>>>>> [11159.499153]  [<ffffffffb9484e10>] ? iomap_write_end+0x80/0x80
>>>>>>> [11159.499182]  [<ffffffffc05f025d>] xfs_file_buffered_aio_write+0x12d/0x2c0
>>>>>>> [xfs]
>>>>>>> [11159.499213]  [<ffffffffc05f057d>] xfs_file_aio_write+0x18d/0x1b0
>>>>>>> [xfs]
>>>>>>> [11159.499217]  [<ffffffffb941a533>] do_sync_write+0x93/0xe0
>>>>>>> [11159.499222]  [<ffffffffb941b010>] vfs_write+0xc0/0x1f0
>>>>>>> [11159.499225]  [<ffffffffb941c002>] SyS_pwrite64+0x92/0xc0
>>>>>>> [11159.499230]  [<ffffffffb992076f>] ? system_call_after_swapgs+0xbc/
>>>>>>> 0x160
>>>>>>> [11159.499234]  [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
>>>>>>> [11159.499238]  [<ffffffffb992077b>] ? system_call_after_swapgs+0xc8/
>>>>>>> 0x160
>>>>>>> [11279.488720] INFO: task xfsaild/dm-10:1134 blocked for more than
>>>>>>> 120 seconds.
>>>>>>> [11279.490197] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>>>> disables this message.
>>>>>>> [11279.491665] xfsaild/dm-10   D ffff9720a8660fd0     0  1134      2
>>>>>>> 0x00000000
>>>>>>> [11279.491671] Call Trace:
>>>>>>> [11279.491682]  [<ffffffffb92a3a2e>] ? try_to_del_timer_sync+0x5e/0x9
>>>>>>> 0
>>>>>>> [11279.491688]  [<ffffffffb9913f79>] schedule+0x29/0x70
>>>>>>> [11279.491744]  [<ffffffffc060de36>] _xfs_log_force+0x1c6/0x2c0 [xfs]
>>>>>>> [11279.491750]  [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
>>>>>>> [11279.491783]  [<ffffffffc0619fec>] ? xfsaild+0x16c/0x6f0 [xfs]
>>>>>>> [11279.491817]  [<ffffffffc060df5c>] xfs_log_force+0x2c/0x70 [xfs]
>>>>>>> [11279.491849]  [<ffffffffc0619e80>] ? xfs_trans_ail_cursor_first+0x90/0x90
>>>>>>> [xfs]
>>>>>>> [11279.491880]  [<ffffffffc0619fec>] xfsaild+0x16c/0x6f0 [xfs]
>>>>>>> [11279.491913]  [<ffffffffc0619e80>] ? xfs_trans_ail_cursor_first+0x90/0x90
>>>>>>> [xfs]
>>>>>>> [11279.491919]  [<ffffffffb92bb161>] kthread+0xd1/0xe0
>>>>>>> [11279.491926]  [<ffffffffb92bb090>] ? insert_kthread_work+0x40/0x40
>>>>>>> [11279.491932]  [<ffffffffb9920677>] ret_from_fork_nospec_begin+0x2
>>>>>>> 1/0x21
>>>>>>> [11279.491936]  [<ffffffffb92bb090>] ? insert_kthread_work+0x40/0x40
>>>>>>> [11279.491976] INFO: task glusterclogfsyn:14934 blocked for more
>>>>>>> than 120 seconds.
>>>>>>> [11279.493466] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>>>> disables this message.
>>>>>>> [11279.494952] glusterclogfsyn D ffff97209832af70     0 14934      1
>>>>>>> 0x00000080
>>>>>>> [11279.494957] Call Trace:
>>>>>>> [11279.494979]  [<ffffffffc0309839>] ? __split_and_process_bio+0x2e9/0x520
>>>>>>> [dm_mod]
>>>>>>> [11279.494983]  [<ffffffffb9913f79>] schedule+0x29/0x70
>>>>>>> [11279.494987]  [<ffffffffb99118e9>] schedule_timeout+0x239/0x2c0
>>>>>>> [11279.494997]  [<ffffffffc0309d98>] ? dm_make_request+0x128/0x1a0
>>>>>>> [dm_mod]
>>>>>>> [11279.495001]  [<ffffffffb991348d>] io_schedule_timeout+0xad/0x130
>>>>>>> [11279.495005]  [<ffffffffb99145ad>] wait_for_completion_io+0xfd/0x
>>>>>>> 140
>>>>>>> [11279.495010]  [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
>>>>>>> [11279.495016]  [<ffffffffb951e574>] blkdev_issue_flush+0xb4/0x110
>>>>>>> [11279.495049]  [<ffffffffc06064b9>] xfs_blkdev_issue_flush+0x19/0x20
>>>>>>> [xfs]
>>>>>>> [11279.495079]  [<ffffffffc05eec40>] xfs_file_fsync+0x1b0/0x1e0 [xfs]
>>>>>>> [11279.495086]  [<ffffffffb944f0e7>] do_fsync+0x67/0xb0
>>>>>>> [11279.495090]  [<ffffffffb992076f>] ? system_call_after_swapgs+0xbc/
>>>>>>> 0x160
>>>>>>> [11279.495094]  [<ffffffffb944f3d0>] SyS_fsync+0x10/0x20
>>>>>>> [11279.495098]  [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
>>>>>>> [11279.495102]  [<ffffffffb992077b>] ? system_call_after_swapgs+0xc8/
>>>>>>> 0x160
>>>>>>> [11279.495105] INFO: task glusterposixfsy:14941 blocked for more
>>>>>>> than 120 seconds.
>>>>>>> [11279.496606] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>>>> disables this message.
>>>>>>> [11279.498114] glusterposixfsy D ffff972495f84f10     0 14941      1
>>>>>>> 0x00000080
>>>>>>> [11279.498118] Call Trace:
>>>>>>> [11279.498134]  [<ffffffffc0309839>] ? __split_and_process_bio+0x2e9/0x520
>>>>>>> [dm_mod]
>>>>>>> [11279.498138]  [<ffffffffb9913f79>] schedule+0x29/0x70
>>>>>>> [11279.498142]  [<ffffffffb99118e9>] schedule_timeout+0x239/0x2c0
>>>>>>> [11279.498152]  [<ffffffffc0309d98>] ? dm_make_request+0x128/0x1a0
>>>>>>> [dm_mod]
>>>>>>> [11279.498156]  [<ffffffffb991348d>] io_schedule_timeout+0xad/0x130
>>>>>>> [11279.498160]  [<ffffffffb99145ad>] wait_for_completion_io+0xfd/0x
>>>>>>> 140
>>>>>>> [11279.498165]  [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
>>>>>>> [11279.498169]  [<ffffffffb951e574>] blkdev_issue_flush+0xb4/0x110
>>>>>>> [11279.498202]  [<ffffffffc06064b9>] xfs_blkdev_issue_flush+0x19/0x20
>>>>>>> [xfs]
>>>>>>> [11279.498231]  [<ffffffffc05eec40>] xfs_file_fsync+0x1b0/0x1e0 [xfs]
>>>>>>> [11279.498238]  [<ffffffffb944f0e7>] do_fsync+0x67/0xb0
>>>>>>> [11279.498242]  [<ffffffffb992076f>] ? system_call_after_swapgs+0xbc/
>>>>>>> 0x160
>>>>>>> [11279.498246]  [<ffffffffb944f3f3>] SyS_fdatasync+0x13/0x20
>>>>>>> [11279.498250]  [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
>>>>>>> [11279.498254]  [<ffffffffb992077b>] ? system_call_after_swapgs+0xc8/
>>>>>>> 0x160
>>>>>>> [11279.498257] INFO: task glusteriotwr1:14950 blocked for more than
>>>>>>> 120 seconds.
>>>>>>> [11279.499789] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>>>> disables this message.
>>>>>>> [11279.501343] glusteriotwr1   D ffff97208b6daf70     0 14950      1
>>>>>>> 0x00000080
>>>>>>> [11279.501348] Call Trace:
>>>>>>> [11279.501353]  [<ffffffffb9913f79>] schedule+0x29/0x70
>>>>>>> [11279.501390]  [<ffffffffc060e388>] _xfs_log_force_lsn+0x2e8/0x340
>>>>>>> [xfs]
>>>>>>> [11279.501396]  [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
>>>>>>> [11279.501428]  [<ffffffffc05eeb97>] xfs_file_fsync+0x107/0x1e0 [xfs]
>>>>>>> [11279.501432]  [<ffffffffb944ef3f>] generic_write_sync+0x4f/0x70
>>>>>>> [11279.501461]  [<ffffffffc05f0545>] xfs_file_aio_write+0x155/0x1b0
>>>>>>> [xfs]
>>>>>>> [11279.501466]  [<ffffffffb941a533>] do_sync_write+0x93/0xe0
>>>>>>> [11279.501471]  [<ffffffffb941b010>] vfs_write+0xc0/0x1f0
>>>>>>> [11279.501475]  [<ffffffffb941c002>] SyS_pwrite64+0x92/0xc0
>>>>>>> [11279.501479]  [<ffffffffb992076f>] ? system_call_after_swapgs+0xbc/
>>>>>>> 0x160
>>>>>>> [11279.501483]  [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
>>>>>>> [11279.501489]  [<ffffffffb992077b>] ? system_call_after_swapgs+0xc8/
>>>>>>> 0x160
>>>>>>> [11279.501493] INFO: task glusteriotwr4:14953 blocked for more than
>>>>>>> 120 seconds.
>>>>>>> [11279.503047] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>>>> disables this message.
>>>>>>> [11279.504630] glusteriotwr4   D ffff972499f2bf40     0 14953      1
>>>>>>> 0x00000080
>>>>>>> [11279.504635] Call Trace:
>>>>>>> [11279.504640]  [<ffffffffb9913f79>] schedule+0x29/0x70
>>>>>>> [11279.504676]  [<ffffffffc060e388>] _xfs_log_force_lsn+0x2e8/0x340
>>>>>>> [xfs]
>>>>>>> [11279.504681]  [<ffffffffb92cf1b0>] ? wake_up_state+0x20/0x20
>>>>>>> [11279.504710]  [<ffffffffc05eeb97>] xfs_file_fsync+0x107/0x1e0 [xfs]
>>>>>>> [11279.504714]  [<ffffffffb944f0e7>] do_fsync+0x67/0xb0
>>>>>>> [11279.504718]  [<ffffffffb992076f>] ? system_call_after_swapgs+0xbc/
>>>>>>> 0x160
>>>>>>> [11279.504722]  [<ffffffffb944f3d0>] SyS_fsync+0x10/0x20
>>>>>>> [11279.504725]  [<ffffffffb992082f>] system_call_fastpath+0x1c/0x21
>>>>>>> [11279.504730]  [<ffffffffb992077b>] ? system_call_after_swapgs+0xc8/
>>>>>>> 0x160
>>>>>>> [12127.466494] perf: interrupt took too long (8263 > 8150), lowering
>>>>>>> kernel.perf_event_max_sample_rate to 24000
>>>>>>>
>>>>>>> --------------------
>>>>>>> I think this is the cause of the massive ovirt performance issues
>>>>>>> irrespective of gluster volume.  At the time this happened, I was also
>>>>>>> ssh'ed into the host, and was doing some rpm querry commands.  I had just
>>>>>>> run rpm -qa |grep glusterfs (to verify what version was actually
>>>>>>> installed), and that command took almost 2 minutes to return!  Normally it
>>>>>>> takes less than 2 seconds.  That is all pure local SSD IO, too....
>>>>>>>
>>>>>>> I'm no expert, but its my understanding that anytime a software
>>>>>>> causes these kinds of issues, its a serious bug in the software, even if
>>>>>>> its mis-handled exceptions.  Is this correct?
>>>>>>>
>>>>>>> --Jim
>>>>>>>
>>>>>>> On Tue, May 29, 2018 at 3:01 PM, Jim Kusznir <jim at palousetech.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I think this is the profile information for one of the volumes that
>>>>>>>> lives on the SSDs and is fully operational with no down/problem disks:
>>>>>>>>
>>>>>>>> [root at ovirt2 yum.repos.d]# gluster volume profile data info
>>>>>>>> Brick: ovirt2.nwfiber.com:/gluster/brick2/data
>>>>>>>> ----------------------------------------------
>>>>>>>> Cumulative Stats:
>>>>>>>>    Block Size:                256b+                 512b+
>>>>>>>>       1024b+
>>>>>>>>  No. of Reads:                  983                  2696
>>>>>>>>         1059
>>>>>>>> No. of Writes:                    0                  1113
>>>>>>>>          302
>>>>>>>>
>>>>>>>>    Block Size:               2048b+                4096b+
>>>>>>>>       8192b+
>>>>>>>>  No. of Reads:                  852                 88608
>>>>>>>>        53526
>>>>>>>> No. of Writes:                  522                812340
>>>>>>>>        76257
>>>>>>>>
>>>>>>>>    Block Size:              16384b+               32768b+
>>>>>>>>      65536b+
>>>>>>>>  No. of Reads:                54351                241901
>>>>>>>>        15024
>>>>>>>> No. of Writes:                21636                  8656
>>>>>>>>         8976
>>>>>>>>
>>>>>>>>    Block Size:             131072b+
>>>>>>>>  No. of Reads:               524156
>>>>>>>> No. of Writes:               296071
>>>>>>>>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of
>>>>>>>> calls         Fop
>>>>>>>>  ---------   -----------   -----------   -----------
>>>>>>>>  ------------        ----
>>>>>>>>       0.00       0.00 us       0.00 us       0.00 us
>>>>>>>>  4189     RELEASE
>>>>>>>>       0.00       0.00 us       0.00 us       0.00 us
>>>>>>>>  1257  RELEASEDIR
>>>>>>>>       0.00      46.19 us      12.00 us     187.00 us
>>>>>>>>  69       FLUSH
>>>>>>>>       0.00     147.00 us      78.00 us     367.00 us             86
>>>>>>>> REMOVEXATTR
>>>>>>>>       0.00     223.46 us      24.00 us    1166.00 us
>>>>>>>> 149     READDIR
>>>>>>>>       0.00     565.34 us      76.00 us    3639.00 us
>>>>>>>>  88   FTRUNCATE
>>>>>>>>       0.00     263.28 us      20.00 us   28385.00 us
>>>>>>>> 228          LK
>>>>>>>>       0.00      98.84 us       2.00 us     880.00 us
>>>>>>>>  1198     OPENDIR
>>>>>>>>       0.00      91.59 us      26.00 us   10371.00 us
>>>>>>>>  3853      STATFS
>>>>>>>>       0.00     494.14 us      17.00 us  193439.00 us
>>>>>>>>  1171    GETXATTR
>>>>>>>>       0.00     299.42 us      35.00 us    9799.00 us
>>>>>>>>  2044    READDIRP
>>>>>>>>       0.00    1965.31 us     110.00 us  382258.00 us
>>>>>>>> 321     XATTROP
>>>>>>>>       0.01     113.40 us      24.00 us   61061.00 us
>>>>>>>>  8134        STAT
>>>>>>>>       0.01     755.38 us      57.00 us  607603.00 us
>>>>>>>>  3196     DISCARD
>>>>>>>>       0.05    2690.09 us      58.00 us 2704761.00 us
>>>>>>>>  3206        OPEN
>>>>>>>>       0.10  119978.25 us      97.00 us 9406684.00 us
>>>>>>>> 154     SETATTR
>>>>>>>>       0.18     101.73 us      28.00 us  700477.00 us
>>>>>>>>  313379       FSTAT
>>>>>>>>       0.23    1059.84 us      25.00 us 2716124.00 us
>>>>>>>> 38255      LOOKUP
>>>>>>>>       0.47    1024.11 us      54.00 us 6197164.00 us
>>>>>>>> 81455    FXATTROP
>>>>>>>>       1.72    2984.00 us      15.00 us 37098954.00 us
>>>>>>>>  103020    FINODELK
>>>>>>>>       5.92   44315.32 us      51.00 us 24731536.00 us
>>>>>>>> 23957       FSYNC
>>>>>>>>      13.27    2399.78 us      25.00 us 22089540.00 us
>>>>>>>>  991005        READ
>>>>>>>>      37.00    5980.43 us      52.00 us 22099889.00 us
>>>>>>>> 1108976       WRITE
>>>>>>>>      41.04    5452.75 us      13.00 us 22102452.00 us
>>>>>>>> 1349053     INODELK
>>>>>>>>
>>>>>>>>     Duration: 10026 seconds
>>>>>>>>    Data Read: 80046027759 bytes
>>>>>>>> Data Written: 44496632320 bytes
>>>>>>>>
>>>>>>>> Interval 1 Stats:
>>>>>>>>    Block Size:                256b+                 512b+
>>>>>>>>       1024b+
>>>>>>>>  No. of Reads:                  983                  2696
>>>>>>>>         1059
>>>>>>>> No. of Writes:                    0                   838
>>>>>>>>          185
>>>>>>>>
>>>>>>>>    Block Size:               2048b+                4096b+
>>>>>>>>       8192b+
>>>>>>>>  No. of Reads:                  852                 85856
>>>>>>>>        51575
>>>>>>>> No. of Writes:                  382                705802
>>>>>>>>        57812
>>>>>>>>
>>>>>>>>    Block Size:              16384b+               32768b+
>>>>>>>>      65536b+
>>>>>>>>  No. of Reads:                52673                232093
>>>>>>>>        14984
>>>>>>>> No. of Writes:                13499                  4908
>>>>>>>>         4242
>>>>>>>>
>>>>>>>>    Block Size:             131072b+
>>>>>>>>  No. of Reads:               460040
>>>>>>>> No. of Writes:                 6411
>>>>>>>>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of
>>>>>>>> calls         Fop
>>>>>>>>  ---------   -----------   -----------   -----------
>>>>>>>>  ------------        ----
>>>>>>>>       0.00       0.00 us       0.00 us       0.00 us
>>>>>>>>  2093     RELEASE
>>>>>>>>       0.00       0.00 us       0.00 us       0.00 us
>>>>>>>>  1093  RELEASEDIR
>>>>>>>>       0.00      53.38 us      26.00 us     111.00 us
>>>>>>>>  16       FLUSH
>>>>>>>>       0.00     145.14 us      78.00 us     367.00 us             71
>>>>>>>> REMOVEXATTR
>>>>>>>>       0.00     190.96 us     114.00 us     298.00 us
>>>>>>>>  71     SETATTR
>>>>>>>>       0.00     213.38 us      24.00 us    1145.00 us
>>>>>>>>  90     READDIR
>>>>>>>>       0.00     263.28 us      20.00 us   28385.00 us
>>>>>>>> 228          LK
>>>>>>>>       0.00     101.76 us       2.00 us     880.00 us
>>>>>>>>  1093     OPENDIR
>>>>>>>>       0.01      93.60 us      27.00 us   10371.00 us
>>>>>>>>  3090      STATFS
>>>>>>>>       0.02     537.47 us      17.00 us  193439.00 us
>>>>>>>>  1038    GETXATTR
>>>>>>>>       0.03     297.44 us      35.00 us    9799.00 us
>>>>>>>>  1990    READDIRP
>>>>>>>>       0.03    2357.28 us     110.00 us  382258.00 us
>>>>>>>> 253     XATTROP
>>>>>>>>       0.04     385.93 us      58.00 us   47593.00 us
>>>>>>>>  2091        OPEN
>>>>>>>>       0.04     114.86 us      24.00 us   61061.00 us
>>>>>>>>  7715        STAT
>>>>>>>>       0.06     444.59 us      57.00 us  333240.00 us
>>>>>>>>  3053     DISCARD
>>>>>>>>       0.42     316.24 us      25.00 us  290728.00 us
>>>>>>>> 29823      LOOKUP
>>>>>>>>       0.73     257.92 us      54.00 us  344812.00 us
>>>>>>>> 63296    FXATTROP
>>>>>>>>       1.37      98.30 us      28.00 us   67621.00 us
>>>>>>>>  313172       FSTAT
>>>>>>>>       1.58    2124.69 us      51.00 us  849200.00 us
>>>>>>>> 16717       FSYNC
>>>>>>>>       5.73     162.46 us      52.00 us  748492.00 us
>>>>>>>>  794079       WRITE
>>>>>>>>       7.19    2065.17 us      16.00 us 37098954.00 us
>>>>>>>> 78381    FINODELK
>>>>>>>>      36.44     886.32 us      25.00 us 2216436.00 us
>>>>>>>>  925421        READ
>>>>>>>>      46.30    1178.04 us      13.00 us 1700704.00 us
>>>>>>>>  884635     INODELK
>>>>>>>>
>>>>>>>>     Duration: 7485 seconds
>>>>>>>>    Data Read: 71250527215 bytes
>>>>>>>> Data Written: 5119903744 bytes
>>>>>>>>
>>>>>>>> Brick: ovirt3.nwfiber.com:/gluster/brick2/data
>>>>>>>> ----------------------------------------------
>>>>>>>> Cumulative Stats:
>>>>>>>>    Block Size:                  1b+
>>>>>>>>  No. of Reads:                    0
>>>>>>>> No. of Writes:              3264419
>>>>>>>>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of
>>>>>>>> calls         Fop
>>>>>>>>  ---------   -----------   -----------   -----------
>>>>>>>>  ------------        ----
>>>>>>>>       0.00       0.00 us       0.00 us       0.00 us
>>>>>>>>  90      FORGET
>>>>>>>>       0.00       0.00 us       0.00 us       0.00 us
>>>>>>>>  9462     RELEASE
>>>>>>>>       0.00       0.00 us       0.00 us       0.00 us
>>>>>>>>  4254  RELEASEDIR
>>>>>>>>       0.00      50.52 us      13.00 us     190.00 us
>>>>>>>>  71       FLUSH
>>>>>>>>       0.00     186.97 us      87.00 us     713.00 us             86
>>>>>>>> REMOVEXATTR
>>>>>>>>       0.00      79.32 us      33.00 us     189.00 us
>>>>>>>> 228          LK
>>>>>>>>       0.00     220.98 us     129.00 us     513.00 us
>>>>>>>>  86     SETATTR
>>>>>>>>       0.01     259.30 us      26.00 us    2632.00 us
>>>>>>>> 137     READDIR
>>>>>>>>       0.02     322.76 us     145.00 us    2125.00 us
>>>>>>>> 321     XATTROP
>>>>>>>>       0.03     109.55 us       2.00 us    1258.00 us
>>>>>>>>  1193     OPENDIR
>>>>>>>>       0.05      70.21 us      21.00 us     431.00 us
>>>>>>>>  3196     DISCARD
>>>>>>>>       0.05     169.26 us      21.00 us    2315.00 us
>>>>>>>>  1545    GETXATTR
>>>>>>>>       0.12     176.85 us      63.00 us    2844.00 us
>>>>>>>>  3206        OPEN
>>>>>>>>       0.61     303.49 us      90.00 us    3085.00 us
>>>>>>>>  9633       FSTAT
>>>>>>>>       2.44     305.66 us      28.00 us    3716.00 us
>>>>>>>> 38230      LOOKUP
>>>>>>>>       4.52     266.22 us      55.00 us   53424.00 us
>>>>>>>> 81455    FXATTROP
>>>>>>>>       6.96    1397.99 us      51.00 us   64822.00 us
>>>>>>>> 23889       FSYNC
>>>>>>>>      16.48      84.74 us      25.00 us    6917.00 us
>>>>>>>>  932592       WRITE
>>>>>>>>      30.16     106.90 us      13.00 us 3920189.00 us
>>>>>>>> 1353046     INODELK
>>>>>>>>      38.55    1794.52 us      14.00 us 16210553.00 us
>>>>>>>>  103039    FINODELK
>>>>>>>>
>>>>>>>>     Duration: 66562 seconds
>>>>>>>>    Data Read: 0 bytes
>>>>>>>> Data Written: 3264419 bytes
>>>>>>>>
>>>>>>>> Interval 1 Stats:
>>>>>>>>    Block Size:                  1b+
>>>>>>>>  No. of Reads:                    0
>>>>>>>> No. of Writes:               794080
>>>>>>>>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of
>>>>>>>> calls         Fop
>>>>>>>>  ---------   -----------   -----------   -----------
>>>>>>>>  ------------        ----
>>>>>>>>       0.00       0.00 us       0.00 us       0.00 us
>>>>>>>>  2093     RELEASE
>>>>>>>>       0.00       0.00 us       0.00 us       0.00 us
>>>>>>>>  1093  RELEASEDIR
>>>>>>>>       0.00      70.31 us      26.00 us     125.00 us
>>>>>>>>  16       FLUSH
>>>>>>>>       0.00     193.10 us     103.00 us     713.00 us             71
>>>>>>>> REMOVEXATTR
>>>>>>>>       0.01     227.32 us     133.00 us     513.00 us
>>>>>>>>  71     SETATTR
>>>>>>>>       0.01      79.32 us      33.00 us     189.00 us
>>>>>>>> 228          LK
>>>>>>>>       0.01     259.83 us      35.00 us    1138.00 us
>>>>>>>>  89     READDIR
>>>>>>>>       0.03     318.26 us     145.00 us    2047.00 us
>>>>>>>> 253     XATTROP
>>>>>>>>       0.04     112.67 us       3.00 us    1258.00 us
>>>>>>>>  1093     OPENDIR
>>>>>>>>       0.06     167.98 us      23.00 us    1951.00 us
>>>>>>>>  1014    GETXATTR
>>>>>>>>       0.08      70.97 us      22.00 us     431.00 us
>>>>>>>>  3053     DISCARD
>>>>>>>>       0.13     183.78 us      66.00 us    2844.00 us
>>>>>>>>  2091        OPEN
>>>>>>>>       1.01     303.82 us      90.00 us    3085.00 us
>>>>>>>>  9610       FSTAT
>>>>>>>>       3.27     316.59 us      30.00 us    3716.00 us
>>>>>>>> 29820      LOOKUP
>>>>>>>>       5.83     265.79 us      59.00 us   53424.00 us
>>>>>>>> 63296    FXATTROP
>>>>>>>>       7.95    1373.89 us      51.00 us   64822.00 us
>>>>>>>> 16717       FSYNC
>>>>>>>>      23.17     851.99 us      14.00 us 16210553.00 us
>>>>>>>> 78555    FINODELK
>>>>>>>>      24.04      87.44 us      27.00 us    6917.00 us
>>>>>>>>  794081       WRITE
>>>>>>>>      34.36     111.91 us      14.00 us  984871.00 us
>>>>>>>>  886790     INODELK
>>>>>>>>
>>>>>>>>     Duration: 7485 seconds
>>>>>>>>    Data Read: 0 bytes
>>>>>>>> Data Written: 794080 bytes
>>>>>>>>
>>>>>>>>
>>>>>>>> -----------------------
>>>>>>>> Here is the data from the volume that is backed by the SHDDs and
>>>>>>>> has one failed disk:
>>>>>>>> [root at ovirt2 yum.repos.d]# gluster volume profile data-hdd info
>>>>>>>> Brick: 172.172.1.12:/gluster/brick3/data-hdd
>>>>>>>> --------------------------------------------
>>>>>>>> Cumulative Stats:
>>>>>>>>    Block Size:                256b+                 512b+
>>>>>>>>       1024b+
>>>>>>>>  No. of Reads:                 1702                    86
>>>>>>>>           16
>>>>>>>> No. of Writes:                    0                   767
>>>>>>>>           71
>>>>>>>>
>>>>>>>>    Block Size:               2048b+                4096b+
>>>>>>>>       8192b+
>>>>>>>>  No. of Reads:                   19                 51841
>>>>>>>>         2049
>>>>>>>> No. of Writes:                   76                 60668
>>>>>>>>        35727
>>>>>>>>
>>>>>>>>    Block Size:              16384b+               32768b+
>>>>>>>>      65536b+
>>>>>>>>  No. of Reads:                 1744                   639
>>>>>>>>         1088
>>>>>>>> No. of Writes:                 8524                  2410
>>>>>>>>         1285
>>>>>>>>
>>>>>>>>    Block Size:             131072b+
>>>>>>>>  No. of Reads:               771999
>>>>>>>> No. of Writes:                29584
>>>>>>>>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of
>>>>>>>> calls         Fop
>>>>>>>>  ---------   -----------   -----------   -----------
>>>>>>>>  ------------        ----
>>>>>>>>       0.00       0.00 us       0.00 us       0.00 us
>>>>>>>>  2902     RELEASE
>>>>>>>>       0.00       0.00 us       0.00 us       0.00 us
>>>>>>>>  1517  RELEASEDIR
>>>>>>>>       0.00     197.00 us     197.00 us     197.00 us
>>>>>>>> 1   FTRUNCATE
>>>>>>>>       0.00      70.24 us      16.00 us     758.00 us
>>>>>>>>  51       FLUSH
>>>>>>>>       0.00     143.93 us      82.00 us     305.00 us             57
>>>>>>>> REMOVEXATTR
>>>>>>>>       0.00     178.63 us     105.00 us     712.00 us
>>>>>>>>  60     SETATTR
>>>>>>>>       0.00      67.30 us      19.00 us     572.00 us
>>>>>>>> 555          LK
>>>>>>>>       0.00     322.80 us      23.00 us    4673.00 us
>>>>>>>> 138     READDIR
>>>>>>>>       0.00     336.56 us     106.00 us   11994.00 us
>>>>>>>> 237     XATTROP
>>>>>>>>       0.00      84.70 us      28.00 us    1071.00 us
>>>>>>>>  3469      STATFS
>>>>>>>>       0.01     387.75 us       2.00 us  146017.00 us
>>>>>>>>  1467     OPENDIR
>>>>>>>>       0.01     148.59 us      21.00 us   64374.00 us
>>>>>>>>  4454        STAT
>>>>>>>>       0.02     783.02 us      16.00 us   93502.00 us
>>>>>>>>  1902    GETXATTR
>>>>>>>>       0.03    1516.10 us      17.00 us  210690.00 us
>>>>>>>>  1364     ENTRYLK
>>>>>>>>       0.03    2555.47 us     300.00 us  674454.00 us
>>>>>>>>  1064    READDIRP
>>>>>>>>       0.07      85.74 us      19.00 us   68340.00 us
>>>>>>>> 62849       FSTAT
>>>>>>>>       0.07    1978.12 us      59.00 us  202596.00 us
>>>>>>>>  2729        OPEN
>>>>>>>>       0.22     708.57 us      15.00 us  394799.00 us
>>>>>>>> 25447      LOOKUP
>>>>>>>>       5.94    2331.74 us      15.00 us 1099530.00 us
>>>>>>>>  207534    FINODELK
>>>>>>>>       7.31    8311.75 us      58.00 us 1800216.00 us
>>>>>>>> 71668    FXATTROP
>>>>>>>>      12.49    7735.19 us      51.00 us 3595513.00 us
>>>>>>>>  131642       WRITE
>>>>>>>>      17.70     957.08 us      16.00 us 13700466.00 us
>>>>>>>> 1508160     INODELK
>>>>>>>>      24.55    2546.43 us      26.00 us 5077347.00 us
>>>>>>>>  786060        READ
>>>>>>>>      31.56   49699.15 us      47.00 us 3746331.00 us
>>>>>>>> 51777       FSYNC
>>>>>>>>
>>>>>>>>     Duration: 10101 seconds
>>>>>>>>    Data Read: 101562897361 bytes
>>>>>>>> Data Written: 4834450432 bytes
>>>>>>>>
>>>>>>>> Interval 0 Stats:
>>>>>>>>    Block Size:                256b+                 512b+
>>>>>>>>       1024b+
>>>>>>>>  No. of Reads:                 1702                    86
>>>>>>>>           16
>>>>>>>> No. of Writes:                    0                   767
>>>>>>>>           71
>>>>>>>>
>>>>>>>>    Block Size:               2048b+                4096b+
>>>>>>>>       8192b+
>>>>>>>>  No. of Reads:                   19                 51841
>>>>>>>>         2049
>>>>>>>> No. of Writes:                   76                 60668
>>>>>>>>        35727
>>>>>>>>
>>>>>>>>    Block Size:              16384b+               32768b+
>>>>>>>>      65536b+
>>>>>>>>  No. of Reads:                 1744                   639
>>>>>>>>         1088
>>>>>>>> No. of Writes:                 8524                  2410
>>>>>>>>         1285
>>>>>>>>
>>>>>>>>    Block Size:             131072b+
>>>>>>>>  No. of Reads:               771999
>>>>>>>> No. of Writes:                29584
>>>>>>>>  %-latency   Avg-latency   Min-Latency   Max-Latency   No. of
>>>>>>>> calls         Fop
>>>>>>>>  ---------   -----------   -----------   -----------
>>>>>>>>  ------------        ----
>>>>>>>>       0.00       0.00 us       0.00 us       0.00 us
>>>>>>>>  2902     RELEASE
>>>>>>>>       0.00       0.00 us       0.00 us       0.00 us
>>>>>>>>  1517  RELEASEDIR
>>>>>>>>       0.00     197.00 us     197.00 us     197.00 us
>>>>>>>> 1   FTRUNCATE
>>>>>>>>       0.00      70.24 us      16.00 us     758.00 us
>>>>>>>>  51       FLUSH
>>>>>>>>       0.00     143.93 us      82.00 us     305.00 us             57
>>>>>>>> REMOVEXATTR
>>>>>>>>       0.00     178.63 us     105.00 us     712.00 us
>>>>>>>>  60     SETATTR
>>>>>>>>       0.00      67.30 us      19.00 us     572.00 us
>>>>>>>> 555          LK
>>>>>>>>       0.00     322.80 us      23.00 us    4673.00 us
>>>>>>>> 138     READDIR
>>>>>>>>       0.00     336.56 us     106.00 us   11994.00 us
>>>>>>>> 237     XATTROP
>>>>>>>>       0.00      84.70 us      28.00 us    1071.00 us
>>>>>>>>  3469      STATFS
>>>>>>>>       0.01     387.75 us       2.00 us  146017.00 us
>>>>>>>>  1467     OPENDIR
>>>>>>>>       0.01     148.59 us      21.00 us   64374.00 us
>>>>>>>>  4454        STAT
>>>>>>>>       0.02     783.02 us      16.00 us   93502.00 us
>>>>>>>>  1902    GETXATTR
>>>>>>>>       0.03    1516.10 us      17.00 us  210690.00 us
>>>>>>>>  1364     ENTRYLK
>>>>>>>>       0.03    2555.47 us     300.00 us  674454.00 us
>>>>>>>>  1064    READDIRP
>>>>>>>>       0.07      85.73 us      19.00 us   68340.00 us
>>>>>>>> 62849       FSTAT
>>>>>>>>       0.07    1978.12 us      59.00 us  202596.00 us
>>>>>>>>  2729        OPEN
>>>>>>>>       0.22     708.57 us      15.00 us  394799.00 us
>>>>>>>> 25447      LOOKUP
>>>>>>>>       5.94    2334.57 us      15.00 us 1099530.00 us
>>>>>>>>  207534    FINODELK
>>>>>>>>       7.31    8311.49 us      58.00 us 1800216.00 us
>>>>>>>> 71668    FXATTROP
>>>>>>>>      12.49    7735.32 us      51.00 us 3595513.00 us
>>>>>>>>  131642       WRITE
>>>>>>>>      17.71     957.08 us      16.00 us 13700466.00 us
>>>>>>>> 1508160     INODELK
>>>>>>>>      24.56    2546.42 us      26.00 us 5077347.00 us
>>>>>>>>  786060        READ
>>>>>>>>      31.54   49651.63 us      47.00 us 3746331.00 us
>>>>>>>> 51777       FSYNC
>>>>>>>>
>>>>>>>>     Duration: 10101 seconds
>>>>>>>>    Data Read: 101562897361 bytes
>>>>>>>> Data Written: 4834450432 bytes
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, May 29, 2018 at 2:55 PM, Jim Kusznir <jim at palousetech.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thank you for your response.
>>>>>>>>>
>>>>>>>>> I have 4 gluster volumes.  3 are replica 2 + arbitrator.  replica
>>>>>>>>> bricks are on ovirt1 and ovirt2, arbitrator on ovirt3.  The 4th volume is
>>>>>>>>> replica 3, with a brick on all three ovirt machines.
>>>>>>>>>
>>>>>>>>> The first 3 volumes are on an SSD disk; the 4th is on a Seagate
>>>>>>>>> SSHD (same in all three machines).  On ovirt3, the SSHD has reported hard
>>>>>>>>> IO failures, and that brick is offline.  However, the other two replicas
>>>>>>>>> are fully operational (although they still show contents in the heal info
>>>>>>>>> command that won't go away, but that may be the case until I replace the
>>>>>>>>> failed disk).
>>>>>>>>>
>>>>>>>>> What is bothering me is that ALL 4 gluster volumes are showing
>>>>>>>>> horrible performance issues.  At this point, as the bad disk has been
>>>>>>>>> completely offlined, I would expect gluster to perform at normal speed, but
>>>>>>>>> that is definitely not the case.
>>>>>>>>>
>>>>>>>>> I've also noticed that the performance hits seem to come in waves:
>>>>>>>>> things seem to work acceptably (but slow) for a while, then suddenly, its
>>>>>>>>> as if all disk IO on all volumes (including non-gluster local OS disk
>>>>>>>>> volumes for the hosts) pause for about 30 seconds, then IO resumes again.
>>>>>>>>> During those times, I start getting VM not responding and host not
>>>>>>>>> responding notices as well as the applications having major issues.
>>>>>>>>>
>>>>>>>>> I've shut down most of my VMs and am down to just my essential
>>>>>>>>> core VMs (shedded about 75% of my VMs).  I still am experiencing the same
>>>>>>>>> issues.
>>>>>>>>>
>>>>>>>>> Am I correct in believing that once the failed disk was brought
>>>>>>>>> offline that performance should return to normal?
>>>>>>>>>
>>>>>>>>> On Tue, May 29, 2018 at 1:27 PM, Alex K <rightkicktech at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I would check disks status and accessibility of mount points
>>>>>>>>>> where your gluster volumes reside.
>>>>>>>>>>
>>>>>>>>>> On Tue, May 29, 2018, 22:28 Jim Kusznir <jim at palousetech.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> On one ovirt server, I'm now seeing these messages:
>>>>>>>>>>> [56474.239725] blk_update_request: 63 callbacks suppressed
>>>>>>>>>>> [56474.239732] blk_update_request: I/O error, dev dm-2, sector 0
>>>>>>>>>>> [56474.240602] blk_update_request: I/O error, dev dm-2, sector
>>>>>>>>>>> 3905945472
>>>>>>>>>>> [56474.241346] blk_update_request: I/O error, dev dm-2, sector
>>>>>>>>>>> 3905945584
>>>>>>>>>>> [56474.242236] blk_update_request: I/O error, dev dm-2, sector
>>>>>>>>>>> 2048
>>>>>>>>>>> [56474.243072] blk_update_request: I/O error, dev dm-2, sector
>>>>>>>>>>> 3905943424
>>>>>>>>>>> [56474.243997] blk_update_request: I/O error, dev dm-2, sector
>>>>>>>>>>> 3905943536
>>>>>>>>>>> [56474.247347] blk_update_request: I/O error, dev dm-2, sector 0
>>>>>>>>>>> [56474.248315] blk_update_request: I/O error, dev dm-2, sector
>>>>>>>>>>> 3905945472
>>>>>>>>>>> [56474.249231] blk_update_request: I/O error, dev dm-2, sector
>>>>>>>>>>> 3905945584
>>>>>>>>>>> [56474.250221] blk_update_request: I/O error, dev dm-2, sector
>>>>>>>>>>> 2048
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, May 29, 2018 at 11:59 AM, Jim Kusznir <
>>>>>>>>>>> jim at palousetech.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I see in messages on ovirt3 (my 3rd machine, the one upgraded
>>>>>>>>>>>> to 4.2):
>>>>>>>>>>>>
>>>>>>>>>>>> May 29 11:54:41 ovirt3 ovs-vsctl: ovs|00001|db_ctl_base|ERR|unix:/var/run/openvswitch/db.sock:
>>>>>>>>>>>> database connection failed (No such file or directory)
>>>>>>>>>>>> May 29 11:54:51 ovirt3 ovs-vsctl: ovs|00001|db_ctl_base|ERR|unix:/var/run/openvswitch/db.sock:
>>>>>>>>>>>> database connection failed (No such file or directory)
>>>>>>>>>>>> May 29 11:55:01 ovirt3 ovs-vsctl: ovs|00001|db_ctl_base|ERR|unix:/var/run/openvswitch/db.sock:
>>>>>>>>>>>> database connection failed (No such file or directory)
>>>>>>>>>>>> (appears a lot).
>>>>>>>>>>>>
>>>>>>>>>>>> I also found on the ssh session of that, some sysv warnings
>>>>>>>>>>>> about the backing disk for one of the gluster volumes (straight replica
>>>>>>>>>>>> 3).  The glusterfs process for that disk on that machine went offline.  Its
>>>>>>>>>>>> my understanding that it should continue to work with the other two
>>>>>>>>>>>> machines while I attempt to replace that disk, right?  Attempted writes
>>>>>>>>>>>> (touching an empty file) can take 15 seconds, repeating it later will be
>>>>>>>>>>>> much faster.
>>>>>>>>>>>>
>>>>>>>>>>>> Gluster generates a bunch of different log files, I don't know
>>>>>>>>>>>> what ones you want, or from which machine(s).
>>>>>>>>>>>>
>>>>>>>>>>>> How do I do "volume profiling"?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, May 29, 2018 at 11:53 AM, Sahina Bose <
>>>>>>>>>>>> sabose at redhat.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Do you see errors reported in the mount logs for the volume?
>>>>>>>>>>>>> If so, could you attach the logs?
>>>>>>>>>>>>> Any issues with your underlying disks. Can you also attach
>>>>>>>>>>>>> output of volume profiling?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, May 30, 2018 at 12:13 AM, Jim Kusznir <
>>>>>>>>>>>>> jim at palousetech.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ok, things have gotten MUCH worse this morning.  I'm getting
>>>>>>>>>>>>>> random errors from VMs, right now, about a third of my VMs have been paused
>>>>>>>>>>>>>> due to storage issues, and most of the remaining VMs are not performing
>>>>>>>>>>>>>> well.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> At this point, I am in full EMERGENCY mode, as my production
>>>>>>>>>>>>>> services are now impacted, and I'm getting calls coming in with problems...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'd greatly appreciate help...VMs are running VERY slowly
>>>>>>>>>>>>>> (when they run), and they are steadily getting worse.  I don't know why.  I
>>>>>>>>>>>>>> was seeing CPU peaks (to 100%) on several VMs, in perfect sync, for a few
>>>>>>>>>>>>>> minutes at a time (while the VM became unresponsive and any VMs I was
>>>>>>>>>>>>>> logged into that were linux were giving me the CPU stuck messages in my
>>>>>>>>>>>>>> origional post).  Is all this storage related?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I also have two different gluster volumes for VM storage, and
>>>>>>>>>>>>>> only one had the issues, but now VMs in both are being affected at the same
>>>>>>>>>>>>>> time and same way.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --Jim
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, May 28, 2018 at 10:50 PM, Sahina Bose <
>>>>>>>>>>>>>> sabose at redhat.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [Adding gluster-users to look at the heal issue]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, May 29, 2018 at 9:17 AM, Jim Kusznir <
>>>>>>>>>>>>>>> jim at palousetech.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hello:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I've been having some cluster and gluster performance
>>>>>>>>>>>>>>>> issues lately.  I also found that my cluster was out of date, and was
>>>>>>>>>>>>>>>> trying to apply updates (hoping to fix some of these), and discovered the
>>>>>>>>>>>>>>>> ovirt 4.1 repos were taken completely offline.  So, I was forced to begin
>>>>>>>>>>>>>>>> an upgrade to 4.2.  According to docs I found/read, I needed only add the
>>>>>>>>>>>>>>>> new repo, do a yum update, reboot, and be good on my hosts (did the yum
>>>>>>>>>>>>>>>> update, the engine-setup on my hosted engine).  Things seemed to work
>>>>>>>>>>>>>>>> relatively well, except for a gluster sync issue that showed up.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> My cluster is a 3 node hyperconverged cluster.  I upgraded
>>>>>>>>>>>>>>>> the hosted engine first, then engine 3.  When engine 3 came back up, for
>>>>>>>>>>>>>>>> some reason one of my gluster volumes would not sync.  Here's sample output:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [root at ovirt3 ~]# gluster volume heal data-hdd info
>>>>>>>>>>>>>>>> Brick 172.172.1.11:/gluster/brick3/data-hdd
>>>>>>>>>>>>>>>> /cc65f671-3377-494a-a7d4-1d9f7
>>>>>>>>>>>>>>>> c3ae46c/images/48d7ecb8-7ac5-4
>>>>>>>>>>>>>>>> 725-bca5-b3519681cf2f/0d6080b0-7018-4fa3-bb82-1dd9ef07d9b9
>>>>>>>>>>>>>>>> /cc65f671-3377-494a-a7d4-1d9f7
>>>>>>>>>>>>>>>> c3ae46c/images/647be733-f153-4
>>>>>>>>>>>>>>>> cdc-85bd-ba72544c2631/b453a300-0602-4be1-8310-8bd5abe00971
>>>>>>>>>>>>>>>> /cc65f671-3377-494a-a7d4-1d9f7
>>>>>>>>>>>>>>>> c3ae46c/images/6da854d1-b6be-4
>>>>>>>>>>>>>>>> 46b-9bf0-90a0dbbea830/3c93bd1f-b7fa-4aa2-b445-6904e31839ba
>>>>>>>>>>>>>>>> /cc65f671-3377-494a-a7d4-1d9f7
>>>>>>>>>>>>>>>> c3ae46c/images/7f647567-d18c-4
>>>>>>>>>>>>>>>> 4f1-a58e-9b8865833acb/f9364470-9770-4bb1-a6b9-a54861849625
>>>>>>>>>>>>>>>> /cc65f671-3377-494a-a7d4-1d9f7
>>>>>>>>>>>>>>>> c3ae46c/images/f3c8e7aa-6ef2-4
>>>>>>>>>>>>>>>> 2a7-93d4-e0a4df6dd2fa/2eb0b1ad-2606-44ef-9cd3-ae59610a504b
>>>>>>>>>>>>>>>> /cc65f671-3377-494a-a7d4-1d9f7
>>>>>>>>>>>>>>>> c3ae46c/images/b1ea3f62-0f05-4
>>>>>>>>>>>>>>>> ded-8c82-9c91c90e0b61/d5d6bf5a-499f-431d-9013-5453db93ed32
>>>>>>>>>>>>>>>> /cc65f671-3377-494a-a7d4-1d9f7
>>>>>>>>>>>>>>>> c3ae46c/images/8c8b5147-e9d6-4
>>>>>>>>>>>>>>>> 810-b45b-185e3ed65727/16f08231-93b0-489d-a2fd-687b6bf88eaa
>>>>>>>>>>>>>>>> /cc65f671-3377-494a-a7d4-1d9f7
>>>>>>>>>>>>>>>> c3ae46c/images/12924435-b9c2-4
>>>>>>>>>>>>>>>> aab-ba19-1c1bc31310ef/07b3db69-440e-491e-854c-bbfa18a7cff2
>>>>>>>>>>>>>>>> Status: Connected
>>>>>>>>>>>>>>>> Number of entries: 8
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Brick 172.172.1.12:/gluster/brick3/data-hdd
>>>>>>>>>>>>>>>> /cc65f671-3377-494a-a7d4-1d9f7
>>>>>>>>>>>>>>>> c3ae46c/images/48d7ecb8-7ac5-4
>>>>>>>>>>>>>>>> 725-bca5-b3519681cf2f/0d6080b0-7018-4fa3-bb82-1dd9ef07d9b9
>>>>>>>>>>>>>>>> /cc65f671-3377-494a-a7d4-1d9f7
>>>>>>>>>>>>>>>> c3ae46c/images/647be733-f153-4
>>>>>>>>>>>>>>>> cdc-85bd-ba72544c2631/b453a300-0602-4be1-8310-8bd5abe00971
>>>>>>>>>>>>>>>> /cc65f671-3377-494a-a7d4-1d9f7
>>>>>>>>>>>>>>>> c3ae46c/images/b1ea3f62-0f05-4
>>>>>>>>>>>>>>>> ded-8c82-9c91c90e0b61/d5d6bf5a-499f-431d-9013-5453db93ed32
>>>>>>>>>>>>>>>> /cc65f671-3377-494a-a7d4-1d9f7
>>>>>>>>>>>>>>>> c3ae46c/images/6da854d1-b6be-4
>>>>>>>>>>>>>>>> 46b-9bf0-90a0dbbea830/3c93bd1f-b7fa-4aa2-b445-6904e31839ba
>>>>>>>>>>>>>>>> /cc65f671-3377-494a-a7d4-1d9f7
>>>>>>>>>>>>>>>> c3ae46c/images/7f647567-d18c-4
>>>>>>>>>>>>>>>> 4f1-a58e-9b8865833acb/f9364470-9770-4bb1-a6b9-a54861849625
>>>>>>>>>>>>>>>> /cc65f671-3377-494a-a7d4-1d9f7
>>>>>>>>>>>>>>>> c3ae46c/images/8c8b5147-e9d6-4
>>>>>>>>>>>>>>>> 810-b45b-185e3ed65727/16f08231-93b0-489d-a2fd-687b6bf88eaa
>>>>>>>>>>>>>>>> /cc65f671-3377-494a-a7d4-1d9f7
>>>>>>>>>>>>>>>> c3ae46c/images/12924435-b9c2-4
>>>>>>>>>>>>>>>> aab-ba19-1c1bc31310ef/07b3db69-440e-491e-854c-bbfa18a7cff2
>>>>>>>>>>>>>>>> /cc65f671-3377-494a-a7d4-1d9f7
>>>>>>>>>>>>>>>> c3ae46c/images/f3c8e7aa-6ef2-4
>>>>>>>>>>>>>>>> 2a7-93d4-e0a4df6dd2fa/2eb0b1ad-2606-44ef-9cd3-ae59610a504b
>>>>>>>>>>>>>>>> Status: Connected
>>>>>>>>>>>>>>>> Number of entries: 8
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Brick 172.172.1.13:/gluster/brick3/data-hdd
>>>>>>>>>>>>>>>> /cc65f671-3377-494a-a7d4-1d9f7
>>>>>>>>>>>>>>>> c3ae46c/images/b1ea3f62-0f05-4
>>>>>>>>>>>>>>>> ded-8c82-9c91c90e0b61/d5d6bf5a-499f-431d-9013-5453db93ed32
>>>>>>>>>>>>>>>> /cc65f671-3377-494a-a7d4-1d9f7
>>>>>>>>>>>>>>>> c3ae46c/images/8c8b5147-e9d6-4
>>>>>>>>>>>>>>>> 810-b45b-185e3ed65727/16f08231-93b0-489d-a2fd-687b6bf88eaa
>>>>>>>>>>>>>>>> /cc65f671-3377-494a-a7d4-1d9f7
>>>>>>>>>>>>>>>> c3ae46c/images/12924435-b9c2-4
>>>>>>>>>>>>>>>> aab-ba19-1c1bc31310ef/07b3db69-440e-491e-854c-bbfa18a7cff2
>>>>>>>>>>>>>>>> /cc65f671-3377-494a-a7d4-1d9f7
>>>>>>>>>>>>>>>> c3ae46c/images/f3c8e7aa-6ef2-4
>>>>>>>>>>>>>>>> 2a7-93d4-e0a4df6dd2fa/2eb0b1ad-2606-44ef-9cd3-ae59610a504b
>>>>>>>>>>>>>>>> /cc65f671-3377-494a-a7d4-1d9f7
>>>>>>>>>>>>>>>> c3ae46c/images/647be733-f153-4
>>>>>>>>>>>>>>>> cdc-85bd-ba72544c2631/b453a300-0602-4be1-8310-8bd5abe00971
>>>>>>>>>>>>>>>> /cc65f671-3377-494a-a7d4-1d9f7
>>>>>>>>>>>>>>>> c3ae46c/images/48d7ecb8-7ac5-4
>>>>>>>>>>>>>>>> 725-bca5-b3519681cf2f/0d6080b0-7018-4fa3-bb82-1dd9ef07d9b9
>>>>>>>>>>>>>>>> /cc65f671-3377-494a-a7d4-1d9f7
>>>>>>>>>>>>>>>> c3ae46c/images/6da854d1-b6be-4
>>>>>>>>>>>>>>>> 46b-9bf0-90a0dbbea830/3c93bd1f-b7fa-4aa2-b445-6904e31839ba
>>>>>>>>>>>>>>>> /cc65f671-3377-494a-a7d4-1d9f7
>>>>>>>>>>>>>>>> c3ae46c/images/7f647567-d18c-4
>>>>>>>>>>>>>>>> 4f1-a58e-9b8865833acb/f9364470-9770-4bb1-a6b9-a54861849625
>>>>>>>>>>>>>>>> Status: Connected
>>>>>>>>>>>>>>>> Number of entries: 8
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ---------
>>>>>>>>>>>>>>>> Its been in this state for a couple days now, and bandwidth
>>>>>>>>>>>>>>>> monitoring shows no appreciable data moving.  I've tried repeatedly
>>>>>>>>>>>>>>>> commanding a full heal from all three clusters in the node.  Its always the
>>>>>>>>>>>>>>>> same files that need healing.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> When running gluster volume heal data-hdd statistics, I see
>>>>>>>>>>>>>>>> sometimes different information, but always some number of "heal failed"
>>>>>>>>>>>>>>>> entries.  It shows 0 for split brain.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm not quite sure what to do.  I suspect it may be due to
>>>>>>>>>>>>>>>> nodes 1 and 2 still being on the older ovirt/gluster release, but I'm
>>>>>>>>>>>>>>>> afraid to upgrade and reboot them until I have a good gluster sync (don't
>>>>>>>>>>>>>>>> need to create a split brain issue).  How do I proceed with this?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Second issue: I've been experiencing VERY POOR performance
>>>>>>>>>>>>>>>> on most of my VMs.  To the tune that logging into a windows 10 vm via
>>>>>>>>>>>>>>>> remote desktop can take 5 minutes, launching quickbooks inside said vm can
>>>>>>>>>>>>>>>> easily take 10 minutes.  On some linux VMs, I get random messages like this:
>>>>>>>>>>>>>>>> Message from syslogd at unifi at May 28 20:39:23 ...
>>>>>>>>>>>>>>>>  kernel:[6171996.308904] NMI watchdog: BUG: soft lockup -
>>>>>>>>>>>>>>>> CPU#0 stuck for 22s! [mongod:14766]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> (the process and PID are often different)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm not quite sure what to do about this either.  My
>>>>>>>>>>>>>>>> initial thought was upgrad everything to current and see if its still
>>>>>>>>>>>>>>>> there, but I cannot move forward with that until my gluster is healed...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>> --Jim
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> Users mailing list -- users at ovirt.org
>>>>>>>>>>>>>>>> To unsubscribe send an email to users-leave at ovirt.org
>>>>>>>>>>>>>>>> Privacy Statement: https://www.ovirt.org/site/pri
>>>>>>>>>>>>>>>> vacy-policy/
>>>>>>>>>>>>>>>> oVirt Code of Conduct: https://www.ovirt.org/communit
>>>>>>>>>>>>>>>> y/about/community-guidelines/
>>>>>>>>>>>>>>>> List Archives: https://lists.ovirt.org/archiv
>>>>>>>>>>>>>>>> es/list/users at ovirt.org/messag
>>>>>>>>>>>>>>>> e/3LEV6ZQ3JV2XLAL7NYBTXOYMYUOTIRQF/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Users mailing list -- users at ovirt.org
>>>>>>>>>>> To unsubscribe send an email to users-leave at ovirt.org
>>>>>>>>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>>>>>>>>>> oVirt Code of Conduct: https://www.ovirt.org/communit
>>>>>>>>>>> y/about/community-guidelines/
>>>>>>>>>>> List Archives: https://lists.ovirt.org/archiv
>>>>>>>>>>> es/list/users at ovirt.org/message/ACO7RFSLBSRBAIONIC2HQ6Z24ZDE
>>>>>>>>>>> S5MF/
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list -- users at ovirt.org
>>>>>>> To unsubscribe send an email to users-leave at ovirt.org
>>>>>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>>>>>> oVirt Code of Conduct: https://www.ovirt.org/communit
>>>>>>> y/about/community-guidelines/
>>>>>>> List Archives: https://lists.ovirt.org/archiv
>>>>>>> es/list/users at ovirt.org/message/3DEQQLJM3WHQNZJ7KEMRZVFZ52MTIL74/
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Users mailing list -- users at ovirt.org
>> To unsubscribe send an email to users-leave at ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct: https://www.ovirt.org/communit
>> y/about/community-guidelines/
>> List Archives: https://lists.ovirt.org/archiv
>> es/list/users at ovirt.org/message/27P5WCXLKQ6T3UPN7BGXJWHRJWA2E54S/
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180602/bfaf7813/attachment.html>


More information about the Gluster-users mailing list