[Gluster-users] Problem with VM images when one node goes online (self-healing) on a 2 node replication gluster for VMware datastore

Tue Oct 11 09:05:36 UTC 2011

With 3.2.4 during self-heal, no operation on the file being healed is 
allowed so your VM's will stall and time out if the self-heal isn't 
finished quick enough. gluster 3.3 will fix this, but I don't know when 
it will be released. There are betas to try out though :). Perhaps 
somebody else can say how stable 3.3-beta2 is compared to 3.2.4?

On 10/11/2011 10:57 AM, keith wrote:
> Hi all
>
> I am testing gluster-3.2.4 on a 2 nodes storage with replication as 
> our VMware datastore.
>
> The setup is running replication on 2 nodes with ucarp and mount it on 
> WMware using NFS to gluster as a datastore.
>
>> Volume Name: GLVOL1
>> Type: Replicate
>> Status: Started
>> Number of Bricks: 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: t4-01.store:/EXPORT/GLVOL1
>> Brick2: t4-03.store:/EXPORT/GLVOL1
>> Options Reconfigured:
>> performance.cache-size: 4096MB
>
> High-availability testing goes on smoothly without any problem or 
> data-corruption, that is when any node is down, all VM guests runs 
> normally without any problem.
>
> The problem arises when I bring up the failed node and the node start 
> doing self-healing.  All my VM guests get kernel error messages and 
> finally the VM guests ended up with "EXT3-fs error: 
> ext3_journal_start_sb: detected aborted journal" remount filesystem 
> (root) as read-only.
>
> Below are some of the VM guests kernel error generated when I bring up 
> the failed gluster node for self-healing:
>
>> Oct 11 15:57:58 testvm3 kernel: pvscsi: task abort on host 1, 
>> ffff8100221c90c0
>> Oct 11 15:57:58 testvm3 kernel: pvscsi: task abort on host 1, 
>> ffff8100221c9240
>> Oct 11 15:57:58 testvm3 kernel: pvscsi: task abort on host 1, 
>> ffff8100221c93c0
>> Oct 11 15:58:34 testvm3 kernel: INFO: task kjournald:2081 blocked for 
>> more than 120 seconds.
>> Oct 11 15:58:34 testvm3 kernel: "echo 0 > 
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Oct 11 15:58:34 testvm3 kernel: kjournald     D ffff810001736420     
>> 0  2081     14          2494  2060 (L-TLB)
>> Oct 11 15:58:34 testvm3 kernel: ffff81003c087cf0 0000000000000046 
>> ffff810030ef2288 ffff81003f5d6048
>> Oct 11 15:58:34 testvm3 kernel: 00000000037685c8 000000000000000a 
>> ffff810037c53820 ffffffff80314b60
>> Oct 11 15:58:34 testvm3 kernel: 00001883cb68d47d 0000000000002c4e 
>> ffff810037c53a08 000000003f5128b8
>> Oct 11 15:58:34 testvm3 kernel: Call Trace:
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8006ec8f>] 
>> do_gettimeofday+0x40/0x90
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800155d3>] 
>> sync_buffer+0x0/0x3f
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800637ce>] 
>> io_schedule+0x3f/0x67
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8001560e>] 
>> sync_buffer+0x3b/0x3f
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800639fa>] 
>> __wait_on_bit+0x40/0x6e
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800155d3>] 
>> sync_buffer+0x0/0x3f
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80063a94>] 
>> out_of_line_wait_on_bit+0x6c/0x78
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2e2b>] 
>> wake_bit_function+0x0/0x23
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff88033a41>] 
>> :jbd:journal_commit_transaction+0x553/0x10aa
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8003d85b>] 
>> lock_timer_base+0x1b/0x3c
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8004ad98>] 
>> try_to_del_timer_sync+0x7f/0x88
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff88037662>] 
>> :jbd:kjournald+0xc1/0x213
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2dfd>] 
>> autoremove_wake_function+0x0/0x2e
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2be5>] 
>> keventd_create_kthread+0x0/0xc4
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff880375a1>] 
>> :jbd:kjournald+0x0/0x213
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2be5>] 
>> keventd_create_kthread+0x0/0xc4
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80032722>] kthread+0xfe/0x132
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2be5>] 
>> keventd_create_kthread+0x0/0xc4
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80032624>] kthread+0x0/0x132
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11
>> Oct 11 15:58:34 testvm3 kernel:
>> Oct 11 15:58:34 testvm3 kernel: INFO: task crond:3418 blocked for 
>> more than 120 seconds.
>> Oct 11 15:58:34 testvm3 kernel: "echo 0 > 
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Oct 11 15:58:34 testvm3 kernel: crond         D ffff810001736420     
>> 0  3418      1          3436  3405 (NOTLB)
>> Oct 11 15:58:34 testvm3 kernel: ffff810036c55ca8 0000000000000086 
>> 0000000000000000 ffffffff80019e3e
>> Oct 11 15:58:34 testvm3 kernel: 0000000000065bf2 0000000000000007 
>> ffff81003ce4b080 ffffffff80314b60
>> Oct 11 15:58:34 testvm3 kernel: 000018899ae16270 0000000000023110 
>> ffff81003ce4b268 000000008804ec00
>> Oct 11 15:58:34 testvm3 kernel: Call Trace:
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80019e3e>] __getblk+0x25/0x22c
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8006ec8f>] 
>> do_gettimeofday+0x40/0x90
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800155d3>] 
>> sync_buffer+0x0/0x3f
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800637ce>] 
>> io_schedule+0x3f/0x67
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8001560e>] 
>> sync_buffer+0x3b/0x3f
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80063912>] 
>> __wait_on_bit_lock+0x36/0x66
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800155d3>] 
>> sync_buffer+0x0/0x3f
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800639ae>] 
>> out_of_line_wait_on_bit_lock+0x6c/0x78
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2e2b>] 
>> wake_bit_function+0x0/0x23
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8803181e>] 
>> :jbd:do_get_write_access+0x54/0x522
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80019e3e>] __getblk+0x25/0x22c
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff88031d0e>] 
>> :jbd:journal_get_write_access+0x22/0x33
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8804dd37>] 
>> :ext3:ext3_reserve_inode_write+0x38/0x90
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8804ddb0>] 
>> :ext3:ext3_mark_inode_dirty+0x21/0x3c
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff88050d35>] 
>> :ext3:ext3_dirty_inode+0x63/0x7b
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80013d98>] 
>> __mark_inode_dirty+0x29/0x16e
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80025a49>] filldir+0x0/0xb7
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8003516b>] 
>> vfs_readdir+0x8c/0xa9
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800389db>] 
>> sys_getdents+0x75/0xbd
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8005d229>] tracesys+0x71/0xe0
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>> Oct 11 15:58:34 testvm3 kernel:
>> Oct 11 15:58:34 testvm3 kernel: INFO: task httpd:3452 blocked for 
>> more than 120 seconds.
>> Oct 11 15:58:34 testvm3 kernel: "echo 0 > 
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Oct 11 15:58:34 testvm3 kernel: httpd         D ffff810001736420     
>> 0  3452   3405          3453       (NOTLB)
>> Oct 11 15:58:34 testvm3 kernel: ffff810035ea9dc8 0000000000000086 
>> 0000000000000000 ffffffff80009a1c
>> Oct 11 15:58:34 testvm3 kernel: ffff810035ea9e28 0000000000000009 
>> ffff810037e52080 ffffffff80314b60
>> Oct 11 15:58:34 testvm3 kernel: 000018839f75405c 000000000003363d 
>> ffff810037e52268 000000003f5e7150
>
> Please note that although I am using ucarp for IP failover and by 
> default ucarp will alway have a preferred master, I have added codes 
> to make sure that the ucarp master will always become slave when it 
> goes down and come up again.  This will ensure that WMware will not 
> connect back to the failed node when it comes back up.
>
> However this does not prevent the problem I describe above.
>
> There are a lot of logs generated during self-healing process.  It 
> doesn't make any sense to me.  I am attaching it. It's over 900k. So I 
> zip them up.  Hopefully the mailling list allow attachment.
>
> Is there any best practices to setup/run gluster with replication as a 
> datastore to VMware that make sure VM guests run smoothly even when 
> one node goes into self-healing?
>
> Any advise is appreciated.
>
> Keith
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20111011/ba49610c/attachment.html>