[Gluster-users] glusterfsd Not tainted error, server crashed

likun kun.li at ucarinc.com
Tue Dec 27 10:32:39 UTC 2016


Hi, everyone

 

I’m running glusterfs 3.8.5 container with kubernetes  in a coreos 1185.5.0 environment.

 

I had a 20 server’s glusterfs cluster, 7 bricks each server, totally a 132-bricks distributed 3-replicated volume. 

Here is the volume info:

 

Type: Distributed-Replicate

Volume ID: cc9f0101-0bc7-4a40-a813-a7e540593a2b

Status: Started

Snapshot Count: 0

Number of Bricks: 44 x 3 = 132

Transport-type: tcp

Bricks:

Brick1: 10.32.3.9:/mnt/brick1/vol

Brick2: 10.32.3.19:/mnt/brick1/vol

Brick3: 10.32.3.29:/mnt/brick1/vol 

….

Brick132: 10.32.3.40:/mnt/brick7/vol

Options Reconfigured:

nfs.disable: on

performance.readdir-ahead: on

transport.address-family: inet

features.quota: on

features.inode-quota: on

features.quota-deem-statfs: on

cluster.quorum-type: auto

features.quota-timeout: 10

features.bitrot: on

features.scrub: Active

performance.cache-size: 4GB

diagnostics.latency-measurement: on

diagnostics.count-fop-hits: on 

 

I was  doing some k8s upgrade from 1.4.6 to 1.5.1 for the time being. upgrade, reboot server, of cause the gluster pod restarted, one by one. 

 

After 3rd server upgraded, the status of this sever was not normal, some bricks were online, some didn’t . I was handling this, another server rebooted, thus, the mirror group which include these two servers entered readonly status. 

 

Then I checked that server , it’s crashed with the log followed:

 

Dec 27 14:22:29 ac07.pek.prod.com kernel: general protection fault: 0000 [#1] SMP

Dec 27 14:22:29 ac07.pek.prod.com kernel: Modules linked in: xt_physdev fuse nfsd auth_rpcgss nfs_acl lockd grace sunrpc bin

fmt_misc xfs ipt_REJECT nf_reject_ipv4 xt_statistic xt_nat xt_recent xt_mark ip6t_rpfilter xt_comment ip6table_nat nf_conntrack_ipv6

nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw nf_conntrack_netlink ip6table_filter ip6_tables xt_set ip_set nfnetlink ipt_MASQUERADE nf_n

at_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntr

ack br_netfilter bridge stp llc overlay bonding dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c crc32c_generic core

temp nls_ascii nls_cp437 sb_edac edac_core vfat fat x86_pkg_temp_thermal kvm_intel ipmi_ssif i2c_core ipmi_devintf kvm mei_me ipmi_s

i irqbypass ses enclosure evdev dcdbas scsi_transport_sas

Dec 27 14:22:29 ac07.pek.prod.com kernel:  mei ipmi_msghandler button sch_fq_codel ip_tables ext4 crc16 jbd2 mbcache mlx4_en

sd_mod crc32c_intel jitterentropy_rng drbg ahci aesni_intel libahci aes_x86_64 ehci_pci glue_helper tg3 lrw ehci_hcd gf128mul ablk_

helper hwmon megaraid_sas cryptd libata ptp mlx4_core usbcore scsi_mod pps_core usb_common libphy dm_mirror dm_region_hash dm_log dm

_mod autofs4

Dec 27 14:22:29 ac07.pek.prod.com kernel: CPU: 15 PID: 40090 Comm: glusterfsd Not tainted 4.7.3-coreos-r3 #1

Dec 27 14:22:29 ac07.pek.prod.com kernel: Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.1.7 06/16/2016

Dec 27 14:22:29 ac07.pek.prod.com kernel: task: ffff885fa2f11d40 ti: ffff885f66988000 task.ti: ffff885f66988000

Dec 27 14:22:29 ac07.pek.prod.com kernel: RIP: 0010:[<ffffffffa11e517f>]  [<ffffffffa11e517f>] do_iter_readv_writev+0xdf/0x1

10

Dec 27 14:22:29 ac07.pek.prod.com kernel: RSP: 0018:ffff885f6698bd58  EFLAGS: 00010246

Dec 27 14:22:29 ac07.pek.prod.com kernel: RAX: 0000000000000000 RBX: ffff885f6698bec8 RCX: ffffffffa1479020

Dec 27 14:22:29 ac07.pek.prod.com kernel: RDX: 73ff884e61ba1598 RSI: ffff885f6698bdb8 RDI: ffff885f7d0f3900

Dec 27 14:22:29 ac07.pek.prod.com kernel: RBP: ffff885f6698bd90 R08: 0000000000000000 R09: 0000000000000802

Dec 27 14:22:29 ac07.pek.prod.com kernel: R10: 0000000000000000 R11: ffff885f6698c000 R12: 0000000000000000

Dec 27 14:22:29 ac07.pek.prod.com kernel: R13: ffff885f7d0f3900 R14: ffff885f6698bec8 R15: 0000000000000000

Dec 27 14:22:29 ac07.pek.prod.com kernel: FS:  00007f00b5562700(0000) GS:ffff885fbe7c0000(0000) knlGS:0000000000000000

Dec 27 14:22:29 ac07.pek.prod.com kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

Dec 27 14:22:29 ac07.pek.prod.com kernel: CR2: 000000c821378000 CR3: 0000005eb41c3000 CR4: 00000000003406e0

Dec 27 14:22:29 ac07.pek.prod.com kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

Dec 27 14:22:29 ac07.pek.prod.com kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Dec 27 14:22:29 ac07.pek.prod.com kernel: Stack:

Dec 27 14:22:29 ac07.pek.prod.com kernel:  0000000000000000 ffff885f7d0f3900 ffff885f6698bec8 ffff885f6698bd90

Dec 27 14:22:29 ac07.pek.prod.com kernel:  ffffffffa11e58be 000000004c834e96 0000000000000000 ffff885f6698bea0

Dec 27 14:22:29 ac07.pek.prod.com kernel:  ffffffffa11e5da2 0000000000000000 ffff885f6698bde8 0000000000000000

Dec 27 14:22:29 ac07.pek.prod.com kernel: Call Trace:

Dec 27 14:22:29 ac07.pek.prod.com kernel:  [<ffffffffa11e58be>] ? rw_verify_area+0x4e/0xb0

Dec 27 14:22:29 ac07.pek.prod.com kernel:  [<ffffffffa11e5da2>] do_readv_writev+0x1a2/0x240

Dec 27 14:22:29 ac07.pek.prod.com kernel:  [<ffffffffa122e749>] ? ep_poll+0x139/0x340

Dec 27 14:22:29 ac07.pek.prod.com kernel:  [<ffffffffa1115771>] ? __audit_syscall_entry+0xb1/0x100

Dec 27 14:22:29 ac07.pek.prod.com kernel:  [<ffffffffa11e5e79>] vfs_readv+0x39/0x50

Dec 27 14:22:29 ac07.pek.prod.com kernel:  [<ffffffffa11e5ef1>] do_readv+0x61/0xf0

Dec 27 14:22:29 ac07.pek.prod.com kernel:  [<ffffffffa11e7270>] SyS_readv+0x10/0x20

Dec 27 14:22:29 ac07.pek.prod.com kernel:  [<ffffffffa1003c6d>] do_syscall_64+0x5d/0x150

Dec 27 14:22:29 ac07.pek.prod.com kernel:  [<ffffffffa159c7a1>] entry_SYSCALL64_slow_path+0x25/0x25

Dec 27 14:22:29 ac07.pek.prod.com kernel: Code: 54 48 8b 55 d0 48 89 13 48 8b 5d f0 65 48 33 1c 25 28 00 00 00 75 40 48 83 c

4 30 5b 5d c3 83 4d e8 30 eb c8 48 8b 97 f8 00 00 00 <48> 8b 12 4c 8b 52 28 41 f6 42 50 10 0f 85 65 ff ff ff f6 42 0c 

Dec 27 14:22:29 ac07.pek.prod.com kernel: RIP  [<ffffffffa11e517f>] do_iter_readv_writev+0xdf/0x110

 

I don’t have any hint about the cause of the crash, any help is appreciated.

 

likun

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161227/5e6cfe84/attachment.html>


More information about the Gluster-users mailing list