[Gluster-users] Glusterd seems to be ignoring that the underling filesystem got missing
Ivan Rossi
rouge2507 at gmail.com
Mon Sep 26 09:44:58 UTC 2016
for completeness:
https://bugzilla.redhat.com/show_bug.cgi?id=1378978
2016-09-23 18:06 GMT+02:00 Luca Gervasi <luca.gervasi at gmail.com>:
> Hi guys,
> I've got a strange problem involving this timeline (matches the "Log
> fragment 1" excerpt)
> 19:56:50: disk is detached from my system. This disk is actually the brick
> of the volume V.
> 19:56:50: LVM sees the disk as unreachable and starts its maintenance
> procedures
> 19:56:50: LVM umounts my thin provisioned volumes
> 19:57:02: Health check on specific bricks fails thus moving the brick to a
> down state
> 19:57:32: XFS filesystem umounts
>
> At this point, the brick filesystem is no longer mounted. The underlying
> filesystems is empty (misses the brick directory too). My assumption is that
> gluster would stop itself in such conditions: it is not.
> Gluster slowly fills my entire root partition, creating its full tree.
>
> My only warning point is the disk that starts to fill its inodes to 100%.
>
> I've read release notes for every version subsequent mine (3.7.14, 3.7.15)
> without finding relevant fixes and at this point i'm pretty sure is some bug
> undocumented.
> Servers were made symmetric.
>
> Could you please help me understand how to avoid that gluster coninues write
> on an unmounted filesystem? Thanks.
>
> I'm running a 3 node replica on 3 azure vms. This is the configuration:
>
> MD (yes, i use md to aggregate 4 disks into a single 4Tb volume):
> /dev/md128:
> Version : 1.2
> Creation Time : Mon Aug 29 18:10:45 2016
> Raid Level : raid0
> Array Size : 4290248704 (4091.50 GiB 4393.21 GB)
> Raid Devices : 4
> Total Devices : 4
> Persistence : Superblock is persistent
>
> Update Time : Mon Aug 29 18:10:45 2016
> State : clean
> Active Devices : 4
> Working Devices : 4
> Failed Devices : 0
> Spare Devices : 0
>
> Chunk Size : 512K
>
> Name : 128
> UUID : d5c51214:43e48da9:49086616:c1371514
> Events : 0
>
> Number Major Minor RaidDevice State
> 0 8 80 0 active sync /dev/sdf
> 1 8 96 1 active sync /dev/sdg
> 2 8 112 2 active sync /dev/sdh
> 3 8 128 3 active sync /dev/sdi
>
> PV, VG, LV status
> PV VG Fmt Attr PSize PFree DevSize PV UUID
> /dev/md127 VGdata lvm2 a-- 2.00t 2.00t 2.00t
> Kxb6C0-FLIH-4rB1-DKyf-IQuR-bbPE-jm2mu0
> /dev/md128 gluster lvm2 a-- 4.00t 1.07t 4.00t
> lDazuw-zBPf-Duis-ZDg1-3zfg-53Ba-2ZF34m
>
> VG Attr Ext #PV #LV #SN VSize VFree VG UUID
> VProfile
> VGdata wz--n- 4.00m 1 0 0 2.00t 2.00t
> XI2V2X-hdxU-0Jrn-TN7f-GSEk-7aNs-GCdTtn
> gluster wz--n- 4.00m 1 6 0 4.00t 1.07t
> ztxX4f-vTgN-IKop-XePU-OwqW-T9k6-A6uDk0
>
> LV VG #Seg Attr LSize Maj Min KMaj KMin Pool
> Origin Data% Meta% Move Cpy%Sync Log Convert LV UUID
> LProfile
> apps-data gluster 1 Vwi-aotz-- 50.00g -1 -1 253 12
> thinpool 0.08
> znUMbm-ax1N-R7aj-dxLc-gtif-WOvk-9QC8tq
> feed gluster 1 Vwi-aotz-- 100.00g -1 -1 253 14
> thinpool 0.08
> hZ4Isk-dELG-lgFs-2hJ6-aYid-8VKg-3jJko9
> homes gluster 1 Vwi-aotz-- 1.46t -1 -1 253 11
> thinpool 58.58
> salIPF-XvsA-kMnm-etjf-Uaqy-2vA9-9WHPkH
> search-data gluster 1 Vwi-aotz-- 100.00g -1 -1 253 13
> thinpool 16.41
> Z5hoa3-yI8D-dk5Q-2jWH-N5R2-ge09-RSjPpQ
> thinpool gluster 1 twi-aotz-- 2.93t -1 -1 253 9
> 29.85 60.00
> oHTbgW-tiPh-yDfj-dNOm-vqsF-fBNH-o1izx2
> video-asset-manager gluster 1 Vwi-aotz-- 100.00g -1 -1 253 15
> thinpool 0.07
> 4dOXga-96Wa-u3mh-HMmE-iX1I-o7ov-dtJ8lZ
>
> Gluster volume configuration (all volumes use the same exact configuration,
> listing them all would be redundant)
> Volume Name: vol-homes
> Type: Replicate
> Volume ID: 0c8fa62e-dd7e-429c-a19a-479404b5e9c6
> Status: Started
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: glu01.prd.azr:/bricks/vol-homes/brick1
> Brick2: glu02.prd.azr:/bricks/vol-homes/brick1
> Brick3: glu03.prd.azr:/bricks/vol-homes/brick1
> Options Reconfigured:
> performance.readdir-ahead: on
> cluster.server-quorum-type: server
> nfs.disable: disable
> cluster.lookup-unhashed: auto
> performance.nfs.quick-read: on
> performance.nfs.read-ahead: on
> performance.cache-size: 4096MB
> cluster.self-heal-daemon: enable
> diagnostics.brick-log-level: ERROR
> diagnostics.client-log-level: ERROR
> nfs.rpc-auth-unix: off
> nfs.acl: off
> performance.nfs.io-cache: on
> performance.client-io-threads: on
> performance.nfs.stat-prefetch: on
> performance.nfs.io-threads: on
> diagnostics.latency-measurement: on
> diagnostics.count-fop-hits: on
> performance.md-cache-timeout: 1
> performance.cache-refresh-timeout: 1
> performance.io-thread-count: 16
> performance.high-prio-threads: 16
> performance.normal-prio-threads: 16
> performance.low-prio-threads: 16
> performance.least-prio-threads: 1
> cluster.server-quorum-ratio: 60
>
> fstab:
> /dev/gluster/homes /bricks/vol-homes
> xfs defaults,noatime,nobarrier,nofail 0 2
>
> Software:
> CentOS Linux release 7.1.1503 (Core)
> glusterfs-api-3.7.13-1.el7.x86_64
> glusterfs-libs-3.7.13-1.el7.x86_64
> glusterfs-3.7.13-1.el7.x86_64
> glusterfs-fuse-3.7.13-1.el7.x86_64
> glusterfs-server-3.7.13-1.el7.x86_64
> glusterfs-client-xlators-3.7.13-1.el7.x86_64
> glusterfs-cli-3.7.13-1.el7.x86_64
>
>
> Log fragment 1:
> Sep 22 19:56:50 glu03 lvm[868]: WARNING: Device for PV
> lDazuw-zBPf-Duis-ZDg1-3zfg-53Ba-2ZF34m not found or rejected by a filter.
> Sep 22 19:56:50 glu03 lvm[868]: Cannot change VG gluster while PVs are
> missing.
> Sep 22 19:56:50 glu03 lvm[868]: Consider vgreduce --removemissing.
> Sep 22 19:56:50 glu03 lvm[868]: Failed to extend thin metadata
> gluster-thinpool-tpool.
> Sep 22 19:56:50 glu03 lvm[868]: Unmounting thin volume
> gluster-thinpool-tpool from /bricks/vol-homes.
> Sep 22 19:56:50 glu03 lvm[868]: Unmounting thin volume
> gluster-thinpool-tpool from /bricks/vol-search-data.
> Sep 22 19:56:50 glu03 lvm[868]: Unmounting thin volume
> gluster-thinpool-tpool from /bricks/vol-apps-data.
> Sep 22 19:56:50 glu03 lvm[868]: Unmounting thin volume
> gluster-thinpool-tpool from /bricks/vol-video-asset-manager.
> Sep 22 19:57:02 glu03 bricks-vol-video-asset-manager-brick1[45162]:
> [2016-09-22 17:57:02.713428] M [MSGID: 113075]
> [posix-helpers.c:1844:posix_health_check_thread_proc]
> 0-vol-video-asset-manager-posix: health-check failed, going down
> Sep 22 19:57:05 glu03 bricks-vol-apps-data-brick1[44536]: [2016-09-22
> 17:57:05.186146] M [MSGID: 113075]
> [posix-helpers.c:1844:posix_health_check_thread_proc] 0-vol-apps-data-posix:
> health-check failed, going down
> Sep 22 19:57:18 glu03 bricks-vol-search-data-brick1[40928]: [2016-09-22
> 17:57:18.674279] M [MSGID: 113075]
> [posix-helpers.c:1844:posix_health_check_thread_proc]
> 0-vol-search-data-posix: health-check failed, going down
> Sep 22 19:57:32 glu03 bricks-vol-video-asset-manager-brick1[45162]:
> [2016-09-22 17:57:32.714461] M [MSGID: 113075]
> [posix-helpers.c:1850:posix_health_check_thread_proc]
> 0-vol-video-asset-manager-posix: still alive! -> SIGTERM
> Sep 22 19:57:32 glu03 kernel: XFS (dm-15): Unmounting Filesystem
> Sep 22 19:57:35 glu03 bricks-vol-apps-data-brick1[44536]: [2016-09-22
> 17:57:35.186352] M [MSGID: 113075]
> [posix-helpers.c:1850:posix_health_check_thread_proc] 0-vol-apps-data-posix:
> still alive! -> SIGTERM
> Sep 22 19:57:35 glu03 kernel: XFS (dm-12): Unmounting Filesystem
> Sep 22 19:57:48 glu03 bricks-vol-search-data-brick1[40928]: [2016-09-22
> 17:57:48.674444] M [MSGID: 113075]
> [posix-helpers.c:1850:posix_health_check_thread_proc]
> 0-vol-search-data-posix: still alive! -> SIGTERM
> Sep 22 19:57:48 glu03 kernel: XFS (dm-13): Unmounting Filesystem
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
More information about the Gluster-users
mailing list