[Gluster-devel] BitRot: detection of files changes

Gaurav Garg ggarg at redhat.com
Sun Oct 4 18:05:54 UTC 2015


Hi,

>> Question: why changes was not detected and how should be process of 
detection of broken files (denied access/read-only mode etc) ?

By looking into your logs Bitrot and scrubber have done there job perfectly.

if you see the bitD log you will find below entry

[2015-10-04 13:56:53.042552] I [MSGID: 118039] [bit-rot.c:1291:br_child_enaction] 0-ovirt-data-bit-rot-0: Connected to brick /export/vdb1/ovirt-data..
[2015-10-04 13:56:53.042642] I [MSGID: 118018] [bit-rot.c:1093:br_oneshot_signer] 0-ovirt-data-bit-rot-0: Crawling brick [/export/vdb1/ovirt-data], scanning for unsigned objects
[2015-10-04 13:56:53.059071] I [MSGID: 118025] [bit-rot.c:1102:br_oneshot_signer] 0-ovirt-data-bit-rot-0: Completed crawling brick [/export/vdb1/ovirt-data]

this means that crawling of brick have completed and it have signed the file if file are present. After that its showing one more log which indicate that *glusterfs* process was killed

[2015-10-04 15:26:48.959107] W [glusterfsd.c:1219:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7df5) [0x7f7103a67df5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7f71050d1785] -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7f71050d1609] ) 0-: received signum (15), shutting down

This means that bitD have done its job successfully before glusterfs process goes down.

Now lets look in to your scrubber logs.

[2015-10-04 13:56:51.051743] I [MSGID: 118038] [bit-rot-scrub.c:778:br_fsscan_schedule] 0-ovirt-data-bit-rot-0: Scrubbing for /export/vdb1/ovirt-data scheduled to run at 2015-10-04 14:56:51
[2015-10-04 13:56:51.051774] I [MSGID: 118039] [bit-rot.c:1291:br_child_enaction] 0-ovirt-data-bit-rot-0: Connected to brick /export/vdb1/ovirt-data..
[2015-10-04 14:56:02.712660] I [MSGID: 118044] [bit-rot-scrub.c:571:br_fsscanner_log_time] 0-ovirt-data-bit-rot-0: Scrubbing "/export/vdb1/ovirt-data" started at 2015-10-04 14:56:02
[2015-10-04 14:56:40.779426] I [MSGID: 118045] [bit-rot-scrub.c:575:br_fsscanner_log_time] 0-ovirt-data-bit-rot-0: Scrubbing "/export/vdb1/ovirt-data" finished at 2015-10-04 14:56:40
[2015-10-04 14:56:40.779466] I [MSGID: 118038] [bit-rot-scrub.c:815:br_fsscan_activate] 0-ovirt-data-bit-rot-0: Scrubbing for /export/vdb1/ovirt-data rescheduled to run at 2015-10-04 15:56:40

by looking into above logs its clear that scrubber also finished its job before glusterfs process goes down and there is no bad file/object detected because scrubber process is again rescheduling for object verification. If there is any bad file then before rescheduling scrubbing we should have bad file detection log entry in the scrubber log.

After scrubber process have done its job (verification of object integrity) logs are showing that *glusterfs* process goes down.

[2015-10-04 15:26:48.959631] W [glusterfsd.c:1219:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7df5) [0x7fade12fbdf5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7fade2965785] -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7fade2965609] ) 0-: received signum (15), shutting down


but your *glusterfs* process goes down after detecting changes. means bitd and scrubber process successfully detected whatever change are there.


>> how should be process of detection of broken files

When there is corrupt in the file means there is change in content of the file (by bit flipping or bit corruption ) then client will not know that there is change in the file and before corruption in the file signer already have calculated the checksum of content of that file and stored in the extended attributed of that file. So after file corruption scrubber will again calculated the checksum of that corrupted file and there will be different checksum compare to what signer process store checksum as a extended attributed of the file. now scrubber process will compare both checksum. if there is checksum mismatch then it will marked that file as a bad file and store that bad file information in a extended attributed of that file as well as in the scrubber logs. so if there is bad file detected by the scrubber process then you will get an *corruption detected* entry in the scrubber logs.

Thanx,

~Gaurav


----- Original Message -----
From: "Попов О.В." <o.popov at livelace.ru>
To: gluster-devel at gluster.org
Sent: Sunday, October 4, 2015 9:55:52 PM
Subject: [Gluster-devel] BitRot: detection of files changes

Hello.

Version:

[root at glusterfs1 ~]# rpm -qa | grep gluster
glusterfs-client-xlators-3.7.4-2.el7.x86_64
glusterfs-cli-3.7.4-2.el7.x86_64
glusterfs-3.7.4-2.el7.x86_64
glusterfs-geo-replication-3.7.4-2.el7.x86_64
glusterfs-fuse-3.7.4-2.el7.x86_64
glusterfs-server-3.7.4-2.el7.x86_64
glusterfs-libs-3.7.4-2.el7.x86_64
glusterfs-api-3.7.4-2.el7.x86_64

Configuration of volume:

[root at glusterfs2 ~]# gluster volume info ovirt-data

Volume Name: ovirt-data
Type: Replicate
Volume ID: a1faaa53-c53e-4696-a078-0e1f4e2444a2
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.0.102.6:/export/vdb1/ovirt-data
Brick2: 10.0.102.7:/export/vdb1/ovirt-data
Brick3: 10.0.102.8:/export/vdb1/ovirt-data
Options Reconfigured:
server.allow-insecure: on
storage.owner-gid: 36
storage.owner-uid: 36
performance.readdir-ahead: on
geo-replication.indexing: on
geo-replication.ignore-pid-check: on
changelog.changelog: on
cluster.quorum-count: 1
cluster.quorum-type: fixed
network.ping-timeout: 1
features.bitrot: on
features.scrub: Active
features.scrub-freq: hourly
features.scrub-throttle: aggressive

On glusterfs1:

mount -t glusterfs 10.0.102.6:/ovirt-data /mnt
dd if=/dev/urandom of=/mnt/file bs=1M count=100
md5sum /export/vdb1/ovirt-data/file
0b093e9f1524cde2d64ce0a105237190  /export/vdb1/ovirt-data/file

On glusterfs3:

dd if=/dev/urandom of=/export/vdb1/ovirt-data/file bs=1M count=100
md5sum /export/vdb1/ovirt-data/file
af0fee68511176b7fcac6cbbd3137431  /export/vdb1/ovirt-data/file

Logs of scrub and bitd attached.

Question: why changes was not detected and how should be process of 
detection of broken files (denied access/read-only mode etc) ?

-- 
С уважением, Попов О.В. / Best regards, Popov V Oleg

tel: +7 981 783-90-13
skype: livelace


_______________________________________________
Gluster-devel mailing list
Gluster-devel at gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


More information about the Gluster-devel mailing list