[Bugs] [Bug 1762438] New: DHT- gluster rebalance status shows wrong data size after rebalance is completed successfully
bugzilla at redhat.com
bugzilla at redhat.com
Wed Oct 16 18:04:53 UTC 2019
https://bugzilla.redhat.com/show_bug.cgi?id=1762438
Bug ID: 1762438
Summary: DHT- gluster rebalance status shows wrong data size
after rebalance is completed successfully
Product: GlusterFS
Version: mainline
Hardware: x86_64
OS: Linux
Status: NEW
Component: glusterd
Severity: high
Assignee: bugs at gluster.org
Reporter: srakonde at redhat.com
CC: amukherj at redhat.com, bmekala at redhat.com,
bshetty at redhat.com, bugs at gluster.org,
nchilaka at redhat.com, rhs-bugs at redhat.com,
saraut at redhat.com, spalai at redhat.com,
storage-qa-internal at redhat.com, vbellur at redhat.com
Depends On: 1761486
Target Milestone: ---
Classification: Community
+++ This bug was initially created as a clone of Bug #1761486 +++
Description of problem:
=======================
While doing a gluster re-balance from 1*(4+2) to 2*(4+2) and then to
3*(4+2),the re-balance status shows inconsistency (wrong data size) after the
re-balance is completed. The re-balance status shows proper data till 4~6 hours
of completion, then it mismatches. This was seen in both 2*(4+2) and 3*(4+2)
configuration. Though the size was in GB, after the completion of re-balance it
shows in PB.
Version-Release number of selected component (if applicable):
============================================================
glusterfs-6.0-15.el7rhgs.x86_64
How reproducible:
================
2/2
Steps:
=====
1.Create a RHHI-V setup with 3 node config
2.Create a disperse volume of 1*(4+2)
3.Start writing fio IO's using a gluster client and wait till its completed
(say 400GB)
4.Now add brick and it's 2*(4+2) and start re-balance and wait till it
completes
5.After 4-5 hours check the re-balance status, it shows incorrect data size
6.Repeat step 3 and 4 ,now it becomes 3*(4+2)
7.Start re-balance and wait till the re-balance is completed
8.Check the re-balance status as mentioned in step 5
Actual results:
==============
Showing wrong file size (in Peta Byte)
Expected results:
=================
Should show actual size
--- Additional comment from bipin on 2019-10-14 19:19:54 IST ---
[root at rhsqa-grafton7-nic2 glusterfs]# gluster volume info
Volume Name: backupvol
Type: Distributed-Disperse
Volume ID: a2926822-a9a4-4ca7-8894-cf71d9ae1057
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (4 + 2) = 18
Transport-type: tcp
Bricks:
Brick1: rhsqa-grafton7.lab.eng.blr.redhat.com:/gluster/brick1/b1
Brick2: rhsqa-grafton8.lab.eng.blr.redhat.com:/gluster/brick1/b1
Brick3: rhsqa-grafton9.lab.eng.blr.redhat.com:/gluster/brick1/b1
Brick4: rhsqa-grafton7.lab.eng.blr.redhat.com:/gluster/brick2/b2
Brick5: rhsqa-grafton8.lab.eng.blr.redhat.com:/gluster/brick2/b2
Brick6: rhsqa-grafton9.lab.eng.blr.redhat.com:/gluster/brick2/b2
Brick7: rhsqa-grafton7.lab.eng.blr.redhat.com:/gluster/brick3/b3
Brick8: rhsqa-grafton8.lab.eng.blr.redhat.com:/gluster/brick3/b3
Brick9: rhsqa-grafton9.lab.eng.blr.redhat.com:/gluster/brick3/b3
Brick10: rhsqa-grafton7.lab.eng.blr.redhat.com:/gluster/brick4/b4
Brick11: rhsqa-grafton8.lab.eng.blr.redhat.com:/gluster/brick4/b4
Brick12: rhsqa-grafton9.lab.eng.blr.redhat.com:/gluster/brick4/b4
Brick13: rhsqa-grafton7.lab.eng.blr.redhat.com:/gluster/brick5/b5
Brick14: rhsqa-grafton8.lab.eng.blr.redhat.com:/gluster/brick5/b5
Brick15: rhsqa-grafton9.lab.eng.blr.redhat.com:/gluster/brick5/b5
Brick16: rhsqa-grafton7.lab.eng.blr.redhat.com:/gluster/brick6/b6
Brick17: rhsqa-grafton8.lab.eng.blr.redhat.com:/gluster/brick6/b6
Brick18: rhsqa-grafton9.lab.eng.blr.redhat.com:/gluster/brick6/b6
Options Reconfigured:
nfs.disable: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.export-dirs: off
nfs.export-volumes: off
performance.read-ahead: off
performance.io-thread-count: 24
performance.md-cache-timeout: 3
performance.cache-refresh-timeout: 3
performance.cache-size: 7GB
performance.parallel-readdir: on
client.event-threads: 5
server.event-threads: 5
features.cache-invalidation-timeout: 600
performance.cache-invalidation: on
network.inode-lru-limit: 65536
cluster.brick-multiplex: on
[root at rhsqa-grafton7-nic2 glusterfs]# df -Th
Filesystem Type Size
Used Avail Use% Mounted on
devtmpfs devtmpfs 126G
0 126G 0% /dev
tmpfs tmpfs 126G
16K 126G 1% /dev/shm
tmpfs tmpfs 126G
11M 126G 1% /run
tmpfs tmpfs 126G
0 126G 0% /sys/fs/cgroup
/dev/mapper/rhel_rhsqa--grafton7-root xfs 50G
5.2G 45G 11% /
/dev/mapper/GLUSTER_vg2-GLUSTER_lv2 xfs 1.9T
81G 1.8T 5% /gluster/brick2
/dev/mapper/GLUSTER_vg4-GLUSTER_lv4 xfs 1.9T
38G 1.8T 3% /gluster/brick4
/dev/mapper/GLUSTER_vg3-GLUSTER_lv3 xfs 1.9T
38G 1.8T 3% /gluster/brick3
/dev/mapper/GLUSTER_vg1-GLUSTER_lv1 xfs 1.9T
115G 1.7T 7% /gluster/brick1
/dev/mapper/GLUSTER_vg6-GLUSTER_lv6 xfs 1.9T
41G 1.8T 3% /gluster/brick6
/dev/mapper/GLUSTER_vg5-GLUSTER_lv5 xfs 1.9T
41G 1.8T 3% /gluster/brick5
/dev/mapper/gluster_vg_sdp2-gluster_lv_commvault_server xfs 500G
64G 437G 13% /gluster_bricks/commvault_server
/dev/mapper/rhel_rhsqa--grafton7-home xfs 876G
34M 876G 1% /home
/dev/sda1 xfs 1014M
200M 815M 20% /boot
/dev/mapper/gluster_vg_sdp1-gluster_lv_engine xfs 100G
5.9G 95G 6% /gluster_bricks/engine
rhsqa-grafton7.lab.eng.blr.redhat.com:/engine fuse.glusterfs 100G
6.9G 94G 7%
/rhev/data-center/mnt/glusterSD/rhsqa-grafton7.lab.eng.blr.redhat.com:_engine
rhsqa-grafton7.lab.eng.blr.redhat.com:/commvault_server fuse.glusterfs 500G
69G 432G 14%
/rhev/data-center/mnt/glusterSD/rhsqa-grafton7.lab.eng.blr.redhat.com:_commvault__server
tmpfs tmpfs 26G
0 26G 0% /run/user/0
[root at rhsqa-grafton8-nic2 ~]# df -Th
Filesystem Type Size
Used Avail Use% Mounted on
devtmpfs devtmpfs 126G
0 126G 0% /dev
tmpfs tmpfs 126G
16K 126G 1% /dev/shm
tmpfs tmpfs 126G
11M 126G 1% /run
tmpfs tmpfs 126G
0 126G 0% /sys/fs/cgroup
/dev/mapper/rhel_rhsqa--grafton8-root xfs 50G
3.1G 47G 7% /
/dev/mapper/GLUSTER_vg5-GLUSTER_lv5 xfs 1.9T
41G 1.8T 3% /gluster/brick5
/dev/mapper/GLUSTER_vg2-GLUSTER_lv2 xfs 1.9T
81G 1.8T 5% /gluster/brick2
/dev/mapper/rhel_rhsqa--grafton8-home xfs 876G
34M 876G 1% /home
/dev/mapper/GLUSTER_vg1-GLUSTER_lv1 xfs 1.9T
81G 1.8T 5% /gluster/brick1
/dev/mapper/GLUSTER_vg3-GLUSTER_lv3 xfs 1.9T
38G 1.8T 3% /gluster/brick3
/dev/mapper/GLUSTER_vg6-GLUSTER_lv6 xfs 1.9T
41G 1.8T 3% /gluster/brick6
/dev/mapper/GLUSTER_vg4-GLUSTER_lv4 xfs 1.9T
38G 1.8T 3% /gluster/brick4
/dev/sda1 xfs 1014M
200M 815M 20% /boot
/dev/mapper/gluster_vg_sdp2-gluster_lv_commvault_server xfs 500G
64G 437G 13% /gluster_bricks/commvault_server
/dev/mapper/gluster_vg_sdp1-gluster_lv_engine xfs 100G
5.9G 95G 6% /gluster_bricks/engine
rhsqa-grafton7.lab.eng.blr.redhat.com:/engine fuse.glusterfs 100G
6.9G 94G 7%
/rhev/data-center/mnt/glusterSD/rhsqa-grafton7.lab.eng.blr.redhat.com:_engine
rhsqa-grafton7.lab.eng.blr.redhat.com:/commvault_server fuse.glusterfs 500G
69G 432G 14%
/rhev/data-center/mnt/glusterSD/rhsqa-grafton7.lab.eng.blr.redhat.com:_commvault__server
tmpfs tmpfs 26G
0 26G 0% /run/user/0
[root at rhsqa-grafton9 glusterfs]# df -Th
Filesystem Type Size
Used Avail Use% Mounted on
devtmpfs devtmpfs 126G
0 126G 0% /dev
tmpfs tmpfs 126G
16K 126G 1% /dev/shm
tmpfs tmpfs 126G
27M 126G 1% /run
tmpfs tmpfs 126G
0 126G 0% /sys/fs/cgroup
/dev/mapper/rhel_rhsqa--grafton9-root xfs 50G
3.1G 47G 7% /
/dev/mapper/GLUSTER_vg6-GLUSTER_lv6 xfs 1.9T
41G 1.8T 3% /gluster/brick6
/dev/mapper/gluster_vg_sdp2-gluster_lv_commvault_server xfs 500G
64G 437G 13% /gluster_bricks/commvault_server
/dev/mapper/GLUSTER_vg1-GLUSTER_lv1 xfs 1.9T
77G 1.8T 5% /gluster/brick1
/dev/mapper/GLUSTER_vg4-GLUSTER_lv4 xfs 1.9T
38G 1.8T 3% /gluster/brick4
/dev/mapper/GLUSTER_vg5-GLUSTER_lv5 xfs 1.9T
41G 1.8T 3% /gluster/brick5
/dev/mapper/GLUSTER_vg2-GLUSTER_lv2 xfs 1.9T
80G 1.8T 5% /gluster/brick2
/dev/mapper/GLUSTER_vg3-GLUSTER_lv3 xfs 1.9T
38G 1.8T 3% /gluster/brick3
/dev/mapper/gluster_vg_sdp1-gluster_lv_engine xfs 100G
5.9G 95G 6% /gluster_bricks/engine
/dev/mapper/rhel_rhsqa--grafton9-home xfs 876G
34M 876G 1% /home
/dev/sda1 xfs 1014M
200M 815M 20% /boot
rhsqa-grafton7.lab.eng.blr.redhat.com:/engine fuse.glusterfs 100G
6.9G 94G 7%
/rhev/data-center/mnt/glusterSD/rhsqa-grafton7.lab.eng.blr.redhat.com:_engine
rhsqa-grafton7.lab.eng.blr.redhat.com:/commvault_server fuse.glusterfs 500G
69G 432G 14%
/rhev/data-center/mnt/glusterSD/rhsqa-grafton7.lab.eng.blr.redhat.com:_commvault__server
tmpfs tmpfs 26G
0 26G 0% /run/user/0
--- Additional comment from bipin on 2019-10-14 19:21:45 IST ---
While the re-balance is in progress:
====================================
Node Rebalanced-files size
scanned failures skipped status run time in h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------ --------------
rhsqa-grafton9.lab.eng.blr.redhat.com 0 0Bytes
36 0 0 in progress 0:12:12
rhsqa-grafton8.lab.eng.blr.redhat.com 0 0Bytes
36 0 0 in progress 0:12:12
localhost 4 100.8GB
20 0 0 in progress 0:12:12
Estimated time left for rebalance to complete : 0:18:57
Node Rebalanced-files size
scanned failures skipped status run time in h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------ --------------
rhsqa-grafton9.lab.eng.blr.redhat.com 1 28.7GB
36 0 0 in progress 0:13:13
rhsqa-grafton8.lab.eng.blr.redhat.com 0 0Bytes
36 0 0 in progress 0:13:13
localhost 4 100.8GB
20 0 0 in progress 0:13:13
Estimated time left for rebalance to complete : 0:18:26
Node Rebalanced-files size
scanned failures skipped status run time in h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------ --------------
rhsqa-grafton9.lab.eng.blr.redhat.com 3 86.6GB
36 0 0 in progress 0:17:30
rhsqa-grafton8.lab.eng.blr.redhat.com 2 57.8GB
36 0 0 in progress 0:17:30
localhost 4 100.8GB
20 0 0 in progress 0:17:30
Estimated time left for rebalance to complete : 0:19:45
After re-balance is completed:
==============================
Node Rebalanced-files size
scanned failures skipped status run time in h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------ --------------
rhsqa-grafton9.lab.eng.blr.redhat.com 7 200.9GB
36 0 0 completed 0:24:27
rhsqa-grafton8.lab.eng.blr.redhat.com 3 127.9GB
36 0 0 completed 0:27:18
localhost 6 198.1GB
20 0 0 completed 0:26:32
After 4-6 hours it shows incorrect data size:
=============================================
[root at rhsqa-grafton7-nic2 tmp]# gluster volume rebalance backupvol status
Node Rebalanced-files size
scanned failures skipped status run time in h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------ --------------
rhsqa-grafton9.lab.eng.blr.redhat.com 7 200.9GB
36 0 0 completed 0:24:27
rhsqa-grafton8.lab.eng.blr.redhat.com 3 16384.0PB
36 0 0 completed 0:27:18
localhost 6 16384.0PB
20 0 0 completed 0:26:32
Rebalance status:
=================
Node Rebalanced-files size
scanned failures skipped status run time in h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------ --------------
rhsqa-grafton9.lab.eng.blr.redhat.com 7 200.9GB
36 0 0 completed 0:24:27
rhsqa-grafton8.lab.eng.blr.redhat.com 3 127.9GB
36 0 0 completed 0:27:18
localhost 6 198.1GB
20 0 0 completed 0:26:32
The below file size in the log matches with the above rebalance status
Rebalance log for grafton7:
==========================
[2019-10-14 04:50:42.971557] I [MSGID: 109028]
[dht-rebalance.c:5059:gf_defrag_status_get] 0-backupvol-dht: Rebalance is
completed. Time taken is 1467.00 secs
[2019-10-14 04:50:42.971586] I [MSGID: 109028]
[dht-rebalance.c:5065:gf_defrag_status_get] 0-backupvol-dht: Files migrated: 7,
size: 215754080256, lookups: 36, failures: 0, skipped: 0
[2019-10-14 04:50:42.973059] W [glusterfsd.c:1560:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7ea5) [0x7f0e936d1ea5]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x5592b03111f5]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x5592b031105b] ) 0-: received
signum (15), shutting down
Rebalance log for grafton8:
===========================
[2019-10-14 04:53:27.016118] I [dht-rebalance.c:4991:gf_defrag_status_get]
0-glusterfs: TIME: Estimated total time to complete (size)= 2755 seconds,
seconds left = 1123
[2019-10-14 04:53:27.016147] I [MSGID: 109028]
[dht-rebalance.c:5059:gf_defrag_status_get] 0-glusterfs: Rebalance is in
progress. Time taken is 1632.00 secs
[2019-10-14 04:53:27.016159] I [MSGID: 109028]
[dht-rebalance.c:5065:gf_defrag_status_get] 0-glusterfs: Files migrated: 2,
size: 62050402304, lookups: 36, failures: 0, skipped: 0
[2019-10-14 04:53:33.285796] I [MSGID: 109022]
[dht-rebalance.c:2233:dht_migrate_file] 0-backupvol-dht: completed migration of
/f.1.0 from subvolume backupvol-disperse-1 to backupvol-disperse-0
[2019-10-14 04:53:33.288968] I [MSGID: 109028]
[dht-rebalance.c:5059:gf_defrag_status_get] 0-backupvol-dht: Rebalance is
completed. Time taken is 1638.00 secs
[2019-10-14 04:53:33.288990] I [MSGID: 109028]
[dht-rebalance.c:5065:gf_defrag_status_get] 0-backupvol-dht: Files migrated: 3,
size: 137363980288, lookups: 36, failures: 0, skipped: 0
Rebalance log for grafton9:
==========================
[2019-10-14 04:50:42.968102] W [MSGID: 114031]
[client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-backupvol-client-3: remote
operation failed. Path: /test/f.3.0 (52789828-ad20-480d-b936-648b531a315a) [No
such file or directory]
[2019-10-14 04:50:42.971557] I [MSGID: 109028]
[dht-rebalance.c:5059:gf_defrag_status_get] 0-backupvol-dht: Rebalance is
completed. Time taken is 1467.00 secs
[2019-10-14 04:50:42.971586] I [MSGID: 109028]
[dht-rebalance.c:5065:gf_defrag_status_get] 0-backupvol-dht: Files migrated: 7,
size: 215754080256, lookups: 36, failures: 0, skipped: 0
--- Additional comment from bipin on 2019-10-14 19:24:42 IST ---
Client snippet:
==============
[root at dhcp43-65 backupvol_test]# df -Th
Filesystem Type Size
Used Avail Use% Mounted on
/dev/mapper/rhel_dhcp43--65-root xfs 44G
1.5G 43G 4% /
devtmpfs devtmpfs 3.9G
0 3.9G 0% /dev
tmpfs tmpfs 3.9G
0 3.9G 0% /dev/shm
tmpfs tmpfs 3.9G
8.5M 3.9G 1% /run
tmpfs tmpfs 3.9G
0 3.9G 0% /sys/fs/cgroup
/dev/sda1 xfs 1014M
145M 870M 15% /boot
tmpfs tmpfs 783M
0 783M 0% /run/user/0
rhsqa-grafton7-nic2.lab.eng.blr.redhat.com:/backupvol fuse.glusterfs 15T
764G 14T 6% /mnt/backupvol_test
mount -t glusterfs rhsqa-grafton7-nic2.lab.eng.blr.redhat.com:/backupvol
/mnt/backupvol_test/
[root at dhcp43-65 mnt]# rpm -qa | grep glusterfs
glusterfs-libs-6.0-16.el7.x86_64
glusterfs-6.0-16.el7.x86_64
glusterfs-client-xlators-6.0-16.el7.x86_64
glusterfs-fuse-6.0-16.el7.x86_64
The data was created using fio which had larger size file (Just a background on
how the data was generated). Each file had size of about 6~8GB.
[root at rhsqa-grafton7-nic2 b2]# ls
f.0.0 f.1.0 f.2.0 f.3.0 f.4.0 f.6.0 test
[root at rhsqa-grafton7-nic2 b2]# tree
.
├── f.0.0
├── f.1.0
├── f.2.0
├── f.3.0
├── f.4.0
├── f.6.0
└── test
├── f.5.0
├── f.7.0
├── f.8.0
└── f.9.0
1 directory, 10 files
[root at rhsqa-grafton7-nic2 b2]# ls -lstah *
7.2G -rw-------. 2 root root 7.2G Oct 14 09:54 f.6.0
18G -rw-------. 2 root root 18G Oct 14 09:54 f.1.0
7.2G -rw-------. 2 root root 7.2G Oct 14 09:54 f.0.0
7.3G -rw-------. 2 root root 7.3G Oct 14 09:54 f.3.0
7.3G -rw-------. 2 root root 7.3G Oct 14 09:54 f.2.0
7.3G -rw-------. 2 root root 7.3G Oct 14 09:54 f.4.0
test:
total 27G
0 drwxr-xr-x. 2 root root 58 Oct 14 10:22 .
8.0K drwxr-xr-x. 4 root root 8.0K Oct 14 10:21 ..
7.0G -rw-------. 2 root root 7.0G Oct 10 10:44 f.7.0
7.0G -rw-------. 2 root root 7.0G Oct 10 10:44 f.9.0
6.2G -rw-------. 2 root root 6.2G Oct 10 10:44 f.5.0
6.9G -rw-------. 2 root root 6.9G Oct 10 10:44 f.8.0
--- Additional comment from bipin on 2019-10-14 19:39:22 IST ---
Logs @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1761486/
--- Additional comment from Susant Kumar Palai on 2019-10-15 15:35:37 IST ---
Reproducer:
- create single brick volume
- create 10 files each of size 1GB.
- add 5 bricks
- trigger rebalance: gluster v rebalance VOLNAME start force
- Most likely in rebalance status you will find few files getting migrated.
- kill and start glusterd process.
- The size information will be showing a random overflowing value.
[root at vm3 mnt]# gluster v rebalance test2 status
Node Rebalanced-files size
scanned failures skipped status run time in h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------ --------------
localhost 624 2.8GB
712 11 0 completed 0:00:56
volume rebalance: test2: success
[root at vm3 mnt]# pkill gluster
[root at vm3 mnt]# glusterd
[root at vm3 mnt]# gluster v rebalance test2 status
Node Rebalanced-files size
scanned failures skipped status run time in h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------ --------------
localhost 624 16384.0PB
712 11 0 completed 0:00:56
volume rebalance: test2: success
[root at vm3 mnt]#
Not familiar with glusterd code entirely but I guess size data is not fetched
from glusterd data base and that is the reason it could be printing an
uninitialized data. (I could be wrong)
fprintf(fp, "Volume%d.rebalance.id: %s\n", count, id_str);
fprintf(fp, "Volume%d.rebalance.status: %s\n", count,
rebal_status_str);
fprintf(fp, "Volume%d.rebalance.failures: %" PRIu64 "\n", count,
volinfo->rebal.rebalance_failures);
fprintf(fp, "Volume%d.rebalance.skipped: %" PRIu64 "\n", count,
volinfo->rebal.skipped_files);
fprintf(fp, "Volume%d.rebalance.lookedup: %" PRIu64 "\n", count,
volinfo->rebal.lookedup_files);
fprintf(fp, "Volume%d.rebalance.files: %" PRIu64 "\n", count,
volinfo->rebal.rebalance_files);
fprintf(fp, "Volume%d.rebalance.data: %s\n", count, rebal_data);
fprintf(fp, "Volume%d.time_left: %" PRIu64 "\n", count,
volinfo->rebal.time_left);
Moving this to Glusterd Component for a quick resolution.
--- Additional comment from Sanju on 2019-10-15 16:21:52 IST ---
Bipin/Susant,
is rebalance status showing incorrect values after a glusterd restart or just
after 4-6 hours after the rebalance is complete.
Thanks,
Sanju
--- Additional comment from bipin on 2019-10-15 16:57:01 IST ---
Sanju,
In my test i never restarted the glusterd. It was seen after some hours after
the re-balance was successfully completed
--- Additional comment from Susant Kumar Palai on 2019-10-15 17:36:38 IST ---
I restarted glusterd and it immediately showed wrong size info as mentioned in
the previous comment.
--- Additional comment from Sayalee on 2019-10-16 11:55:50 IST ---
Did below steps on 3.4.4 (3.12.2-47.5) build:
1) Create a 1X(4+2) volume.
2) Mount the volume using FUSE.
3) From mount point, create 10 files of 10G each. Wait for the IO to complete.
4) Add bricks to the volume to make it 2X(4+2).
5) Trigger rebalance and wait for it to complete.
6) Check rebalance status once its complete.
7) Wait for 4/5 hours and again check the rebalance status
Observations:
============
Tue Oct 15 17:56:20 IST 2019
Node Rebalanced-files size
scanned failures skipped status run time in h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------ --------------
10.70.35.95 0 0Bytes
10 0 0 in progress 0:05:33
10.70.35.47 0 0Bytes
10 0 0 in progress 0:05:33
localhost 0 0Bytes
10 0 0 completed 0:00:00
The estimated time for rebalance to complete will be unavailable for the first
10 minutes.
volume rebalance: say1: success
Tue Oct 15 18:06:21 IST 2019
Node Rebalanced-files size
scanned failures skipped status run time in h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------ --------------
10.70.35.95 3 30.0GB
10 0 0 completed 0:15:19
10.70.35.47 1 10.0GB
10 0 0 completed 0:13:41
localhost 0 0Bytes
10 0 0 completed 0:00:00
volume rebalance: say1: success
***Checked the status after some hours***
Tue Oct 15 23:46:27 IST 2019
Node Rebalanced-files size
scanned failures skipped status run time in h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------ --------------
10.70.35.95 3 30.0GB
10 0 0 completed 0:15:19
10.70.35.47 1 10.0GB
10 0 0 completed 0:13:41
localhost 0 0Bytes
10 0 0 completed 0:00:00
volume rebalance: say1: success
Wed Oct 16 10:26:36 IST 2019
Node Rebalanced-files size
scanned failures skipped status run time in h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------ --------------
10.70.35.95 3 30.0GB
10 0 0 completed 0:15:19
10.70.35.47 1 10.0GB
10 0 0 completed 0:13:41
localhost 0 0Bytes
10 0 0 completed 0:00:00
volume rebalance: say1: success
====> Did not hit the issue.
--- Additional comment from Sayalee on 2019-10-16 12:02:36 IST ---
Also tried out the reproducer steps as in Comment 6 on 3.4.4 (3.12.2-47.5)
build:
Observations:
============
[root at dhcp35-125 home]# gluster v rebal say2 status
Node Rebalanced-files size
scanned failures skipped status run time in h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------ --------------
10.70.35.95 7 7.0GB
10 0 0 completed 0:01:04
10.70.35.47 0 0Bytes
1 0 0 completed 0:00:00
volume rebalance: say2: success
[root at dhcp35-125 home]#
[root at dhcp35-125 home]# pkill gluster <=== (NOTE: The glusterd process was
killed and started on all the nodes that were part of the volume,but providing
the observation from just one node)
[root at dhcp35-125 home]# glusterd
[root at dhcp35-125 home]# pidof glusterd
2001
[root at dhcp35-125 home]# gluster v rebal say2 status
Node Rebalanced-files size
scanned failures skipped status run time in h:m:s
--------- ----------- -----------
----------- ----------- ----------- ------------ --------------
10.70.35.95 7 16384.0PB
10 0 0 completed 0:01:04
10.70.35.47 0 0Bytes
1 0 0 completed 0:00:00
volume rebalance: say2: success
====> The issue was hit with the steps in Comment 6
--- Additional comment from Sanju on 2019-10-16 15:28:22 IST ---
Looks like this is a day1 issue.
>From gdb:
glusterd_store_retrieve_node_state (volinfo=volinfo at entry=0x555555e67fc0) at
glusterd-store.c:2980
2980 volinfo->rebal.rebalance_data = atoi(value);
4: value = 0x5555557dfea0 "3145728000"
3: key = 0x5555557e41d0 "size"
2: volinfo->rebal.rebalance_data = 0
1: volinfo->volname = "test1", '\000' <repeats 250 times>
(gdb) n
3048 GF_FREE(key);
4: value = 0x5555557dfea0 "3145728000"
3: key = 0x5555557e41d0 "size"
2: volinfo->rebal.rebalance_data = 18446744072560312320
1: volinfo->volname = "test1", '\000' <repeats 250 times>
(gdb) p atoi(value)
$20 = -1149239296
(gdb) p/u atoi(value)
$21 = 3145728000
The issue here is, atoi() is returning negative value because of overflow.
Below statements from gdb proves it.
(gdb) set volinfo->rebal.rebalance_data=atoi("314572800")
(gdb) p volinfo->rebal.rebalance_data
$41 = 314572800
(gdb) set volinfo->rebal.rebalance_data=atoi("3145728000")
(gdb) p volinfo->rebal.rebalance_data
$42 = 18446744072560312320 <-- wrong value
(gdb)
Will send out the patch to fix this.
Thanks,
Sanju
--- Additional comment from Sanju on 2019-10-16 16:54:37 IST ---
even after using suffix for unsigned int, I see the same wrong value:
(gdb) set volinfo->rebal.rebalance_data = atoi("3145728000")
(gdb) p/u volinfo->rebal.rebalance_data
$57 = 18446744072560312320
(gdb)
RCA:
When we restart glusterd, glusterd fetches information from /var/lib/glusterd/
and restores it in its memory. In the process of restoring, glusterd fetches
contents of /var/lib/glusterd/vols/volname/node_state.info where rebalance
related information is stored. Once the control reaches
glusterd_store_retrieve_node_state() it fetches all the key-value pairs one by
one. size is one such key. In above gdb output, value of size is 3145728000 and
it is assigned to volinfo->rebal.rebalance_data after converting it into an
integer.
volinfo->rebal.rebalance_data = atoi(value);
After this assignment, volinfo->rebal.rebalance_data value is
18446744072560312320. So, the volinfo will have in-correct size in its memory
and /var/lib/glusterd/vols/test1/node_state.info is also updated with this
wrong information. This value is shown as 16384.0PB in rebalance status output.
[root at dhcp35-84 ~]# cat /var/lib/glusterd/vols/test1/node_state.info
rebalance_status=1
status=3
rebalance_op=19
rebalance-id=47c58350-7fc1-4bbf-96aa-01ca7ec0c4e3
rebalanced-files=2
size=18446744072560312320
scanned=6
failures=0
skipped=1
run-time=318.000000
[root at dhcp35-84 ~]#
Thanks,
Sanju
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1761486
[Bug 1761486] DHT- gluster rebalance status shows wrong data size after
rebalance is completed successfully
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list