[Bugs] [Bug 1632889] 'df' shows half as much space on volume after upgrade to RHGS 3.4

Tue Sep 25 18:47:27 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1632889

Sanju <srakonde at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Comment #0 is|1                           |0
            private|                            |

--- Comment #1 from Sanju <srakonde at redhat.com> ---
Comment from Atin Mukherjee:

More detailed RCA:

https://review.gluster.org/#/c/glusterfs/+/19484/2/xlators/mgmt/glusterd/src/glusterd-store.c
ensured that if a cluster is upgraded with volumes which didn't have
shared-brick-count feature the upgrade is handled and this field is populated
during the restore of volumes and bricks as part of initialization phase of
glusterd. However this patch had assumed that by the time control reached to
this flow brickinfo->uuid will be already populated which was a wrong
assumption. brick's uuid is first tried to be restored from the store (which
with default volume creation from glusterfs-3.12.2 onwards will be available),
but given the volume was carried forward from glusterfs-3.8.4 or earlier
versions, brickinfo file didn't have uuid and that caused statfs_fsid of every
brickinfo to continue as zero value even after this patch which resulted
gd_set_shared_brick_count () logic to go for a toss 

static void                                                                     
gd_set_shared_brick_count(glusterd_volinfo_t *volinfo)                          
{                                                                               
    glusterd_brickinfo_t *brickinfo = NULL;                                     
    glusterd_brickinfo_t *trav = NULL;                                          

    cds_list_for_each_entry(brickinfo, &volinfo->bricks, brick_list)            
    {                                                                           
        if (gf_uuid_compare(brickinfo->uuid, MY_UUID))                          
            continue;                                                           
        brickinfo->fs_share_count = 0;                                          
        cds_list_for_each_entry(trav, &volinfo->bricks, brick_list)             
        {                                                                       
            if (!gf_uuid_compare(trav->uuid, MY_UUID) &&                        
                (trav->statfs_fsid == brickinfo->statfs_fsid)) {   <===
brickinfo->statfs_fsid will be always zero and hence the fs_share_count will be
equal to number of bricks what the node hosts for this volume.

                brickinfo->fs_share_count++;                                    
            }                                                                   
        }                                                                       
    }                                                                           

    return;                                                                     
} 

Now if fs_share_count is set to n compared to 1, df will report back as x/n
size where x is the legitimate disk space which df should report.

While debugging this problem what we noticed is, if we restart glusterd,
followed by gluster volume set operation the issue goes away. Restarting
glusterd ensures the brickinfo->statfs_fsid is populated correctly because now
we have persisted uuid for brickinfo. And then executing gluster volume set
ensures fs_shared_count is correctly updated and volfiles are regenerated and
propogated back to client.

-- 
You are receiving this mail because:
You are the assignee for the bug.