[Bugs] [Bug 1335721] New: glusterd can't startup while volumes configuration file corrupt

Fri May 13 05:45:29 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1335721

            Bug ID: 1335721
           Summary: glusterd can't startup while volumes configuration
                    file corrupt
           Product: GlusterFS
           Version: 3.6.9
         Component: glusterd
          Severity: urgent
          Assignee: bugs at gluster.org
          Reporter: george.lian at nokia.com
             Group: nokia

Description of problem:

"glusterd" can't start up due to corruption of volumes configuration files
which might come from SW, HW or unclean reboots. 

Version-Release number of selected component (if applicable):

How reproducible:
the corruptions of configuration file is not easy to reproduce, but we could do
it by manually

Steps to Reproduce:
1. setup 2 host for glusterd for replicated, which named hostA, hostB
2. stop any glusterfs process in hostA
3. rm -rf $workdir/volume_example/info
4. start the glusterd process

Actual results:
the process glusterd start failed due to error log:
"Unable to restore volume:volume_example"

Expected results:
hope glusterd can't startup and wait hostB startup normal, then get
configration data from hostB

Additional info:
when do some test for the below code changes , it seems work.

int32_t
glusterd_restore ()  
{
        int32_t         ret = -1;
        xlator_t        *this = NULL;

        this = THIS;

        ret = glusterd_restore_op_version (this);
        if (ret) {
                gf_log (this->name, GF_LOG_ERROR,
                        "Failed to restore op_version");
                goto out;
        }

        ret = glusterd_store_retrieve_volumes (this, NULL);
        if (ret)
                goto out;

        ret = glusterd_store_retrieve_peers (this);
        if (ret)
                goto out;

        /* While retrieving snapshots, if the snapshot status
           is not GD_SNAP_STATUS_IN_USE, then the snapshot is
           cleaned up. To do that, the snap volume has to be
           stopped by stopping snapshot volume's bricks. And for
           that the snapshot bricks should be resolved. But without
           retrieving the peers, resolving bricks will fail. So
           do retrieving of snapshots after retrieving peers.
        */
        ret = glusterd_store_retrieve_snaps (this);
/*
        if (ret)
                goto out;
*/

        ret = glusterd_resolve_all_bricks (this);
/*
        if (ret)
                goto out;
*/

        ret = glusterd_snap_cleanup (this);
        if (ret) {
                gf_log (this->name, GF_LOG_ERROR, "Failed to perform "
                        "a cleanup of the snapshots");
                goto out;
        }

        ret = glusterd_recreate_all_snap_brick_mounts (this);
        if (ret) {
                gf_log (this->name, GF_LOG_ERROR, "Failed to recreate "
                        "all snap brick mounts");
                goto out;
        }

-- 
You are receiving this mail because:
You are the assignee for the bug.