[Bugs] [Bug 1335721] glusterd can't startup while volumes configuration file corrupt

Fri May 13 05:54:11 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1335721

Atin Mukherjee <amukherj at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |CLOSED
                 CC|                            |amukherj at redhat.com
         Resolution|---                         |NOTABUG
        Last Closed|                            |2016-05-13 01:54:11

--- Comment #1 from Atin Mukherjee <amukherj at redhat.com> ---
(In reply to George from comment #0)
> Description of problem:
> 
> "glusterd" can't start up due to corruption of volumes configuration files
> which might come from SW, HW or unclean reboots. 
> 
> 
> Version-Release number of selected component (if applicable):
> 
> 
> How reproducible:
> the corruptions of configuration file is not easy to reproduce, but we could
> do it by manually
> 
> Steps to Reproduce:
> 1. setup 2 host for glusterd for replicated, which named hostA, hostB
> 2. stop any glusterfs process in hostA
> 3. rm -rf $workdir/volume_example/info
Manual alteration of the configuration files are strictly disallowed. You'd
need to come up with a proven case where the files can be corrupted. One
example could be disk full scenario, but at the same time its recommended that
the disk size of the partition where /var/lib/glusterd resides should be
carefully monitored. 

Given this reason, I don't think its a bug and closing it. Please feel free to
reopen if you can come up with some other scenarios.

> 4. start the glusterd process
> 
> 
> Actual results:
> the process glusterd start failed due to error log:
> "Unable to restore volume:volume_example"
> 
> Expected results:
> hope glusterd can't startup and wait hostB startup normal, then get
> configration data from hostB
> 
> Additional info:
> when do some test for the below code changes , it seems work.
> 
> 
> 
> int32_t
> glusterd_restore ()  
> {
>         int32_t         ret = -1;
>         xlator_t        *this = NULL;
> 
>         this = THIS;
> 
>         ret = glusterd_restore_op_version (this);
>         if (ret) {
>                 gf_log (this->name, GF_LOG_ERROR,
>                         "Failed to restore op_version");
>                 goto out;
>         }
> 
>         ret = glusterd_store_retrieve_volumes (this, NULL);
>         if (ret)
>                 goto out;
> 
>         ret = glusterd_store_retrieve_peers (this);
>         if (ret)
>                 goto out;
> 
>         /* While retrieving snapshots, if the snapshot status
>            is not GD_SNAP_STATUS_IN_USE, then the snapshot is
>            cleaned up. To do that, the snap volume has to be
>            stopped by stopping snapshot volume's bricks. And for
>            that the snapshot bricks should be resolved. But without
>            retrieving the peers, resolving bricks will fail. So
>            do retrieving of snapshots after retrieving peers.
>         */
>         ret = glusterd_store_retrieve_snaps (this);
> /*
>         if (ret)
>                 goto out;
> */
> 
>         ret = glusterd_resolve_all_bricks (this);
> /*
>         if (ret)
>                 goto out;
> */
> 
>         ret = glusterd_snap_cleanup (this);
>         if (ret) {
>                 gf_log (this->name, GF_LOG_ERROR, "Failed to perform "
>                         "a cleanup of the snapshots");
>                 goto out;
>         }
> 
>         ret = glusterd_recreate_all_snap_brick_mounts (this);
>         if (ret) {
>                 gf_log (this->name, GF_LOG_ERROR, "Failed to recreate "
>                         "all snap brick mounts");
>                 goto out;
>         }

-- 
You are receiving this mail because:
You are the assignee for the bug.