[Gluster-users] Fwd: Replica brick not working

Miloš Čučulović - MDPI cuculovic at mdpi.com
Mon Dec 19 09:22:01 UTC 2016


All was fixed now, thank you very much.

I had to completely purge the storage2 server for all glusterfs files.
The entire procedure is as follows:

How to add a new replica volume

1. Purge all glusterfs packages on new server
2. Remove all from /var/log/glusterfs and /var/lib/glusterfs
3. Install fresh glusterfs-server installation
4. On new server, start glusterfs-server
5. On old server, run gluster peer probe storage2
6. On both servers, check gluster peer status
7. If it is 'Peer in cluster' on both, then you're good.
8. On old server, run sudo gluster volume add-brick storage replica 2 
storage2:/data/data-cluster
9. Return message is volume add-brick: success
10. Healing will start automatically, check the status with gluster 
volume heal storage info

- Kindest regards,

Milos Cuculovic
IT Manager

---
MDPI AG
Postfach, CH-4020 Basel, Switzerland
Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
Tel. +41 61 683 77 35
Fax +41 61 302 89 18
Email: cuculovic at mdpi.com
Skype: milos.cuculovic.mdpi

On 08.12.2016 18:58, Pranith Kumar Karampuri wrote:
>
>
> On Thu, Dec 8, 2016 at 11:25 PM, Pranith Kumar Karampuri
> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>> wrote:
>
>
>
>     On Thu, Dec 8, 2016 at 11:17 PM, Pranith Kumar Karampuri
>     <pkarampu at redhat.com <mailto:pkarampu at redhat.com>> wrote:
>
>
>
>         On Thu, Dec 8, 2016 at 10:22 PM, Ravishankar N
>         <ravishankar at redhat.com <mailto:ravishankar at redhat.com>> wrote:
>
>             On 12/08/2016 09:44 PM, Miloš Čučulović - MDPI wrote:
>
>                 I was able to fix the sync by rsync-ing all the
>                 directories, then the hale started. The next problem :),
>                 as soon as there are files on the new brick, the gluster
>                 mount will render also this one for mounts, and the new
>                 brick is not ready yet, as the sync is not yet done, so
>                 it results on missing files on client side. I temporary
>                 removed the new brick, now I am running a manual rsync
>                 and will add the brick again, hope this could work.
>
>                 What mechanism is managing this issue, I guess there is
>                 something per built to make a replica brick available
>                 only once the data is completely synced.
>
>             This mechanism was introduced in  3.7.9 or 3.7.10
>             (http://review.gluster.org/#/c/13806/
>             <http://review.gluster.org/#/c/13806/>). Before that
>             version, you manually needed to set some xattrs on the
>             bricks so that healing could happen in parallel while the
>             client still would server reads from the original brick.  I
>             can't find the link to the doc which describes these steps
>             for setting xattrs.:-(
>
>
>         https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-brick
>         <https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-brick>
>
>
>     Oh this is addition of bricks?
>     Just do the following:
>     1) Bring the new brick down by killing it.
>     2) On the root of the mount directory(Let's call it /mnt) do:
>
>     mkdir /mnt/<name-of-nonexistent-dir>
>     rmdir /mnt/<name-of-nonexistent-dir>
>     setfattr -n trusted.non-existent-key -v abc /mnt
>     setfattr -x trusted.non-existent-key  /mnt
>
>     3) Start the volume using: "gluster volume start <volname> force"
>
>     This will trigger the heal which will make sure everything is healed
>     and the application will only see the correct data.
>
>     Since you did an explicit rsync, there is no gurantee that things
>     should work as expected. We will be adding the steps above to
>     documentation.
>
>
> Please note that you need to do these steps exactly, If you do the
> mkdir/rmdir/setfattr steps by bringing the good brick, reverse heal will
> happen and the data will be removed.
>
>
>
>
>
>             Calling it a day,
>             Ravi
>
>
>                 - Kindest regards,
>
>                 Milos Cuculovic
>                 IT Manager
>
>                 ---
>                 MDPI AG
>                 Postfach, CH-4020 Basel, Switzerland
>                 Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>                 Tel. +41 61 683 77 35
>                 Fax +41 61 302 89 18
>                 Email: cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>                 Skype: milos.cuculovic.mdpi
>
>                 On 08.12.2016 16:17, Ravishankar N wrote:
>
>                     On 12/08/2016 06:53 PM, Atin Mukherjee wrote:
>
>
>
>                         On Thu, Dec 8, 2016 at 6:44 PM, Miloš Čučulović
>                         - MDPI
>                         <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>                         <mailto:cuculovic at mdpi.com
>                         <mailto:cuculovic at mdpi.com>>> wrote:
>
>                             Ah, damn! I found the issue. On the storage
>                         server, the storage2
>                             IP address was wrong, I inversed two digits
>                         in the /etc/hosts
>                             file, sorry for that :(
>
>                             I was able to add the brick now, I started
>                         the heal, but still no
>                             data transfer visible.
>
>                     1. Are the files getting created on the new brick
>                     though?
>                     2. Can you provide the output of `getfattr -d -m .
>                     -e hex
>                     /data/data-cluster` on both bricks?
>                     3. Is it possible to attach gdb to the self-heal
>                     daemon on the original
>                     (old) brick and get a backtrace?
>                         `gdb -p <pid of self-heal daemon on the orignal
>                     brick>`
>                          thread apply all bt  -->share this output
>                         quit gdb.
>
>
>                     -Ravi
>
>
>                         @Ravi/Pranith - can you help here?
>
>
>
>                             By doing gluster volume status, I have
>
>                             Status of volume: storage
>                             Gluster process                       TCP
>                         Port  RDMA Port Online  Pid
>                         ------------------------------------------------------------------------------
>                             Brick storage2:/data/data-cluster     49152
>                            0 Y
>                              23101
>                             Brick storage:/data/data-cluster      49152
>                            0 Y
>                              30773
>                             Self-heal Daemon on localhost         N/A
>                            N/A Y
>                              30050
>                             Self-heal Daemon on storage           N/A
>                            N/A Y
>                              30792
>
>
>                             Any idea?
>
>                             On storage I have:
>                             Number of Peers: 1
>
>                             Hostname: 195.65.194.217
>                             Uuid: 7c988af2-9f76-4843-8e6f-d94866d57bb0
>                             State: Peer in Cluster (Connected)
>
>
>                             - Kindest regards,
>
>                             Milos Cuculovic
>                             IT Manager
>
>                             ---
>                             MDPI AG
>                             Postfach, CH-4020 Basel, Switzerland
>                             Office: St. Alban-Anlage 66, 4052 Basel,
>                         Switzerland
>                             Tel. +41 61 683 77 35
>                             Fax +41 61 302 89 18
>                             Email: cuculovic at mdpi.com
>                         <mailto:cuculovic at mdpi.com>
>                         <mailto:cuculovic at mdpi.com
>                         <mailto:cuculovic at mdpi.com>>
>                             Skype: milos.cuculovic.mdpi
>
>                             On 08.12.2016 13:55, Atin Mukherjee wrote:
>
>                                 Can you resend the attachment as zip? I
>                         am unable to extract the
>                                 content? We shouldn't have 0 info file.
>                         What does gluster peer
>                                 status
>                                 output say?
>
>                                 On Thu, Dec 8, 2016 at 4:51 PM, Miloš
>                         Čučulović - MDPI
>                                 <cuculovic at mdpi.com
>                         <mailto:cuculovic at mdpi.com>
>                         <mailto:cuculovic at mdpi.com
>                         <mailto:cuculovic at mdpi.com>>
>                                 <mailto:cuculovic at mdpi.com
>                         <mailto:cuculovic at mdpi.com>
>                         <mailto:cuculovic at mdpi.com
>                         <mailto:cuculovic at mdpi.com>>>> wrote:
>
>                                     I hope you received my last email
>                         Atin, thank you!
>
>                                     - Kindest regards,
>
>                                     Milos Cuculovic
>                                     IT Manager
>
>                                     ---
>                                     MDPI AG
>                                     Postfach, CH-4020 Basel, Switzerland
>                                     Office: St. Alban-Anlage 66, 4052
>                         Basel, Switzerland
>                                     Tel. +41 61 683 77 35
>                                     Fax +41 61 302 89 18
>                                     Email: cuculovic at mdpi.com
>                         <mailto:cuculovic at mdpi.com>
>                         <mailto:cuculovic at mdpi.com
>                         <mailto:cuculovic at mdpi.com>>
>                                 <mailto:cuculovic at mdpi.com
>                         <mailto:cuculovic at mdpi.com>
>                         <mailto:cuculovic at mdpi.com
>                         <mailto:cuculovic at mdpi.com>>>
>                                     Skype: milos.cuculovic.mdpi
>
>                                     On 08.12.2016 10:28, Atin Mukherjee
>                         wrote:
>
>
>                                         ---------- Forwarded message
>                         ----------
>                                         From: *Atin Mukherjee*
>                         <amukherj at redhat.com <mailto:amukherj at redhat.com>
>                                 <mailto:amukherj at redhat.com
>                         <mailto:amukherj at redhat.com>>
>                                         <mailto:amukherj at redhat.com
>                         <mailto:amukherj at redhat.com>
>                                 <mailto:amukherj at redhat.com
>                         <mailto:amukherj at redhat.com>>>
>                         <mailto:amukherj at redhat.com
>                         <mailto:amukherj at redhat.com>
>                                 <mailto:amukherj at redhat.com
>                         <mailto:amukherj at redhat.com>>
>                                         <mailto:amukherj at redhat.com
>                         <mailto:amukherj at redhat.com>
>                                 <mailto:amukherj at redhat.com
>                         <mailto:amukherj at redhat.com>>>>>
>                                         Date: Thu, Dec 8, 2016 at 11:56 AM
>                                         Subject: Re: [Gluster-users]
>                         Replica brick not working
>                                         To: Ravishankar N
>                         <ravishankar at redhat.com
>                         <mailto:ravishankar at redhat.com>
>                                 <mailto:ravishankar at redhat.com
>                         <mailto:ravishankar at redhat.com>>
>                                         <mailto:ravishankar at redhat.com
>                         <mailto:ravishankar at redhat.com>
>                                 <mailto:ravishankar at redhat.com
>                         <mailto:ravishankar at redhat.com>>>
>                                 <mailto:ravishankar at redhat.com
>                         <mailto:ravishankar at redhat.com>
>                         <mailto:ravishankar at redhat.com
>                         <mailto:ravishankar at redhat.com>>
>                                         <mailto:ravishankar at redhat.com
>                         <mailto:ravishankar at redhat.com>
>                                 <mailto:ravishankar at redhat.com
>                         <mailto:ravishankar at redhat.com>>>>>
>                                         Cc: Miloš Čučulović - MDPI
>                         <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>                                 <mailto:cuculovic at mdpi.com
>                         <mailto:cuculovic at mdpi.com>>
>                                         <mailto:cuculovic at mdpi.com
>                         <mailto:cuculovic at mdpi.com>
>                         <mailto:cuculovic at mdpi.com
>                         <mailto:cuculovic at mdpi.com>>>
>                                         <mailto:cuculovic at mdpi.com
>                         <mailto:cuculovic at mdpi.com>
>                         <mailto:cuculovic at mdpi.com
>                         <mailto:cuculovic at mdpi.com>>
>                                 <mailto:cuculovic at mdpi.com
>                         <mailto:cuculovic at mdpi.com>
>                         <mailto:cuculovic at mdpi.com
>                         <mailto:cuculovic at mdpi.com>>>>>,
>                                         Pranith Kumar Karampuri
>                                         <pkarampu at redhat.com
>                         <mailto:pkarampu at redhat.com>
>                         <mailto:pkarampu at redhat.com
>                         <mailto:pkarampu at redhat.com>>
>                                 <mailto:pkarampu at redhat.com
>                         <mailto:pkarampu at redhat.com>
>                         <mailto:pkarampu at redhat.com
>                         <mailto:pkarampu at redhat.com>>>
>                                         <mailto:pkarampu at redhat.com
>                         <mailto:pkarampu at redhat.com>
>                                 <mailto:pkarampu at redhat.com
>                         <mailto:pkarampu at redhat.com>>
>                         <mailto:pkarampu at redhat.com
>                         <mailto:pkarampu at redhat.com>
>                                 <mailto:pkarampu at redhat.com
>                         <mailto:pkarampu at redhat.com>>>>>,
>                                         gluster-users
>                                         <gluster-users at gluster.org
>                         <mailto:gluster-users at gluster.org>
>                                 <mailto:gluster-users at gluster.org
>                         <mailto:gluster-users at gluster.org>>
>                                 <mailto:gluster-users at gluster.org
>                         <mailto:gluster-users at gluster.org>
>                                 <mailto:gluster-users at gluster.org
>                         <mailto:gluster-users at gluster.org>>>
>
>                         <mailto:gluster-users at gluster.org
>                         <mailto:gluster-users at gluster.org>
>                                 <mailto:gluster-users at gluster.org
>                         <mailto:gluster-users at gluster.org>>
>
>                         <mailto:gluster-users at gluster.org
>                         <mailto:gluster-users at gluster.org>
>                                 <mailto:gluster-users at gluster.org
>                         <mailto:gluster-users at gluster.org>>>>>
>
>
>
>
>                                         On Thu, Dec 8, 2016 at 11:11 AM,
>                         Ravishankar N
>                                         <ravishankar at redhat.com
>                         <mailto:ravishankar at redhat.com>
>                                 <mailto:ravishankar at redhat.com
>                         <mailto:ravishankar at redhat.com>>
>                         <mailto:ravishankar at redhat.com
>                         <mailto:ravishankar at redhat.com>
>                                 <mailto:ravishankar at redhat.com
>                         <mailto:ravishankar at redhat.com>>>
>                                         <mailto:ravishankar at redhat.com
>                         <mailto:ravishankar at redhat.com>
>                                 <mailto:ravishankar at redhat.com
>                         <mailto:ravishankar at redhat.com>>
>                         <mailto:ravishankar at redhat.com
>                         <mailto:ravishankar at redhat.com>
>                                 <mailto:ravishankar at redhat.com
>                         <mailto:ravishankar at redhat.com>>>>>
>
>                                         wrote:
>
>                                             On 12/08/2016 10:43 AM, Atin
>                         Mukherjee wrote:
>
>                                                 >From the log snippet:
>
>                                                 [2016-12-07
>                         09:15:35.677645] I [MSGID: 106482]
>
>
>                         [glusterd-brick-ops.c:442:__glusterd_handle_add_brick]
>                                                 0-management: Received
>                         add brick req
>                                                 [2016-12-07
>                         09:15:35.677708] I [MSGID: 106062]
>
>
>                         [glusterd-brick-ops.c:494:__glusterd_handle_add_brick]
>                                                 0-management:
>                         replica-count is 2
>                                                 [2016-12-07
>                         09:15:35.677735] E [MSGID: 106291]
>
>
>                         [glusterd-brick-ops.c:614:__glusterd_handle_add_brick]
>                                         0-management:
>
>                                                 The last log entry
>                         indicates that we hit the
>                                 code path in
>
>                         gd_addbr_validate_replica_count ()
>
>                                                                 if
>                         (replica_count ==
>                                         volinfo->replica_count) {
>
>                         if (!(total_bricks %
>                                                 volinfo->dist_leaf_count)) {
>
>                                 ret = 1;
>
>                                 goto out;
>                                                 }
>                                                                 }
>
>
>                                             It seems unlikely that this
>                         snippet was hit
>                                 because we print
>                                         the E
>                                             [MSGID: 106291] in the above
>                         message only if ret==-1.
>
>                         gd_addbr_validate_replica_count() returns -1 and
>                                 yet not
>                                         populates
>                                             err_str only when in
>                         volinfo->type doesn't match
>                                 any of the
>                                         known
>                                             volume types, so
>                         volinfo->type is corrupted perhaps?
>
>
>                                         You are right, I missed that ret
>                         is set to 1 here in
>                                 the above
>                                         snippet.
>
>                                         @Milos - Can you please provide
>                         us the volume info
>                                 file from
>
>                         /var/lib/glusterd/vols/<volname>/ from all the three
>                                 nodes to
>                                         continue
>                                         the analysis?
>
>
>
>                                             -Ravi
>
>                                                 @Pranith, Ravi - Milos
>                         was trying to convert a
>                                 dist (1 X 1)
>                                                 volume to a replicate (1
>                         X 2) using add brick
>                                 and hit
>                                         this issue
>                                                 where add-brick failed.
>                         The cluster is
>                                 operating with 3.7.6.
>                                                 Could you help on what
>                         scenario this code path
>                                 can be
>                                         hit? One
>                                                 straight forward issue I
>                         see here is missing
>                                 err_str in
>                                         this path.
>
>
>
>
>
>
>                                         --
>
>                                         ~ Atin (atinm)
>
>
>
>                                         --
>
>                                         ~ Atin (atinm)
>
>
>
>
>                                 --
>
>                                 ~ Atin (atinm)
>
>
>
>
>                         --
>
>                         ~ Atin (atinm)
>
>
>
>
>
>
>
>         --
>         Pranith
>
>
>
>
>     --
>     Pranith
>
>
>
>
> --
> Pranith


More information about the Gluster-users mailing list