[Gluster-users] Fwd: Replica brick not working

Thu Dec 8 15:32:51 UTC 2016

1. No, atm the old server (storage2) volume is mounted on some other 
servers, so all files are created there. If I check the new brick, there 
is no files.

2. On storage2 server (old brick)
getfattr: Removing leading '/' from absolute path names
# file: data/data-cluster
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x0226135726f346bcb3f8cb73365ed382

On storage server (new brick)
getfattr: Removing leading '/' from absolute path names
# file: data/data-cluster
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x0226135726f346bcb3f8cb73365ed382

3.
Thread 8 (Thread 0x7fad832dd700 (LWP 30057)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at 
../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x00007fad88834f3e in __afr_shd_healer_wait () from 
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/cluster/replicate.so
#2  0x00007fad88834fad in afr_shd_healer_wait () from 
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/cluster/replicate.so
#3  0x00007fad88835aa0 in afr_shd_index_healer () from 
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/cluster/replicate.so
#4  0x00007fad8df4270a in start_thread (arg=0x7fad832dd700) at 
pthread_create.c:333
#5  0x00007fad8dc7882d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 7 (Thread 0x7fad83ade700 (LWP 30056)):
#0  0x00007fad8dc78e23 in epoll_wait () at 
../sysdeps/unix/syscall-template.S:84
#1  0x00007fad8e808a58 in ?? () from 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#2  0x00007fad8df4270a in start_thread (arg=0x7fad83ade700) at 
pthread_create.c:333
#3  0x00007fad8dc7882d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 6 (Thread 0x7fad894a5700 (LWP 30055)):
#0  0x00007fad8dc78e23 in epoll_wait () at 
../sysdeps/unix/syscall-template.S:84
#1  0x00007fad8e808a58 in ?? () from 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#2  0x00007fad8df4270a in start_thread (arg=0x7fad894a5700) at 
pthread_create.c:333
#3  0x00007fad8dc7882d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 5 (Thread 0x7fad8a342700 (LWP 30054)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at 
../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x00007fad8e7ecd98 in syncenv_task () from 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#2  0x00007fad8e7ed970 in syncenv_processor () from 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#3  0x00007fad8df4270a in start_thread (arg=0x7fad8a342700) at 
pthread_create.c:333
#4  0x00007fad8dc7882d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 4 (Thread 0x7fad8ab43700 (LWP 30053)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at 
../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x00007fad8e7ecd98 in syncenv_task () from 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#2  0x00007fad8e7ed970 in syncenv_processor () from 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#3  0x00007fad8df4270a in start_thread (arg=0x7fad8ab43700) at 
pthread_create.c:333
#4  0x00007fad8dc7882d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 3 (Thread 0x7fad8b344700 (LWP 30052)):
#0  do_sigwait (sig=0x7fad8b343e3c, set=<optimized out>) at 
../sysdeps/unix/sysv/linux/sigwait.c:64
#1  __sigwait (set=<optimized out>, sig=0x7fad8b343e3c) at 
../sysdeps/unix/sysv/linux/sigwait.c:96
#2  0x00000000004080bf in glusterfs_sigwaiter ()
#3  0x00007fad8df4270a in start_thread (arg=0x7fad8b344700) at 
pthread_create.c:333
#4  0x00007fad8dc7882d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 2 (Thread 0x7fad8bb45700 (LWP 30051)):
#0  0x00007fad8df4bc6d in nanosleep () at 
../sysdeps/unix/syscall-template.S:84
#1  0x00007fad8e7ca744 in gf_timer_proc () from 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#2  0x00007fad8df4270a in start_thread (arg=0x7fad8bb45700) at 
pthread_create.c:333
#3  0x00007fad8dc7882d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x7fad8ec66780 (LWP 30050)):
#0  0x00007fad8df439dd in pthread_join (threadid=140383309420288, 
thread_return=0x0) at pthread_join.c:90
#1  0x00007fad8e808eeb in ?? () from 
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#2  0x0000000000405501 in main ()

- Kindest regards,

Milos Cuculovic
IT Manager

---
MDPI AG
Postfach, CH-4020 Basel, Switzerland
Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
Tel. +41 61 683 77 35
Fax +41 61 302 89 18
Email: cuculovic at mdpi.com
Skype: milos.cuculovic.mdpi

On 08.12.2016 16:17, Ravishankar N wrote:
> On 12/08/2016 06:53 PM, Atin Mukherjee wrote:
>>
>>
>> On Thu, Dec 8, 2016 at 6:44 PM, Miloš Čučulović - MDPI
>> <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>> wrote:
>>
>>     Ah, damn! I found the issue. On the storage server, the storage2
>>     IP address was wrong, I inversed two digits in the /etc/hosts
>>     file, sorry for that :(
>>
>>     I was able to add the brick now, I started the heal, but still no
>>     data transfer visible.
>>
> 1. Are the files getting created on the new brick though?
> 2. Can you provide the output of `getfattr -d -m . -e hex
> /data/data-cluster` on both bricks?
> 3. Is it possible to attach gdb to the self-heal daemon on the original
> (old) brick and get a backtrace?
>     `gdb -p <pid of self-heal daemon on the orignal brick>`
>      thread apply all bt  -->share this output
>     quit gdb.
>
>
> -Ravi
>>
>> @Ravi/Pranith - can you help here?
>>
>>
>>
>>     By doing gluster volume status, I have
>>
>>     Status of volume: storage
>>     Gluster process                       TCP Port  RDMA Port  Online  Pid
>>     ------------------------------------------------------------------------------
>>     Brick storage2:/data/data-cluster     49152     0          Y
>>      23101
>>     Brick storage:/data/data-cluster      49152     0          Y
>>      30773
>>     Self-heal Daemon on localhost         N/A       N/A        Y
>>      30050
>>     Self-heal Daemon on storage           N/A       N/A        Y
>>      30792
>>
>>
>>     Any idea?
>>
>>     On storage I have:
>>     Number of Peers: 1
>>
>>     Hostname: 195.65.194.217
>>     Uuid: 7c988af2-9f76-4843-8e6f-d94866d57bb0
>>     State: Peer in Cluster (Connected)
>>
>>
>>     - Kindest regards,
>>
>>     Milos Cuculovic
>>     IT Manager
>>
>>     ---
>>     MDPI AG
>>     Postfach, CH-4020 Basel, Switzerland
>>     Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>     Tel. +41 61 683 77 35
>>     Fax +41 61 302 89 18
>>     Email: cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>     Skype: milos.cuculovic.mdpi
>>
>>     On 08.12.2016 13:55, Atin Mukherjee wrote:
>>
>>         Can you resend the attachment as zip? I am unable to extract the
>>         content? We shouldn't have 0 info file. What does gluster peer
>>         status
>>         output say?
>>
>>         On Thu, Dec 8, 2016 at 4:51 PM, Miloš Čučulović - MDPI
>>         <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>         <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>> wrote:
>>
>>             I hope you received my last email Atin, thank you!
>>
>>             - Kindest regards,
>>
>>             Milos Cuculovic
>>             IT Manager
>>
>>             ---
>>             MDPI AG
>>             Postfach, CH-4020 Basel, Switzerland
>>             Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>             Tel. +41 61 683 77 35
>>             Fax +41 61 302 89 18
>>             Email: cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>         <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>
>>             Skype: milos.cuculovic.mdpi
>>
>>             On 08.12.2016 10:28, Atin Mukherjee wrote:
>>
>>
>>                 ---------- Forwarded message ----------
>>                 From: *Atin Mukherjee* <amukherj at redhat.com
>>         <mailto:amukherj at redhat.com>
>>                 <mailto:amukherj at redhat.com
>>         <mailto:amukherj at redhat.com>> <mailto:amukherj at redhat.com
>>         <mailto:amukherj at redhat.com>
>>                 <mailto:amukherj at redhat.com
>>         <mailto:amukherj at redhat.com>>>>
>>                 Date: Thu, Dec 8, 2016 at 11:56 AM
>>                 Subject: Re: [Gluster-users] Replica brick not working
>>                 To: Ravishankar N <ravishankar at redhat.com
>>         <mailto:ravishankar at redhat.com>
>>                 <mailto:ravishankar at redhat.com
>>         <mailto:ravishankar at redhat.com>>
>>         <mailto:ravishankar at redhat.com <mailto:ravishankar at redhat.com>
>>                 <mailto:ravishankar at redhat.com
>>         <mailto:ravishankar at redhat.com>>>>
>>                 Cc: Miloš Čučulović - MDPI <cuculovic at mdpi.com
>>         <mailto:cuculovic at mdpi.com>
>>                 <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>
>>                 <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>         <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>>>,
>>                 Pranith Kumar Karampuri
>>                 <pkarampu at redhat.com <mailto:pkarampu at redhat.com>
>>         <mailto:pkarampu at redhat.com <mailto:pkarampu at redhat.com>>
>>                 <mailto:pkarampu at redhat.com
>>         <mailto:pkarampu at redhat.com> <mailto:pkarampu at redhat.com
>>         <mailto:pkarampu at redhat.com>>>>,
>>                 gluster-users
>>                 <gluster-users at gluster.org
>>         <mailto:gluster-users at gluster.org>
>>         <mailto:gluster-users at gluster.org
>>         <mailto:gluster-users at gluster.org>>
>>                 <mailto:gluster-users at gluster.org
>>         <mailto:gluster-users at gluster.org>
>>                 <mailto:gluster-users at gluster.org
>>         <mailto:gluster-users at gluster.org>>>>
>>
>>
>>
>>
>>                 On Thu, Dec 8, 2016 at 11:11 AM, Ravishankar N
>>                 <ravishankar at redhat.com
>>         <mailto:ravishankar at redhat.com> <mailto:ravishankar at redhat.com
>>         <mailto:ravishankar at redhat.com>>
>>                 <mailto:ravishankar at redhat.com
>>         <mailto:ravishankar at redhat.com> <mailto:ravishankar at redhat.com
>>         <mailto:ravishankar at redhat.com>>>>
>>
>>                 wrote:
>>
>>                     On 12/08/2016 10:43 AM, Atin Mukherjee wrote:
>>
>>                         >From the log snippet:
>>
>>                         [2016-12-07 09:15:35.677645] I [MSGID: 106482]
>>
>>         [glusterd-brick-ops.c:442:__glusterd_handle_add_brick]
>>                         0-management: Received add brick req
>>                         [2016-12-07 09:15:35.677708] I [MSGID: 106062]
>>
>>         [glusterd-brick-ops.c:494:__glusterd_handle_add_brick]
>>                         0-management: replica-count is 2
>>                         [2016-12-07 09:15:35.677735] E [MSGID: 106291]
>>
>>         [glusterd-brick-ops.c:614:__glusterd_handle_add_brick]
>>                 0-management:
>>
>>                         The last log entry indicates that we hit the
>>         code path in
>>                         gd_addbr_validate_replica_count ()
>>
>>                                         if (replica_count ==
>>                 volinfo->replica_count) {
>>                                                 if (!(total_bricks %
>>                         volinfo->dist_leaf_count)) {
>>                                                         ret = 1;
>>                                                         goto out;
>>                         }
>>                                         }
>>
>>
>>                     It seems unlikely that this snippet was hit
>>         because we print
>>                 the E
>>                     [MSGID: 106291] in the above message only if ret==-1.
>>                     gd_addbr_validate_replica_count() returns -1 and
>>         yet not
>>                 populates
>>                     err_str only when in volinfo->type doesn't match
>>         any of the
>>                 known
>>                     volume types, so volinfo->type is corrupted perhaps?
>>
>>
>>                 You are right, I missed that ret is set to 1 here in
>>         the above
>>                 snippet.
>>
>>                 @Milos - Can you please provide us the volume info
>>         file from
>>                 /var/lib/glusterd/vols/<volname>/ from all the three
>>         nodes to
>>                 continue
>>                 the analysis?
>>
>>
>>
>>                     -Ravi
>>
>>                         @Pranith, Ravi - Milos was trying to convert a
>>         dist (1 X 1)
>>                         volume to a replicate (1 X 2) using add brick
>>         and hit
>>                 this issue
>>                         where add-brick failed. The cluster is
>>         operating with 3.7.6.
>>                         Could you help on what scenario this code path
>>         can be
>>                 hit? One
>>                         straight forward issue I see here is missing
>>         err_str in
>>                 this path.
>>
>>
>>
>>
>>
>>
>>                 --
>>
>>                 ~ Atin (atinm)
>>
>>
>>
>>                 --
>>
>>                 ~ Atin (atinm)
>>
>>
>>
>>
>>         --
>>
>>         ~ Atin (atinm)
>>
>>
>>
>>
>> --
>>
>> ~ Atin (atinm)
>
>