[Gluster-users] Fwd: Replica brick not working

Miloš Čučulović - MDPI cuculovic at mdpi.com
Thu Dec 8 15:40:46 UTC 2016


Additional info, there are warning / errors in the new brick:

[2016-12-08 15:37:05.053615] E [MSGID: 115056] 
[server-rpc-fops.c:509:server_mkdir_cbk] 0-storage-server: 12636867: 
MKDIR /dms (00000000-0000-0000-0000-000000000001/dms) ==> (Permission 
denied) [Permission denied]
[2016-12-08 15:37:05.135607] I [MSGID: 115081] 
[server-rpc-fops.c:1280:server_fstat_cbk] 0-storage-server: 12636895: 
FSTAT -2 (e9481d78-9094-45a7-ac7e-e1feeb7055df) ==> (No such file or 
directory) [No such file or directory]
[2016-12-08 15:37:05.163610] I [MSGID: 115081] 
[server-rpc-fops.c:1280:server_fstat_cbk] 0-storage-server: 3523605: 
FSTAT -2 (2bb87992-5f24-44bd-ba7c-70c84510942b) ==> (No such file or 
directory) [No such file or directory]
[2016-12-08 15:37:05.163633] I [MSGID: 115081] 
[server-rpc-fops.c:1280:server_fstat_cbk] 0-storage-server: 3523604: 
FSTAT -2 (2bb87992-5f24-44bd-ba7c-70c84510942b) ==> (No such file or 
directory) [No such file or directory]
[2016-12-08 15:37:05.166590] I [MSGID: 115081] 
[server-rpc-fops.c:1280:server_fstat_cbk] 0-storage-server: 3523619: 
FSTAT -2 (616028b7-a2c2-40e3-998a-68329daf7b07) ==> (No such file or 
directory) [No such file or directory]
[2016-12-08 15:37:05.166659] I [MSGID: 115081] 
[server-rpc-fops.c:1280:server_fstat_cbk] 0-storage-server: 3523620: 
FSTAT -2 (616028b7-a2c2-40e3-998a-68329daf7b07) ==> (No such file or 
directory) [No such file or directory]
[2016-12-08 15:37:05.241276] I [MSGID: 115081] 
[server-rpc-fops.c:1280:server_fstat_cbk] 0-storage-server: 3451382: 
FSTAT -2 (f00e597e-7ae4-4d3a-986e-bbeb6cc07339) ==> (No such file or 
directory) [No such file or directory]
[2016-12-08 15:37:05.268583] I [MSGID: 115081] 
[server-rpc-fops.c:1280:server_fstat_cbk] 0-storage-server: 3523823: 
FSTAT -2 (a8a343c1-512f-4ad1-a3db-de9fc8ed990c) ==> (No such file or 
directory) [No such file or directory]
[2016-12-08 15:37:05.268771] I [MSGID: 115081] 
[server-rpc-fops.c:1280:server_fstat_cbk] 0-storage-server: 3523824: 
FSTAT -2 (a8a343c1-512f-4ad1-a3db-de9fc8ed990c) ==> (No such file or 
directory) [No such file or directory]
[2016-12-08 15:37:05.302501] I [MSGID: 115081] 
[server-rpc-fops.c:1280:server_fstat_cbk] 0-storage-server: 3523868: 
FSTAT -2 (eb0c4500-f9ae-408a-85e6-6e67ec6466a9) ==> (No such file or 
directory) [No such file or directory]
[2016-12-08 15:37:05.302558] I [MSGID: 115081] 
[server-rpc-fops.c:1280:server_fstat_cbk] 0-storage-server: 3523869: 
FSTAT -2 (eb0c4500-f9ae-408a-85e6-6e67ec6466a9) ==> (No such file or 
directory) [No such file or directory]
[2016-12-08 15:37:05.365428] E [MSGID: 115056] 
[server-rpc-fops.c:509:server_mkdir_cbk] 0-storage-server: 12637038: 
MKDIR /files (00000000-0000-0000-0000-000000000001/files) ==> 
(Permission denied) [Permission denied]
[2016-12-08 15:37:05.414486] E [MSGID: 115056] 
[server-rpc-fops.c:509:server_mkdir_cbk] 0-storage-server: 3451430: 
MKDIR /files (00000000-0000-0000-0000-000000000001/files) ==> 
(Permission denied) [Permission denied]


- Kindest regards,

Milos Cuculovic
IT Manager

---
MDPI AG
Postfach, CH-4020 Basel, Switzerland
Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
Tel. +41 61 683 77 35
Fax +41 61 302 89 18
Email: cuculovic at mdpi.com
Skype: milos.cuculovic.mdpi

On 08.12.2016 16:32, Miloš Čučulović - MDPI wrote:
> 1. No, atm the old server (storage2) volume is mounted on some other
> servers, so all files are created there. If I check the new brick, there
> is no files.
>
>
> 2. On storage2 server (old brick)
> getfattr: Removing leading '/' from absolute path names
> # file: data/data-cluster
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0x0226135726f346bcb3f8cb73365ed382
>
> On storage server (new brick)
> getfattr: Removing leading '/' from absolute path names
> # file: data/data-cluster
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0x0226135726f346bcb3f8cb73365ed382
>
>
> 3.
> Thread 8 (Thread 0x7fad832dd700 (LWP 30057)):
> #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at
> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
> #1  0x00007fad88834f3e in __afr_shd_healer_wait () from
> /usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/cluster/replicate.so
> #2  0x00007fad88834fad in afr_shd_healer_wait () from
> /usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/cluster/replicate.so
> #3  0x00007fad88835aa0 in afr_shd_index_healer () from
> /usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/cluster/replicate.so
> #4  0x00007fad8df4270a in start_thread (arg=0x7fad832dd700) at
> pthread_create.c:333
> #5  0x00007fad8dc7882d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
> Thread 7 (Thread 0x7fad83ade700 (LWP 30056)):
> #0  0x00007fad8dc78e23 in epoll_wait () at
> ../sysdeps/unix/syscall-template.S:84
> #1  0x00007fad8e808a58 in ?? () from
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
> #2  0x00007fad8df4270a in start_thread (arg=0x7fad83ade700) at
> pthread_create.c:333
> #3  0x00007fad8dc7882d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
> Thread 6 (Thread 0x7fad894a5700 (LWP 30055)):
> #0  0x00007fad8dc78e23 in epoll_wait () at
> ../sysdeps/unix/syscall-template.S:84
> #1  0x00007fad8e808a58 in ?? () from
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
> #2  0x00007fad8df4270a in start_thread (arg=0x7fad894a5700) at
> pthread_create.c:333
> #3  0x00007fad8dc7882d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
> Thread 5 (Thread 0x7fad8a342700 (LWP 30054)):
> #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at
> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
> #1  0x00007fad8e7ecd98 in syncenv_task () from
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
> #2  0x00007fad8e7ed970 in syncenv_processor () from
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
> #3  0x00007fad8df4270a in start_thread (arg=0x7fad8a342700) at
> pthread_create.c:333
> #4  0x00007fad8dc7882d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
> Thread 4 (Thread 0x7fad8ab43700 (LWP 30053)):
> #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at
> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
> #1  0x00007fad8e7ecd98 in syncenv_task () from
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
> #2  0x00007fad8e7ed970 in syncenv_processor () from
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
> #3  0x00007fad8df4270a in start_thread (arg=0x7fad8ab43700) at
> pthread_create.c:333
> #4  0x00007fad8dc7882d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
> Thread 3 (Thread 0x7fad8b344700 (LWP 30052)):
> #0  do_sigwait (sig=0x7fad8b343e3c, set=<optimized out>) at
> ../sysdeps/unix/sysv/linux/sigwait.c:64
> #1  __sigwait (set=<optimized out>, sig=0x7fad8b343e3c) at
> ../sysdeps/unix/sysv/linux/sigwait.c:96
> #2  0x00000000004080bf in glusterfs_sigwaiter ()
> #3  0x00007fad8df4270a in start_thread (arg=0x7fad8b344700) at
> pthread_create.c:333
> #4  0x00007fad8dc7882d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
> Thread 2 (Thread 0x7fad8bb45700 (LWP 30051)):
> #0  0x00007fad8df4bc6d in nanosleep () at
> ../sysdeps/unix/syscall-template.S:84
> #1  0x00007fad8e7ca744 in gf_timer_proc () from
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
> #2  0x00007fad8df4270a in start_thread (arg=0x7fad8bb45700) at
> pthread_create.c:333
> #3  0x00007fad8dc7882d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
> Thread 1 (Thread 0x7fad8ec66780 (LWP 30050)):
> #0  0x00007fad8df439dd in pthread_join (threadid=140383309420288,
> thread_return=0x0) at pthread_join.c:90
> #1  0x00007fad8e808eeb in ?? () from
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
> #2  0x0000000000405501 in main ()
>
>
> - Kindest regards,
>
> Milos Cuculovic
> IT Manager
>
> ---
> MDPI AG
> Postfach, CH-4020 Basel, Switzerland
> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
> Tel. +41 61 683 77 35
> Fax +41 61 302 89 18
> Email: cuculovic at mdpi.com
> Skype: milos.cuculovic.mdpi
>
> On 08.12.2016 16:17, Ravishankar N wrote:
>> On 12/08/2016 06:53 PM, Atin Mukherjee wrote:
>>>
>>>
>>> On Thu, Dec 8, 2016 at 6:44 PM, Miloš Čučulović - MDPI
>>> <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>> wrote:
>>>
>>>     Ah, damn! I found the issue. On the storage server, the storage2
>>>     IP address was wrong, I inversed two digits in the /etc/hosts
>>>     file, sorry for that :(
>>>
>>>     I was able to add the brick now, I started the heal, but still no
>>>     data transfer visible.
>>>
>> 1. Are the files getting created on the new brick though?
>> 2. Can you provide the output of `getfattr -d -m . -e hex
>> /data/data-cluster` on both bricks?
>> 3. Is it possible to attach gdb to the self-heal daemon on the original
>> (old) brick and get a backtrace?
>>     `gdb -p <pid of self-heal daemon on the orignal brick>`
>>      thread apply all bt  -->share this output
>>     quit gdb.
>>
>>
>> -Ravi
>>>
>>> @Ravi/Pranith - can you help here?
>>>
>>>
>>>
>>>     By doing gluster volume status, I have
>>>
>>>     Status of volume: storage
>>>     Gluster process                       TCP Port  RDMA Port
>>> Online  Pid
>>>
>>> ------------------------------------------------------------------------------
>>>
>>>     Brick storage2:/data/data-cluster     49152     0          Y
>>>      23101
>>>     Brick storage:/data/data-cluster      49152     0          Y
>>>      30773
>>>     Self-heal Daemon on localhost         N/A       N/A        Y
>>>      30050
>>>     Self-heal Daemon on storage           N/A       N/A        Y
>>>      30792
>>>
>>>
>>>     Any idea?
>>>
>>>     On storage I have:
>>>     Number of Peers: 1
>>>
>>>     Hostname: 195.65.194.217
>>>     Uuid: 7c988af2-9f76-4843-8e6f-d94866d57bb0
>>>     State: Peer in Cluster (Connected)
>>>
>>>
>>>     - Kindest regards,
>>>
>>>     Milos Cuculovic
>>>     IT Manager
>>>
>>>     ---
>>>     MDPI AG
>>>     Postfach, CH-4020 Basel, Switzerland
>>>     Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>>     Tel. +41 61 683 77 35
>>>     Fax +41 61 302 89 18
>>>     Email: cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>>     Skype: milos.cuculovic.mdpi
>>>
>>>     On 08.12.2016 13:55, Atin Mukherjee wrote:
>>>
>>>         Can you resend the attachment as zip? I am unable to extract the
>>>         content? We shouldn't have 0 info file. What does gluster peer
>>>         status
>>>         output say?
>>>
>>>         On Thu, Dec 8, 2016 at 4:51 PM, Miloš Čučulović - MDPI
>>>         <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>>         <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>> wrote:
>>>
>>>             I hope you received my last email Atin, thank you!
>>>
>>>             - Kindest regards,
>>>
>>>             Milos Cuculovic
>>>             IT Manager
>>>
>>>             ---
>>>             MDPI AG
>>>             Postfach, CH-4020 Basel, Switzerland
>>>             Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>>             Tel. +41 61 683 77 35
>>>             Fax +41 61 302 89 18
>>>             Email: cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>>         <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>
>>>             Skype: milos.cuculovic.mdpi
>>>
>>>             On 08.12.2016 10:28, Atin Mukherjee wrote:
>>>
>>>
>>>                 ---------- Forwarded message ----------
>>>                 From: *Atin Mukherjee* <amukherj at redhat.com
>>>         <mailto:amukherj at redhat.com>
>>>                 <mailto:amukherj at redhat.com
>>>         <mailto:amukherj at redhat.com>> <mailto:amukherj at redhat.com
>>>         <mailto:amukherj at redhat.com>
>>>                 <mailto:amukherj at redhat.com
>>>         <mailto:amukherj at redhat.com>>>>
>>>                 Date: Thu, Dec 8, 2016 at 11:56 AM
>>>                 Subject: Re: [Gluster-users] Replica brick not working
>>>                 To: Ravishankar N <ravishankar at redhat.com
>>>         <mailto:ravishankar at redhat.com>
>>>                 <mailto:ravishankar at redhat.com
>>>         <mailto:ravishankar at redhat.com>>
>>>         <mailto:ravishankar at redhat.com <mailto:ravishankar at redhat.com>
>>>                 <mailto:ravishankar at redhat.com
>>>         <mailto:ravishankar at redhat.com>>>>
>>>                 Cc: Miloš Čučulović - MDPI <cuculovic at mdpi.com
>>>         <mailto:cuculovic at mdpi.com>
>>>                 <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>
>>>                 <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>>         <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>>>,
>>>                 Pranith Kumar Karampuri
>>>                 <pkarampu at redhat.com <mailto:pkarampu at redhat.com>
>>>         <mailto:pkarampu at redhat.com <mailto:pkarampu at redhat.com>>
>>>                 <mailto:pkarampu at redhat.com
>>>         <mailto:pkarampu at redhat.com> <mailto:pkarampu at redhat.com
>>>         <mailto:pkarampu at redhat.com>>>>,
>>>                 gluster-users
>>>                 <gluster-users at gluster.org
>>>         <mailto:gluster-users at gluster.org>
>>>         <mailto:gluster-users at gluster.org
>>>         <mailto:gluster-users at gluster.org>>
>>>                 <mailto:gluster-users at gluster.org
>>>         <mailto:gluster-users at gluster.org>
>>>                 <mailto:gluster-users at gluster.org
>>>         <mailto:gluster-users at gluster.org>>>>
>>>
>>>
>>>
>>>
>>>                 On Thu, Dec 8, 2016 at 11:11 AM, Ravishankar N
>>>                 <ravishankar at redhat.com
>>>         <mailto:ravishankar at redhat.com> <mailto:ravishankar at redhat.com
>>>         <mailto:ravishankar at redhat.com>>
>>>                 <mailto:ravishankar at redhat.com
>>>         <mailto:ravishankar at redhat.com> <mailto:ravishankar at redhat.com
>>>         <mailto:ravishankar at redhat.com>>>>
>>>
>>>                 wrote:
>>>
>>>                     On 12/08/2016 10:43 AM, Atin Mukherjee wrote:
>>>
>>>                         >From the log snippet:
>>>
>>>                         [2016-12-07 09:15:35.677645] I [MSGID: 106482]
>>>
>>>         [glusterd-brick-ops.c:442:__glusterd_handle_add_brick]
>>>                         0-management: Received add brick req
>>>                         [2016-12-07 09:15:35.677708] I [MSGID: 106062]
>>>
>>>         [glusterd-brick-ops.c:494:__glusterd_handle_add_brick]
>>>                         0-management: replica-count is 2
>>>                         [2016-12-07 09:15:35.677735] E [MSGID: 106291]
>>>
>>>         [glusterd-brick-ops.c:614:__glusterd_handle_add_brick]
>>>                 0-management:
>>>
>>>                         The last log entry indicates that we hit the
>>>         code path in
>>>                         gd_addbr_validate_replica_count ()
>>>
>>>                                         if (replica_count ==
>>>                 volinfo->replica_count) {
>>>                                                 if (!(total_bricks %
>>>                         volinfo->dist_leaf_count)) {
>>>                                                         ret = 1;
>>>                                                         goto out;
>>>                         }
>>>                                         }
>>>
>>>
>>>                     It seems unlikely that this snippet was hit
>>>         because we print
>>>                 the E
>>>                     [MSGID: 106291] in the above message only if
>>> ret==-1.
>>>                     gd_addbr_validate_replica_count() returns -1 and
>>>         yet not
>>>                 populates
>>>                     err_str only when in volinfo->type doesn't match
>>>         any of the
>>>                 known
>>>                     volume types, so volinfo->type is corrupted perhaps?
>>>
>>>
>>>                 You are right, I missed that ret is set to 1 here in
>>>         the above
>>>                 snippet.
>>>
>>>                 @Milos - Can you please provide us the volume info
>>>         file from
>>>                 /var/lib/glusterd/vols/<volname>/ from all the three
>>>         nodes to
>>>                 continue
>>>                 the analysis?
>>>
>>>
>>>
>>>                     -Ravi
>>>
>>>                         @Pranith, Ravi - Milos was trying to convert a
>>>         dist (1 X 1)
>>>                         volume to a replicate (1 X 2) using add brick
>>>         and hit
>>>                 this issue
>>>                         where add-brick failed. The cluster is
>>>         operating with 3.7.6.
>>>                         Could you help on what scenario this code path
>>>         can be
>>>                 hit? One
>>>                         straight forward issue I see here is missing
>>>         err_str in
>>>                 this path.
>>>
>>>
>>>
>>>
>>>
>>>
>>>                 --
>>>
>>>                 ~ Atin (atinm)
>>>
>>>
>>>
>>>                 --
>>>
>>>                 ~ Atin (atinm)
>>>
>>>
>>>
>>>
>>>         --
>>>
>>>         ~ Atin (atinm)
>>>
>>>
>>>
>>>
>>> --
>>>
>>> ~ Atin (atinm)
>>
>>


More information about the Gluster-users mailing list