[Gluster-users] Fwd: Replica brick not working
Miloš Čučulović - MDPI
cuculovic at mdpi.com
Thu Dec 8 15:40:46 UTC 2016
Additional info, there are warning / errors in the new brick:
[2016-12-08 15:37:05.053615] E [MSGID: 115056]
[server-rpc-fops.c:509:server_mkdir_cbk] 0-storage-server: 12636867:
MKDIR /dms (00000000-0000-0000-0000-000000000001/dms) ==> (Permission
denied) [Permission denied]
[2016-12-08 15:37:05.135607] I [MSGID: 115081]
[server-rpc-fops.c:1280:server_fstat_cbk] 0-storage-server: 12636895:
FSTAT -2 (e9481d78-9094-45a7-ac7e-e1feeb7055df) ==> (No such file or
directory) [No such file or directory]
[2016-12-08 15:37:05.163610] I [MSGID: 115081]
[server-rpc-fops.c:1280:server_fstat_cbk] 0-storage-server: 3523605:
FSTAT -2 (2bb87992-5f24-44bd-ba7c-70c84510942b) ==> (No such file or
directory) [No such file or directory]
[2016-12-08 15:37:05.163633] I [MSGID: 115081]
[server-rpc-fops.c:1280:server_fstat_cbk] 0-storage-server: 3523604:
FSTAT -2 (2bb87992-5f24-44bd-ba7c-70c84510942b) ==> (No such file or
directory) [No such file or directory]
[2016-12-08 15:37:05.166590] I [MSGID: 115081]
[server-rpc-fops.c:1280:server_fstat_cbk] 0-storage-server: 3523619:
FSTAT -2 (616028b7-a2c2-40e3-998a-68329daf7b07) ==> (No such file or
directory) [No such file or directory]
[2016-12-08 15:37:05.166659] I [MSGID: 115081]
[server-rpc-fops.c:1280:server_fstat_cbk] 0-storage-server: 3523620:
FSTAT -2 (616028b7-a2c2-40e3-998a-68329daf7b07) ==> (No such file or
directory) [No such file or directory]
[2016-12-08 15:37:05.241276] I [MSGID: 115081]
[server-rpc-fops.c:1280:server_fstat_cbk] 0-storage-server: 3451382:
FSTAT -2 (f00e597e-7ae4-4d3a-986e-bbeb6cc07339) ==> (No such file or
directory) [No such file or directory]
[2016-12-08 15:37:05.268583] I [MSGID: 115081]
[server-rpc-fops.c:1280:server_fstat_cbk] 0-storage-server: 3523823:
FSTAT -2 (a8a343c1-512f-4ad1-a3db-de9fc8ed990c) ==> (No such file or
directory) [No such file or directory]
[2016-12-08 15:37:05.268771] I [MSGID: 115081]
[server-rpc-fops.c:1280:server_fstat_cbk] 0-storage-server: 3523824:
FSTAT -2 (a8a343c1-512f-4ad1-a3db-de9fc8ed990c) ==> (No such file or
directory) [No such file or directory]
[2016-12-08 15:37:05.302501] I [MSGID: 115081]
[server-rpc-fops.c:1280:server_fstat_cbk] 0-storage-server: 3523868:
FSTAT -2 (eb0c4500-f9ae-408a-85e6-6e67ec6466a9) ==> (No such file or
directory) [No such file or directory]
[2016-12-08 15:37:05.302558] I [MSGID: 115081]
[server-rpc-fops.c:1280:server_fstat_cbk] 0-storage-server: 3523869:
FSTAT -2 (eb0c4500-f9ae-408a-85e6-6e67ec6466a9) ==> (No such file or
directory) [No such file or directory]
[2016-12-08 15:37:05.365428] E [MSGID: 115056]
[server-rpc-fops.c:509:server_mkdir_cbk] 0-storage-server: 12637038:
MKDIR /files (00000000-0000-0000-0000-000000000001/files) ==>
(Permission denied) [Permission denied]
[2016-12-08 15:37:05.414486] E [MSGID: 115056]
[server-rpc-fops.c:509:server_mkdir_cbk] 0-storage-server: 3451430:
MKDIR /files (00000000-0000-0000-0000-000000000001/files) ==>
(Permission denied) [Permission denied]
- Kindest regards,
Milos Cuculovic
IT Manager
---
MDPI AG
Postfach, CH-4020 Basel, Switzerland
Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
Tel. +41 61 683 77 35
Fax +41 61 302 89 18
Email: cuculovic at mdpi.com
Skype: milos.cuculovic.mdpi
On 08.12.2016 16:32, Miloš Čučulović - MDPI wrote:
> 1. No, atm the old server (storage2) volume is mounted on some other
> servers, so all files are created there. If I check the new brick, there
> is no files.
>
>
> 2. On storage2 server (old brick)
> getfattr: Removing leading '/' from absolute path names
> # file: data/data-cluster
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0x0226135726f346bcb3f8cb73365ed382
>
> On storage server (new brick)
> getfattr: Removing leading '/' from absolute path names
> # file: data/data-cluster
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0x0226135726f346bcb3f8cb73365ed382
>
>
> 3.
> Thread 8 (Thread 0x7fad832dd700 (LWP 30057)):
> #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at
> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
> #1 0x00007fad88834f3e in __afr_shd_healer_wait () from
> /usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/cluster/replicate.so
> #2 0x00007fad88834fad in afr_shd_healer_wait () from
> /usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/cluster/replicate.so
> #3 0x00007fad88835aa0 in afr_shd_index_healer () from
> /usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/cluster/replicate.so
> #4 0x00007fad8df4270a in start_thread (arg=0x7fad832dd700) at
> pthread_create.c:333
> #5 0x00007fad8dc7882d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
> Thread 7 (Thread 0x7fad83ade700 (LWP 30056)):
> #0 0x00007fad8dc78e23 in epoll_wait () at
> ../sysdeps/unix/syscall-template.S:84
> #1 0x00007fad8e808a58 in ?? () from
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
> #2 0x00007fad8df4270a in start_thread (arg=0x7fad83ade700) at
> pthread_create.c:333
> #3 0x00007fad8dc7882d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
> Thread 6 (Thread 0x7fad894a5700 (LWP 30055)):
> #0 0x00007fad8dc78e23 in epoll_wait () at
> ../sysdeps/unix/syscall-template.S:84
> #1 0x00007fad8e808a58 in ?? () from
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
> #2 0x00007fad8df4270a in start_thread (arg=0x7fad894a5700) at
> pthread_create.c:333
> #3 0x00007fad8dc7882d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
> Thread 5 (Thread 0x7fad8a342700 (LWP 30054)):
> #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at
> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
> #1 0x00007fad8e7ecd98 in syncenv_task () from
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
> #2 0x00007fad8e7ed970 in syncenv_processor () from
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
> #3 0x00007fad8df4270a in start_thread (arg=0x7fad8a342700) at
> pthread_create.c:333
> #4 0x00007fad8dc7882d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
> Thread 4 (Thread 0x7fad8ab43700 (LWP 30053)):
> #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at
> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
> #1 0x00007fad8e7ecd98 in syncenv_task () from
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
> #2 0x00007fad8e7ed970 in syncenv_processor () from
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
> #3 0x00007fad8df4270a in start_thread (arg=0x7fad8ab43700) at
> pthread_create.c:333
> #4 0x00007fad8dc7882d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
> Thread 3 (Thread 0x7fad8b344700 (LWP 30052)):
> #0 do_sigwait (sig=0x7fad8b343e3c, set=<optimized out>) at
> ../sysdeps/unix/sysv/linux/sigwait.c:64
> #1 __sigwait (set=<optimized out>, sig=0x7fad8b343e3c) at
> ../sysdeps/unix/sysv/linux/sigwait.c:96
> #2 0x00000000004080bf in glusterfs_sigwaiter ()
> #3 0x00007fad8df4270a in start_thread (arg=0x7fad8b344700) at
> pthread_create.c:333
> #4 0x00007fad8dc7882d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
> Thread 2 (Thread 0x7fad8bb45700 (LWP 30051)):
> #0 0x00007fad8df4bc6d in nanosleep () at
> ../sysdeps/unix/syscall-template.S:84
> #1 0x00007fad8e7ca744 in gf_timer_proc () from
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
> #2 0x00007fad8df4270a in start_thread (arg=0x7fad8bb45700) at
> pthread_create.c:333
> #3 0x00007fad8dc7882d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
> Thread 1 (Thread 0x7fad8ec66780 (LWP 30050)):
> #0 0x00007fad8df439dd in pthread_join (threadid=140383309420288,
> thread_return=0x0) at pthread_join.c:90
> #1 0x00007fad8e808eeb in ?? () from
> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
> #2 0x0000000000405501 in main ()
>
>
> - Kindest regards,
>
> Milos Cuculovic
> IT Manager
>
> ---
> MDPI AG
> Postfach, CH-4020 Basel, Switzerland
> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
> Tel. +41 61 683 77 35
> Fax +41 61 302 89 18
> Email: cuculovic at mdpi.com
> Skype: milos.cuculovic.mdpi
>
> On 08.12.2016 16:17, Ravishankar N wrote:
>> On 12/08/2016 06:53 PM, Atin Mukherjee wrote:
>>>
>>>
>>> On Thu, Dec 8, 2016 at 6:44 PM, Miloš Čučulović - MDPI
>>> <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>> wrote:
>>>
>>> Ah, damn! I found the issue. On the storage server, the storage2
>>> IP address was wrong, I inversed two digits in the /etc/hosts
>>> file, sorry for that :(
>>>
>>> I was able to add the brick now, I started the heal, but still no
>>> data transfer visible.
>>>
>> 1. Are the files getting created on the new brick though?
>> 2. Can you provide the output of `getfattr -d -m . -e hex
>> /data/data-cluster` on both bricks?
>> 3. Is it possible to attach gdb to the self-heal daemon on the original
>> (old) brick and get a backtrace?
>> `gdb -p <pid of self-heal daemon on the orignal brick>`
>> thread apply all bt -->share this output
>> quit gdb.
>>
>>
>> -Ravi
>>>
>>> @Ravi/Pranith - can you help here?
>>>
>>>
>>>
>>> By doing gluster volume status, I have
>>>
>>> Status of volume: storage
>>> Gluster process TCP Port RDMA Port
>>> Online Pid
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> Brick storage2:/data/data-cluster 49152 0 Y
>>> 23101
>>> Brick storage:/data/data-cluster 49152 0 Y
>>> 30773
>>> Self-heal Daemon on localhost N/A N/A Y
>>> 30050
>>> Self-heal Daemon on storage N/A N/A Y
>>> 30792
>>>
>>>
>>> Any idea?
>>>
>>> On storage I have:
>>> Number of Peers: 1
>>>
>>> Hostname: 195.65.194.217
>>> Uuid: 7c988af2-9f76-4843-8e6f-d94866d57bb0
>>> State: Peer in Cluster (Connected)
>>>
>>>
>>> - Kindest regards,
>>>
>>> Milos Cuculovic
>>> IT Manager
>>>
>>> ---
>>> MDPI AG
>>> Postfach, CH-4020 Basel, Switzerland
>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>> Tel. +41 61 683 77 35
>>> Fax +41 61 302 89 18
>>> Email: cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>> Skype: milos.cuculovic.mdpi
>>>
>>> On 08.12.2016 13:55, Atin Mukherjee wrote:
>>>
>>> Can you resend the attachment as zip? I am unable to extract the
>>> content? We shouldn't have 0 info file. What does gluster peer
>>> status
>>> output say?
>>>
>>> On Thu, Dec 8, 2016 at 4:51 PM, Miloš Čučulović - MDPI
>>> <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>> <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>> wrote:
>>>
>>> I hope you received my last email Atin, thank you!
>>>
>>> - Kindest regards,
>>>
>>> Milos Cuculovic
>>> IT Manager
>>>
>>> ---
>>> MDPI AG
>>> Postfach, CH-4020 Basel, Switzerland
>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>>> Tel. +41 61 683 77 35
>>> Fax +41 61 302 89 18
>>> Email: cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>> <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>
>>> Skype: milos.cuculovic.mdpi
>>>
>>> On 08.12.2016 10:28, Atin Mukherjee wrote:
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: *Atin Mukherjee* <amukherj at redhat.com
>>> <mailto:amukherj at redhat.com>
>>> <mailto:amukherj at redhat.com
>>> <mailto:amukherj at redhat.com>> <mailto:amukherj at redhat.com
>>> <mailto:amukherj at redhat.com>
>>> <mailto:amukherj at redhat.com
>>> <mailto:amukherj at redhat.com>>>>
>>> Date: Thu, Dec 8, 2016 at 11:56 AM
>>> Subject: Re: [Gluster-users] Replica brick not working
>>> To: Ravishankar N <ravishankar at redhat.com
>>> <mailto:ravishankar at redhat.com>
>>> <mailto:ravishankar at redhat.com
>>> <mailto:ravishankar at redhat.com>>
>>> <mailto:ravishankar at redhat.com <mailto:ravishankar at redhat.com>
>>> <mailto:ravishankar at redhat.com
>>> <mailto:ravishankar at redhat.com>>>>
>>> Cc: Miloš Čučulović - MDPI <cuculovic at mdpi.com
>>> <mailto:cuculovic at mdpi.com>
>>> <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>
>>> <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>
>>> <mailto:cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>>>>,
>>> Pranith Kumar Karampuri
>>> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>
>>> <mailto:pkarampu at redhat.com <mailto:pkarampu at redhat.com>>
>>> <mailto:pkarampu at redhat.com
>>> <mailto:pkarampu at redhat.com> <mailto:pkarampu at redhat.com
>>> <mailto:pkarampu at redhat.com>>>>,
>>> gluster-users
>>> <gluster-users at gluster.org
>>> <mailto:gluster-users at gluster.org>
>>> <mailto:gluster-users at gluster.org
>>> <mailto:gluster-users at gluster.org>>
>>> <mailto:gluster-users at gluster.org
>>> <mailto:gluster-users at gluster.org>
>>> <mailto:gluster-users at gluster.org
>>> <mailto:gluster-users at gluster.org>>>>
>>>
>>>
>>>
>>>
>>> On Thu, Dec 8, 2016 at 11:11 AM, Ravishankar N
>>> <ravishankar at redhat.com
>>> <mailto:ravishankar at redhat.com> <mailto:ravishankar at redhat.com
>>> <mailto:ravishankar at redhat.com>>
>>> <mailto:ravishankar at redhat.com
>>> <mailto:ravishankar at redhat.com> <mailto:ravishankar at redhat.com
>>> <mailto:ravishankar at redhat.com>>>>
>>>
>>> wrote:
>>>
>>> On 12/08/2016 10:43 AM, Atin Mukherjee wrote:
>>>
>>> >From the log snippet:
>>>
>>> [2016-12-07 09:15:35.677645] I [MSGID: 106482]
>>>
>>> [glusterd-brick-ops.c:442:__glusterd_handle_add_brick]
>>> 0-management: Received add brick req
>>> [2016-12-07 09:15:35.677708] I [MSGID: 106062]
>>>
>>> [glusterd-brick-ops.c:494:__glusterd_handle_add_brick]
>>> 0-management: replica-count is 2
>>> [2016-12-07 09:15:35.677735] E [MSGID: 106291]
>>>
>>> [glusterd-brick-ops.c:614:__glusterd_handle_add_brick]
>>> 0-management:
>>>
>>> The last log entry indicates that we hit the
>>> code path in
>>> gd_addbr_validate_replica_count ()
>>>
>>> if (replica_count ==
>>> volinfo->replica_count) {
>>> if (!(total_bricks %
>>> volinfo->dist_leaf_count)) {
>>> ret = 1;
>>> goto out;
>>> }
>>> }
>>>
>>>
>>> It seems unlikely that this snippet was hit
>>> because we print
>>> the E
>>> [MSGID: 106291] in the above message only if
>>> ret==-1.
>>> gd_addbr_validate_replica_count() returns -1 and
>>> yet not
>>> populates
>>> err_str only when in volinfo->type doesn't match
>>> any of the
>>> known
>>> volume types, so volinfo->type is corrupted perhaps?
>>>
>>>
>>> You are right, I missed that ret is set to 1 here in
>>> the above
>>> snippet.
>>>
>>> @Milos - Can you please provide us the volume info
>>> file from
>>> /var/lib/glusterd/vols/<volname>/ from all the three
>>> nodes to
>>> continue
>>> the analysis?
>>>
>>>
>>>
>>> -Ravi
>>>
>>> @Pranith, Ravi - Milos was trying to convert a
>>> dist (1 X 1)
>>> volume to a replicate (1 X 2) using add brick
>>> and hit
>>> this issue
>>> where add-brick failed. The cluster is
>>> operating with 3.7.6.
>>> Could you help on what scenario this code path
>>> can be
>>> hit? One
>>> straight forward issue I see here is missing
>>> err_str in
>>> this path.
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> ~ Atin (atinm)
>>>
>>>
>>>
>>> --
>>>
>>> ~ Atin (atinm)
>>>
>>>
>>>
>>>
>>> --
>>>
>>> ~ Atin (atinm)
>>>
>>>
>>>
>>>
>>> --
>>>
>>> ~ Atin (atinm)
>>
>>
More information about the Gluster-users
mailing list