[Bugs] [Bug 1403109] New: Crash of glusterd when using long username with geo-replication

bugzilla at redhat.com bugzilla at redhat.com
Fri Dec 9 05:18:51 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1403109

            Bug ID: 1403109
           Summary: Crash of glusterd when using long username with
                    geo-replication
           Product: GlusterFS
           Version: 3.8
         Component: geo-replication
          Keywords: Triaged
          Severity: medium
          Priority: high
          Assignee: bugs at gluster.org
          Reporter: amukherj at redhat.com
                CC: bugs at gluster.org, bugzilla at ii.nl, sarumuga at redhat.com,
                    smohan at redhat.com
        Depends On: 1363613
            Blocks: 1368138, 1403108



+++ This bug was initially created as a clone of Bug #1363613 +++

Description of problem:

I have some existing data on the slave that I'm going to use for geo-rep, this
in the hope that I don't have to transfer 400G of data over geo-rep (the data
is already available at the location of the slave, just not in gluster)

Following this manual:

https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/html/Administration_Guide/sect-Preparing_to_Deploy_Geo-replication.html#Setting_Up_the_Environment_for_a_Secure_Geo-replication_Slave

the crash happens at step 9.

This is (probably) expected:

root at gluster-3:/home/mrten# gluster volume geo-replication gl0
georeplication at gluster-4.glstr::glbackup create push-pem
gluster-4.glstr::glbackup is not empty. Please delete existing files in
gluster-4.glstr::glbackup and retry, or use force to continue without deleting
the existing files.
geo-replication command failed

So force it:

root at gluster-3:/home/mrten# gluster volume geo-replication gl0
georeplication at gluster-4.glstr::glbackup create push-pem force
Connection failed. Please check if gluster daemon is operational.
geo-replication command failed

At this stage, there is a crash log in
/var/log/glusterfs/etc-glusterfs-glusterd.vol.log:

pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash:
2016-08-03 08:00:49
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.14
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x92)[0x7f6db19d5a32]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f6db19facdd]
/lib/x86_64-linux-gnu/libc.so.6(+0x36cb0)[0x7f6db0dd3cb0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f6db0dd3c37]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f6db0dd7028]
/lib/x86_64-linux-gnu/libc.so.6(+0x732a4)[0x7f6db0e102a4]
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x5c)[0x7f6db0ea7bbc]
/lib/x86_64-linux-gnu/libc.so.6(+0x109a90)[0x7f6db0ea6a90]
/lib/x86_64-linux-gnu/libc.so.6(__stpncpy_chk+0x0)[0x7f6db0ea5ef0]
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.14/xlator/mgmt/glusterd.so(+0xc5d6b)[0x7f6dacf83d6b]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_foreach_match+0x77)[0x7f6db19cf8a7]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_foreach+0x18)[0x7f6db19cfa18]
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.14/xlator/mgmt/glusterd.so(glusterd_op_stage_gsync_create+0x1cea)[0x7f6dacf92faa]
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.14/xlator/mgmt/glusterd.so(glusterd_op_stage_validate+0xdb)[0x7f6dacf2184b]
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.14/xlator/mgmt/glusterd.so(gd_stage_op_phase+0x16a)[0x7f6dacfb20ea]
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.14/xlator/mgmt/glusterd.so(gd_sync_task_begin+0x6de)[0x7f6dacfb3bbe]
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.14/xlator/mgmt/glusterd.so(glusterd_op_begin_synctask+0x30)[0x7f6dacfb3ef0]
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.14/xlator/mgmt/glusterd.so(__glusterd_handle_gsync_set+0x628)[0x7f6dacf871b8]
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.14/xlator/mgmt/glusterd.so(glusterd_big_locked_handler+0x30)[0x7f6dacf0c240]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(synctask_wrap+0x12)[0x7f6db1a232d2]
/lib/x86_64-linux-gnu/libc.so.6(+0x49800)[0x7f6db0de6800]

and glusterd is gone.

These are the log messages just before the crash, perhaps related:

[2016-08-03 08:00:49.674995] I [MSGID: 106316]
[glusterd-geo-rep.c:3096:glusterd_op_stage_gsync_create] 0-management:
georeplication at gluster-4.glstr::glbackup is not a valid slave volume. Error:
gluster-4.glstr::glbackup is not empty. Please delete existing files in
gluster-4.glstr::glbackup and retry, or use force to continue without deleting
the existing files.. Force creating geo-rep session.
[2016-08-03 08:00:49.675032] W [MSGID: 106029]
[glusterd-geo-rep.c:2522:glusterd_get_statefile_name] 0-management: Config file
(/var/lib/glusterd/geo-replication/gl0_gluster-4.glstr_glbackup/gsyncd.conf)
missing. Looking for template config file
(/var/lib/glusterd/geo-replication/gsyncd_template.conf) [No such file or
directory]
[2016-08-03 08:00:49.675048] I [MSGID: 106294]
[glusterd-geo-rep.c:2531:glusterd_get_statefile_name] 0-management: Using
default config
template(/var/lib/glusterd/geo-replication/gsyncd_template.conf).



Version-Release number of selected component (if applicable):
3.7.14 but saw it in 3.7.13 as well.

How reproducible:
Every time

Additional info:

This is on Ubuntu 14.04, using the gluster PPA, kernel 3.13.0-92-generic.

--- Additional comment from Mrten on 2016-08-03 09:25:16 EDT ---

Also crashes on 3.8.1:

[2016-08-03 13:23:03.870624] I [MSGID: 106294]
[glusterd-geo-rep.c:2560:glusterd_get_statefile_name] 0-management: Using
default config
template(/var/lib/glusterd/geo-replication/gsyncd_template.conf).
[2016-08-03 13:23:03.870493] E [MSGID: 106316]
[glusterd-geo-rep.c:2744:glusterd_verify_slave] 0-management: Not a valid slave
pending frames:
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash:
2016-08-03 13:23:04
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.8.1
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x92)[0x7f520b958b02]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f520b96204d]
/lib/x86_64-linux-gnu/libc.so.6(+0x36cb0)[0x7f520ad52cb0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f520ad52c37]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f520ad56028]
/lib/x86_64-linux-gnu/libc.so.6(+0x732a4)[0x7f520ad8f2a4]
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x5c)[0x7f520ae26bbc]
/lib/x86_64-linux-gnu/libc.so.6(+0x109a90)[0x7f520ae25a90]
/lib/x86_64-linux-gnu/libc.so.6(__stpncpy_chk+0x0)[0x7f520ae24ef0]
/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so(+0x97d1b)[0x7f5206b60d1b]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_foreach_match+0x77)[0x7f520b952847]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_foreach+0x18)[0x7f520b9529b8]
/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so(+0xa70f2)[0x7f5206b700f2]
/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so(+0x3283b)[0x7f5206afb83b]
/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so(+0xc6b3a)[0x7f5206b8fb3a]
/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so(+0xc860e)[0x7f5206b9160e]
/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so(+0xc8940)[0x7f5206b91940]
/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so(+0x9b158)[0x7f5206b64158]
/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so(+0x1db00)[0x7f5206ae6b00]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(synctask_wrap+0x12)[0x7f520b98d9a2]
/lib/x86_64-linux-gnu/libc.so.6(+0x49800)[0x7f520ad65800]

--- Additional comment from Mrten on 2016-08-03 09:41:47 EDT ---

Seems deliberate.

This is from a strace (3.8.1), can't find the output anywhere in logs:

[pid 15408] open("/dev/tty", O_RDWR|O_NOCTTY|O_NONBLOCK) = -1 ENXIO (No such
device or address)
[pid 15408] writev(2, [{"*** ", 4}, {"buffer overflow detected", 24}, {" ***:
", 6}, {"/usr/sbin/glusterd", 18}, {" terminated\n", 12}], 5) = 64
[pid 15408] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x7f6acead1000
[pid 15408] write(2, "======= Backtrace: =========\n", 29) = 29
[pid 15408] writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1},
{"+0x", 3}, {"7329f", 5}, {")", 1}, {"[0x", 3}, {"7f6acd80e29f", 12}, {"]\n",
2}], 8) = 58
[pid 15408] writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1},
{"__fortify_fail", 14}, {"+0x", 3}, {"5c", 2}, {")", 1}, {"[0x", 3},
{"7f6acd8a5bbc", 12}, {"]\n", 2}], 9) = 69
[pid 15408] writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1},
{"+0x", 3}, {"109a90", 6}, {")", 1}, {"[0x", 3}, {"7f6acd8a4a90", 12}, {"]\n",
2}], 8) = 59
[pid 15408] writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1},
{"__stpncpy_chk", 13}, {"+0x", 3}, {"0", 1}, {")", 1}, {"[0x", 3},
{"7f6acd8a3ef0", 12}, {"]\n", 2}], 9) = 67
[pid 15408] writev(2,
[{"/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so", 65},
{"(", 1}, {"+0x", 3}, {"97d1b", 5}, {")", 1}, {"[0x", 3}, {"7f6ac95dfd1b", 12},
{"]\n", 2}], 8) = 92
[pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/libglusterfs.so.0", 43},
{"(", 1}, {"dict_foreach_match", 18}, {"+0x", 3}, {"77", 2}, {")", 1}, {"[0x",
3}, {"7f6ace3d1847", 12}, {"]\n", 2}], 9) = 85
[pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/libglusterfs.so.0", 43},
{"(", 1}, {"dict_foreach", 12}, {"+0x", 3}, {"18", 2}, {")", 1}, {"[0x", 3},
{"7f6ace3d19b8", 12}, {"]\n", 2}], 9) = 79
[pid 15408] writev(2,
[{"/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so", 65},
{"(", 1}, {"+0x", 3}, {"a70f2", 5}, {")", 1}, {"[0x", 3}, {"7f6ac95ef0f2", 12},
{"]\n", 2}], 8) = 92
[pid 15408] writev(2,
[{"/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so", 65},
{"(", 1}, {"+0x", 3}, {"3283b", 5}, {")", 1}, {"[0x", 3}, {"7f6ac957a83b", 12},
{"]\n", 2}], 8) = 92
[pid 15408] writev(2,
[{"/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so", 65},
{"(", 1}, {"+0x", 3}, {"c6b3a", 5}, {")", 1}, {"[0x", 3}, {"7f6ac960eb3a", 12},
{"]\n", 2}], 8) = 92
[pid 15408] writev(2,
[{"/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so", 65},
{"(", 1}, {"+0x", 3}, {"c860e", 5}, {")", 1}, {"[0x", 3}, {"7f6ac961060e", 12},
{"]\n", 2}], 8) = 92
[pid 15408] writev(2,
[{"/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so", 65},
{"(", 1}, {"+0x", 3}, {"c8940", 5}, {")", 1}, {"[0x", 3}, {"7f6ac9610940", 12},
{"]\n", 2}], 8) = 92
[pid 15408] writev(2,
[{"/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so", 65},
{"(", 1}, {"+0x", 3}, {"9b158", 5}, {")", 1}, {"[0x", 3}, {"7f6ac95e3158", 12},
{"]\n", 2}], 8) = 92
[pid 15408] writev(2,
[{"/usr/lib/x86_64-linux-gnu/glusterfs/3.8.1/xlator/mgmt/glusterd.so", 65},
{"(", 1}, {"+0x", 3}, {"1db00", 5}, {")", 1}, {"[0x", 3}, {"7f6ac9565b00", 12},
{"]\n", 2}], 8) = 92
[pid 15408] writev(2, [{"/usr/lib/x86_64-linux-gnu/libglusterfs.so.0", 43},
{"(", 1}, {"synctask_wrap", 13}, {"+0x", 3}, {"12", 2}, {")", 1}, {"[0x", 3},
{"7f6ace40c9a2", 12}, {"]\n", 2}], 9) = 80
[pid 15408] writev(2, [{"/lib/x86_64-linux-gnu/libc.so.6", 31}, {"(", 1},
{"+0x", 3}, {"49800", 5}, {")", 1}, {"[0x", 3}, {"7f6acd7e4800", 12}, {"]\n",
2}], 8) = 58
[pid 15408] write(2, "======= Memory map: ========\n", 29) = 29

I've omitted the memory map, lots of output there.

--- Additional comment from Mrten on 2016-08-03 11:01:51 EDT ---

A stack trace from gdb for good measure:

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7f29402bb700 (LWP 29241)]
0x00007f29431d9c37 in __GI_raise (sig=sig at entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
56    ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007f29431d9c37 in __GI_raise (sig=sig at entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f29431dd028 in __GI_abort () at abort.c:89
#2  0x00007f29432162a4 in __libc_message (do_abort=do_abort at entry=2,
fmt=fmt at entry=0x7f2943322113 "*** %s ***: %s terminated\n") at
../sysdeps/posix/libc_fatal.c:175
#3  0x00007f29432adbbc in __GI___fortify_fail (msg=<optimized out>,
msg at entry=0x7f29433220aa "buffer overflow detected") at fortify_fail.c:38
#4  0x00007f29432aca90 in __GI___chk_fail () at chk_fail.c:28
#5  0x00007f29432abef0 in __strncpy_chk (s1=s1 at entry=0x7f292c3febd0 "",
s2=<optimized out>, n=n at entry=14, s1len=s1len at entry=9) at strncpy_chk.c:30
#6  0x00007f293efe7d1b in strncpy (__len=14, __src=<optimized out>,
__dest=0x7f292c3febd0 "") at /usr/include/x86_64-linux-gnu/bits/string3.h:120
#7  get_slavehost_from_voluuid (dict=dict at entry=0x7f29415403c8, key=<optimized
out>, value=<optimized out>, data=data at entry=0x7f292c3fead0) at
glusterd-geo-rep.c:2917
#8  0x00007f2943dd9847 in dict_foreach_match (dict=0x7f29415403c8,
match=0x7f2943dd6d60 <dict_match_everything>, match_data=0x0,
action=0x7f293efe7bf0 <get_slavehost_from_voluuid>, action_data=0x7f292c3fead0)
at dict.c:1236
#9  0x00007f2943dd99b8 in dict_foreach (dict=<optimized out>,
fn=fn at entry=0x7f293efe7bf0 <get_slavehost_from_voluuid>,
data=data at entry=0x7f292c3fead0) at dict.c:1194
#10 0x00007f293eff70f2 in glusterd_get_slavehost_from_voluuid
(slave_host=<optimized out>, slave_vol=<optimized out>, slave1=0x7f292c3fead0,
volinfo=0x7f29450e26a0) at glusterd-geo-rep.c:2963
#11 glusterd_op_stage_gsync_create (dict=dict at entry=0x7f2941541494,
op_errstr=op_errstr at entry=0x7f292c406c00) at glusterd-geo-rep.c:3256
#12 0x00007f293ef8283b in glusterd_op_stage_validate
(op=op at entry=GD_OP_GSYNC_CREATE, dict=dict at entry=0x7f2941541494,
op_errstr=op_errstr at entry=0x7f292c406c00,
rsp_dict=rsp_dict at entry=0x7f29415415ec) at glusterd-op-sm.c:5646
#13 0x00007f293f016b3a in gd_stage_op_phase (op=<optimized out>,
op_ctx=op_ctx at entry=0x7f29415413e8, req_dict=0x7f2941541494,
op_errstr=op_errstr at entry=0x7f292c406c00,
txn_opinfo=txn_opinfo at entry=0x7f292c406c20) at glusterd-syncop.c:1272
#14 0x00007f293f01860e in gd_sync_task_begin
(op_ctx=op_ctx at entry=0x7f29415413e8, req=req at entry=0x7f29450d48cc) at
glusterd-syncop.c:1900
#15 0x00007f293f018940 in glusterd_op_begin_synctask
(req=req at entry=0x7f29450d48cc, op=op at entry=GD_OP_GSYNC_CREATE,
dict=0x7f29415413e8) at glusterd-syncop.c:1973
#16 0x00007f293efeb158 in __glusterd_handle_gsync_set
(req=req at entry=0x7f29450d48cc) at glusterd-geo-rep.c:347
#17 0x00007f293ef6db00 in glusterd_big_locked_handler (req=0x7f29450d48cc,
actor_fn=0x7f293efeab30 <__glusterd_handle_gsync_set>) at glusterd-handler.c:80
#18 0x00007f2943e149a2 in synctask_wrap (old_task=<optimized out>) at
syncop.c:375
#19 0x00007f29431ec800 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#20 0x0000000000000000 in ?? ()

Got this by installing glusterfs-dbg package.

--- Additional comment from Mrten on 2016-08-03 11:30:38 EDT ---

I think I got it: slave_vol_config struct is a 

struct slave_vol_config {
       char      old_slvhost[_POSIX_HOST_NAME_MAX+1];
       char      old_slvuser[_POSIX_LOGIN_NAME_MAX];
       unsigned  old_slvidx;
       char      slave_voluuid[GF_UUID_BUF_SIZE];
};

and _POSIX_LOGIN_NAME_MAX is ... 9.

my login name is 14 characters long, so, crash.

I'd suggest using LOGIN_NAME_MAX instead of _POSIX_LOGIN_NAME_MAX, which is 256
long.

Don't switch the _POSIX_HOST_NAME_MAX to HOST_NAME_MAX though, that's 255 vs
64.

--- Additional comment from Saravanakumar on 2016-08-18 10:11:31 EDT ---

(In reply to Mrten from comment #4)
> I think I got it: slave_vol_config struct is a 
> 
> struct slave_vol_config {
>        char      old_slvhost[_POSIX_HOST_NAME_MAX+1];
>        char      old_slvuser[_POSIX_LOGIN_NAME_MAX];
>        unsigned  old_slvidx;
>        char      slave_voluuid[GF_UUID_BUF_SIZE];
> };
> 
> and _POSIX_LOGIN_NAME_MAX is ... 9.
> 
> my login name is 14 characters long, so, crash.
> 
> I'd suggest using LOGIN_NAME_MAX instead of _POSIX_LOGIN_NAME_MAX, which is
> 256 long.
> 
> Don't switch the _POSIX_HOST_NAME_MAX to HOST_NAME_MAX though, that's 255 vs
> 64.

Thanks for the detailed bug report and RCA.

Unfortunately, having LOGIN_NAME_MAX will not honour POSIX.
(Also, it will be inconsistent to have _POSIX_HOST_NAME_MAX and LOGIN_NAME as
LOGIN_NAME_MAX)

I have posted a patch, which checks whether length is within
_POSIX_LOGIN_NAME_MAX, so glusterd should no longer crash.

This is under review - http://review.gluster.org/#/c/15199

--- Additional comment from Niels de Vos on 2016-09-12 01:39:56 EDT ---

All 3.8.x bugs are now reported against version 3.8 (without .x). For more
information, see
http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1363613
[Bug 1363613] Crash of glusterd when using long username with
geo-replication
https://bugzilla.redhat.com/show_bug.cgi?id=1368138
[Bug 1368138] Crash of glusterd when using long username with
geo-replication
https://bugzilla.redhat.com/show_bug.cgi?id=1403108
[Bug 1403108] Crash of glusterd when using long username with
geo-replication
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list