[Bugs] [Bug 1318132] New: Clients return ENOTCONN or EINVAL after restarting brick servers in quick succession
bugzilla at redhat.com
bugzilla at redhat.com
Wed Mar 16 07:23:25 UTC 2016
https://bugzilla.redhat.com/show_bug.cgi?id=1318132
Bug ID: 1318132
Summary: Clients return ENOTCONN or EINVAL after restarting
brick servers in quick succession
Product: Red Hat Gluster Storage
Version: 3.1
Component: glusterfs
Sub Component: core
Severity: high
Assignee: rhs-bugs at redhat.com
Reporter: congxueyang at gmail.com
QA Contact: annair at redhat.com
CC: bugs at gluster.org, congxueyang at gmail.com,
gluster-bugs at redhat.com, jdarcy at redhat.com,
jwm at horde.net, rwheeler at redhat.com
+++ This bug was initially created as a clone of Bug #902953 +++
(This comment was longer than 65,535 characters and has been moved to an
attachment by Red Hat Bugzilla).
--- Additional comment from Amar Tumballi on 2013-02-14 04:37:39 EST ---
Thanks for the report, but one thing is, if a node is (or lot of nodes) are
going down and coming back up, isn't it natural to have the operations fail as
the filesystem is network based?
--- Additional comment from John Morrissey on 2013-02-15 11:04:24 EST ---
Sure, I would expect the operations to fail *while* the Gluster servers are
being restarted, but after the servers are running, I would also expect Gluster
clients to gracefully reconnect.
As the logs above show, they clearly do not do so after several minutes, or (in
our experience) even after several hours.
--- Additional comment from John Morrissey on 2013-04-01 12:28:12 EDT ---
Looks like this isn't limited to native Gluster clients.
Some of our nodes mount a Gluster instance via NFS. We noticed that these
clients can successfully mount the volume, but any I/O to them returns EIO:
[jwm at elided:pts/13 ~> ls -l /path/to/gluster
ls: /path/to/gluster: Input/output error
The gluster<->nfs process on the gluster server:
root 27902 12.1 0.7 406064 179052 ? Ssl Jan22 11601:30
/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p
/var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S
/tmp/bf018af881a58acb0efa7cefadd6fb1d.socket
is spinning on a file descriptor that probably used to be connected to a
gluster brick, but is now open to /etc/services:
-bash-4.1$ sudo strace -p 27902
Process 27902 attached - interrupt to quit
epoll_wait(3, {{EPOLLIN|EPOLLERR|EPOLLHUP, {u32=19, u64=107374182419}}}, 258,
4294967295) = 1
getsockopt(19, SOL_SOCKET, SO_ERROR, [182050606976860271], [4]) = 0
shutdown(19, 2 /* send and receive */) = -1 ENOTCONN (Transport endpoint is
not connected)
readv(19, [{"\0\0\0\0", 4}], 1) = 0
epoll_ctl(3, EPOLL_CTL_DEL, 19, NULL) = 0
close(19) = 0
epoll_wait(3, {{EPOLLIN|EPOLLERR|EPOLLHUP, {u32=19, u64=107374182419}}}, 258,
4294967295) = 1
getsockopt(19, SOL_SOCKET, SO_ERROR, [190986337975795823], [4]) = 0
shutdown(19, 2 /* send and receive */) = -1 ENOTCONN (Transport endpoint is
not connected)
readv(19, [{"\0\0\0\0", 4}], 1) = 0
epoll_ctl(3, EPOLL_CTL_DEL, 19, NULL) = 0
close(19) = 0
epoll_wait(3, {{EPOLLIN|EPOLLERR|EPOLLHUP, {u32=19, u64=107374182419}}}, 258,
4294967295) = 1
-bash-4.1$ sudo lsof -p 27902
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
[...]
glusterfs 27902 root 19u REG 253,0 640999 3801126
/etc/services
--- Additional comment from Kaleb KEITHLEY on 2015-10-22 11:46:38 EDT ---
because of the large number of bugs filed against mainline version\ is
ambiguous and about to be removed as a choice.
If you believe this is still a bug, please change the status back to NEW and
choose the appropriate, applicable version for it.
--
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=Gjck1nUTS5&a=cc_unsubscribe
More information about the Bugs
mailing list