[Bugs] [Bug 1807738] New: [GlusterFS-3.12.15] The brick went down after the FOP.

bugzilla at redhat.com bugzilla at redhat.com
Thu Feb 27 05:45:15 UTC 2020


https://bugzilla.redhat.com/show_bug.cgi?id=1807738

            Bug ID: 1807738
           Summary: [GlusterFS-3.12.15] The brick went down after the FOP.
           Product: GlusterFS
           Version: mainline
          Hardware: x86_64
                OS: Linux
            Status: NEW
         Component: core
          Assignee: bugs at gluster.org
          Reporter: knjeong at growthsoft.co.kr
                CC: bugs at gluster.org
  Target Milestone: ---
    Classification: Community



Description of problem:


Suddenly, a brick went down during the operation and collected an error, but
unfortunately no core dump was generated.


< GlusterFS(3.12.15) Volume Info >
Volume Name: nas
Type: Distributed-Replicate
Volume ID: 5a8cc386-85dc-4255-b287-352a499a28d5
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: NAS01:/gluster1/nas
Brick2: NAS02:/gluster1/nas
Brick3: NAS03:/gluster1/nas
Brick4: NAS01:/gluster2/nas ===> down brick(10.10.10.121:49154)
Brick5: NAS02:/gluster2/nas
Brick6: NAS03:/gluster2/nas
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
cluster.self-heal-window-size: 16
cluster.background-self-heal-count: 64
cluster.shd-wait-qlength: 32768
performance.cache-size: 4GB
performance.write-behind-window-size: 50MB
disperse.background-heals: 16
disperse.heal-wait-qlength: 2048
disperse.self-heal-window-size: 16
disperse.shd-wait-qlength: 16384
features.cache-invalidation: off
features.cache-invalidation-timeout: 60
performance.stat-prefetch: on
performance.cache-invalidation: off
performance.md-cache-timeout: 1
network.inode-lru-limit: 200000
performance.nl-cache-limit: 10MB
performance.nl-cache-positive-entry: FALSE
performance.nl-cache: off
cluster.use-compound-fops: on
performance.parallel-readdir: on
cluster.lookup-optimize: on
client.event-threads: 32
server.event-threads: 16
performance.low-prio-threads: 64
performance.normal-prio-threads: 64
performance.high-prio-threads: 64
performance.least-prio-threads: 64
performance.io-thread-count: 64
cluster.eager-lock: on


< Down Brick Log - 10.10.10.121:49154 >
[2020-02-25 18:39:47.375248] I [MSGID: 115008]
[server-resolve.c:541:server_resolve_fd] 0-: fd not found in context [Bad file
descriptor]
[2020-02-25 18:39:47.375658] E [MSGID: 115073]
[server-rpc-fops.c:1833:server_fxattrop_cbk] 0-nas-server: 17031439: FXATTROP 2
(a79e63ee-5610-41bb-bd59-72bf7164354d), client:
CLIENT01-35025-2020/01/08-11:41:03:97960-nas-client-3-5-3, error-xlator: - [Bad
file descriptor]
[2020-02-25 18:39:47.375701] E [rpcsvc.c:1364:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x103e10f, Program: GlusterFS
3.3, ProgVers: 330, Proc: 34) to rpc-transport (tcp.nas-server)
[2020-02-25 18:39:47.375750] E [server.c:195:server_submit_reply]
(-->/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x1bfa6)
[0x7f2b6f552fa6]
-->/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x1bba3)
[0x7f2b6f552ba3]
-->/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x94a6)
[0x7f2b6f5404a6] ) 0-: Reply submission failed
[2020-02-25 18:39:47.376339] I [MSGID: 115008]
[server-resolve.c:541:server_resolve_fd] 0-: fd not found in context [Bad file
descriptor]
[2020-02-25 18:39:47.376365] E [MSGID: 115090]
[server-rpc-fops.c:2171:server_compound_cbk] 0-nas-server: 17031440: COMPOUND2
(a79e63ee-5610-41bb-bd59-72bf7164354d), client:
CLIENT01-35025-2020/01/08-11:41:03:97960-nas-client-3-5-3, error-xlator: - [Bad
file descriptor]
pending frames:
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(11)
frame : type(0) op(52)
frame : type(0) op(33)
frame : type(0) op(11)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(11)
frame : type(0) op(52)
frame : type(0) op(33)
frame : type(0) op(11)
frame : type(0) op(52)
frame : type(0) op(11)
frame : type(0) op(33)
frame : type(0) op(52)
frame : type(0) op(33)
frame : type(0) op(52)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2020-02-25 18:39:47
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.12.15
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7f2b842db4e0]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f2b842e5414]
/lib64/libc.so.6(+0x36280)[0x7f2b8293b280]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x242eb)[0x7f2b6f55b2eb]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x2674d)[0x7f2b6f55d74d]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xc8e9)[0x7f2b6f5438e9]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xc98d)[0x7f2b6f54398d]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xd3a5)[0x7f2b6f5443a5]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xc9ce)[0x7f2b6f5439ce]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xd23e)[0x7f2b6f54423e]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xd356)[0x7f2b6f544356]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xc9ae)[0x7f2b6f5439ae]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xd464)[0x7f2b6f544464]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x27195)[0x7f2b6f55e195]
/lib64/libgfrpc.so.0(rpcsvc_request_handler+0x96)[0x7f2b8409d246]
/lib64/libpthread.so.0(+0x7dd5)[0x7f2b8313add5]
/lib64/libc.so.6(clone+0x6d)[0x7f2b82a02ead]
---------


< CLIENT01 gluster LOG >
[2020-02-25 14:18:28.617641] E [rpc-clnt.c:185:call_bail] 5-nas-client-3:
bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0xff906d sent =
2020-02-25 13:48:27.301196. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 14:18:28.627224] E [MSGID: 114031]
[client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation
failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 14:48:28.961143] E [rpc-clnt.c:185:call_bail] 5-nas-client-3:
bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x10001f7 sent =
2020-02-25 14:18:28.627356. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 14:48:28.961402] E [MSGID: 114031]
[client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation
failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 14:48:28.970718] W [fuse-bridge.c:779:fuse_truncate_cbk]
0-glusterfs-fuse: 235752053: TRUNCATE() /Image/655/146136.jpg => -1 (Transport
endpoint is not connected)
[2020-02-25 15:19:59.238515] E [rpc-clnt.c:185:call_bail] 5-nas-client-3:
bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x1007af2 sent =
2020-02-25 14:49:55.891752. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 15:19:59.238736] E [MSGID: 114031]
[client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation
failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 15:49:59.570378] E [rpc-clnt.c:185:call_bail] 5-nas-client-3:
bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x100ef9f sent =
2020-02-25 15:19:59.238815. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 15:49:59.570438] E [MSGID: 114031]
[client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation
failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 15:49:59.615705] W [fuse-bridge.c:779:fuse_truncate_cbk]
0-glusterfs-fuse: 235897960: TRUNCATE() /Image/655/146136.jpg => -1 (Transport
endpoint is not connected)
[2020-02-25 16:29:29.884503] E [rpc-clnt.c:185:call_bail] 5-nas-client-3:
bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x1017eda sent =
2020-02-25 15:59:22.576432. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 16:29:29.884547] E [MSGID: 114031]
[client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation
failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 16:59:30.151714] E [rpc-clnt.c:185:call_bail] 5-nas-client-3:
bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x101f311 sent =
2020-02-25 16:29:29.884632. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 16:59:30.151759] E [MSGID: 114031]
[client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation
failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 16:59:30.163008] W [fuse-bridge.c:779:fuse_truncate_cbk]
0-glusterfs-fuse: 236055074: TRUNCATE() /Image/655/146136.jpg => -1 (Transport
endpoint is not connected)
[2020-02-25 17:39:20.481714] E [rpc-clnt.c:185:call_bail] 5-nas-client-3:
bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x1028daa sent =
2020-02-25 17:09:17.597659. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 17:39:20.481784] E [MSGID: 114031]
[client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation
failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 18:09:20.783690] E [rpc-clnt.c:185:call_bail] 5-nas-client-3:
bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x102fefb sent =
2020-02-25 17:39:20.481881. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 18:09:20.783739] E [MSGID: 114031]
[client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation
failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 18:09:20.864368] W [fuse-bridge.c:779:fuse_truncate_cbk]
0-glusterfs-fuse: 236216079: TRUNCATE() /Image/655/146136.jpg => -1 (Transport
endpoint is not connected)
[2020-02-25 18:39:47.372960] C [rpc-clnt.c:449:rpc_clnt_fill_request_info]
5-nas-client-3: cannot lookup the saved frame corresponding to xid (16748653)
[2020-02-25 18:39:47.373034] W [socket.c:1956:__socket_read_reply]
5-nas-client-3: notify for event MAP_XID failed for 10.10.10.121:49154
[2020-02-25 18:39:47.373060] I [MSGID: 114018]
[client.c:2285:client_rpc_notify] 5-nas-client-3: disconnected from
nas-client-3. Client process will keep trying to connect to glusterd until
brick's port is available
[2020-02-25 18:39:47.373332] E [rpc-clnt.c:350:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f3ab5f58ebb] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f3ab5d1dd9e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f3ab5d1debe] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f3ab5d1f640] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f3ab5d20130] )))))
5-nas-client-3: forced unwinding frame type(GlusterFS 3.3) op(XATTROP(33))
called at 2020-02-25 18:19:23.681712 (xid=0x10396ec)
[2020-02-25 18:39:47.373366] E [MSGID: 114031]
[client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation
failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 18:39:47.373670] E [rpc-clnt.c:350:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f3ab5f58ebb] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f3ab5d1dd9e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f3ab5d1debe] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f3ab5d1f640] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f3ab5d20130] )))))
5-nas-client-3: forced unwinding frame type(GlusterFS 3.3) op(FXATTROP(34))
called at 2020-02-25 18:39:47.372139 (xid=0x103e10f)
[2020-02-25 18:39:47.373704] W [MSGID: 114031]
[client-rpc-fops.c:1782:client3_3_fxattrop_cbk] 5-nas-client-3: remote
operation failed
[2020-02-25 18:39:47.373723] E [MSGID: 114031]
[client-rpc-fops.c:1557:client3_3_finodelk_cbk] 5-nas-client-3: remote
operation failed [Transport endpoint is not connected]
[2020-02-25 18:39:47.373927] E [MSGID: 114031]
[client-rpc-fops.c:1508:client3_3_inodelk_cbk] 5-nas-client-3: remote operation
failed [Transport endpoint is not connected]
[2020-02-25 18:39:47.373942] E [rpc-clnt.c:350:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f3ab5f58ebb] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f3ab5d1dd9e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f3ab5d1debe] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f3ab5d1f640] (-->
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f3ab5d20130] )))))
5-nas-client-3: forced unwinding frame type(GlusterFS 3.3) op(COMPOUND(49))
called at 2020-02-25 18:39:47.372211 (xid=0x103e110)
[2020-02-25 18:39:47.373981] W [MSGID: 114031]
[client-rpc-fops.c:3138:client3_3_compound_cbk] 5-nas-client-3: remote
operation failed [Transport endpoint is not connected]
[2020-02-25 18:39:47.374029] W [MSGID: 114061]
[client-common.c:438:client_pre_flush] 5-nas-client-3: 
(a79e63ee-5610-41bb-bd59-72bf7164354d) remote_fd is -1. EBADFD [File descriptor
in bad state]
[2020-02-25 18:39:47.374330] W [fuse-bridge.c:779:fuse_truncate_cbk]
0-glusterfs-fuse: 236374556: TRUNCATE() /Image/655/146136.jpg => -1 (Transport
endpoint is not connected)
[2020-02-25 18:39:47.380497] W [fuse-bridge.c:779:fuse_truncate_cbk]
0-glusterfs-fuse: 236419406: TRUNCATE() /Image/655/148418_org.jpg => -1
(Input/output error)
[2020-02-25 18:39:47.464163] W [fuse-bridge.c:779:fuse_truncate_cbk]
0-glusterfs-fuse: 236419417: TRUNCATE() /Image/655/152411_org.jpg => -1
(Input/output error)
[2020-02-25 18:39:47.466615] W [fuse-bridge.c:779:fuse_truncate_cbk]
0-glusterfs-fuse: 236419422: TRUNCATE() /Image/655/152411.jpg => -1
(Input/output error)
[2020-02-25 18:39:47.812181] W [fuse-bridge.c:779:fuse_truncate_cbk]
0-glusterfs-fuse: 236419427: TRUNCATE() /Image/655/151868_org.jpg => -1
(Input/output error)
[2020-02-25 18:39:52.409176] W [fuse-bridge.c:779:fuse_truncate_cbk]
0-glusterfs-fuse: 236419532: TRUNCATE() /Image/655/147289_org.jpg => -1
(Input/output error)
[2020-02-25 18:39:58.037141] I [rpc-clnt.c:1986:rpc_clnt_reconfig]
5-nas-client-3: changing port to 49154 (from 0)
[2020-02-25 18:39:58.040347] E [socket.c:2376:socket_connect_finish]
5-nas-client-3: connection to 10.10.10.121:49154 failed (Connection refused);
disconnecting socket



In addition, we recently asked about the issue of increased CPU load for this
gluster server.
(https://bugzilla.redhat.com/show_bug.cgi?id=1806244)

Do you know why the brick went down?

Please tell me if you need more information.



Version-Release number of selected component (if applicable): 3.12.15


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list