[Gluster-users] setting gfid on .trashcan/... failed - total outage
Dietmar Putz
dietmar.putz at 3qsdn.com
Wed Jun 28 12:42:08 UTC 2017
Hello,
recently we had two times a partial gluster outage followed by a total
outage of all four nodes. Looking into the gluster mailing list i found
a very similar case in
http://lists.gluster.org/pipermail/gluster-users/2016-June/027124.html
but i'm not sure if this issue is fixed...
even this outage happened on glusterfs 3.7.18 which gets no more updates
since ~.20 i would kindly ask if this issue is known to be fixed in 3.8
resp. 3.10... ?
unfortunately i did not found corresponding informations in the release
notes...
best regards
Dietmar
the partial outage started as shown below, the very first entries
occurred in the brick-logs :
gl-master-04, brick1-mvol1.log :
[2017-06-23 16:35:11.373471] E [MSGID: 113020]
[posix.c:2839:posix_create] 0-mvol1-posix: setting gfid on
/brick1/mvol1/.trashcan//2290/uploads/170221_Sendung_Lieberum_01_AT.mp4_2017-06-23_163511
failed
[2017-06-23 16:35:11.392540] E [posix.c:3188:_fill_writev_xdata]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.18/xlator/features/trash.so(trash_truncate_readv_cbk+0x1ab)
[0x7f4f8c2aaa0b] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.18/xlator/
storage/posix.so(posix_writev+0x1ff) [0x7f4f8caec62f]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.18/xlator/storage/posix.so(_fill_writev_xdata+0x1c6)
[0x7f4f8caec406] ) 0-mvol1-posix: fd: 0x7f4ef434225c inode:
0x7f4ef430bd6cgfid:00000000-0
000-0000-0000-000000000000 [Invalid argument]
...
gl-master-04 : etc-glusterfs-glusterd.vol.log
[2017-06-23 16:35:18.872346] W [rpcsvc.c:270:rpcsvc_program_actor]
0-rpc-service: RPC program not available (req 1298437 330) for
10.0.1.203:65533
[2017-06-23 16:35:18.872421] E
[rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed
to complete successfully
gl-master-04 : glustershd.log
[2017-06-23 16:35:42.536840] E [MSGID: 108006]
[afr-common.c:4323:afr_notify] 0-mvol1-replicate-1: All subvolumes are
down. Going offline until atleast one of them comes back up.
[2017-06-23 16:35:51.702413] E [socket.c:2292:socket_connect_finish]
0-mvol1-client-3: connection to 10.0.1.156:49152 failed (Connection refused)
gl-master-03, brick1-movl1.log :
[2017-06-23 16:35:11.399769] E [MSGID: 113020]
[posix.c:2839:posix_create] 0-mvol1-posix: setting gfid on
/brick1/mvol1/.trashcan//2290/uploads/170221_Sendung_Lieberum_01_AT.mp4_2017-06-23_163511
failed
[2017-06-23 16:35:11.418559] E [posix.c:3188:_fill_writev_xdata]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.18/xlator/features/trash.so(trash_truncate_readv_cbk+0x1ab)
[0x7ff517087a0b] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.18/xlator/
storage/posix.so(posix_writev+0x1ff) [0x7ff5178c962f]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.18/xlator/storage/posix.so(_fill_writev_xdata+0x1c6)
[0x7ff5178c9406] ) 0-mvol1-posix: fd: 0x7ff4c814a43c inode:
0x7ff4c82e1b5cgfid:00000000-0
000-0000-0000-000000000000 [Invalid argument]
...
gl-master-03 : etc-glusterfs-glusterd.vol.log
[2017-06-23 16:35:19.879140] W [rpcsvc.c:270:rpcsvc_program_actor]
0-rpc-service: RPC program not available (req 1298437 330) for
10.0.1.203:65530
[2017-06-23 16:35:19.879201] E
[rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed
to complete successfully
[2017-06-23 16:35:19.879300] W [rpcsvc.c:270:rpcsvc_program_actor]
0-rpc-service: RPC program not available (req 1298437 330) for
10.0.1.203:65530
[2017-06-23 16:35:19.879314] E
[rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed
to complete successfully
[2017-06-23 16:35:19.879845] W [rpcsvc.c:270:rpcsvc_program_actor]
0-rpc-service: RPC program not available (req 1298437 330) for
10.0.1.203:65530
[2017-06-23 16:35:19.879859] E
[rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed
to complete successfully
[2017-06-23 16:35:42.538727] W [socket.c:596:__socket_rwv] 0-management:
readv on /var/run/gluster/5e23d9709b37ac7877720ac3986c48bc.socket failed
(No data available)
[2017-06-23 16:35:42.543486] I [MSGID: 106005]
[glusterd-handler.c:5037:__glusterd_brick_rpc_notify] 0-management:
Brick gl-master-03-int:/brick1/mvol1 has disconnected from glusterd.
gl-master-03 : glustershd.log
[2017-06-23 16:35:42.537752] E [MSGID: 108006]
[afr-common.c:4323:afr_notify] 0-mvol1-replicate-1: All subvolumes are
down. Going offline until atleast one of them comes back up.
[2017-06-23 16:35:52.011016] E [socket.c:2292:socket_connect_finish]
0-mvol1-client-3: connection to 10.0.1.156:49152 failed (Connection refused)
[2017-06-23 16:35:53.010620] E [socket.c:2292:socket_connect_finish]
0-mvol1-client-2: connection to 10.0.1.154:49152 failed (Connection refused)
about 73 minutes later the remaining replicated pair was affected by the
outage :
gl-master-02, brick1-mvol1.log :
[2017-06-23 17:48:30.093526] E [MSGID: 113018]
[posix.c:2766:posix_create] 0-mvol1-posix: pre-operation lstat on parent
/brick1/mvol1/.trashcan//2290/uploads failed [No such file or directory]
[2017-06-23 17:48:30.093591] E [MSGID: 113018]
[posix.c:1447:posix_mkdir] 0-mvol1-posix: pre-operation lstat on parent
/brick1/mvol1/.trashcan//2290 failed [No such file or directory]
[2017-06-23 17:48:30.093636] E [MSGID: 113027]
[posix.c:1538:posix_mkdir] 0-mvol1-posix: mkdir of /brick1/mvol1/ failed
[File exists]
[2017-06-23 17:48:30.093670] E [MSGID: 113027]
[posix.c:1538:posix_mkdir] 0-mvol1-posix: mkdir of
/brick1/mvol1/.trashcan failed [File exists]
[2017-06-23 17:48:30.093701] E [MSGID: 113027]
[posix.c:1538:posix_mkdir] 0-mvol1-posix: mkdir of
/brick1/mvol1/.trashcan/ failed [File exists]
[2017-06-23 17:48:30.113559] E [MSGID: 113001]
[posix.c:1562:posix_mkdir] 0-mvol1-posix: setting xattrs on
/brick1/mvol1/.trashcan//2290 failed [No such file or directory]
[2017-06-23 17:48:30.113630] E [MSGID: 113027]
[posix.c:1538:posix_mkdir] 0-mvol1-posix: mkdir of
/brick1/mvol1/.trashcan//2290 failed [File exists]
[2017-06-23 17:48:30.163155] E [MSGID: 113001]
[posix.c:1562:posix_mkdir] 0-mvol1-posix: setting xattrs on
/brick1/mvol1/.trashcan//2290/uploads failed [No such file or directory]
[2017-06-23 17:48:30.163282] E [MSGID: 113001]
[posix.c:2832:posix_create] 0-mvol1-posix: setting xattrs on
/brick1/mvol1/.trashcan//2290/uploads/170623_TVM_News.mp4_2017-06-23_174830
failed [No such file or directory]
[2017-06-23 17:48:30.165617] E [posix.c:3188:_fill_writev_xdata]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.18/xlator/features/trash.so(trash_truncate_readv_cbk+0x1ab)
[0x7f4ec77d9a0b] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.18/xlator/
storage/posix.so(posix_writev+0x1ff) [0x7f4ecc1c162f]
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.18/xlator/storage/posix.so(_fill_writev_xdata+0x1c6)
[0x7f4ecc1c1406] ) 0-mvol1-posix: fd: 0x7f4e70429b6c inode:
0x7f4e7041f9acgfid:00000000-0
000-0000-0000-000000000000 [Invalid argument]
the mentioned file in the brick-log was still available in the origin
directory but not in the corresponding trashcan directory :
[ 14:29:29 ] - root at gl-master-01 /var/log/glusterfs $ls -lh
/sdn/2290/uploads/170221_Sendung_Lieberum_01_AT*
-rw-r--r-- 1 2001 2001 386M Mar 31 13:00
/sdn/2290/uploads/170221_Sendung_Lieberum_01_AT.mp4
-rw-r--r-- 1 2001 2001 386M Jun 2 13:09
/sdn/2290/uploads/170221_Sendung_Lieberum_01_AT_AT.mp4
[ 15:08:53 ] - root at gl-master-01 /var/log/glusterfs $
[ 15:11:04 ] - root at gl-master-01 /var/log/glusterfs $ls -lh
/sdn/.trashcan/2290/uploads/170221_Sendung_Lieberum_01_AT*
[ 15:11:10 ] - root at gl-master-01 /var/log/glusterfs $
some further informations...the OS is ubuntu 16.04.2 lts, volume info
below :
[ 11:31:53 ] - root at gl-master-03 ~ $gluster volume info mvol1
Volume Name: mvol1
Type: Distributed-Replicate
Volume ID: 2f5de6e4-66de-40a7-9f24-4762aad3ca96
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gl-master-01-int:/brick1/mvol1
Brick2: gl-master-02-int:/brick1/mvol1
Brick3: gl-master-03-int:/brick1/mvol1
Brick4: gl-master-04-int:/brick1/mvol1
Options Reconfigured:
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
nfs.disable: off
diagnostics.client-log-level: ERROR
changelog.changelog: on
performance.cache-refresh-timeout: 32
cluster.min-free-disk: 200GB
network.ping-timeout: 5
performance.io-thread-count: 64
performance.cache-size: 8GB
performance.readdir-ahead: on
features.trash: off
features.trash-max-filesize: 1GB
[ 11:31:56 ] - root at gl-master-03 ~ $
Host : gl-master-01
-rw-r----- 1 root root 232M Jun 23 17:49
/var/crash/_usr_sbin_glusterfsd.0.crash
-----------------------------------------------------
Host : gl-master-02
-rw-r----- 1 root root 226M Jun 23 17:49
/var/crash/_usr_sbin_glusterfsd.0.crash
-----------------------------------------------------
Host : gl-master-03
-rw-r----- 1 root root 254M Jun 23 16:35
/var/crash/_usr_sbin_glusterfsd.0.crash
-----------------------------------------------------
Host : gl-master-04
-rw-r----- 1 root root 239M Jun 23 16:35
/var/crash/_usr_sbin_glusterfsd.0.crash
-----------------------------------------------------
--
Dietmar Putz
3Q GmbH
Wetzlarer Str. 86
D-14482 Potsdam
Telefax: +49 (0)331 / 2797 866 - 1
Telefon: +49 (0)331 / 2797 866 - 8
Mobile: +49 171 / 90 160 39
Mail: dietmar.putz at 3qsdn.com
More information about the Gluster-users
mailing list