[Bugs] [Bug 1564071] directories are invisible on client side

Mon Apr 30 15:37:26 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1564071

--- Comment #13 from g.amedick at uni-luebeck.de ---
We restarted the rebalance. It'll take a while though (est time 50 hrs). We'll
report the outcome

The bricks actually are virtual discs provided by a big storage system. The
storage reports no errors (including no loss of connectivity or harddrive
failure).
We didn't touch the brick process at all (actually, we weren't even present, it
was late in the evening). It recovered on its own.
Port 49159 on gluster02 belongs to brick DATA208. The port was open when we
came to work the next day. The brick was up and running. The glusterd-log
showed nothing about having lost a brick, just the failed rebalance:

[2018-04-24 18:59:19.256333] I [MSGID: 106172]
[glusterd-handshake.c:1014:__server_event_notify] 0-glusterd: received defrag
status updated
[2018-04-24 18:59:19.263291] W [socket.c:593:__socket_rwv] 0-management: readv
on /var/run/gluster/gluster-rebalance-0d210c70-e44f-46f1-862c-ef260514c9f1.sock
failed (No data available)
[2018-04-24 18:59:19.266258] I [MSGID: 106007]
[glusterd-rebalance.c:158:__glusterd_defrag_notify] 0-management: Rebalance
process for volume $vol1 has disconnected.

That's the complete log of that day.

For some reason, DATA208 tried to connect to port 49057:
[2018-04-24 18:56:02.744587] W [socket.c:593:__socket_rwv] 0-tcp.$vol1-server:
writev on $IP_gluster02:49057 failed (Broken pipe)

We are unsure why. There's nothing listening:

$ netstat -tulpen | grep 49057

$ netstat -tulpen | grep gluster
tcp        0      0 0.0.0.0:49152           0.0.0.0:*               LISTEN     
0          24130      4064/glusterfsd     
tcp        0      0 0.0.0.0:49153           0.0.0.0:*               LISTEN     
0          18881      4072/glusterfsd     
tcp        0      0 0.0.0.0:49154           0.0.0.0:*               LISTEN     
0          19775      4080/glusterfsd     
tcp        0      0 0.0.0.0:49155           0.0.0.0:*               LISTEN     
0          26969      4090/glusterfsd     
tcp        0      0 0.0.0.0:49156           0.0.0.0:*               LISTEN     
0          45238      4098/glusterfsd     
tcp        0      0 0.0.0.0:49157           0.0.0.0:*               LISTEN     
0          46649      4107/glusterfsd     
tcp        0      0 0.0.0.0:49158           0.0.0.0:*               LISTEN     
0          1440       4116/glusterfsd     
tcp        0      0 0.0.0.0:49159           0.0.0.0:*               LISTEN     
0          18417      4125/glusterfsd     
tcp        0      0 0.0.0.0:24007           0.0.0.0:*               LISTEN     
0          15592      3873/glusterd       
tcp        0      0 0.0.0.0:49160           0.0.0.0:*               LISTEN     
0          19785      4134/glusterfsd     
tcp        0      0 0.0.0.0:49161           0.0.0.0:*               LISTEN     
0          36104      4143/glusterfsd     
tcp        0      0 0.0.0.0:49162           0.0.0.0:*               LISTEN     
0          72783      4152/glusterfsd     
tcp        0      0 0.0.0.0:49163           0.0.0.0:*               LISTEN     
0          38236      4161/glusterfsd    

We don't know why the rebalance failed. It's the first time that something like
this happened. And we don't understand the brick log.

We need to discuss uploading the pcap-file with our supervisor, since it
contains our IP's. Is there a way to give it to you without making it public? 

There's something else that happened today:

A user reported she wanted to create a smylink with an absolute path to some
file. There was no error message (in fact, the mount log reported Success), but
the symlink lead to nowhere. The volume usually is mounted as /data, on all
compute nodes with the /data-mount, creating a symling to this file didn't
work. The new mount I created at /mnt however could do the symlink. The
Systemd-mount-unit literally is copied except for the mount point. A server
with both mount points (/data and /mnt) could do the smylink on the /mnt- mount
point but not at /data. Relative paths however work fine. It looks like this:

$ ls -lah
lrwxrwxrwx  1 root    itsc_test_proj2  120 Apr 30 15:25 test1.gz ->
/mnt/$PATH/$file.gz
lrwxrwxrwx  1 root    itsc_test_proj2  121 Apr 30 15:47 test2.gz -> 
lrwxrwxrwx  1 root    itsc_test_proj2  120 Apr 30 15:48 test3.gz ->
/mnt/$PATH/$file.gz
lrwxrwxrwx  1 root    itsc_test_proj2  118 Apr 30 16:05 test4.gz ->
../$PATH/$file.gz
lrwxrwxrwx  1 root    itsc_test_proj2  119 Apr 30 16:06 test5.gz ->
lrwxrwxrwx  1 root    itsc_test_proj2  121 Apr 30 16:08 test6.gz -> 
lrwxrwxrwx  1 root    itsc_test_proj2  121 Apr 30 16:08 test7.gz -> 
lrwxrwxrwx  1 root    itsc_test_proj2  120 Apr 30 15:48 test8.gz ->
/mnt/$PATH/$file.gz

Creation of the symlinks:
test1.gz & test3.gz via "cd /mnt; ln -s /mnt/$PATH/$file.gz test$x.gz"
test2.gz, test5.gz & test6.gz via "cd /data; ln -s /data/$PATH/$file.gz
test$x.gz"
test4.gz via "cd /data; ln -s ../$PATH/$file.gz test$x.gz"
test7.gz via "cd /mnt; ln -s /data/$PATH/$file.gz test$x.gz"
test8.gz via "cd /data; ln -s /mnt/$PATH/$file.gz test$x.gz"

This was reproducible.

We know that the /mnt-mount point is not completely fine either, since the
hidden files we used to create the logs were hidden there, too. Still, the
mounts behave different. Symlinks with an absolute path pointing on /data
aren't created correctly. Following the strange symlinks with zcat produces an
error:

$ zcat test7.gz | head
gzip: test7.gz is a directory -- ignored

All links, including the one with a relative link pointing to /data, can be
used as usual.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=Jg4HQK6rBd&a=cc_unsubscribe