[Gluster-users] files disappearing and re-appearing
Riccardo Murri
riccardo.murri at uzh.ch
Thu Dec 22 15:49:21 UTC 2016
Dear Mohammed Rafi,
thanks for getting back to me!
> If you have the problem still bugging you, or if you have any previous
> logs that you can share with me, that will help to analyze further.
I have collected the logs from the server and one client; it's a 21MB
archive, how can I provide them? (I'm not sure how complete the
collection is: unfortunately some time has already passed, so client
nodes have been terminated and their logs have been lost. Also the
issues were happening at the beginning of November, so some logs have
simply been rotated out of existence now.)
My reply to your questions is inline below.
> On 11/17/2016 07:22 PM, Riccardo Murri wrote:
> > Hello,
> >
> > we are trying out GlusterFS as the working filesystem for a compute cluster;
> > the cluster is comprised of 57 compute nodes (55 cores each), acting as
> > GlusterFS clients, and 25 data server nodes (8 cores each), serving
> > 1 large GlusterFS brick each.
> >
> > We currently have noticed a couple of issues:
> >
> > 1) When compute jobs run, the `glusterfs` client process on the compute nodes
> > goes up to 100% CPU, and filesystem operations start to slow down a lot.
> > Since there are many CPUs available, is it possible to make it use, e.g.,
> > 4 CPUs instead of one to make it more responsive?
>
> Can you just briefly describe about your computing job, workloads to see
> what are the operation happening on the cluster.
We built a cluster with 47 compute nodes, each with 56 cores. The
compute nodes were acting as GlusterFS clients (FUSE) to 25 GlusterFS
servers, each with 8 cores and 32 GB of RAM. Each server was serving a
single 10TB brick (ext4 formatted), for a grand total of 250TB.
The compute nodes were running the "rockstar" [1] program, one job per
node so about 45 jobs concurrently running[2], driven by a shell script
that was performing a number of file existence probes while the main
program was running, e.g. (PERL)::
sleep 1 while (!(-e "auto-rockstar.cfg")); #wait for server to start
Users of the cluster reported that many jobs failed or stalled because
these existence tests were never succeeding, or files would disappear
after having been created.
[1]: https://bitbucket.org/gfcstanford/rockstar
[2]: although one job could span many processes
> > 2) In addition (but possibly related to 1) we have an issue with files
> > disappearing and re-appearing: from a compute process we test for the existence
> > of a file and e.g. `test -e /glusterfs/file.txt` fails. Then we test from
> > a different process or shell and the file is there. As far as I can see,
> > the servers are basically idle, and none of the peers is disconnected.
> >
> > We are running GlusterFS 3.7.17 on Ubuntu 16.04, installed from the Launchpad PPA.
> > (Details below for the interested.)
> >
> > Can you give any hint about what's going on?
> is there any rebalance happening, tell me more about any on going
> operations (internal operations like rebalance, shd,etc or client
> operations).
If any rebalance happened, it was triggered automatically by the system.
It might be relevant that at some point the free space dropped to 0 (too
much output from the jobs), this might have thrown off some internal
healing operation.
Basically, the sequence of operations was like this:
- create cluster
- fill the `/glusterfs` with input data: ~200TB copied with `rsync`, no problems
- start 1000 "rockstar" jobs, issues begin as jobs stall and never complete
- reboot all GlusterFS servers and unmount/remount filesystem on clients, attempting to cure the problem
- reduce amount of compute nodes to 10 (=560 cores), job failure rate decreases to an acceptable rate
I could only get limited reports/data points from the users: they were
in a hurry to process the data because of a deadline and did not want to
sit down to debug the issue to the roots.
I am still quite interested in sorting this problem out, as the same
issue might resurface if we need to build a large cluster again.
> Also some insight about your volume configuration will also help. volume
> info and volume status.
Here it is::
ubuntu at data001:~$ sudo gluster volume info
Volume Name: glusterfs
Type: Distribute
Volume ID: fdca65bd-313c-47fa-8a09-222f794951ed
Status: Started
Number of Bricks: 25
Transport-type: tcp
Bricks:
Brick1: data001:/srv/glusterfs
Brick2: data002:/srv/glusterfs
Brick3: data003:/srv/glusterfs
Brick4: data004:/srv/glusterfs
Brick5: data005:/srv/glusterfs
Brick6: data006:/srv/glusterfs
Brick7: data007:/srv/glusterfs
Brick8: data008:/srv/glusterfs
Brick9: data009:/srv/glusterfs
Brick10: data010:/srv/glusterfs
Brick11: data011:/srv/glusterfs
Brick12: data012:/srv/glusterfs
Brick13: data013:/srv/glusterfs
Brick14: data014:/srv/glusterfs
Brick15: data015:/srv/glusterfs
Brick16: data016:/srv/glusterfs
Brick17: data017:/srv/glusterfs
Brick18: data018:/srv/glusterfs
Brick19: data019:/srv/glusterfs
Brick20: data020:/srv/glusterfs
Brick21: data021:/srv/glusterfs
Brick22: data022:/srv/glusterfs
Brick23: data023:/srv/glusterfs
Brick24: data024:/srv/glusterfs
Brick25: data025:/srv/glusterfs
Options Reconfigured:
performance.readdir-ahead: on
ubuntu at data001:~$ sudo gluster volume status
Status of volume: glusterfs
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick data001:/srv/glusterfs 49153 0 Y 1462
Brick data002:/srv/glusterfs 49153 0 Y 1459
Brick data003:/srv/glusterfs 49153 0 Y 1463
Brick data004:/srv/glusterfs 49153 0 Y 1460
Brick data005:/srv/glusterfs 49153 0 Y 1459
Brick data006:/srv/glusterfs 49153 0 Y 1748
Brick data007:/srv/glusterfs 49153 0 Y 1457
Brick data008:/srv/glusterfs 49153 0 Y 1498
Brick data009:/srv/glusterfs 49153 0 Y 1469
Brick data010:/srv/glusterfs 49153 0 Y 1489
Brick data011:/srv/glusterfs 49153 0 Y 1470
Brick data012:/srv/glusterfs 49153 0 Y 1458
Brick data013:/srv/glusterfs 49153 0 Y 1475
Brick data014:/srv/glusterfs 49153 0 Y 1464
Brick data015:/srv/glusterfs 49153 0 Y 1459
Brick data016:/srv/glusterfs 49153 0 Y 1465
Brick data017:/srv/glusterfs 49153 0 Y 1466
Brick data018:/srv/glusterfs 49153 0 Y 1467
Brick data019:/srv/glusterfs 49153 0 Y 1464
Brick data020:/srv/glusterfs 49153 0 Y 1460
Brick data021:/srv/glusterfs 49153 0 Y 1556
Brick data022:/srv/glusterfs 49153 0 Y 1458
Brick data023:/srv/glusterfs 49153 0 Y 1472
Brick data024:/srv/glusterfs 49153 0 Y 1767
Brick data025:/srv/glusterfs 49153 0 Y 1470
NFS Server on localhost 2049 0 Y 17383
NFS Server on data011 2049 0 Y 14638
NFS Server on data022 2049 0 Y 12485
NFS Server on data004 2049 0 Y 15197
NFS Server on data007 2049 0 Y 15006
NFS Server on data021 2049 0 Y 13631
NFS Server on data019 2049 0 Y 14421
NFS Server on data008 2049 0 Y 13506
NFS Server on data013 2049 0 Y 15965
NFS Server on data014 2049 0 Y 13231
NFS Server on data005 2049 0 Y 13370
NFS Server on data017 2049 0 Y 15316
NFS Server on data003 2049 0 Y 15359
NFS Server on data002 2049 0 Y 12681
NFS Server on data024 2049 0 Y 14263
NFS Server on data025 2049 0 Y 12560
NFS Server on data016 2049 0 Y 14761
NFS Server on data023 2049 0 Y 13165
NFS Server on data020 2049 0 Y 12769
NFS Server on data018 2049 0 Y 13789
NFS Server on data006 2049 0 Y 13429
NFS Server on data015 2049 0 Y 13423
NFS Server on data009 2049 0 Y 15343
NFS Server on data010 2049 0 Y 13189
NFS Server on data012 2049 0 Y 12690
Task Status of Volume glusterfs
------------------------------------------------------------------------------
There are no active volume tasks
We build ephemeral clusters of VMs on an OpenStack infrastructure, that
are destroyed once the batch of computations is done.
The GlusterFS server configuration is done by Ansible::
https://github.com/gc3-uzh-ch/elasticluster/blob/master/elasticluster/share/playbooks/roles/glusterfs-server/tasks/export.yml
This is the `/etc/glusterfs/glusterd.vol` generated as result::
volume management
type mgmt/glusterd
option working-directory /var/lib/glusterd
option transport-type socket,rdma
option transport.socket.keepalive-time 10
option transport.socket.keepalive-interval 2
option transport.socket.read-fail-log off
option ping-timeout 0
option event-threads 1
# option base-port 49152
end-volume
The GlusterFS clients simply do `mount -t glusterfs data001:/srv/glusterfs /glusterfs`
Thanks for your help!
Riccardo
--
Riccardo Murri
http://www.s3it.uzh.ch/about/team/#Riccardo.Murri
S3IT: Services and Support for Science IT
University of Zurich
Winterthurerstrasse 190, CH-8057 Zürich (Switzerland)
Tel: +41 44 635 4208
Fax: +41 44 635 6888
More information about the Gluster-users
mailing list