[Gluster-users] ESXI cannot access stripped gluster volume

Sat Mar 1 16:35:58 UTC 2014

ESXI cannot access stripped gluster volume

Hello all.

Unfortunately this is going to be a long post, so I will not spend too many
words with compliments; Gluster is a great solution and I should be writing
odes about it, so, bravo all of you.

A bit about me: I've been working with FreeBSD and Linux for over a decade
now. I use CentOS nowadays because of some convenient features, regarding
my applications.

Now a bit more about my problem: After adding my stripped gluster volume to
my ESXI via NFS, I try to browse it with vSphere's browser,  and it...

a) cannot see any of the folders already there. The operation never times
out, and I have a line of dots telling me the system (esxi) is trying to do
something. Other systems CAN access and see content on the same volume.

b) gluster or ESXI return/log NO ERROR when I (try to) create folders
there, but does not show them either. Folder IS created.

c) When trying to create a virtual machine it DOES return an error. Folder
and first file related to the VM ARE created but I get an error about the
an "Invalid virtual machine configuration". I am under the impression esxi
returns this when it tries to create the file for the virtual disk.

d) When trying to remove the volume, it DOES return an error, stating the
resource is busy. I am forced to reboot the ESXI host in order to
successfully delete the nfs volume.

Now, a bit about my environment:

I am one of those cursed with an Isilon, but with NO service contract and
NO license. So, basically I have a big, fast and resilient NAS. Cool stuff,
but with great inconveniences as well. Goes without saying that me, as a
free software guy, would love to build something that can retire the
isilon, or at least, move it to a secondary role.

Anyway, trying to add an alternative to it, I searched for days and decided
Gluster was the way to go. And I am not going back. I will make it work.

All of my VM servers, (about 15) are spread across 3 metal boxes, and -
please, don't blame me, I inherited this situation - there is no backup
solution whatsoever. Gluster will, in its final configuration, run on 4
boxes, providing HA and backup.

So, on esxi,  Isilon's NFS volume/share works like a charm; pure NFS
sharing on CentOS works like a charm; gluster stripe - using two of my four
servers -  does not like me.

The very same gluster NSF volume is mounted and works happily on a ContOS
client. Actually, more than one.

I have been reading literally dozens of docs, guides, manuals, VMware,
Gluster and Red Hat for more than a week, and while that, I've even created
ESXI VIRTUAL SERVERS *INSIDE* my EXSI physical servers, because I can no
longer afford rebooting a production server whenever I need to test yet
another change on gluster.

My software versions:

CentOS 6.5
Gluster 3.4.2
ESXI 5.1 all patches applied
ESXI 5.5

My hardware for the nodes: 2 x Dell PE2950 with RAID5, bricks are single
volume of about 1.5 TB on each node.

One stand-alone PE2900 with a single volume, on RAID5, and about 2.4 TB,
which will be added to the stripe, eventually. One PE2950, brick on RAID5
800GB, which will also be added eventually.

All of them with one NIC for regular networking and a bonded nic made out
of 2 physical NICs, for gluster/nfs.

ESXIs are running on R710s, lots of ram, and, at least one NIC dedicated to
NFS. I have one test server running with all four NICs on the NFS netowrk.

NFS network is 9000 MTU, tuned for iSCSI (in the future).

Now, trying to make it all work, these are the steps I took:

Regarding ESXI tweaks, I've changed GLUSTER'S ping limit, lowering it to
20, to stop the volume from being intermittently inaccessible. On ESXI
itself I've set the NFS max queue length to 64.

I've chmoded gluster's share with 777, and you can find my gluster tweaks
for the volume below.

Gluster's NFS and regular NFS both are forcing uid and gid to
nfsnobody:nfsnobody.

iptables has been disabled, along with SElinux.

Of course regular NFS is disabled.

My gluster settings:

Volume Name: glvol0
Type: Stripe
Volume ID: f76af2ac-6a42-42ea-9887-941bf1600ced
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.0.1.21:/export/glusterroot
Brick2: 10.0.1.22:/export/glusterroot
Options Reconfigured:
nfs.ports-insecure: on
nfs.addr-namelookup: off
auth.reject: NONE
nfs.volume-access: read-write
nfs.nlm: off
network.ping-timeout: 20
server.root-squash: on
performance.nfs.write-behind: on
performance.nfs.read-ahead: on
performance.nfs.io-cache: on
performance.nfs.quick-read: on
performance.nfs.stat-prefetch: on
performance.nfs.io-threads: on
storage.owner-uid: 65534
storage.owner-gid: 65534
nfs.disable: off

My regular NFS settings, that work just the way I need:

/export/share *(rw,all_squash,anonuid=65534,anongid=65534,no_subtree_check)

Once I get this all to work, I intend to create a nice page with
instructions/info on gluster for ESXi. This field needs some more work out
there.

Now the question: is there anything I forgot, overlooked and don't know ?

Could you help me ? Except for a comment from someone, saying "stripe does
not work with esxi", nothing REALLY rings a bell. I've used all the
pertinent info I had, and ran out of moves.

My only option now would be testing a volume that is not a stripe. I'll do
that for now, but I don't think it will work.

BTW, can I be added to the list ?

Cheers,

Carlos.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140301/5138b54e/attachment.html>