[Gluster-users] I'm new to Gluster, and have some questions
hsanson at gmail.com
Fri Oct 22 00:55:11 UTC 2010
I am just starting playing with Gluster but I think I can give you some
answers from my experience.
On Thursday 21 October 2010 17:09:32 Rudi Ahlers wrote:
> Hi all,
> I'm considering setting up Gluster, and have a few questions if you don't
> 1. Which option is better? I already have a few CentOS 5.5. server
> setup. Would it be better to just install GlusterFS, or to install
> Gluster Storage Platform from scratch? How / where can I see a full
> comparison between the 2? Are there any performance / management
> benefits in choosing the one of the other?
The Gluster Storage Platform requires GlusterFS. The platform is a complete OS
(linux Fedora) + GlusterFS + Web Management in a single package that can be
installed via USB in a few minutes. It is supposed to simplify installation,
setup and management of GlusterFS clusters but.... I could not get it to work
I was unable to add new servers. Everytime I pressed the add new server button
I got an error saying "Could not retrive installer ip address". And since the
platform is relative new there is near zero documentation/issue reports about
it. Also adding the servers/volumes via command line never reflected to the
web based GUI
So I installed Ubuntu 10.10 LTS and GlusterFS 3.1 via source code and handling
the server/volumes etc via the new command line is a breeze.
> 2. I need reliability and speed. From what I understand, I could setup
> 2 servers to work similar to software RAID1 (mirroring). Is it also
> correct to assume that I could use 4 servers in a RAID10 / 1+0 type
> setup? But then obviously serverA & serverB will be mirrored, and
> serverC & serverD together? What happens to the data? Does it get
> filled randomly between the 2 sets of servers, or does it get put onto
> serverA & B first, till it's full then move over to C & D?
I only have two servers for testing. What you setup are volumes and each
volume can be configured depending on your needs. This is what I understand so
Distributed volume: Aggregates the storage of several directories (bricks in
gluster terms) among several computers. The benefit is that you can
grow/shrink the volume as you please. The bad part is that this offers no
performance/reliability guarantees as files are stored randomly among the
disks in the volume.
Replicated volume: Requires minimum 2 bricks in separate servers. All files are
replicated among the bricks. How many replicas can be configured at volume
creation. Has all the benefits of a Distributed volume plus fail resilience.
Stripe volume: Requires minimum 2 bricks in separate servers. All files are
splitted in stripes and these stripes are distributed among the bricks of the
volume. How many stripes and which size is configured on volume creation. Has
all the benefits of Replicated volume plus reliability and can improve read
performance for large files as the read is distributed among several machines.
> 3. Has anyone noticed any considerable differences in using 1x 1GB NIC
> & 2x 1GB NIC's bonded together? Or should I rather use a Quad port NIC
> if / where possible?
> 4. How do clients (i.e. users) connect if I want to give them normal
> FTP / SMB / NFS access? Or do I need to mount the exported Gluster to
> another Linux server first which runs these services already?
Gluster 3.1 has a native NFS v3 implementation so you can mount any Gluster
volume as a normal NFS mount. For SMB you need to configure samba to share the
volume and you can easily access the files on any of the bricks via SCP or FTP
if you have an SSH or FTP server configured. For linux the recommended way is
to use the glusterfs module to mount as a gluster file system.
> 5. If there's 10 Gluster servers, for example, with a lot of data
> spread out across them. How do the clients connect, exactly? I.e. do
> they all connect to a central server which then just "fetches and
> delivers" the content to the clients, or do the client's connect
> directly to the specific server where their content is? i.e. is the
> network traffic split evenly across the servers, according to where
> the data is stored?
This is also something I would like to know. When connecting clients I use the
mount -t [nfs|glusterfs] <ip-address>:<volume-name> /mount/point
where ip-address is the IP of any of the servers that have the volume
configured. It is not clear to me how the reliability part works here. If I
disconnect the server with that ip-address I loose access to the files. True
that the files are still accessible via other servers but I need to manually
set the mount to point to another server which is not exactly high-
> tia :)
More information about the Gluster-users