[Gluster-users] PLEASE READ ! We need your opinion. GSOC-2014 and the Gluster community

Thu Mar 13 13:12:30 UTC 2014

has the 32 group limit been fixed yet? If not how about that :) ?
https://bugzilla.redhat.com/show_bug.cgi?id=789961

On Thu, Mar 13, 2014 at 8:01 AM, Jeff Darcy <jdarcy at redhat.com> wrote:

> > I am a little bit impressed by the lack of action on this topic. I hate
> to be
> > "that guy", specially being new here, but it has to be done.
> > If I've got this right, we have here a chance of developing Gluster even
> > further, sponsored by Google, with a dedicated programmer for the summer.
> > In other words, if we play our cards right, we can get a free programmer
> and
> > at least a good start/advance on this fantastic.
>
> Welcome, Carlos.  I think it's great that you're taking initiative here.
> However, it's also important to set proper expectations for what a GSoC
> intern
> could reasonably be expected to achieve.  I've seen some amazing stuff out
> of
> GSoC, but if we set the bar too high then we end up with incomplete code
> and
> the student doesn't learn much except frustration.
>
> GlusterFS consists of 430K lines of code in the core project alone.  Most
> of
> it's written in a style that is generally hard for newcomers to pick up -
> both callback-oriented and highly concurrent, often using our own "unique"
> interpretation of standard concepts.  It's also in an area (storage) that
> is
> not well taught in most universities.  Given those facts and the short
> duration of GSoC, it's important to focus on projects that don't require
> deep
> knowledge of existing code, to keep the learning curve short and productive
> time correspondingly high.  With that in mind, let's look at some of your
> suggestions.
>
> > I think it would be nice to listen to the COMMUNITY (yes, that means
> YOU),
> > for either suggestions, or at least a vote.
>
> It certainly would have been nice to have you at the community IRC meeting
> yesterday, at which we discussed release content for 3.6 based on the
> feature proposals here:
>
>    http://www.gluster.org/community/documentation/index.php/Planning36
>
> The results are here:
>
>    http://titanpad.com/glusterfs-3-6-planning
>
> > My opinion, being also my vote, in order of PERSONAL preference:
> > 1) There is a project going on ( https://forge.gluster.org/disperse ),
> that
> > consists on re-writing the stripe module on gluster. This is specially
> > important because it has a HUGE impact on Total Cost of Implementation
> > (customer side), Total Cost of Ownership, and also matching what the
> > competition has to offer. Among other things, it would allow gluster to
> > implement a RAIDZ/RAID5 type of fault tolerance, much more efficient, and
> > would, as far as I understand, allow you to use 3 nodes as a minimum
> > stripe+replication. This means 25% less money in computer hardware, with
> > increased data safety/resilience.
>
> This was decided as a core feature for 3.6.  I'll let Xavier (the feature
> owner) answer w.r.t. whether there's any part of it that would be
> appropriate for GSoC.
>
> > 2) We have a recurring issue with split-brain solution. There is an
> entry on
> > trello asking/suggesting a mechanism that arbitrates this resolution
> > automatically. I pretty much think this could come together with another
> > solution that is file replication consistency check.
>
> This is also core for 3.6 under the name "policy based split brain
> resolution":
>
>
> http://www.gluster.org/community/documentation/index.php/Features/pbspbr
>
> Implementing this feature requires significant knowledge of AFR, which both
> causes split brain and would be involved in its repair.  Because it's also
> one of our most complicated components, and the person who just rewrote it
> won't be around to offer help, I don't think this project *as a whole*
> would be a good fit for GSoC.  On the other hand, there might be specific
> pieces of the policy implementation (not execution) that would be a good
> fit.
>
> > 3) Accelerator node project. Some storage solutions out there offer an
> > "accelerator node", which is, in short, a, extra node with a lot of RAM,
> > eventually fast disks (SSD), and that works like a proxy to the regular
> > volumes. active chunks of files are moved there, logs (ZIL style) are
> > recorded on fast media, among other things. There is NO active project
> for
> > this, or trello entry, because it is something I started discussing with
> a
> > few fellows just a couple of days ago. I thought of starting to play with
> > RAM disks (tmpfs) as scratch disks, but, since we have an opportunity to
> do
> > something more efficient, or at the very least start it, why not ?
>
> Looks like somebody has read the Isilon marketing materials.  ;)
>
> A full production-level implementation of this, with cache consistency and
> so on, would be a major project.  However, a non-consistent prototype good
> for specific use cases - especially Hadoop, as Jay mentions - would be
> pretty easy to build.  Having a GlusterFS server (for the real clients)
> also be a GlusterFS client (to the real cluster) is pretty straightforward.
> Testing performance would also be a significant component of this, and IMO
> that's something more developers should learn about early in their careers.
> I encourage you to keep thinking about how this could be turned into a real
> GSoC proposal.
>
>
> Keep the ideas coming!
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140313/3cf79697/attachment.html>