[Bugs] [Bug 1185950] New: adding replication to a distributed volume makes the volume unavailable

bugzilla at redhat.com bugzilla at redhat.com
Mon Jan 26 16:36:47 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1185950

            Bug ID: 1185950
           Summary: adding replication to a distributed volume makes the
                    volume unavailable
           Product: GlusterFS
           Version: 3.6.1
         Component: replicate
          Severity: urgent
          Assignee: bugs at gluster.org
          Reporter: pille+redhat+bugzilla at struction.de
                CC: bugs at gluster.org, gluster-bugs at redhat.com



Description of problem:
adding redundancy to a distributed volume blows up mountpoint (fuse).
note: i've started with a distributed volume and never tried whether this works
for a single volume)
additionally until that point ingesting files into the mountpoint significantly
reduces speed (from XX MB/s to YY KB/s). looking at the resources of the
machines involved, doesn't show any bottleneck (so this is not simply healing
bandwidth/cpu)

How reproducible:
not 100% deterministic, but it will fail eventually. just access some files.
(find helps).

Steps to Reproduce:
1. create distributed volume STORAGE with two servers each having one brick
2. add some files to it (everything is smooth)
3. 'add-brick STORAGE replica 2 ...' with two additional servers each having
one brick

Actual results:
mountpoint becomes slow and blows up eventually with: 'Transport endpoint is
not connected (107)'

Expected results:
ingest continues uninterrupted (mountpoint stays alive).
slight slowdown would be ok, b/c of added replication (more bandwidth).

Additional info:
please specify which logfiles would be helpful.
i had this issue on 2/2 setups so i bet you could replicate it.
the seems to be no significant healing traffic between the nodes (Z MBit/s) and
no high load.
a 'gluster volume heal STORAGE info' timeouts after 10min with return code 146.
when i mount this volume again, it's missing complete directories. those are
still on the brick fs.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list