[Gluster-users] GlusterFS healing questions

Thu Nov 16 07:18:17 UTC 2017

Hi,

On Thu, Nov 9, 2017 at 7:47 PM, <ingard at jotta.no> wrote:

> Someone on the #gluster-users irc channel said the following :
> "Decreasing features.locks-revocation-max-blocked to an absurdly low
> number is letting our distributed-disperse set heal again."
>
> Is this something to concider? Does anyone else have experience with
> tweaking this to speed up healing?
>

What that option does is to release the currently granted lock for a file
when a new lock request comes and there are more than
features-revocation-max-blocked locks already pending. The real effect of
this is that all pending requests can proceed, but the client that was
using the old granted lock will continue working without knowing that it
doesn't have the lock, meaning that everything it does can cause
corruption. This option was created to avoid that a single bad client could
block the entire cluster, but if you set this option to a really small
value compared to your workload, it will probably cause unwanted effects
even or perfectly healthy and well-behaving clients.

This option shouldn't be used unless you are very sure of what your users
are doing with the volume and how, and you know the implications of this
option. Otherwise this is a good candidate to have data corruption.

Anyway, if setting this option improves speed, it means that there's a
heavy lock usage. It should be determined if that usage is normal or not.
For example, disperse needs to take locks for reads and writes. If a file
is being simultaneously accessed by multiple clients, the lock usage will
be high because of contention between clients, but normal. Forcing the
release of some locks while another client is trying to write (for example
self-heal), will probably cause read errors to other clients.

If a real problem is detected, it's better to file a bug with as much
information as you can give to try to resolve the problem (if there's a
real problem) or to try to improve performance (if everything works fine
but slow).

Xavi

> Sent from my iPhone
>
> > On 9 Nov 2017, at 18:00, Serkan Çoban <cobanserkan at gmail.com> wrote:
> >
> > Hi,
> >
> > You can set disperse.shd-max-threads to 2 or 4 in order to make heal
> > faster. This makes my heal times 2-3x faster.
> > Also you can play with disperse.self-heal-window-size to read more
> > bytes at one time, but i did not test it.
> >
> >> On Thu, Nov 9, 2017 at 4:47 PM, Xavi Hernandez <jahernan at redhat.com>
> wrote:
> >> Hi Rolf,
> >>
> >> answers follow inline...
> >>
> >>> On Thu, Nov 9, 2017 at 3:20 PM, Rolf Larsen <rolf at jotta.no> wrote:
> >>>
> >>> Hi,
> >>>
> >>> We ran a test on GlusterFS 3.12.1 with erasurecoded volumes 8+2 with 10
> >>> bricks (default config,tested with 100gb, 200gb, 400gb
> bricksizes,10gbit
> >>> nics)
> >>>
> >>> 1.
> >>> Tests show that healing takes about double the time on healing 200gb vs
> >>> 100, and abit under the double on 400gb vs 200gb bricksizes. Is this
> >>> expected behaviour? In light of this would make 6,4 tb bricksizes use
> ~ 377
> >>> hours to heal.
> >>>
> >>> 100gb brick heal: 18 hours (8+2)
> >>> 200gb brick heal: 37 hours (8+2) +205%
> >>> 400gb brick heal: 59 hours (8+2) +159%
> >>>
> >>> Each 100gb is filled with 80000 x 10mb files (200gb is 2x and 400gb is
> 4x)
> >>
> >>
> >> If I understand it correctly, you are storing 80.000 files of 10 MB each
> >> when you are using 100GB bricks, but you double this value for 200GB
> bricks
> >> (160.000 files of 10MB each). And for 400GB bricks you create 320.000
> files.
> >> Have I understood it correctly ?
> >>
> >> If this is true, it's normal that twice the space requires approximately
> >> twice the heal time. The healing time depends on the contents of the
> brick,
> >> not brick size. The same amount of files should take the same healing
> time,
> >> whatever the brick size is.
> >>
> >>>
> >>>
> >>> 2.
> >>> Are there any possibility to show the progress of a heal? As per now we
> >>> run gluster volume heal volume info, but this exit's when a brick is
> done
> >>> healing and when we run heal info again the command contiunes showing
> gfid's
> >>> until the brick is done again. This gives quite a bad picture of the
> status
> >>> of a heal.
> >>
> >>
> >> The output of 'gluster volume heal <volname> info' shows the list of
> files
> >> pending to be healed on each brick. The heal is complete when the list
> is
> >> empty. A faster alternative if you don't want to see the whole list of
> files
> >> is to use 'gluster volume heal <volname> statistics heal-count'. This
> will
> >> only show the number of pending files on each brick.
> >>
> >> I don't know any other way to track progress of self-heal.
> >>
> >>>
> >>>
> >>> 3.
> >>> What kind of config tweaks is recommended for these kind of EC volumes?
> >>
> >>
> >> I usually use the following values (specific only for ec):
> >>
> >> client.event-threads 4
> >> server.event-threads 4
> >> performance.client-io-threads on
> >>
> >> Regards,
> >>
> >> Xavi
> >>
> >>
> >>
> >>>
> >>>
> >>>
> >>> $ gluster volume info
> >>> Volume Name: test-ec-100g
> >>> Type: Disperse
> >>> Volume ID: 0254281d-2f6e-4ac4-a773-2b8e0eb8ab27
> >>> Status: Started
> >>> Snapshot Count: 0
> >>> Number of Bricks: 1 x (8 + 2) = 10
> >>> Transport-type: tcp
> >>> Bricks:
> >>> Brick1: dn-304:/mnt/test-ec-100/brick
> >>> Brick2: dn-305:/mnt/test-ec-100/brick
> >>> Brick3: dn-306:/mnt/test-ec-100/brick
> >>> Brick4: dn-307:/mnt/test-ec-100/brick
> >>> Brick5: dn-308:/mnt/test-ec-100/brick
> >>> Brick6: dn-309:/mnt/test-ec-100/brick
> >>> Brick7: dn-310:/mnt/test-ec-100/brick
> >>> Brick8: dn-311:/mnt/test-ec-2/brick
> >>> Brick9: dn-312:/mnt/test-ec-100/brick
> >>> Brick10: dn-313:/mnt/test-ec-100/brick
> >>> Options Reconfigured:
> >>> nfs.disable: on
> >>> transport.address-family: inet
> >>>
> >>> Volume Name: test-ec-200
> >>> Type: Disperse
> >>> Volume ID: 2ce23e32-7086-49c5-bf0c-7612fd7b3d5d
> >>> Status: Started
> >>> Snapshot Count: 0
> >>> Number of Bricks: 1 x (8 + 2) = 10
> >>> Transport-type: tcp
> >>> Bricks:
> >>> Brick1: dn-304:/mnt/test-ec-200/brick
> >>> Brick2: dn-305:/mnt/test-ec-200/brick
> >>> Brick3: dn-306:/mnt/test-ec-200/brick
> >>> Brick4: dn-307:/mnt/test-ec-200/brick
> >>> Brick5: dn-308:/mnt/test-ec-200/brick
> >>> Brick6: dn-309:/mnt/test-ec-200/brick
> >>> Brick7: dn-310:/mnt/test-ec-200/brick
> >>> Brick8: dn-311:/mnt/test-ec-200_2/brick
> >>> Brick9: dn-312:/mnt/test-ec-200/brick
> >>> Brick10: dn-313:/mnt/test-ec-200/brick
> >>> Options Reconfigured:
> >>> nfs.disable: on
> >>> transport.address-family: inet
> >>>
> >>> Volume Name: test-ec-400
> >>> Type: Disperse
> >>> Volume ID: fe00713a-7099-404d-ba52-46c6b4b6ecc0
> >>> Status: Started
> >>> Snapshot Count: 0
> >>> Number of Bricks: 1 x (8 + 2) = 10
> >>> Transport-type: tcp
> >>> Bricks:
> >>> Brick1: dn-304:/mnt/test-ec-400/brick
> >>> Brick2: dn-305:/mnt/test-ec-400/brick
> >>> Brick3: dn-306:/mnt/test-ec-400/brick
> >>> Brick4: dn-307:/mnt/test-ec-400/brick
> >>> Brick5: dn-308:/mnt/test-ec-400/brick
> >>> Brick6: dn-309:/mnt/test-ec-400/brick
> >>> Brick7: dn-310:/mnt/test-ec-400/brick
> >>> Brick8: dn-311:/mnt/test-ec-400_2/brick
> >>> Brick9: dn-312:/mnt/test-ec-400/brick
> >>> Brick10: dn-313:/mnt/test-ec-400/brick
> >>> Options Reconfigured:
> >>> nfs.disable: on
> >>> transport.address-family: inet
> >>>
> >>> --
> >>>
> >>> Regards
> >>> Rolf Arne Larsen
> >>> Ops Engineer
> >>> rolf at jottacloud.com
> >>>
> >>> _______________________________________________
> >>> Gluster-users mailing list
> >>> Gluster-users at gluster.org
> >>> http://lists.gluster.org/mailman/listinfo/gluster-users
> >>
> >>
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://lists.gluster.org/mailman/listinfo/gluster-users
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171116/734176c3/attachment.html>