[Gluster-users] Replicated striped data lose

Sun Mar 13 16:07:54 UTC 2016

My HBAs are LSISAS1068E, and the filesystem is XFS.
I tried EXT4 and it did not help.
I have created a stripted volume in one server with two bricks, same issue.
and i tried a replicated volume with just "sharding enabled" same issue, 
as soon as i disable the sharding it works just fine, niether sharding 
nor striping works for me.
i did follow up with some of threads in the mailing list and tried some 
of the fixes that worked with the others, none worked for me. :(

On 03/13/2016 06:54 PM, David Gossage wrote:
>
>
>
> On Sun, Mar 13, 2016 at 8:16 AM, Mahdi Adnan 
> <mahdi.adnan at earthlinktele.com <mailto:mahdi.adnan at earthlinktele.com>> 
> wrote:
>
>     Okay so i have enabled shard in my test volume and it did not
>     help, stupidly enough, i have enabled it in a production volume
>     "Distributed-Replicate" and it currpted  half of my VMs.
>     I have updated Gluster to the latest and nothing seems to be
>     changed in my situation.
>     below the info of my volume;
>
>
> I was pointing at the settings in that email as an example for 
> corruption fixing. I wouldn't recommend enabling sharding if you 
> haven't gotten the base working yet on that cluster. What HBA's are 
> you using and what is layout of filesystem for bricks?
>
>
>     Number of Bricks: 3 x 2 = 6
>     Transport-type: tcp
>     Bricks:
>     Brick1: gfs001:/bricks/b001/vmware
>     Brick2: gfs002:/bricks/b004/vmware
>     Brick3: gfs001:/bricks/b002/vmware
>     Brick4: gfs002:/bricks/b005/vmware
>     Brick5: gfs001:/bricks/b003/vmware
>     Brick6: gfs002:/bricks/b006/vmware
>     Options Reconfigured:
>     performance.strict-write-ordering: on
>     cluster.server-quorum-type: server
>     cluster.quorum-type: auto
>     network.remote-dio: enable
>     performance.stat-prefetch: disable
>     performance.io-cache: off
>     performance.read-ahead: off
>     performance.quick-read: off
>     cluster.eager-lock: enable
>     features.shard-block-size: 16MB
>     features.shard: on
>     performance.readdir-ahead: off
>
>
>     On 03/12/2016 08:11 PM, David Gossage wrote:
>>
>>     On Sat, Mar 12, 2016 at 10:21 AM, Mahdi Adnan
>>     <mahdi.adnan at earthlinktele.com
>>     <mailto:mahdi.adnan at earthlinktele.com>> wrote:
>>
>>         Both servers have HBA no RAIDs and i can setup a replicated
>>         or dispensers without any issues.
>>         Logs are clean and when i tried to migrate a vm and got the
>>         error, nothing showed up in the logs.
>>         i tried mounting the volume into my laptop and it mounted
>>         fine but, if i use dd to create a data file it just hang and
>>         i cant cancel it, and i cant unmount it or anything, i just
>>         have to reboot.
>>         The same servers have another volume on other bricks in a
>>         distributed replicas, works fine.
>>         I have even tried the same setup in a virtual environment
>>         (created two vms and install gluster and created a replicated
>>         striped) and again same thing, data corruption.
>>
>>
>>     I'd look through mail archives for a topic "Shard in Production"
>>     I think it's called. The shard portion may not be relevant but it
>>     does discuss certain settings that had to be applied with regards
>>     to avoiding corruption with VM's.  You may want to try and
>>     disable the  performance.readdir-ahead also.
>>
>>
>>
>>         On 03/12/2016 07:02 PM, David Gossage wrote:
>>>
>>>
>>>         On Sat, Mar 12, 2016 at 9:51 AM, Mahdi Adnan
>>>         <mahdi.adnan at earthlinktele.com
>>>         <mailto:mahdi.adnan at earthlinktele.com>> wrote:
>>>
>>>             Thanks David,
>>>
>>>             My settings are all defaults, i have just created the
>>>             pool and started it.
>>>             I have set the settings as your recommendation and it
>>>             seems to be the same issue;
>>>
>>>             Type: Striped-Replicate
>>>             Volume ID: 44adfd8c-2ed1-4aa5-b256-d12b64f7fc14
>>>             Status: Started
>>>             Number of Bricks: 1 x 2 x 2 = 4
>>>             Transport-type: tcp
>>>             Bricks:
>>>             Brick1: gfs001:/bricks/t1/s
>>>             Brick2: gfs002:/bricks/t1/s
>>>             Brick3: gfs001:/bricks/t2/s
>>>             Brick4: gfs002:/bricks/t2/s
>>>             Options Reconfigured:
>>>             performance.stat-prefetch: off
>>>             network.remote-dio: on
>>>             cluster.eager-lock: enable
>>>             performance.io-cache: off
>>>             performance.read-ahead: off
>>>             performance.quick-read: off
>>>             performance.readdir-ahead: on
>>>
>>>
>>>         Is their a raid controller perhaps doing any caching?
>>>
>>>         In the gluster logs any errors being reported during
>>>         migration process?
>>>         Since they aren't in use yet have you tested making just
>>>         mirrored bricks using different pairings of servers two at a
>>>         time to see if problem follows certain machine or network ports?
>>>
>>>
>>>
>>>
>>>
>>>
>>>             On 03/12/2016 03:25 PM, David Gossage wrote:
>>>>
>>>>
>>>>             On Sat, Mar 12, 2016 at 1:55 AM, Mahdi Adnan
>>>>             <mahdi.adnan at earthlinktele.com
>>>>             <mailto:mahdi.adnan at earthlinktele.com>> wrote:
>>>>
>>>>                 Dears,
>>>>
>>>>                 I have created a replicated striped volume with two
>>>>                 bricks and two servers but I can't use it because
>>>>                 when I mount it in ESXi and try to migrate a VM to
>>>>                 it, the data get corrupted.
>>>>                 Is any one have any idea why is this happening ?
>>>>
>>>>                 Dell 2950 x2
>>>>                 Seagate 15k 600GB
>>>>                 CentOS 7.2
>>>>                 Gluster 3.7.8
>>>>
>>>>                 Appreciate your help.
>>>>
>>>>
>>>>             Most reports of this I have seen end up being settings
>>>>             related.  Post gluster volume info. Below is what I
>>>>             have seen as most common recommended settings.
>>>>             I'd hazard a guess you may have some the read ahead
>>>>             cache or prefetch on.
>>>>
>>>>             quick-read=off
>>>>             read-ahead=off
>>>>             io-cache=off
>>>>             stat-prefetch=off
>>>>             eager-lock=enable
>>>>             remote-dio=on
>>>>
>>>>
>>>>                 Mahdi Adnan
>>>>                 System Admin
>>>>
>>>>
>>>>                 _______________________________________________
>>>>                 Gluster-users mailing list
>>>>                 Gluster-users at gluster.org
>>>>                 <mailto:Gluster-users at gluster.org>
>>>>                 http://www.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
>>>
>>>
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160313/fe6d8bee/attachment.html>