[Gluster-users] Strange file corruption - it happened again

Wed Dec 16 10:25:37 UTC 2015


On 12/14/2015 04:44 PM, Udo Giacomozzi wrote:
> Hi,
>
> it happened again:
>
> today I've upgraded some packages on node #3. Since the Kernel had a 
> minor update, I was asked to reboot the server, and did so.
>
> At that time only one (non-critical) VM was running on that node. I've 
> checked twice and Gluster was *not* healing when I've rebooted.
>
> After rebooting, and while *automatic* healing was in progress, one VM 
> started to get HDD corruption again, up to the point that it wasn't 
> able to boot anymore(!).
>
> That poor VM was one of the only two VMs that were still using NFS for 
> accessing the Gluster storage - if that matters.
> The second VM survived the healing, even if it has rather large disks 
> (~380 GB) and is rather busy.
>
> All other ~13 VMs had been moved to native glusterfs mount days before 
> and had no problem with the reboot. The Gluster access type may be 
> related or not - I don't know...
>
> All Gluster packages are at version "3.5.2-2+deb8u1" on all three 
> servers - so Gluster has *not* been upgraded this time.
> Kernel on node #3: Linux metal3 4.2.6-1-pve #1 SMP Wed Dec 9 10:49:55 
> CET 2015 x86_64 GNU/Linux
> Kenrle node #1&#2: Linux metal1 4.2.3-2-pve #1 SMP Sun Nov 15 16:08:19 
> CET 2015 x86_64 GNU/Linux
Could you give us the logs of all the nodes for your setup and the 
name/gfid of the file that was corrupted?

Pranith
>
>
> Any idea??
>
> Udo
>
>
> Am 10.12.2015 um 16:12 schrieb Udo Giacomozzi:
>> Am 09.12.2015 um 22:33 schrieb Lindsay Mathieson:
>>>
>>>
>>> On 10/12/2015 3:15 AM, Udo Giacomozzi wrote:
>>>> This were the commands executed on node #2 during step 6:
>>>>
>>>>     gluster volume add-brick "systems" replica 3
>>>>     metal1:/data/gluster/systems
>>>>     gluster volume heal "systems" full   # to trigger sync
>>>>
>>>>
>>>> Then I waited for replication to finish before doing anything else 
>>>> (about 1 hour or maybe more), checking _gluster volume heal 
>>>> "systems" info_
>>>
>>>
>>> Did you execute the heal command from host #2? Might be related to a 
>>> possible issue I encountered during testing adding bricks recently, 
>>> still in the process of recreating and testing the issue.
>>
>>
>> I'm afraid I can't tell anymore. Could be, I'm not sure, sorry...
>>
>>
>> Udo
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151216/05750f60/attachment.html>