[Gluster-users] Disperse volumes on armhf

Sat Aug 4 01:18:17 UTC 2018

Replying to the last batch of questions I've received...

To reiterate, I am only having problems writing files to disperse volumes
when mounting it on an armhf system. Mounting the same volume on an x86-64
system works fine.
Disperse volumes running on arm can not heal.

Replica volumes mount and heal just fine.

All bricks are up and running. I have ensured connectivity and that MTU is
correct and identical.

Armhf is 32bit:
# uname -a
Linux gluster01 4.14.55-146 #1 SMP PREEMPT Wed Jul 11 22:31:01 -03 2018
armv7l armv7l armv7l GNU/Linux
# file /bin/bash
/bin/bash: ELF 32-bit LSB shared object, ARM, EABI5 version 1 (SYSV),
dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux
3.2.0, BuildID[sha1]=e0a53f804173b0cd9845bb8a76fee1a1e98a9759, stripped
# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.1 LTS
Release:        18.04
Codename:       bionic
# free
              total        used        free      shared  buff/cache
available
Mem:        2042428       83540     1671004        6052      287884
1895684
Swap:             0           0           0

8 cores total. 4x running 2ghz and 4x running 1.4ghz
processor       : 0
model name      : ARMv7 Processor rev 3 (v7l)
BogoMIPS        : 24.00
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva
idivt vfpd32 lpae
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xc07
CPU revision    : 3

processor       : 4
model name      : ARMv7 Processor rev 3 (v7l)
BogoMIPS        : 72.00
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva
idivt vfpd32 lpae
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x2
CPU part        : 0xc0f
CPU revision    : 3

There IS a 98MB /core file from the fuse mount so thats cool.
# file /core
/core: ELF 32-bit LSB core file ARM, version 1 (SYSV), SVR4-style, from
'/usr/sbin/glusterfs --process-name fuse --volfile-server=gluster01
--volfile-id', real uid: 0, effective uid: 0, real gid: 0, effective gid:
0, execfn: '/usr/sbin/glusterfs', platform: 'v7l'

I will try and get a bug report with logs filed over the weekend.

This is just an experimental home cluster. I don't have anything on it yet.
Its possible I could grant someone SSH access to the cluster if it helps
further the gluster project. But the results should be reproducible on
something like a raspberry pi. I was hoping to run a dispersed volume on it
eventually otherwise I would have never found this issue.

Thank you for the troubleshooting ideas.

-Fox

On Fri, Aug 3, 2018 at 3:33 AM, Milind Changire <mchangir at redhat.com> wrote:

> What is the endianness of the armhf CPU ?
> Are you running a 32bit or 64bit Operating System ?
>
>
> On Fri, Aug 3, 2018 at 9:51 AM, Fox <foxxz.net at gmail.com> wrote:
>
>> Just wondering if anyone else is running into the same behavior with
>> disperse volumes described below and what I might be able to do about it.
>>
>> I am using ubuntu 18.04LTS on Odroid HC-2 hardware (armhf) and have
>> installed gluster 4.1.2 via PPA. I have 12 member nodes each with a single
>> brick. I can successfully create a working volume via the command:
>>
>> gluster volume create testvol1 disperse 12 redundancy 4
>> gluster01:/exports/sda/brick1/testvol1 gluster02:/exports/sda/brick1/testvol1
>> gluster03:/exports/sda/brick1/testvol1 gluster04:/exports/sda/brick1/testvol1
>> gluster05:/exports/sda/brick1/testvol1 gluster06:/exports/sda/brick1/testvol1
>> gluster07:/exports/sda/brick1/testvol1 gluster08:/exports/sda/brick1/testvol1
>> gluster09:/exports/sda/brick1/testvol1 gluster10:/exports/sda/brick1/testvol1
>> gluster11:/exports/sda/brick1/testvol1 gluster12:/exports/sda/brick1/
>> testvol1
>>
>> And start the volume:
>> gluster volume start testvol1
>>
>> Mounting the volume on an x86-64 system it performs as expected.
>>
>> Mounting the same volume on an armhf system (such as one of the cluster
>> members) I can create directories but trying to create a file I get an
>> error and the file system unmounts/crashes:
>> root at gluster01:~# mount -t glusterfs gluster01:/testvol1 /mnt
>> root at gluster01:~# cd /mnt
>> root at gluster01:/mnt# ls
>> root at gluster01:/mnt# mkdir test
>> root at gluster01:/mnt# cd test
>> root at gluster01:/mnt/test# cp /root/notes.txt ./
>> cp: failed to close './notes.txt': Software caused connection abort
>> root at gluster01:/mnt/test# ls
>> ls: cannot open directory '.': Transport endpoint is not connected
>>
>> I get many of these in the glusterfsd.log:
>> The message "W [MSGID: 101088] [common-utils.c:4316:gf_backtrace_save]
>> 0-management: Failed to save the backtrace." repeated 100 times between
>> [2018-08-03 04:06:39.904166] and [2018-08-03 04:06:57.521895]
>>
>>
>> Furthermore, if a cluster member ducks out (reboots, loses connection,
>> etc) and needs healing the self heal daemon logs messages similar to that
>> above and can not heal - no disk activity (verified via iotop) though very
>> high CPU usage and the volume heal info command indicates the volume needs
>> healing.
>>
>>
>> I tested all of the above in virtual environments using x86-64 VMs and
>> could self heal as expected.
>>
>> Again this only happens when using disperse volumes. Should I be filing a
>> bug report instead?
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
> --
> Milind
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180803/2dc52884/attachment.html>