[Gluster-users] Disperse volumes on armhf

Mon Aug 6 07:23:23 UTC 2018

Hi,

On Sat, Aug 4, 2018 at 3:19 AM Fox <foxxz.net at gmail.com> wrote:

> Replying to the last batch of questions I've received...
>
> To reiterate, I am only having problems writing files to disperse volumes
> when mounting it on an armhf system. Mounting the same volume on an x86-64
> system works fine.
> Disperse volumes running on arm can not heal.
>
> Replica volumes mount and heal just fine.
>
>
> All bricks are up and running. I have ensured connectivity and that MTU is
> correct and identical.
>
> Armhf is 32bit:
> # uname -a
> Linux gluster01 4.14.55-146 #1 SMP PREEMPT Wed Jul 11 22:31:01 -03 2018
> armv7l armv7l armv7l GNU/Linux
> # file /bin/bash
> /bin/bash: ELF 32-bit LSB shared object, ARM, EABI5 version 1 (SYSV),
> dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux
> 3.2.0, BuildID[sha1]=e0a53f804173b0cd9845bb8a76fee1a1e98a9759, stripped
> # lsb_release -a
> No LSB modules are available.
> Distributor ID: Ubuntu
> Description:    Ubuntu 18.04.1 LTS
> Release:        18.04
> Codename:       bionic
> # free
>               total        used        free      shared  buff/cache
> available
> Mem:        2042428       83540     1671004        6052      287884
> 1895684
> Swap:             0           0           0
>
>
> 8 cores total. 4x running 2ghz and 4x running 1.4ghz
> processor       : 0
> model name      : ARMv7 Processor rev 3 (v7l)
> BogoMIPS        : 24.00
> Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva
> idivt vfpd32 lpae
> CPU implementer : 0x41
> CPU architecture: 7
> CPU variant     : 0x0
> CPU part        : 0xc07
> CPU revision    : 3
>
> processor       : 4
> model name      : ARMv7 Processor rev 3 (v7l)
> BogoMIPS        : 72.00
> Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva
> idivt vfpd32 lpae
> CPU implementer : 0x41
> CPU architecture: 7
> CPU variant     : 0x2
> CPU part        : 0xc0f
> CPU revision    : 3
>
>
>
> There IS a 98MB /core file from the fuse mount so thats cool.
> # file /core
> /core: ELF 32-bit LSB core file ARM, version 1 (SYSV), SVR4-style, from
> '/usr/sbin/glusterfs --process-name fuse --volfile-server=gluster01
> --volfile-id', real uid: 0, effective uid: 0, real gid: 0, effective gid:
> 0, execfn: '/usr/sbin/glusterfs', platform: 'v7l'
>

On possible cause is some 64/32 bits inconsistency. If you have also
installed the debug symbols and can provide a backtrace from the core dump,
it would help to identify the problem.

Xavi

> I will try and get a bug report with logs filed over the weekend.
>
> This is just an experimental home cluster. I don't have anything on it
> yet. Its possible I could grant someone SSH access to the cluster if it
> helps further the gluster project. But the results should be reproducible
> on something like a raspberry pi. I was hoping to run a dispersed volume on
> it eventually otherwise I would have never found this issue.
>
> Thank you for the troubleshooting ideas.
>
>
> -Fox
>
>
>
>
> On Fri, Aug 3, 2018 at 3:33 AM, Milind Changire <mchangir at redhat.com>
> wrote:
>
>> What is the endianness of the armhf CPU ?
>> Are you running a 32bit or 64bit Operating System ?
>>
>>
>> On Fri, Aug 3, 2018 at 9:51 AM, Fox <foxxz.net at gmail.com> wrote:
>>
>>> Just wondering if anyone else is running into the same behavior with
>>> disperse volumes described below and what I might be able to do about it.
>>>
>>> I am using ubuntu 18.04LTS on Odroid HC-2 hardware (armhf) and have
>>> installed gluster 4.1.2 via PPA. I have 12 member nodes each with a single
>>> brick. I can successfully create a working volume via the command:
>>>
>>> gluster volume create testvol1 disperse 12 redundancy 4
>>> gluster01:/exports/sda/brick1/testvol1
>>> gluster02:/exports/sda/brick1/testvol1
>>> gluster03:/exports/sda/brick1/testvol1
>>> gluster04:/exports/sda/brick1/testvol1
>>> gluster05:/exports/sda/brick1/testvol1
>>> gluster06:/exports/sda/brick1/testvol1
>>> gluster07:/exports/sda/brick1/testvol1
>>> gluster08:/exports/sda/brick1/testvol1
>>> gluster09:/exports/sda/brick1/testvol1
>>> gluster10:/exports/sda/brick1/testvol1
>>> gluster11:/exports/sda/brick1/testvol1
>>> gluster12:/exports/sda/brick1/testvol1
>>>
>>> And start the volume:
>>> gluster volume start testvol1
>>>
>>> Mounting the volume on an x86-64 system it performs as expected.
>>>
>>> Mounting the same volume on an armhf system (such as one of the cluster
>>> members) I can create directories but trying to create a file I get an
>>> error and the file system unmounts/crashes:
>>> root at gluster01:~# mount -t glusterfs gluster01:/testvol1 /mnt
>>> root at gluster01:~# cd /mnt
>>> root at gluster01:/mnt# ls
>>> root at gluster01:/mnt# mkdir test
>>> root at gluster01:/mnt# cd test
>>> root at gluster01:/mnt/test# cp /root/notes.txt ./
>>> cp: failed to close './notes.txt': Software caused connection abort
>>> root at gluster01:/mnt/test# ls
>>> ls: cannot open directory '.': Transport endpoint is not connected
>>>
>>> I get many of these in the glusterfsd.log:
>>> The message "W [MSGID: 101088] [common-utils.c:4316:gf_backtrace_save]
>>> 0-management: Failed to save the backtrace." repeated 100 times between
>>> [2018-08-03 04:06:39.904166] and [2018-08-03 04:06:57.521895]
>>>
>>>
>>> Furthermore, if a cluster member ducks out (reboots, loses connection,
>>> etc) and needs healing the self heal daemon logs messages similar to that
>>> above and can not heal - no disk activity (verified via iotop) though very
>>> high CPU usage and the volume heal info command indicates the volume needs
>>> healing.
>>>
>>>
>>> I tested all of the above in virtual environments using x86-64 VMs and
>>> could self heal as expected.
>>>
>>> Again this only happens when using disperse volumes. Should I be filing
>>> a bug report instead?
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>>
>> --
>> Milind
>>
>>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180806/6609828a/attachment.html>