[Gluster-users] GFID attir is missing after adding large amounts of data
Ben Turner
bturner at redhat.com
Fri Sep 1 02:19:01 UTC 2017
I re-added gluster-users to get some more eye on this.
----- Original Message -----
> From: "Christoph Schäbel" <christoph.schaebel at dc-square.de>
> To: "Ben Turner" <bturner at redhat.com>
> Sent: Wednesday, August 30, 2017 8:18:31 AM
> Subject: Re: [Gluster-users] GFID attir is missing after adding large amounts of data
>
> Hello Ben,
>
> thank you for offering your help.
>
> Here are outputs from all the gluster commands I could think of.
> Note that we had to remove the terrabytes of data to keep the system
> operational, because it is a live system.
>
> # gluster volume status
>
> Status of volume: gv0
> Gluster process TCP Port RDMA Port Online Pid
> ------------------------------------------------------------------------------
> Brick 10.191.206.15:/mnt/brick1/gv0 49154 0 Y 2675
> Brick 10.191.198.15:/mnt/brick1/gv0 49154 0 Y 2679
> Self-heal Daemon on localhost N/A N/A Y
> 12309
> Self-heal Daemon on 10.191.206.15 N/A N/A Y 2670
>
> Task Status of Volume gv0
> ------------------------------------------------------------------------------
> There are no active volume tasks
OK so your bricks are all online, you have two nodes with 1 brick per node.
>
> # gluster volume info
>
> Volume Name: gv0
> Type: Replicate
> Volume ID: 5e47d0b8-b348-45bb-9a2a-800f301df95b
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: 10.191.206.15:/mnt/brick1/gv0
> Brick2: 10.191.198.15:/mnt/brick1/gv0
> Options Reconfigured:
> transport.address-family: inet
> performance.readdir-ahead: on
> nfs.disable: on
You are using a replicate volume with 2 copies of your data, it looks like you are using the defaults as I don't see any tuning.
>
> # gluster peer status
>
> Number of Peers: 1
>
> Hostname: 10.191.206.15
> Uuid: 030a879d-da93-4a48-8c69-1c552d3399d2
> State: Peer in Cluster (Connected)
>
>
> # gluster —version
>
> glusterfs 3.8.11 built on Apr 11 2017 09:50:39
> Repository revision: git://git.gluster.com/glusterfs.git
> Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
> GlusterFS comes with ABSOLUTELY NO WARRANTY.
> You may redistribute copies of GlusterFS under the terms of the GNU General
> Public License.
You are running Gluster 3.8 which is the latest upstream release marked stable.
>
> # df -h
>
> Filesystem Size Used Avail Use% Mounted on
> /dev/mapper/vg00-root 75G 5.7G 69G 8% /
> devtmpfs 1.9G 0 1.9G 0% /dev
> tmpfs 1.9G 0 1.9G 0% /dev/shm
> tmpfs 1.9G 17M 1.9G 1% /run
> tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
> /dev/sda1 477M 151M 297M 34% /boot
> /dev/mapper/vg10-brick1 8.0T 700M 8.0T 1% /mnt/brick1
> localhost:/gv0 8.0T 768M 8.0T 1% /mnt/glusterfs_client
> tmpfs 380M 0 380M 0% /run/user/0
>
Your brick is:
/dev/mapper/vg10-brick1 8.0T 700M 8.0T 1% /mnt/brick1
The block device is 8TB. Can you tell me more about your brick? Is it a single disk or a RAID? If its a RAID can you tell me about the disks? I am interested in:
-Size of disks
-RAID type
-Stripe size
-RAID controller
I also see:
localhost:/gv0 8.0T 768M 8.0T 1% /mnt/glusterfs_client
So you are mounting your volume on the local node, is this the mount where you are writing data to?
>
>
> The setup of the servers is done via shell script on CentOS 7 containing the
> following commands:
>
> yum install -y centos-release-gluster
> yum install -y glusterfs-server
>
> mkdir /mnt/brick1
> ssm create -s 999G -n brick1 --fstype xfs -p vg10 /dev/sdb /mnt/brick1
I haven't used system-storage-manager before, do you know if it takes care of properly tuning your storage stack(if you have a RAID that is)? If you don't have a RAID its prolly not that big of a deal, if you do have a RAID we should make sure everything is aware of your stripe size and tune appropriately.
>
> echo "/dev/mapper/vg10-brick1 /mnt/brick1 xfs defaults 1 2" >>
> /etc/fstab
> mount -a && mount
> mkdir /mnt/brick1/gv0
>
> gluster peer probe OTHER_SERVER_IP
>
> gluster pool list
> gluster volume create gv0 replica 2 OWN_SERVER_IP:/mnt/brick1/gv0
> OTHER_SERVER_IP:/mnt/brick1/gv0
> gluster volume start gv0
> gluster volume info gv0
> gluster volume set gv0 network.ping-timeout "10"
> gluster volume info gv0
>
> # mount as client for archiving cronjob, is already in fstab
> mount -a
>
> # mount via fuse-client
> mkdir -p /mnt/glusterfs_client
> echo "localhost:/gv0 /mnt/glusterfs_client glusterfs defaults,_netdev 0 0" >>
> /etc/fstab
> mount -a
>
>
> We untar multiple files (around 1300 tar files) each around 2,7GB in size.
> The tar files are not compressed.
> We untar the files with a shell script containing the following:
>
> #! /bin/bash
> for f in *.tar; do tar xfP $f; done
Your script looks good, I am not that familiar with the tar flag "P" but it looks to mean:
-P, --absolute-names
Don't strip leading slashes from file names when creating archives.
I don't see anything strange here, everything looks OK.
>
> The script is run as user root, the processes glusterd, glusterfs and
> glusterfsd also run under user root.
>
> Each tar file consists of a single folder with multiple folders and files in
> it.
> The folder tree looks like this (note that the "=“ is part of the folder
> name):
>
> 1498780800/
> - timeframe_hour=1498780800/ (about 25 of these folders)
> -- type=1/ (about 25 folders total)
> --- data-x.gz.parquet (between 100MB and 1kb in size)
> --- data-x.gz.parquet.crc (around 1kb in size)
> -- …
> - ...
>
> Unfortunately I cannot share the file contents with you.
Thats no problem, I'll try to recreate this in the lab.
>
> We have not seen any other issues with glusterfs, when untaring just a few of
> those files. I just tried writing a 100GB with dd and did not see any issues
> there, the file is replicated and the GFID attribute is set correctly on
> both nodes.
ACK. I do this all the time, if you saw an issue here I would be worried about your setup.
>
> We are not able to reproduce this in our lab environment which is a clone
> (actual cloned VMs) of the other system, but it only has around 1TB of
> storage.
> Do you think this could be an issue with the number of files which is
> generated by tar (over 1.5 million files). ?
> What I can say is that it is not an issue with inodes, that I checked when
> all the files where unpacked on the live system.
Hmm I am not sure. Its strange that you can't repro this on your other config, in the lab I have a ton of space to work with so I can run a ton of data in my repro.
>
> If you need anything else, let me know.
Can you help clarify your reproducer so I can give it a go in the lab? From what I can tell you have:
1498780800/ <-- Just a string of numbers, this is the root dir of your tarball
- timeframe_hour=1498780800/ (about 25 of these folders) <-- This is the second level dir of your tarball, there are ~25 of these dirs that mention a timeframe and an hour
-- type=1/ (about 25 folders total) <-- This is the 3rd level of your tar, there are about 25 different type=$X dirs
--- data-x.gz.parquet (between 100MB and 1kb in size) <-- This is your actual data. Is there just 1 pair of these file per dir or multiple?
--- data-x.gz.parquet.crc (around 1kb in size) <-- This is a checksum for the above file?
I have almost everything I need for my reproducer, can you answer the above questions about the data?
-b
>
> Thank you for your help,
> Christoph
> > Am 29.08.2017 um 06:36 schrieb Ben Turner <bturner at redhat.com>:
> >
> > Also include gluster v status, I want to check the status of your bricks
> > and SHD processes.
> >
> > -b
> >
> > ----- Original Message -----
> >> From: "Ben Turner" <bturner at redhat.com>
> >> To: "Christoph Schäbel" <christoph.schaebel at dc-square.de>
> >> Cc: gluster-users at gluster.org
> >> Sent: Tuesday, August 29, 2017 12:35:05 AM
> >> Subject: Re: [Gluster-users] GFID attir is missing after adding large
> >> amounts of data
> >>
> >> This is strange, a couple of questions:
> >>
> >> 1. What volume type is this? What tuning have you done? gluster v info
> >> output would be helpful here.
> >>
> >> 2. How big are your bricks?
> >>
> >> 3. Can you write me a quick reproducer so I can try this in the lab? Is
> >> it
> >> just a single multi TB file you are untarring or many? If you give me the
> >> steps to repro, and I hit it, we can get a bug open.
> >>
> >> 4. Other than this are you seeing any other problems? What if you untar
> >> a
> >> smaller file(s)? Can you read and write to the volume with say DD without
> >> any problems?
> >>
> >> It sounds like you have some other issues affecting things here, there is
> >> no
> >> reason why you shouldn't be able to untar and write multiple TBs of data
> >> to
> >> gluster. Go ahead and answer those questions and I'll see what I can do
> >> to
> >> help you out.
> >>
> >> -b
> >>
> >> ----- Original Message -----
> >>> From: "Christoph Schäbel" <christoph.schaebel at dc-square.de>
> >>> To: gluster-users at gluster.org
> >>> Sent: Monday, August 28, 2017 3:55:31 AM
> >>> Subject: [Gluster-users] GFID attir is missing after adding large amounts
> >>> of data
> >>>
> >>> Hi Cluster Community,
> >>>
> >>> we are seeing some problems when adding multiple terrabytes of data to a
> >>> 2
> >>> node replicated GlusterFS installation.
> >>>
> >>> The version is 3.8.11 on CentOS 7.
> >>> The machines are connected via 10Gbit LAN and are running 24/7. The OS is
> >>> virtualized on VMWare.
> >>>
> >>> After a restart of node-1 we see that the log files are growing to
> >>> multiple
> >>> Gigabytes a day.
> >>>
> >>> Also there seem to be problems with the replication.
> >>> The setup worked fine until sometime after we added the additional data
> >>> (around 3 TB in size) to node-1. We added the data to a mountpoint via
> >>> the
> >>> client, not directly to the brick.
> >>> What we did is add tar files via a client-mount and then untar them while
> >>> in
> >>> the client-mount folder.
> >>> The brick (/mnt/brick1/gv0) is using the XFS filesystem.
> >>>
> >>> When checking the file attributes of one of the files mentioned in the
> >>> brick
> >>> logs, i can see that the gfid attribute is missing on node-1. On node-2
> >>> the
> >>> file does not even exist.
> >>>
> >>> getfattr -m . -d -e hex
> >>> mnt/brick1/gv0/.glusterfs/40/59/40598e46-9868-4d7c-b494-7b978e67370a/type=type1/part-r-00002-4846e211-c81d-4c08-bb5e-f22fa5a4b404.gz.parquet
> >>>
> >>> # file:
> >>> mnt/brick1/gv0/.glusterfs/40/59/40598e46-9868-4d7c-b494-7b978e67370a/type=type1/part-r-00002-4846e211-c81d-4c08-bb5e-f22fa5a4b404.gz.parquet
> >>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> >>>
> >>> We repeated this scenario a second time with a fresh setup and got the
> >>> same
> >>> results.
> >>>
> >>> Does anyone know what we are doing wrong ?
> >>>
> >>> Is there maybe a problem with glusterfs and tar ?
> >>>
> >>>
> >>> Log excerpts:
> >>>
> >>>
> >>> glustershd.log
> >>>
> >>> [2017-07-26 15:31:36.290908] I [MSGID: 108026]
> >>> [afr-self-heal-entry.c:833:afr_selfheal_entry_do] 0-gv0-replicate-0:
> >>> performing entry selfheal on fe5c42ac-5fda-47d4-8221-484c8d826c06
> >>> [2017-07-26 15:31:36.294289] W [MSGID: 114031]
> >>> [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-gv0-client-1: remote
> >>> operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [No
> >>> data available]
> >>> [2017-07-26 15:31:36.298287] I [MSGID: 108026]
> >>> [afr-self-heal-entry.c:833:afr_selfheal_entry_do] 0-gv0-replicate-0:
> >>> performing entry selfheal on e31ae2ca-a3d2-4a27-a6ce-9aae24608141
> >>> [2017-07-26 15:31:36.300695] W [MSGID: 114031]
> >>> [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-gv0-client-1: remote
> >>> operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [No
> >>> data available]
> >>> [2017-07-26 15:31:36.303626] I [MSGID: 108026]
> >>> [afr-self-heal-entry.c:833:afr_selfheal_entry_do] 0-gv0-replicate-0:
> >>> performing entry selfheal on 2cc9dafe-64d3-454a-a647-20deddfaebfe
> >>> [2017-07-26 15:31:36.305763] W [MSGID: 114031]
> >>> [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-gv0-client-1: remote
> >>> operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [No
> >>> data available]
> >>> [2017-07-26 15:31:36.308639] I [MSGID: 108026]
> >>> [afr-self-heal-entry.c:833:afr_selfheal_entry_do] 0-gv0-replicate-0:
> >>> performing entry selfheal on cbabf9ed-41be-4d08-9cdb-5734557ddbea
> >>> [2017-07-26 15:31:36.310819] W [MSGID: 114031]
> >>> [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-gv0-client-1: remote
> >>> operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [No
> >>> data available]
> >>> [2017-07-26 15:31:36.315057] I [MSGID: 108026]
> >>> [afr-self-heal-entry.c:833:afr_selfheal_entry_do] 0-gv0-replicate-0:
> >>> performing entry selfheal on 8a3c1c16-8edf-40f0-b2ea-8e70c39e1a69
> >>> [2017-07-26 15:31:36.317196] W [MSGID: 114031]
> >>> [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-gv0-client-1: remote
> >>> operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [No
> >>> data available]
> >>>
> >>>
> >>>
> >>> bricks/mnt-brick1-gv0.log
> >>>
> >>> 2017-07-26 15:31:36.287831] E [MSGID: 115050]
> >>> [server-rpc-fops.c:156:server_lookup_cbk] 0-gv0-server: 6153546: LOOKUP
> >>> <gfid:d99930df-6b47-4b55-9af3-c767afd6584c>/part-r-00001-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet
> >>> (d99930df-6b47-4b55-9af3-c767afd6584c/part-r-00001-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet)
> >>> ==> (No data available) [No data available]
> >>> [2017-07-26 15:31:36.294202] E [MSGID: 113002] [posix.c:266:posix_lookup]
> >>> 0-gv0-posix: buf->ia_gfid is null for
> >>> /mnt/brick1/gv0/.glusterfs/e7/2d/e72d9005-b958-432b-b4a9-37aaadd9d2df/type=type1/part-r-00001-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet
> >>> [No data available]
> >>> [2017-07-26 15:31:36.294235] E [MSGID: 115050]
> >>> [server-rpc-fops.c:156:server_lookup_cbk] 0-gv0-server: 6153564: LOOKUP
> >>> <gfid:fe5c42ac-5fda-47d4-8221-484c8d826c06>/part-r-00001-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet
> >>> (fe5c42ac-5fda-47d4-8221-484c8d826c06/part-r-00001-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet)
> >>> ==> (No data available) [No data available]
> >>> [2017-07-26 15:31:36.300611] E [MSGID: 113002] [posix.c:266:posix_lookup]
> >>> 0-gv0-posix: buf->ia_gfid is null for
> >>> /mnt/brick1/gv0/.glusterfs/33/d4/33d47146-bc30-49dd-ada8-475bb75435bf/type=type2/part-r-00002-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet
> >>> [No data available]
> >>> [2017-07-26 15:31:36.300645] E [MSGID: 115050]
> >>> [server-rpc-fops.c:156:server_lookup_cbk] 0-gv0-server: 6153582: LOOKUP
> >>> <gfid:e31ae2ca-a3d2-4a27-a6ce-9aae24608141>/part-r-00002-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet
> >>> (e31ae2ca-a3d2-4a27-a6ce-9aae24608141/part-r-00002-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet)
> >>> ==> (No data available) [No data available]
> >>> [2017-07-26 15:31:36.305671] E [MSGID: 113002] [posix.c:266:posix_lookup]
> >>> 0-gv0-posix: buf->ia_gfid is null for
> >>> /mnt/brick1/gv0/.glusterfs/33/d4/33d47146-bc30-49dd-ada8-475bb75435bf/type=type1/part-r-00002-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet
> >>> [No data available]
> >>> [2017-07-26 15:31:36.305711] E [MSGID: 115050]
> >>> [server-rpc-fops.c:156:server_lookup_cbk] 0-gv0-server: 6153600: LOOKUP
> >>> <gfid:2cc9dafe-64d3-454a-a647-20deddfaebfe>/part-r-00002-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet
> >>> (2cc9dafe-64d3-454a-a647-20deddfaebfe/part-r-00002-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet)
> >>> ==> (No data available) [No data available]
> >>> [2017-07-26 15:31:36.310735] E [MSGID: 113002] [posix.c:266:posix_lookup]
> >>> 0-gv0-posix: buf->ia_gfid is null for
> >>> /mnt/brick1/gv0/.glusterfs/df/71/df715321-3078-47c8-bf23-dec47abe46d7/type=type2/part-r-00002-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet
> >>> [No data available]
> >>> [2017-07-26 15:31:36.310767] E [MSGID: 115050]
> >>> [server-rpc-fops.c:156:server_lookup_cbk] 0-gv0-server: 6153618: LOOKUP
> >>> <gfid:cbabf9ed-41be-4d08-9cdb-5734557ddbea>/part-r-00002-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet
> >>> (cbabf9ed-41be-4d08-9cdb-5734557ddbea/part-r-00002-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet)
> >>> ==> (No data available) [No data available]
> >>> [2017-07-26 15:31:36.317113] E [MSGID: 113002] [posix.c:266:posix_lookup]
> >>> 0-gv0-posix: buf->ia_gfid is null for
> >>> /mnt/brick1/gv0/.glusterfs/df/71/df715321-3078-47c8-bf23-dec47abe46d7/type=type3/part-r-00002-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet
> >>> [No data available]
> >>> [2017-07-26 15:31:36.317146] E [MSGID: 115050]
> >>> [server-rpc-fops.c:156:server_lookup_cbk] 0-gv0-server: 6153636: LOOKUP
> >>> <gfid:8a3c1c16-8edf-40f0-b2ea-8e70c39e1a69>/part-r-00002-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet
> >>> (8a3c1c16-8edf-40f0-b2ea-8e70c39e1a69/part-r-00002-becc67f0-1665-47b6-8566-fa0245f560ad.gz.parquet)
> >>> ==> (No data available) [No data available]
> >>>
> >>>
> >>> Regards,
> >>> Christoph
> >>> _______________________________________________
> >>> Gluster-users mailing list
> >>> Gluster-users at gluster.org
> >>> http://lists.gluster.org/mailman/listinfo/gluster-users
> >>>
> >>
>
>
More information about the Gluster-users
mailing list