[Gluster-users] symbolic links changed to empty files

Allan Latham alatham at flexsys-group.de
Sun Jul 14 15:10:24 UTC 2013

Hi Vijay

Thanks - I didn't expect a reply on a Sunday.

It amazes me sometimes that I can use a tool like tar as a familiar
friend without ever thinking about the inner workings!

-P in the gnu version (in Debian wheezy) is used to preserve leading '/'
on file names. I'll try it later just in case it's an undocumented feature.

Please tell me more about hashed read-child - a link to a web page would
be great.

I had hoped that all reads would be to the local host (exactly 2 nodes
and exactly 2 replicas).

Thanks again


On 14/07/13 16:31, Vijay Bellur wrote:
> On 07/14/2013 02:34 PM, Allan Latham wrote:
>> Hi all
>> I'm running some initial sanity and performance checks on this:
>> root at h06 /root # glusterd -V
>> glusterfs 3.4.0beta4 built on Jul 10 2013 15:14:50
>> The setup is a two node replicating cluster (h06 and h65) the client is
>> on h06.
>> The network is limited to 100Mbits/second and rtt is 0.6ms so
>> performance is not spectacular but that is not the problem which
>> currently concerns me.
>> The test was to take a typical linux rootfs with 52K files of various
>> sizes totaling 1.9GByte and copy it to a gluster mount using tar:
>> root at h06 /root/mnt/h06 # sync;time (tar -c *|(cd /gluster/mnt/20 && tar
>> -x);sync)
>> real    20m24.836s
>> user    0m3.517s
>> sys     0m39.814s
>> Times are not brilliant but it will be OK for the usage scenario.
>> Extracts from 'mount':
>> /dev/mapper/vs-h06 on /root/mnt/h06 type ext4
>> (rw,relatime,barrier=1,data=ordered)
>> /var/lib/glusterd/vols/gl/gl-fuse.vol on /gluster/mnt type
>> fuse.glusterfs
>> (rw,nosuid,nodev,noatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
>> Among other tests I wanted to see that the copy in /gluster/mnt/20 was
>> identical to the original in /root/mnt/h06
>> The original has 5261 symbolic links:
>> root at h06 /root/mnt/h06 # find -type l |wc
>>     5261    5261  182824
>> The copy has only 4089:
>> root at h65 /gluster/mnt/20 # find -type l |wc
>>     4089    4089  141644
>> Here is an example:
>> root at h06 /root # ls -ld ~/mnt/h06/bin/netcat
>> lrwxrwxrwx 1 root root 24 Jun 30 09:03 /root/mnt/h06/bin/netcat ->
>> /etc/alternatives/netcat
>> root at h06 /root # ls -ld /gluster/mnt/20/bin/netcat
>> ---------- 1 root root 0 Jul 14 06:37 /gluster/mnt/20/bin/netcat
>> A scan with md5sum on the original and the copy in gluster shows only
>> these links as being different. All normal files checksum the same.
>> The mirrored gluster filesystem on h65 (no surprise) shows the identical
>> result - some symbolic links have been changed in empty files.
>> To the best of my knowledge the gluster filesystems are identical on the
>> two nodes but differ from the original.
>> To me it appears that the command:
>> (cd source && tar -c *)|(cd gluster && tar -x)
>> has changed some symbolic links in 'source' into empty files in
>> 'gluster'.
> This seems related to the way tar extracts symbolic links. In a nutshell
> the following steps are performed by tar for creation of symbolic links
> on the destination:
> a) Create an empty regular placeholder file with permission bits set to
> 0 and the name being that of the symlink source file.
> b) Record the device, inode numbers and the mtime of the placeholder
> file through stat.
> c) After the first pass of extraction is complete, there is a second
> pass involved to set right symbolic links. In this phase a stat is
> performed on the placeholder file. If all attributes recorded in b) are
> in sync with the latest information from stat buf, only then the
> placeholder is unlinked and a new symbolic link is created. If any
> attribute is out of sync, the unlink and creation of symbolic link do
> not happen.
> With gluster's replication, the mtimes can vary across the nodes during
> the creation of placeholder files. If the stat calls in steps b) and c)
> land on different nodes, then there is a very good likelihood (due to
> different mtimes) that tar would skip creation of symbolic links and
> leave behind the placeholder file.
> A little more about this particular implementation of symlinks for tar
> can be found here:
> http://lists.debian.org/debian-user/2003/03/msg03249.html
> To overcome this behavior, we can make use of -P option with tar during
> extraction. This will create the link file directly and not involve the
> 2 phased approach outlined above.
> In addition to this, using an option like hashed read-child (available
> with GlusterFS 3.4) can ensure that read calls for an inode/file land on
> the same node always. With that, tar should not get varying mtimes
> across calls and the placeholder file should get replaced with the
> actual symlink.
> -Vijay

More information about the Gluster-users mailing list