[Gluster-users] symbolic links changed to empty files

Vijay Bellur vbellur at redhat.com
Sun Jul 14 14:31:58 UTC 2013


On 07/14/2013 02:34 PM, Allan Latham wrote:
> Hi all
>
> I'm running some initial sanity and performance checks on this:
>
> root at h06 /root # glusterd -V
> glusterfs 3.4.0beta4 built on Jul 10 2013 15:14:50
>
> The setup is a two node replicating cluster (h06 and h65) the client is
> on h06.
>
> The network is limited to 100Mbits/second and rtt is 0.6ms so
> performance is not spectacular but that is not the problem which
> currently concerns me.
>
> The test was to take a typical linux rootfs with 52K files of various
> sizes totaling 1.9GByte and copy it to a gluster mount using tar:
>
> root at h06 /root/mnt/h06 # sync;time (tar -c *|(cd /gluster/mnt/20 && tar
> -x);sync)
>
> real    20m24.836s
> user    0m3.517s
> sys     0m39.814s
>
> Times are not brilliant but it will be OK for the usage scenario.
>
> Extracts from 'mount':
>
> /dev/mapper/vs-h06 on /root/mnt/h06 type ext4
> (rw,relatime,barrier=1,data=ordered)
>
> /var/lib/glusterd/vols/gl/gl-fuse.vol on /gluster/mnt type
> fuse.glusterfs
> (rw,nosuid,nodev,noatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
>
> Among other tests I wanted to see that the copy in /gluster/mnt/20 was
> identical to the original in /root/mnt/h06
>
> The original has 5261 symbolic links:
>
> root at h06 /root/mnt/h06 # find -type l |wc
>     5261    5261  182824
>
> The copy has only 4089:
>
> root at h65 /gluster/mnt/20 # find -type l |wc
>     4089    4089  141644
>
> Here is an example:
>
> root at h06 /root # ls -ld ~/mnt/h06/bin/netcat
> lrwxrwxrwx 1 root root 24 Jun 30 09:03 /root/mnt/h06/bin/netcat ->
> /etc/alternatives/netcat
>
> root at h06 /root # ls -ld /gluster/mnt/20/bin/netcat
> ---------- 1 root root 0 Jul 14 06:37 /gluster/mnt/20/bin/netcat
>
> A scan with md5sum on the original and the copy in gluster shows only
> these links as being different. All normal files checksum the same.
>
> The mirrored gluster filesystem on h65 (no surprise) shows the identical
> result - some symbolic links have been changed in empty files.
>
> To the best of my knowledge the gluster filesystems are identical on the
> two nodes but differ from the original.
>
> To me it appears that the command:
>
> (cd source && tar -c *)|(cd gluster && tar -x)
>
> has changed some symbolic links in 'source' into empty files in 'gluster'.

This seems related to the way tar extracts symbolic links. In a nutshell 
the following steps are performed by tar for creation of symbolic links 
on the destination:

a) Create an empty regular placeholder file with permission bits set to 
0 and the name being that of the symlink source file.

b) Record the device, inode numbers and the mtime of the placeholder 
file through stat.

c) After the first pass of extraction is complete, there is a second 
pass involved to set right symbolic links. In this phase a stat is 
performed on the placeholder file. If all attributes recorded in b) are 
in sync with the latest information from stat buf, only then the 
placeholder is unlinked and a new symbolic link is created. If any 
attribute is out of sync, the unlink and creation of symbolic link do 
not happen.

With gluster's replication, the mtimes can vary across the nodes during 
the creation of placeholder files. If the stat calls in steps b) and c) 
land on different nodes, then there is a very good likelihood (due to 
different mtimes) that tar would skip creation of symbolic links and 
leave behind the placeholder file.

A little more about this particular implementation of symlinks for tar 
can be found here:

http://lists.debian.org/debian-user/2003/03/msg03249.html

To overcome this behavior, we can make use of -P option with tar during 
extraction. This will create the link file directly and not involve the 
2 phased approach outlined above.

In addition to this, using an option like hashed read-child (available 
with GlusterFS 3.4) can ensure that read calls for an inode/file land on 
the same node always. With that, tar should not get varying mtimes 
across calls and the placeholder file should get replaced with the 
actual symlink.

-Vijay




More information about the Gluster-users mailing list