[Gluster-users] symbolic links changed to empty files

Allan Latham alatham at flexsys-group.de
Sun Jul 14 15:10:24 UTC 2013


Hi Vijay

Thanks - I didn't expect a reply on a Sunday.

It amazes me sometimes that I can use a tool like tar as a familiar
friend without ever thinking about the inner workings!

-P in the gnu version (in Debian wheezy) is used to preserve leading '/'
on file names. I'll try it later just in case it's an undocumented feature.

Please tell me more about hashed read-child - a link to a web page would
be great.

I had hoped that all reads would be to the local host (exactly 2 nodes
and exactly 2 replicas).

Thanks again

Allan

On 14/07/13 16:31, Vijay Bellur wrote:
> On 07/14/2013 02:34 PM, Allan Latham wrote:
>> Hi all
>>
>> I'm running some initial sanity and performance checks on this:
>>
>> root at h06 /root # glusterd -V
>> glusterfs 3.4.0beta4 built on Jul 10 2013 15:14:50
>>
>> The setup is a two node replicating cluster (h06 and h65) the client is
>> on h06.
>>
>> The network is limited to 100Mbits/second and rtt is 0.6ms so
>> performance is not spectacular but that is not the problem which
>> currently concerns me.
>>
>> The test was to take a typical linux rootfs with 52K files of various
>> sizes totaling 1.9GByte and copy it to a gluster mount using tar:
>>
>> root at h06 /root/mnt/h06 # sync;time (tar -c *|(cd /gluster/mnt/20 && tar
>> -x);sync)
>>
>> real    20m24.836s
>> user    0m3.517s
>> sys     0m39.814s
>>
>> Times are not brilliant but it will be OK for the usage scenario.
>>
>> Extracts from 'mount':
>>
>> /dev/mapper/vs-h06 on /root/mnt/h06 type ext4
>> (rw,relatime,barrier=1,data=ordered)
>>
>> /var/lib/glusterd/vols/gl/gl-fuse.vol on /gluster/mnt type
>> fuse.glusterfs
>> (rw,nosuid,nodev,noatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
>>
>>
>> Among other tests I wanted to see that the copy in /gluster/mnt/20 was
>> identical to the original in /root/mnt/h06
>>
>> The original has 5261 symbolic links:
>>
>> root at h06 /root/mnt/h06 # find -type l |wc
>>     5261    5261  182824
>>
>> The copy has only 4089:
>>
>> root at h65 /gluster/mnt/20 # find -type l |wc
>>     4089    4089  141644
>>
>> Here is an example:
>>
>> root at h06 /root # ls -ld ~/mnt/h06/bin/netcat
>> lrwxrwxrwx 1 root root 24 Jun 30 09:03 /root/mnt/h06/bin/netcat ->
>> /etc/alternatives/netcat
>>
>> root at h06 /root # ls -ld /gluster/mnt/20/bin/netcat
>> ---------- 1 root root 0 Jul 14 06:37 /gluster/mnt/20/bin/netcat
>>
>> A scan with md5sum on the original and the copy in gluster shows only
>> these links as being different. All normal files checksum the same.
>>
>> The mirrored gluster filesystem on h65 (no surprise) shows the identical
>> result - some symbolic links have been changed in empty files.
>>
>> To the best of my knowledge the gluster filesystems are identical on the
>> two nodes but differ from the original.
>>
>> To me it appears that the command:
>>
>> (cd source && tar -c *)|(cd gluster && tar -x)
>>
>> has changed some symbolic links in 'source' into empty files in
>> 'gluster'.
> 
> This seems related to the way tar extracts symbolic links. In a nutshell
> the following steps are performed by tar for creation of symbolic links
> on the destination:
> 
> a) Create an empty regular placeholder file with permission bits set to
> 0 and the name being that of the symlink source file.
> 
> b) Record the device, inode numbers and the mtime of the placeholder
> file through stat.
> 
> c) After the first pass of extraction is complete, there is a second
> pass involved to set right symbolic links. In this phase a stat is
> performed on the placeholder file. If all attributes recorded in b) are
> in sync with the latest information from stat buf, only then the
> placeholder is unlinked and a new symbolic link is created. If any
> attribute is out of sync, the unlink and creation of symbolic link do
> not happen.
> 
> With gluster's replication, the mtimes can vary across the nodes during
> the creation of placeholder files. If the stat calls in steps b) and c)
> land on different nodes, then there is a very good likelihood (due to
> different mtimes) that tar would skip creation of symbolic links and
> leave behind the placeholder file.
> 
> A little more about this particular implementation of symlinks for tar
> can be found here:
> 
> http://lists.debian.org/debian-user/2003/03/msg03249.html
> 
> To overcome this behavior, we can make use of -P option with tar during
> extraction. This will create the link file directly and not involve the
> 2 phased approach outlined above.
> 
> In addition to this, using an option like hashed read-child (available
> with GlusterFS 3.4) can ensure that read calls for an inode/file land on
> the same node always. With that, tar should not get varying mtimes
> across calls and the placeholder file should get replaced with the
> actual symlink.
> 
> -Vijay
> 
> 




More information about the Gluster-users mailing list