[Gluster-users] Bareos backup from Gluster mount

Wed Jul 29 21:37:09 UTC 2015

On Wed, Jul 29, 2015 at 5:17 PM Michael Mol <mikemol at gmail.com> wrote:

> On Mon, Jul 27, 2015 at 5:03 PM Ryan Clough <ryan.clough at dsic.com> wrote:
>
>> Hello,
>>
>> I have cross-posted this question in the bareos-users mailing list.
>>
>> Wondering if anyone has tried this because I am unable to backup data
>> that is mounted via Gluster Fuse or Gluster NFS. Basically, I have the
>> Gluster volume mounted on the Bareos Director which also has the tape
>> changer attached.
>>
>> Here is some information about versions:
>> Bareos version 14.2.2
>> Gluster version 3.7.2
>> Scientific Linux version 6.6
>>
>> Our Gluster volume consists of two nodes in distribute only. Here is the
>> configuration of our volume:
>> [root at hgluster02 ~]# gluster volume info
>>
>> Volume Name: export_volume
>> Type: Distribute
>> Volume ID: c74cc970-31e2-4924-a244-4c70d958dadb
>> Status: Started
>> Number of Bricks: 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: hgluster01:/gluster_data
>> Brick2: hgluster02:/gluster_data
>> Options Reconfigured:
>> performance.io-thread-count: 24
>> server.event-threads: 20
>> client.event-threads: 4
>> performance.readdir-ahead: on
>> features.inode-quota: on
>> features.quota: on
>> nfs.disable: off
>> auth.allow: 192.168.10.*,10.0.10.*,10.8.0.*,10.2.0.*,10.0.60.*
>> server.allow-insecure: on
>> server.root-squash: on
>> performance.read-ahead: on
>> features.quota-deem-statfs: on
>> diagnostics.brick-log-level: WARNING
>>
>> When I try to backup a directory from Gluster Fuse or Gluster NFS mount
>> and I monitor the network communication I only see data being pulled from
>> the hgluster01 brick. When the job finishes Bareos thinks that it completed
>> without error but included in the messages for the job are lots and lots of
>> permission denied errors like this:
>> 15-Jul 02:03 ripper.red.dsic.com-fd JobId 613:      Cannot open
>> "/export/rclough/psdv-2014-archives-2/scan_111.tar.bak": ERR=Permission
>> denied.
>> 15-Jul 02:03 ripper.red.dsic.com-fd JobId 613:      Cannot open
>> "/export/rclough/psdv-2014-archives-2/run_219.tar.bak": ERR=Permission
>> denied.
>> 15-Jul 02:03 ripper.red.dsic.com-fd JobId 613:      Cannot open
>> "/export/rclough/psdv-2014-archives-2/scan_112.tar.bak": ERR=Permission
>> denied.
>> 15-Jul 02:03 ripper.red.dsic.com-fd JobId 613:      Cannot open
>> "/export/rclough/psdv-2014-archives-2/run_220.tar.bak": ERR=Permission
>> denied.
>> 15-Jul 02:03 ripper.red.dsic.com-fd JobId 613:      Cannot open
>> "/export/rclough/psdv-2014-archives-2/scan_114.tar.bak": ERR=Permission
>> denied.
>>
>> At first I thought this might be a root-squash problem but, if I try to
>> read/copy a file using the root user from the Bareos server that is trying
>> to do the backup, I can read files just fine.
>>
>> When the job finishes is reports that it finished "OK -- with warnings"
>> but, again the log for the job is filled with "ERR=Permission denied"
>> messages. In my opinion, this job did not finish OK and should be Failed.
>> Some of the files from the HGluster02 brick are backed up but all of the
>> ones with permission errors do not. When I restore the job, all of the
>> files with permission errors are empty.
>>
>> Has anyone successfully used Bareos to backup data from Gluster mounts?
>> This is an important use case for us because this is the largest single
>> volume that we have to prepare large amounts of data to be archived.
>>
>> Thank you for your time,
>>
>> How did I not see this earlier? I'm seeing a very similar problem.  I
> just posted this to the bareos-user list:
>
> Help! I've run out of know-how while trying to fix this myself...
>
> Environment: CentOS 7, x86_64
> Bareos version: 14.2.2-46.1.el7 (via
> http://download.bareos.org/bareos/release/14.2/CentOS_7/ repo)
> Gluster version: 3.7.3-1.el7 (via
> http://download.gluster.org/pub/gluster/glusterfs/3.7/LATEST/EPEL.repo/epel-$releasever/$basearch/
> repo)
>
> Symptom: Bareos attempts to mount a volume, and spits back a Permission
> Denied error, as though it didn't have permission to access the relevant
> file.
>
> I've been seeing this at least since Gluster version 3.7.2, which I
> updated to owing to a need to expand my backend storage (and 3.7.1, which
> worked fine) had a bug that broke bricks while rebalancing.
>
> I've verified that the bareos storage daemon is running as the bareos
> user, and I've also, by way of FUSE mount into the gluster volume, verified
> ownership of the volume:
>
> # ls -l Email-Incremental-0155
> -rw-r-----. 1 bareos bareos 1073728379 Jun 10 21:04 Email-Incremental-0155
>
> And uid/gid, for reference:
>
> # ls -ln Email-Incremental-0155
> -rw-r-----. 1 997 995 1073728379 Jun 10 21:04 Email-Incremental-0155
>
> And in the gluster volume, the storage owner-{uid,gid}:
> # gluster volume info bareos
>
> Volume Name: bareos
> Type: Distribute
> Volume ID: f4cb7aac-3631-41cc-9afa-f182a514d116
> Status: Started
> Number of Bricks: 2
> Transport-type: tcp
> Bricks:
> Brick1: backup-stor-1[censored]:/var/gluster/bareos/brick-bareos
> Brick2: backup-stor-2[censored]:/var/gluster/bareos/brick-bareos
> Options Reconfigured:
> server.allow-insecure: on
> performance.readdir-ahead: off
> nfs.disable: on
> performance.cache-size: 128MB
> performance.write-behind-window-size: 256MB
> performance.cache-refresh-timeout: 10
> performance.io-thread-count: 16
> performance.cache-max-file-size: 4TB
> performance.flush-behind: on
> performance.client-io-threads: on
> storage.owner-uid: 997
> storage.owner-gid: 995
> features.bitrot: off
> features.scrub: Inactive
> features.scrub-freq: daily
> features.scrub-throttle: lazy
>
> In this run, the storage daemon and the file daemon happen to be on the
> same node. Here's trace output at level 200, obtained running "tail -f
> *.trace" in bareos-sd's cwd:
>
> ==> backup-director-sd.trace <==
> backup-director-sd: fd_cmds.c:219-0 <filed: append open session
> backup-director-sd: fd_cmds.c:303-0 Append open session: append open
> session
> backup-director-sd: fd_cmds.c:314-0 >filed: 3000 OK open ticket = 1
> backup-director-sd: fd_cmds.c:219-0 <filed: append data 1
> backup-director-sd: fd_cmds.c:265-0 Append data: append data 1
> backup-director-sd: fd_cmds.c:267-0 <filed: append data 1
> backup-director-sd: append.c:69-0 Start append data. res=1
> backup-director-sd: acquire.c:369-0 acquire_append device is disk
> backup-director-sd: acquire.c:404-0 jid=924 Do mount_next_write_vol
> backup-director-sd: mount.c:71-0 Enter mount_next_volume(release=0)
> dev="GlusterStorage4" (gluster://backup-stor-1[censored]/bareos/bareos)
> backup-director-sd: mount.c:84-0 mount_next_vol retry=0
> backup-director-sd: mount.c:604-0 No swap_dev set
> backup-director-sd: askdir.c:246-0 >dird CatReq
> Job=server2-email.2015-07-29_16.32.34_09 GetVolInfo
> VolName=Email-Incremental-0155 write=1
> backup-director-sd: askdir.c:175-0 <dird 1000 OK
> VolName=Email-Incremental-0155 VolJobs=0 VolFiles=0 VolBlocks=0 VolBytes=1
> VolMounts=3 VolErrors=0 VolWrites=16646 MaxVolBytes=1073741824
> VolCapacityBytes=0 VolStatus=Recycle Slot=0 MaxVolJobs=0 MaxVolFiles=0
> InChanger=0 VolReadTime=0 VolWriteTime=8455280 EndFile=0
> EndBlock=1073728378 LabelType=0 MediaId=156 EncryptionKey= MinBlocksize=0
> MaxBlocksize=0
> backup-director-sd: askdir.c:211-0 do_get_volume_info return true slot=0
> Volume=Email-Incremental-0155, VolminBlocksize=0 VolMaxBlocksize=0
> backup-director-sd: askdir.c:213-0 setting dcr->VolMinBlocksize(0) to
> vol.VolMinBlocksize(0)
> backup-director-sd: askdir.c:215-0 setting dcr->VolMaxBlocksize(0) to
> vol.VolMaxBlocksize(0)
> backup-director-sd: mount.c:122-0 After find_next_append.
> Vol=Email-Incremental-0155 Slot=0
> backup-director-sd: autochanger.c:99-0 Device "GlusterStorage4"
> (gluster://backup-stor-1[censored]/bareos/bareos) is not an autochanger
> backup-director-sd: mount.c:144-0 autoload_dev returns 0
> backup-director-sd: mount.c:175-0 want vol=Email-Incremental-0155 devvol=
> dev="GlusterStorage4" (gluster://backup-stor-1[censored]/bareos/bareos)
> backup-director-sd: dev.c:536-0 open dev: type=5
> dev_name="GlusterStorage4"
> (gluster://backup-stor-1[censored]/bareos/bareos)
> vol=Email-Incremental-0155 mode=OPEN_READ_WRITE
> backup-director-sd: dev.c:540-0 call open_device mode=OPEN_READ_WRITE
> backup-director-sd: dev.c:941-0 Enter mount
> backup-director-sd: dev.c:610-0 open disk: mode=OPEN_READ_WRITE
> open(gluster://backup-stor-1[censored]/bareos/bareos/Email-Incremental-0155,
> 0x2, 0640)
>
> ==> backup-director-fd.trace <==
>
> ==> backup-director-sd.trace <==
> backup-director-sd: dev.c:617-0 open failed: dev.c:616 Could not open:
> gluster://backup-stor-1[censored]/bareos/bareos/Email-Incremental-0155,
> ERR=Permission denied
>

In response to Pranith's suggestion to Ryan to look at logs, I did find
this interesting in root-bareos.log when I FUSE-mounted the volume.
(Interesting, because everything is running the same version of gluster, at
least as far as packages are telling me.)

==> root-bareos.log <==
[2015-07-29 21:26:39.465191] I [MSGID: 114057]
[client-handshake.c:1437:select_server_supported_programs]
0-bareos-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2015-07-29 21:26:39.465737] I [MSGID: 114057]
[client-handshake.c:1437:select_server_supported_programs]
0-bareos-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2015-07-29 21:26:39.465935] I [MSGID: 114046]
[client-handshake.c:1213:client_setvolume_cbk] 0-bareos-client-1: Connected
to bareos-client-1, attached to remote volume
'/var/gluster/bareos/brick-bareos'.
[2015-07-29 21:26:39.465999] I [MSGID: 114047]
[client-handshake.c:1224:client_setvolume_cbk] 0-bareos-client-1: Server
and Client lk-version numbers are not same, reopening the fds
[2015-07-29 21:26:39.466319] I [MSGID: 114046]
[client-handshake.c:1213:client_setvolume_cbk] 0-bareos-client-0: Connected
to bareos-client-0, attached to remote volume
'/var/gluster/bareos/brick-bareos'.
[2015-07-29 21:26:39.466344] I [MSGID: 114047]
[client-handshake.c:1224:client_setvolume_cbk] 0-bareos-client-0: Server
and Client lk-version numbers are not same, reopening the fds
[2015-07-29 21:26:39.471772] I [fuse-bridge.c:5053:fuse_graph_setup]
0-fuse: switched to graph 0
[2015-07-29 21:26:39.471953] I [MSGID: 114035]
[client-handshake.c:193:client_set_lk_version_cbk] 0-bareos-client-1:
Server lk version = 1
[2015-07-29 21:26:39.472000] I [MSGID: 114035]
[client-handshake.c:193:client_set_lk_version_cbk] 0-bareos-client-0:
Server lk version = 1
[2015-07-29 21:26:39.473230] I [fuse-bridge.c:3979:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel
7.22

On both bricks, there's this or similar, but the timestamps don't correlate
with the bareos errors:

The message "W [MSGID: 101095] [xlator.c:143:xlator_volopt_dynload]
0-xlator: /usr/lib64/glusterfs/3.7.3/xlator/features/bitrot.so: cannot open
shared object file: No such file or directory" repeated 3 times between
[2015-07-29 19:50:34.593333] and [2015-07-29 19:50:34.593486]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150729/5fbf4cde/attachment.html>