[Gluster-users] Gluster/libgfapi and bareos

Thu Jul 30 17:32:13 UTC 2015

Help! I've run out of know-how while trying to fix this myself...

Environment: CentOS 7, x86_64
Bareos version: 14.2.2-46.1.el7 (via
http://download.bareos.org/bareos/release/14.2/CentOS_7/ repo)
Gluster version: 3.7.3-1.el7 (via
http://download.gluster.org/pub/gluster/glusterfs/3.7/LATEST/EPEL.repo/epel-$releasever/$basearch/
 repo)

Symptom: Bareos attempts to mount a volume, and spits back a Permission
Denied error, as though it didn't have permission to access the relevant
file.

I've been seeing this at least since Gluster version 3.7.2, which I updated
to owing to a need to expand my backend storage (and 3.7.1, which worked
fine) had a bug that broke bricks while rebalancing.

I've verified that the bareos storage daemon is running as the bareos user,
and I've also, by way of FUSE mount into the gluster volume, verified
ownership of the volume:

# ls -l Email-Incremental-0155
-rw-r-----. 1 bareos bareos 1073728379 Jun 10 21:04 Email-Incremental-0155

And uid/gid, for reference:

# ls -ln Email-Incremental-0155
-rw-r-----. 1 997 995 1073728379 Jun 10 21:04 Email-Incremental-0155

And in the gluster volume, the storage owner-{uid,gid}:
# gluster volume info bareos

Volume Name: bareos
Type: Distribute
Volume ID: f4cb7aac-3631-41cc-9afa-f182a514d116
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: backup-stor-1[censored]:/var/gluster/bareos/brick-bareos
Brick2: backup-stor-2[censored]:/var/gluster/bareos/brick-bareos
Options Reconfigured:
server.allow-insecure: on
performance.readdir-ahead: off
nfs.disable: on
performance.cache-size: 128MB
performance.write-behind-window-size: 256MB
performance.cache-refresh-timeout: 10
performance.io-thread-count: 16
performance.cache-max-file-size: 4TB
performance.flush-behind: on
performance.client-io-threads: on
storage.owner-uid: 997
storage.owner-gid: 995
features.bitrot: off
features.scrub: Inactive
features.scrub-freq: daily
features.scrub-throttle: lazy

In this run, the storage daemon and the file daemon happen to be on the
same node. Here's trace output at level 200, obtained running "tail -f
*.trace" in bareos-sd's cwd:

==> backup-director-sd.trace <==
backup-director-sd: fd_cmds.c:219-0 <filed: append open session
backup-director-sd: fd_cmds.c:303-0 Append open session: append open session
backup-director-sd: fd_cmds.c:314-0 >filed: 3000 OK open ticket = 1
backup-director-sd: fd_cmds.c:219-0 <filed: append data 1
backup-director-sd: fd_cmds.c:265-0 Append data: append data 1
backup-director-sd: fd_cmds.c:267-0 <filed: append data 1
backup-director-sd: append.c:69-0 Start append data. res=1
backup-director-sd: acquire.c:369-0 acquire_append device is disk
backup-director-sd: acquire.c:404-0 jid=924 Do mount_next_write_vol
backup-director-sd: mount.c:71-0 Enter mount_next_volume(release=0)
dev="GlusterStorage4" (gluster://backup-stor-1[censored]/bareos/bareos)
backup-director-sd: mount.c:84-0 mount_next_vol retry=0
backup-director-sd: mount.c:604-0 No swap_dev set
backup-director-sd: askdir.c:246-0 >dird CatReq
Job=server2-email.2015-07-29_16.32.34_09 GetVolInfo
VolName=Email-Incremental-0155 write=1
backup-director-sd: askdir.c:175-0 <dird 1000 OK
VolName=Email-Incremental-0155 VolJobs=0 VolFiles=0 VolBlocks=0 VolBytes=1
VolMounts=3 VolErrors=0 VolWrites=16646 MaxVolBytes=1073741824
VolCapacityBytes=0 VolStatus=Recycle Slot=0 MaxVolJobs=0 MaxVolFiles=0
InChanger=0 VolReadTime=0 VolWriteTime=8455280 EndFile=0
EndBlock=1073728378 LabelType=0 MediaId=156 EncryptionKey= MinBlocksize=0
MaxBlocksize=0
backup-director-sd: askdir.c:211-0 do_get_volume_info return true slot=0
Volume=Email-Incremental-0155, VolminBlocksize=0 VolMaxBlocksize=0
backup-director-sd: askdir.c:213-0 setting dcr->VolMinBlocksize(0) to
vol.VolMinBlocksize(0)
backup-director-sd: askdir.c:215-0 setting dcr->VolMaxBlocksize(0) to
vol.VolMaxBlocksize(0)
backup-director-sd: mount.c:122-0 After find_next_append.
Vol=Email-Incremental-0155 Slot=0
backup-director-sd: autochanger.c:99-0 Device "GlusterStorage4"
(gluster://backup-stor-1[censored]/bareos/bareos) is not an autochanger
backup-director-sd: mount.c:144-0 autoload_dev returns 0
backup-director-sd: mount.c:175-0 want vol=Email-Incremental-0155 devvol=
dev="GlusterStorage4" (gluster://backup-stor-1[censored]/bareos/bareos)
backup-director-sd: dev.c:536-0 open dev: type=5 dev_name="GlusterStorage4"
(gluster://backup-stor-1[censored]/bareos/bareos)
vol=Email-Incremental-0155 mode=OPEN_READ_WRITE
backup-director-sd: dev.c:540-0 call open_device mode=OPEN_READ_WRITE
backup-director-sd: dev.c:941-0 Enter mount
backup-director-sd: dev.c:610-0 open disk: mode=OPEN_READ_WRITE
open(gluster://backup-stor-1[censored]/bareos/bareos/Email-Incremental-0155,
0x2, 0640)

==> backup-director-fd.trace <==

==> backup-director-sd.trace <==
backup-director-sd: dev.c:617-0 open failed: dev.c:616 Could not open:
gluster://backup-stor-1[censored]/bareos/bareos/Email-Incremental-0155,
ERR=Permission denied

...

In response to Pranith's suggestion to Ryan (in another thread) to look at
logs, I did find this interesting in root-bareos.log when I FUSE-mounted
the volume. (Interesting, because everything is running the same version of
gluster, at least as far as packages are telling me.)

==> root-bareos.log <==
[2015-07-29 21:26:39.465191] I [MSGID: 114057]
[client-handshake.c:1437:select_server_supported_programs]
0-bareos-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2015-07-29 21:26:39.465737] I [MSGID: 114057]
[client-handshake.c:1437:select_server_supported_programs]
0-bareos-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2015-07-29 21:26:39.465935] I [MSGID: 114046]
[client-handshake.c:1213:client_setvolume_cbk] 0-bareos-client-1: Connected
to bareos-client-1, attached to remote volume
'/var/gluster/bareos/brick-bareos'.
[2015-07-29 21:26:39.465999] I [MSGID: 114047]
[client-handshake.c:1224:client_setvolume_cbk] 0-bareos-client-1: Server
and Client lk-version numbers are not same, reopening the fds
[2015-07-29 21:26:39.466319] I [MSGID: 114046]
[client-handshake.c:1213:client_setvolume_cbk] 0-bareos-client-0: Connected
to bareos-client-0, attached to remote volume
'/var/gluster/bareos/brick-bareos'.
[2015-07-29 21:26:39.466344] I [MSGID: 114047]
[client-handshake.c:1224:client_setvolume_cbk] 0-bareos-client-0: Server
and Client lk-version numbers are not same, reopening the fds
[2015-07-29 21:26:39.471772] I [fuse-bridge.c:5053:fuse_graph_setup]
0-fuse: switched to graph 0
[2015-07-29 21:26:39.471953] I [MSGID: 114035]
[client-handshake.c:193:client_set_lk_version_cbk] 0-bareos-client-1:
Server lk version = 1
[2015-07-29 21:26:39.472000] I [MSGID: 114035]
[client-handshake.c:193:client_set_lk_version_cbk] 0-bareos-client-0:
Server lk version = 1
[2015-07-29 21:26:39.473230] I [fuse-bridge.c:3979:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel
7.22

On both bricks, there's this or similar, but the timestamps don't correlate
with the bareos errors:

The message "W [MSGID: 101095] [xlator.c:143:xlator_volopt_dynload]
0-xlator: /usr/lib64/glusterfs/3.7.3/xlator/features/bitrot.so: cannot open
shared object file: No such file or directory" repeated 3 times between
[2015-07-29 19:50:34.593333] and [2015-07-29 19:50:34.593486]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150730/99165d70/attachment.html>