[Gluster-users] [Gluster-devel] In a replica 2 server, file-updates on one server missing on the other server #Personal#

Fri Feb 20 14:00:43 UTC 2015

I found out the reason this happens a few of days back. Just to let you 
know..

It seems it has partly to do with the way we handle reboots on our setup. 
When we take down one of our replica servers (for testing/maintenance), to 
ensure that the bricks are unmounted correctly, we kill off the glusterfsd 
processes (short of stopping the volume and causing service disruption to 
the mount clients). Let us assume that serv1 is being rebooted. When we 
kill off glusterfsd,

For file-systems that are normally not accessed: 

1. ping between the mount client on serv0 and the brick's glusterfsd on 
serv1 times out. In our system, this ping is configured at 10 seconds. 

2. At this point, the mount client on serv0 destroys the now defunct TCP 
connection and querying the port of the remote brick with the remote 
glusterd process. 

3. But, since by this time serv1 is already down, no response arrives, and 
the local mount client retries the query till serv1 is up once more, upon 
which the glusterd on serv1  responds with the newly allocated port number 
for the brick, and a new connection is thus established.

For frequently accessed file-systems: 

1. it is one of the file operations (read/write) that times out. This 
happens much earlier than 10 seconds. 
This results in the connection being destroyed and the mount client on 
serv0 querying remote glusterd for the remote brick's port number. 

2. Because this happens so quickly, glusterd on serv1 is not yet down, and 
also unaware that the local brick is not alive anymore. So, it returns the 
port number of the dead process.

3. For the mount client on serv0, since the query succeeded, it does not 
attempt another port query, but instead tries to connect to the stale port 
number ad infinitum. 

Our solution to this problem is simple - before we kill glusterfsd and 
unmount the bricks, we stop glusterd:

/etc/init.d/glusterd stop

This ensures that the portmap queries by the mount client on serv0 are 
never honored.

Thanks,
Anirban

From:   A Ghoshal/MUM/TCS
To:     Ben England <bengland at redhat.com>
Cc:     gluster-users at gluster.org
Date:   02/05/2015 04:50 AM
Subject:        Re: [Gluster-devel] [Gluster-users] In a replica 2 server, 
file-updates on one server missing on the other server #Personal#
Sent by:        A Ghoshal

CC gluster-users.

No, there aren't any firewall rules in our server. As I write in one of my 
earlier emails, if I kill the mount client, and remount the volume, then 
the problem disappears. That is to say, this causes the client to refresh 
remote port data and from there everything's fine. Also, we dont' use 
gfapi - and bind() is always good.

From:   Ben England <bengland at redhat.com>
To:     A Ghoshal <a.ghoshal at tcs.com>
Date:   02/05/2015 04:40 AM
Subject:        Re: [Gluster-devel] [Gluster-users] In a replica 2 server, 
file-updates on one server missing on the other server #Personal#

could it be a problem with iptables blocking connections?  DO iptables 
--list and make sure gluster ports are allowed through, at both ends. 
Also, if you are using libgfapi, be sure you use rpc-auth-allow-insecure 
if you have a lot of gfapi instances, or else you'll run into problems.

----- Original Message -----
> From: "A Ghoshal" <a.ghoshal at tcs.com>
> To: "Ben England" <bengland at redhat.com>
> Sent: Wednesday, February 4, 2015 6:07:10 PM
> Subject: Re: [Gluster-devel] [Gluster-users] In a replica 2 server, 
file-updates on one server missing on the other
> server #Personal#
> 
> Thanks, Ben, same here :/ I actually get port numbers for glusterfsd in
> any of the three ways:
> 
> 1. gluster volume status <VOLNAME>
> 2. command line for glusterfsd on target server.
> 3. if you're really paranoid, get the glusterfsd PID and use netstat.
> 
> Looking at the code it seems to me that the whole thing operates on a
> statd-notify paradigm. Your local mount client registers for notify on 
all
> remote glusterfsd's. When remote brick goes down and comes back up, you
> are notified and then it calls portmap to obtain remote glusterfsd port.
> 
> I see here that both glusterd are up. But somehow the port number of the
> remote glusterfsd with the mount client is now stale - not sure how it
> happens. Now, the client keeps trying to connect on the stale port every 
3
> seconds. It gets the return errno of -111 (-ECONNREFUSED) which is 
clearly
> indicating that there is not listener on the remote host's IP at this
> port.
> 
> Design-wise, could it indicate to the mount client that the port number
> information needs to be refreshed? Would you say this is a bug of sorts?
> 
> 
> 
> 
> From:   Ben England <bengland at redhat.com>
> To:     A Ghoshal <a.ghoshal at tcs.com>
> Date:   02/05/2015 03:59 AM
> Subject:        Re: [Gluster-devel] [Gluster-users] In a replica 2 
server,
> file-updates on one server missing on the other server #Personal#
> 
> 
> 
> I thought Gluster was based on ONC RPC, which means there are no fixed
> port numbers except for glusterd (24007).  The client connects to
> Glusterd, reads the volfile, and gets the port numbers of the registered
> glusterfsd processes at that time, then it connects to glusterfsd.  Make
> sense?  What you need to know is whether glusterfsd is running or not, 
and
> whether glusterd is finding out current state of glusterfsd.
> /var/log/glusterfsd/bricks/*log has log files for each glusterfsd 
process,
> you might be able to see from that what glusterfsd port number is.
> /var/log/glusterfs/etc*log is glusterd's log file, it might say whether
> glusterd knows about glusterfsd.  I'm not as good at troubleshooting as
> some of the other people are so don't take my word for it.
> -ben
> 
> 
> ----- Original Message -----
> > From: "A Ghoshal" <a.ghoshal at tcs.com>
> > To: gluster-devel at gluster.org
> > Cc: gluster-users at gluster.org, gluster-users-bounces at gluster.org
> > Sent: Wednesday, February 4, 2015 4:36:02 PM
> > Subject: Re: [Gluster-devel] [Gluster-users] In a replica 2 server,
> file-updates on one server missing on the other
> > server #Personal#
> > 
> > Sorry for spamming you guys, but this is kind of important for me to
> debug,
> > so if you saw anything like this before, do let me know. Here's an
> update:
> > 
> > It seems the mount client is attempting connection with an invalid 
port
> > number. 49175 is NOT the port number of glusterfsd on serv1
> (192.168.24.8).
> > 
> > I got me an strace:
> > 
> > [pid 31026] open("/proc/sys/net/ipv4/ip_local_reserved_ports", 
O_RDONLY)
> = -1
> > ENOENT (No such file or directory)
> > [pid 31026] write(4, "[2015-02-04 20:39:02.793154] W ["..., 215) = 215
> > [pid 31026] write(4, "[2015-02-04 20:39:02.793289] W ["..., 194) = 194
> > [pid 31026] bind(10, {sa_family=AF_INET, sin_port=htons(1023),
> > sin_addr=inet_addr("192.168.24.80")}, 16) = 0
> > [pid 31026] fcntl(10, F_GETFL) = 0x2 (flags O_RDWR)
> > [pid 31026] fcntl(10, F_SETFL, O_RDWR|O_NONBLOCK) = 0
> > [pid 31026] connect(10, {sa_family=AF_INET, sin_port=htons(49175),
> > sin_addr=inet_addr("192.168.24.81")}, 16) = -1 EINPROGRESS (Operation
> now in
> > progress)
> > [pid 31026] fcntl(10, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
> > [pid 31026] fcntl(10, F_SETFL, O_RDWR|O_NONBLOCK) = 0
> > [pid 31026] epoll_ctl(3, EPOLL_CTL_ADD, 10, 
{EPOLLIN|EPOLLPRI|EPOLLOUT,
> > {u32=10, u64=8589934602}}) = 0
> > [pid 31026] nanosleep({1, 0}, <unfinished ...>
> > [pid 31021] <... epoll_wait resumed>
> {{EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP,
> > {u32=10, u64=8589934602}}}, 257, 4294967295) = 1
> > [pid 31021] getsockopt(10, SOL_SOCKET, SO_ERROR, [29422518842425455],
> [4]) =
> > 0
> > [pid 31021] shutdown(10, 2 /* send and receive */) = -1 ENOTCONN
> (Transport
> > endpoint is not connected)
> > 
> > Which basically told me that connection is attempted via a 
non-blocking
> > socket at port 49175. The errno from the failure is -ECONNREFUSED, 
which
> is
> > what is expected.
> > 
> > 807 in socket.c
> > (gdb) bt
> > #0 __socket_connect_finish (this=0x6887a0) at socket.c:807
> > #1 socket_connect_finish (this=0x6887a0) at socket.c:2147
> > #2 0x00007fc863de4c04 in socket_event_handler (fd=<value optimized 
out>,
> > idx=<value optimized out>, data=0x6887a0,
> > poll_in=1, poll_out=4, poll_err=<value optimized out>) at 
socket.c:2223
> > #3 0x00007fc867f7919f in event_dispatch_epoll_handler
> (event_pool=0x62db70)
> > at event-epoll.c:384
> > #4 event_dispatch_epoll (event_pool=0x62db70) at event-epoll.c:445
> > #5 0x0000000000406b06 in main (argc=4, argv=0x7fff25302c38) at
> > glusterfsd.c:1934
> > (gdb) print *optval
> > Cannot access memory at address 0x6f
> > (gdb) print optval
> > $1 = 111
> > 
> > Note that this agrees with the following debug log:
> > 
> > [2015-02-03 12:11:33.833647] D
> [socket.c:1962:__socket_proto_state_machine]
> > 0-replicated_vol-1: reading from socket failed. Error (No data
> available),
> > peer (192.168.24.81:49175)
> > 
> > There is, of course no service running on port 49175. In fact, listen
> port
> > for corresponding glusterd on serv1 is 49206. Where does the mount
> client
> > pick this port number from? I know that if I kill and restart the 
mount
> > client on serv0 from command line, then the problem will disappear. 
So,
> it's
> > not something that is up with the processes on serv1...
> > 
> > Thanks,
> > Anirban
> > 
> > 
> > From: A Ghoshal/MUM/TCS
> > To: A Ghoshal <a.ghoshal at tcs.com>
> > Cc: gluster-users at gluster.org, gluster-users-bounces at gluster.org,
> Pranith
> > Kumar Karampuri <pkarampu at redhat.com>
> > Date: 02/05/2015 02:03 AM
> > Subject: Re: [Gluster-users] In a replica 2 server, file-updates on 
one
> > server missing on the other server #Personal#
> > Sent by: A Ghoshal
> > 
> > 
> > 
> > Ok, more updates here:
> > 
> > I turned on trace and it seems bind to secure port on the mount client
> with
> > the remote brick is successful - afterwards the connect() fails to
> complete.
> > I saw these logs:
> > 
> > [2015-02-03 12:11:33.832615] T [rpc-clnt.c:422:rpc_clnt_reconnect]
> > 0-replicated_vol-1: attempting reconnect
> > [2015-02-03 12:11:33.832666] D [name.c:155:client_fill_address_family]
> > 0-replicated_vol-1: address-family not specified, guessing it to be 
inet
> > from (remote-host: serv1)
> > [2015-02-03 12:11:33.832683] T
> > [name.c:225:af_inet_client_get_remote_sockaddr] 0-replicated_vol-1:
> option
> > remote-port missing in volume replicated_vol-1. Defaulting to 24007
> > [2015-02-03 12:11:33.833083] D [common-utils.c:237:gf_resolve_ip6]
> > 0-resolver: returning ip-192.168.24.81 (port-24007) for hostname: 
serv1
> and
> > port: 24007
> > [2015-02-03 12:11:33.833113] T [socket.c:731:__socket_nodelay]
> > 0-replicated_vol-1: NODELAY enabled for socket 10
> > [2015-02-03 12:11:33.833128] T [socket.c:790:__socket_keepalive]
> > 0-replicated_vol-1: Keep-alive enabled for socket 10, interval 2, 
idle:
> 20
> > [2015-02-03 12:11:33.833188] W
> [common-utils.c:2247:gf_get_reserved_ports]
> > 0-glusterfs: could not open the file
> > /proc/sys/net/ipv4/ip_local_reserved_ports for getting reserved ports
> info
> > (No such file or directory)
> > [2015-02-03 12:11:33.833204] W
> > [common-utils.c:2280:gf_process_reserved_ports] 0-glusterfs: Not able 
to
> get
> > reserved ports, hence there is a possibility that glusterfs may 
consume
> > reserved port
> > [2015-02-03 12:11:33.833560] D [socket.c:605:__socket_shutdown]
> > 0-replicated_vol-1: shutdown() returned -1. Transport endpoint is not
> > connected
> > [2015-02-03 12:11:33.833594] T
> [rpc-clnt.c:519:rpc_clnt_connection_cleanup]
> > 0-replicated_vol-1: cleaning up state in transport object 0x6887a0
> > [2015-02-03 12:11:33.833622] D [socket.c:486:__socket_rwv]
> > 0-replicated_vol-1: EOF on socket
> > [2015-02-03 12:11:33.833635] W [socket.c:514:__socket_rwv]
> > 0-replicated_vol-1: readv failed (No data available)
> > [2015-02-03 12:11:33.833647] D
> [socket.c:1962:__socket_proto_state_machine]
> > 0-replicated_vol-1: reading from socket failed. Error (No data
> available),
> > peer (192.168.24.81:49175)
> > [2015-02-03 12:11:33.833659] D [socket.c:2236:socket_event_handler]
> > 0-transport: disconnecting now
> > [2015-02-03 12:11:33.833684] T
> [rpc-clnt.c:519:rpc_clnt_connection_cleanup]
> > 0-replicated_vol-1: cleaning up state in transport object 0x6887a0
> > 
> > After that, I attached the mount process to my gdb session and 
attached
> a
> > breakpoint at __socket_shutdown(). It gave me the following traceback
> > 
> > Breakpoint 1, __socket_shutdown (this=0x68a7f0) at socket.c:594
> > 594 socket.c: No such file or directory.
> > in socket.c
> > (gdb) bt
> > #0 __socket_shutdown (this=0x68a7f0) at socket.c:594
> > #1 0x00007f3391906553 in __socket_disconnect (this=0x68a7f0) at
> socket.c:623
> > #2 0x00007f339190856a in socket_connect_finish (this=0x68a7f0) at
> > socket.c:2160
> > #3 0x00007f339190dc04 in socket_event_handler (fd=<value optimized 
out>,
> > idx=<value optimized out>, data=0x68a7f0,
> > poll_in=1, poll_out=4, poll_err=<value optimized out>) at 
socket.c:2223
> > #4 0x00007f3395aa219f in event_dispatch_epoll_handler
> (event_pool=0x62d8b0)
> > at event-epoll.c:384
> > #5 event_dispatch_epoll (event_pool=0x62d8b0) at event-epoll.c:445
> > #6 0x0000000000406b06 in main (argc=4, argv=0x7fff71108c38) at
> > glusterfsd.c:1934
> > (gdb)
> > 
> > In case somebody could give me a clue or two on my next move (I'm
> looking at
> > the code in the meantime), I'd be very grateful.
> > 
> > Thanks,
> > Anirban
> > 
> > 
> > 
> > 
> > From: A Ghoshal <a.ghoshal at tcs.com>
> > To: Pranith Kumar Karampuri <pkarampu at redhat.com>
> > Cc: gluster-users at gluster.org
> > Date: 02/03/2015 05:24 AM
> > Subject: Re: [Gluster-users] In a replica 2 server, file-updates on 
one
> > server missing on the other server #Personal#
> > Sent by: gluster-users-bounces at gluster.org
> > 
> > 
> > 
> > 
> > Hi Pranith,
> > 
> > I finally understood what you meant the secure ports, because the 
issue
> > occurred in one of our setups once more. It seems one of the clients 
on
> > serv1 could not open a connection to the glusterfsd running on serv0.
> I'd
> > actually started a mail trail about it (believing it might be 
something
> > else, initially) here:
> > 
http://www.gluster.org/pipermail/gluster-users/2015-February/020465.html
> > 
> > I think I can write me a rudimentary kind of patch altering
> > af_inet_bind_to_port_lt_ceiling() to get it to call bind with port 0,
> rather
> > than specify a port explicitly when the client.bind-insecure is
> specified
> > ...
> > 
> > Then I'd need to create a way to set server.allow-insecure using the 
cli
> (or
> > if you already sent around the patch to do that like you said in the
> earlier
> > mail, do let me know). I'll keep you posted about it round here or @
> > [gluster-devel] if I can get it to work.
> > 
> > Thanks a lot,
> > 
> > Anirban
> > 
> > 
> > 
> > From: A Ghoshal/MUM/TCS
> > To: Pranith Kumar Karampuri <pkarampu at redhat.com>
> > Cc: gluster-users at gluster.org, Niels de Vos <ndevos at redhat.com>
> > Date: 01/23/2015 02:45 PM
> > Subject: Re: [Gluster-users] In a replica 2 server, file-updates on 
one
> > server missing on the other server #Personal#
> > Sent by: A Ghoshal
> > 
> > 
> > 
> > Oh, I didn't I only read a fragment of the IRC log and assumed
> > --xlator-option would be enough. Apparently it's a lot more work....
> > 
> > I do have a query, though. These connections, from one of our setups,
> are
> > these on secure ports? Or, maybe I didn't get it the first time.....
> > 
> > root at serv0:/root> ps -ef | grep replicated_vol
> > root 8851 25307 0 10:03 pts/2 00:00:00 grep replicated_vol
> > root 29751 1 4 Jan21 ? 01:47:20 /usr/sbin/glusterfsd -s serv0
> --volfile-id
> > replicated_vol.serv0.mnt-bricks-replicated_vol-brick -p
> > 
> 
/var/lib/glusterd/vols/_replicated_vol/run/serv0-mnt-bricks-replicated_vol-brick.pid
> > -S /var/run/dff9fa3c93e82f20103f2a3d91adc4a8.socket --brick-name
> > /mnt/bricks/replicated_vol/brick -l
> > /var/log/glusterfs/bricks/mnt-bricks-replicated_vol-brick.log
> > --xlator-option
> *-posix.glusterd-uuid=1a1d1ebc-4b92-428f-b66b-9c5efa49574d
> > --brick-port 49185 --xlator-option
> replicated_vol-server.listen-port=49185
> > root 30399 1 0 Jan21 ? 00:19:06 /usr/sbin/glusterfs
> > --volfile-id=replicated_vol --volfile-server=serv0 /mnt/replicated_vol
> > 
> > root at serv0:/root> netstat -p | grep 30399
> > tcp 0 0 serv0:969 serv0:49185 ESTABLISHED 30399/glusterfs
> > tcp 0 0 serv0:999 serv1:49159 ESTABLISHED 30399/glusterfs
> > tcp 0 0 serv0:1023 serv0:24007 ESTABLISHED 30399/glusterfs
> > root at serv0:/root>
> > 
> > Thanks again,
> > Anirban
> > 
> > 
> > 
> > 
> > From: Pranith Kumar Karampuri <pkarampu at redhat.com>
> > To: A Ghoshal <a.ghoshal at tcs.com>
> > Cc: gluster-users at gluster.org, Niels de Vos <ndevos at redhat.com>
> > Date: 01/23/2015 01:58 PM
> > Subject: Re: [Gluster-users] In a replica 2 server, file-updates on 
one
> > server missing on the other server #Personal#
> > 
> > 
> > 
> > 
> > 
> > On 01/23/2015 01:54 PM, A Ghoshal wrote:
> > Thanks a lot, Pranith.
> > 
> > We'll set this option on our test servers and keep the setup under
> > observation.
> > How did you get the bind-insecure option working?
> > I guess I will send a patch to make it 'volume set option'
> > 
> > Pranith
> > 
> > Thanks,
> > Anirban
> > 
> > 
> > 
> > From: Pranith Kumar Karampuri <pkarampu at redhat.com>
> > To: A Ghoshal <a.ghoshal at tcs.com>
> > Cc: gluster-users at gluster.org , Niels de Vos <ndevos at redhat.com>
> > Date: 01/23/2015 01:28 PM
> > Subject: Re: [Gluster-users] In a replica 2 server, file-updates on 
one
> > server missing on the other server #Personal#
> > 
> > 
> > 
> > 
> > 
> > On 01/22/2015 02:07 PM, A Ghoshal wrote:
> > Hi Pranith,
> > 
> > Yes, the very same ( chalcogen_eg_oxygen at yahoo.com ). Justin Clift 
sent
> me a
> > mail a while back telling me that it is better if we all use our
> business
> > email addresses so I made me a new profile.
> > 
> > Glusterfs complains about /proc/sys/net/ipv4/ip_local_reserved_ports
> because
> > we use a really old Linux kernel (2.6.34) wherein this feature is not
> > present. We plan to upgrade our Linux so often but each time we are
> > dissuaded from it by some compatibility issue or the other. So, we get
> this
> > log every time - on both good volumes and bad ones. What bothered me 
was
> > this (on serv1):
> > Basically to make the connections to servers i.e. bricks clients need 
to
> > choose secure ports i.e. port less than 1024. Since this file is not
> > present, it is not binding to any port as per the code I just checked.
> There
> > is an option called client-bind-insecure which bypasses this check. I
> feel
> > that is one (probably only way) to get around this.
> > You have to "volume set server.allow-insecure on" option and
> bind-insecure
> > option.
> > CC ndevos who seemed to have helped someone set bind-insecure option
> > correctly here ( http://irclog.perlgeek.de/gluster/2014-04-09/text )
> > 
> > Pranith
> > 
> > [2015-01-20 09:37:49.151744] T
> [rpc-clnt.c:1182:rpc_clnt_record_build_header]
> > 0-rpc-clnt: Request fraglen 456, payload: 360, rpc hdr: 96
> > [2015-01-20 09:37:49.151780] T [rpc-clnt.c:1499:rpc_clnt_submit]
> 0-rpc-clnt:
> > submitted request (XID: 0x39620x Program: GlusterFS 3.3, ProgVers: 
330,
> > Proc: 27) to rpc-transport (replicated_vol-client-0)
> > [2015-01-20 09:37:49.151810] T [rpc-clnt.c:1302:rpc_clnt_record]
> > 0-replicated_vol-client-1: Auth Info: pid: 7599, uid: 0, gid: 0, 
owner:
> > 0000000000000000
> > [2015-01-20 09:37:49.151824] T
> [rpc-clnt.c:1182:rpc_clnt_record_build_header]
> > 0-rpc-clnt: Request fraglen 456, payload: 360, rpc hdr: 96
> > [2015-01-20 09:37:49.151889] T [rpc-clnt.c:1499:rpc_clnt_submit]
> 0-rpc-clnt:
> > submitted request (XID: 0x39563x Program: GlusterFS 3.3, ProgVers: 
330,
> > Proc: 27) to rpc-transport (replicated_vol-client-1)
> > [2015-01-20 09:37:49.152239] T [rpc-clnt.c:669:rpc_clnt_reply_init]
> > 0-replicated_vol-client-1: received rpc message (RPC XID: 0x39563x
> Program:
> > GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport
> > (replicated_vol-client-1)
> > [2015-01-20 09:37:49.152484] T [rpc-clnt.c:669:rpc_clnt_reply_init]
> > 0-replicated_vol-client-0: received rpc message (RPC XID: 0x39620x
> Program:
> > GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport
> > (replicated_vol-client-0)
> > 
> > When I write on the good server (serv1), we see that an RPC request is
> sent
> > to both client-0 and client-1. While, when I write on the bad server
> > (serv0), the RPC request is sent only to client-0, which is why it is 
no
> > wonder that the writes are not synced over to serv1. Somehow I could 
not
> > make the daemon on serv0 understand that there are two up-children and
> not
> > just one.
> > 
> > One additional detail - since we are using a kernel that's too old, we
> do not
> > have the (Anand Avati's?) FUse readdirplus patches, either. I've 
noticed
> > that the fixes in the readdirplus version of glusterfs aren't always
> > guaranteed to be present on the non-readdirplus version of the 
patches.
> I'd
> > filed a bug around one such anomaly back, but never got around to
> writing a
> > patch for it (sorry!) Here it is:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1062287
> > I don't this has anything to do with readdirplus.
> > 
> > Maybe something on similar lines here?
> > 
> > Thanks,
> > Anirban
> > 
> > P.s. Please ignore the #Personal# in the subject line - we need to do
> that to
> > push mails to the public domain past the email filter safely.
> > 
> > 
> > 
> > From: Pranith Kumar Karampuri <pkarampu at redhat.com>
> > To: A Ghoshal <a.ghoshal at tcs.com> , gluster-users at gluster.org
> > Date: 01/22/2015 12:09 AM
> > Subject: Re: [Gluster-users] In a replica 2 server, file-updates on 
one
> > server missing on the other server
> > 
> > 
> > 
> > 
> > hi,
> > Responses inline.
> > 
> > PS: You are chalkogen_oxygen?
> > 
> > Pranith
> > On 01/20/2015 05:34 PM, A Ghoshal wrote:
> > Hello,
> > 
> > I am using the following replicated volume:
> > 
> > root at serv0:~> gluster v info replicated_vol
> > 
> > Volume Name: replicated_vol
> > Type: Replicate
> > Volume ID: 26d111e3-7e4c-479e-9355-91635ab7f1c2
> > Status: Started
> > Number of Bricks: 1 x 2 = 2
> > Transport-type: tcp
> > Bricks:
> > Brick1: serv0:/mnt/bricks/replicated_vol/brick
> > Brick2: serv1:/mnt/bricks/replicated_vol/brick
> > Options Reconfigured:
> > diagnostics.client-log-level: INFO
> > network.ping-timeout: 10
> > nfs.enable-ino32: on
> > cluster.self-heal-daemon: on
> > nfs.disable: off
> > 
> > replicated_vol is mounted at /mnt/replicated_vol on both serv0 and
> serv1. If
> > I do the following on serv0:
> > 
> > root at serv0:~>echo "cranberries" > /mnt/replicated_vol/testfile
> > root at serv0:~>echo "tangerines" >> /mnt/replicated_vol/testfile
> > 
> > And then I check for the state of the replicas in the bricks, then I
> find
> > that
> > 
> > root at serv0:~>cat /mnt/bricks/replicated_vol/brick/testfile
> > cranberries
> > tangerines
> > root at serv0:~>
> > 
> > root at serv1:~>cat /mnt/bricks/replicated_vol/brick/testfile
> > root at serv1:~>
> > 
> > As may be seen, the replica on serv1 is blank, when I write into
> testfile
> > from serv0 (even though the file is created on both bricks).
> Interestingly,
> > if I write something to the file at serv1, then the two replicas 
become
> > identical.
> > 
> > root at serv1:~>echo "artichokes" >> /mnt/replicated_vol/testfile
> > 
> > root at serv1:~>cat /mnt/bricks/replicated_vol/brick/testfile
> > cranberries
> > tangerines
> > artichokes
> > root at serv1:~>
> > 
> > root at serv0:~>cat /mnt/bricks/replicated_vol/brick/testfile
> > cranberries
> > tangerines
> > artichokes
> > root at serv0:~>
> > 
> > So, I dabbled into the logs a little bit, after upping the diagnostic
> level,
> > and this is what I saw:
> > 
> > When I write on serv0 (bad case):
> > 
> > [2015-01-20 09:21:52.197704] T [fuse-bridge.c:546:fuse_lookup_resume]
> > 0-glusterfs-fuse: 53027: LOOKUP
> > /testfl(f0a76987-8a42-47a2-b027-a823254b736b)
> > [2015-01-20 09:21:52.197959] D
> > [afr-common.c:131:afr_lookup_xattr_req_prepare]
> > 0-replicated_vol-replicate-0: /testfl: failed to get the gfid from 
dict
> > [2015-01-20 09:21:52.198006] T [rpc-clnt.c:1302:rpc_clnt_record]
> > 0-replicated_vol-client-0: Auth Info: pid: 28151, uid: 0, gid: 0, 
owner:
> > 0000000000000000
> > [2015-01-20 09:21:52.198024] T
> [rpc-clnt.c:1182:rpc_clnt_record_build_header]
> > 0-rpc-clnt: Request fraglen 456, payload: 360, rpc hdr: 96
> > [2015-01-20 09:21:52.198108] T [rpc-clnt.c:1499:rpc_clnt_submit]
> 0-rpc-clnt:
> > submitted request (XID: 0x78163x Program: GlusterFS 3.3, ProgVers: 
330,
> > Proc: 27) to rpc-transport (replicated_vol-client-0)
> > [2015-01-20 09:21:52.198565] T [rpc-clnt.c:669:rpc_clnt_reply_init]
> > 0-replicated_vol-client-0: received rpc message (RPC XID: 0x78163x
> Program:
> > GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport
> > (replicated_vol-client-0)
> > [2015-01-20 09:21:52.198640] D
> > [afr-self-heal-common.c:138:afr_sh_print_pending_matrix]
> > 0-replicated_vol-replicate-0: pending_matrix: [ 0 3 ]
> > [2015-01-20 09:21:52.198669] D
> > [afr-self-heal-common.c:138:afr_sh_print_pending_matrix]
> > 0-replicated_vol-replicate-0: pending_matrix: [ 0 0 ]
> > [2015-01-20 09:21:52.198681] D
> [afr-self-heal-common.c:887:afr_mark_sources]
> > 0-replicated_vol-replicate-0: Number of sources: 1
> > [2015-01-20 09:21:52.198694] D
> > [afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type]
> > 0-replicated_vol-replicate-0: returning read_child: 0
> > [2015-01-20 09:21:52.198705] D
> > [afr-common.c:1380:afr_lookup_select_read_child]
> > 0-replicated_vol-replicate-0: Source selected as 0 for /testfl
> > [2015-01-20 09:21:52.198720] D
> > [afr-common.c:1117:afr_lookup_build_response_params]
> > 0-replicated_vol-replicate-0: Building lookup response from 0
> > [2015-01-20 09:21:52.198732] D
> > [afr-common.c:1732:afr_lookup_perform_self_heal]
> > 0-replicated_vol-replicate-0: Only 1 child up - do not attempt to 
detect
> > self heal
> > 
> > When I write on serv1 (good case):
> > 
> > [2015-01-20 09:37:49.151506] T [fuse-bridge.c:546:fuse_lookup_resume]
> > 0-glusterfs-fuse: 31212: LOOKUP
> > /testfl(f0a76987-8a42-47a2-b027-a823254b736b)
> > [2015-01-20 09:37:49.151683] D
> > [afr-common.c:131:afr_lookup_xattr_req_prepare]
> > 0-replicated_vol-replicate-0: /testfl: failed to get the gfid from 
dict
> > [2015-01-20 09:37:49.151726] T [rpc-clnt.c:1302:rpc_clnt_record]
> > 0-replicated_vol-client-0: Auth Info: pid: 7599, uid: 0, gid: 0, 
owner:
> > 0000000000000000
> > [2015-01-20 09:37:49.151744] T
> [rpc-clnt.c:1182:rpc_clnt_record_build_header]
> > 0-rpc-clnt: Request fraglen 456, payload: 360, rpc hdr: 96
> > [2015-01-20 09:37:49.151780] T [rpc-clnt.c:1499:rpc_clnt_submit]
> 0-rpc-clnt:
> > submitted request (XID: 0x39620x Program: GlusterFS 3.3, ProgVers: 
330,
> > Proc: 27) to rpc-transport (replicated_vol-client-0)
> > [2015-01-20 09:37:49.151810] T [rpc-clnt.c:1302:rpc_clnt_record]
> > 0-replicated_vol-client-1: Auth Info: pid: 7599, uid: 0, gid: 0, 
owner:
> > 0000000000000000
> > [2015-01-20 09:37:49.151824] T
> [rpc-clnt.c:1182:rpc_clnt_record_build_header]
> > 0-rpc-clnt: Request fraglen 456, payload: 360, rpc hdr: 96
> > [2015-01-20 09:37:49.151889] T [rpc-clnt.c:1499:rpc_clnt_submit]
> 0-rpc-clnt:
> > submitted request (XID: 0x39563x Program: GlusterFS 3.3, ProgVers: 
330,
> > Proc: 27) to rpc-transport (replicated_vol-client-1)
> > [2015-01-20 09:37:49.152239] T [rpc-clnt.c:669:rpc_clnt_reply_init]
> > 0-replicated_vol-client-1: received rpc message (RPC XID: 0x39563x
> Program:
> > GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport
> > (replicated_vol-client-1)
> > [2015-01-20 09:37:49.152484] T [rpc-clnt.c:669:rpc_clnt_reply_init]
> > 0-replicated_vol-client-0: received rpc message (RPC XID: 0x39620x
> Program:
> > GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport
> > (replicated_vol-client-0)
> > [2015-01-20 09:37:49.152582] D
> > [afr-self-heal-common.c:138:afr_sh_print_pending_matrix]
> > 0-replicated_vol-replicate-0: pending_matrix: [ 0 3 ]
> > [2015-01-20 09:37:49.152596] D
> > [afr-self-heal-common.c:138:afr_sh_print_pending_matrix]
> > 0-replicated_vol-replicate-0: pending_matrix: [ 0 0 ]
> > [2015-01-20 09:37:49.152621] D
> [afr-self-heal-common.c:887:afr_mark_sources]
> > 0-replicated_vol-replicate-0: Number of sources: 1
> > [2015-01-20 09:37:49.152633] D
> > [afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type]
> > 0-replicated_vol-replicate-0: returning read_child: 0
> > [2015-01-20 09:37:49.152644] D
> > [afr-common.c:1380:afr_lookup_select_read_child]
> > 0-replicated_vol-replicate-0: Source selected as 0 for /testfl
> > [2015-01-20 09:37:49.152657] D
> > [afr-common.c:1117:afr_lookup_build_response_params]
> > 0-replicated_vol-replicate-0: Building lookup response from 0
> > 
> > We see that when you write on serv1, the RPC request is sent to both
> > replicated_vol-client-0 and replicated_vol-client-1, while when we 
write
> on
> > serv0, the request is sent only to replicated_vol-client-0, and the 
FUse
> > client is unaware of the presence of client-1 in the latter case.
> > 
> > I checked a bit more in the logs. When I turn on my trace, I found 
many
> > instances of these logs on serv0 but NOT on serv1:
> > 
> > [2015-01-20 09:21:15.520784] T [fuse-bridge.c:681:fuse_attr_cbk]
> > 0-glusterfs-fuse: 53011: LOOKUP() / => 1
> > [2015-01-20 09:21:17.683088] T [rpc-clnt.c:422:rpc_clnt_reconnect]
> > 0-replicated_vol-client-1: attempting reconnect
> > [2015-01-20 09:21:17.683159] D [name.c:155:client_fill_address_family]
> > 0-replicated_vol-client-1: address-family not specified, guessing it 
to
> be
> > inet from (remote-host: serv1)
> > [2015-01-20 09:21:17.683178] T
> > [name.c:225:af_inet_client_get_remote_sockaddr]
> 0-replicated_vol-client-1:
> > option remote-port missing in volume replicated_vol-client-1. 
Defaulting
> to
> > 24007
> > [2015-01-20 09:21:17.683191] T [common-utils.c:188:gf_resolve_ip6]
> > 0-resolver: flushing DNS cache
> > [2015-01-20 09:21:17.683202] T [common-utils.c:195:gf_resolve_ip6]
> > 0-resolver: DNS cache not present, freshly probing hostname: serv1
> > [2015-01-20 09:21:17.683814] D [common-utils.c:237:gf_resolve_ip6]
> > 0-resolver: returning ip-192.168.24.81 (port-24007) for hostname: 
serv1
> and
> > port: 24007
> > [2015-01-20 09:21:17.684139] D [common-utils.c:257:gf_resolve_ip6]
> > 0-resolver: next DNS query will return: ip-192.168.24.81 port-24007
> > [2015-01-20 09:21:17.684164] T [socket.c:731:__socket_nodelay]
> > 0-replicated_vol-client-1: NODELAY enabled for socket 10
> > [2015-01-20 09:21:17.684177] T [socket.c:790:__socket_keepalive]
> > 0-replicated_vol-client-1: Keep-alive enabled for socket 10, interval 
2,
> > idle: 20
> > [2015-01-20 09:21:17.684236] W
> [common-utils.c:2247:gf_get_reserved_ports]
> > 0-glusterfs: could not open the file
> > /proc/sys/net/ipv4/ip_local_reserved_ports for getting reserved ports
> info
> > (No such file or directory)
> > [2015-01-20 09:21:17.684253] W
> > [common-utils.c:2280:gf_process_reserved_ports] 0-glusterfs: Not able 
to
> get
> > reserved ports, hence there is a possibility that glusterfs may 
consume
> > reserved port
> > Logs above suggest that mount process couldn't assign a reserved port
> because
> > it couldn't find the file /proc/sys/net/ipv4/ip_local_reserved_ports
> > 
> > I guess reboot of the machine fixed it. Wonder why it was not found in
> the
> > first place.
> > 
> > Pranith.
> > [2015-01-20 09:21:17.684660] D [socket.c:605:__socket_shutdown]
> > 0-replicated_vol-client-1: shutdown() returned -1. Transport endpoint 
is
> not
> > connected
> > [2015-01-20 09:21:17.684699] T
> [rpc-clnt.c:519:rpc_clnt_connection_cleanup]
> > 0-replicated_vol-client-1: cleaning up state in transport object
> 0x68a630
> > [2015-01-20 09:21:17.684731] D [socket.c:486:__socket_rwv]
> > 0-replicated_vol-client-1: EOF on socket
> > [2015-01-20 09:21:17.684750] W [socket.c:514:__socket_rwv]
> > 0-replicated_vol-client-1: readv failed (No data available)
> > [2015-01-20 09:21:17.684766] D
> [socket.c:1962:__socket_proto_state_machine]
> > 0-replicated_vol-client-1: reading from socket failed. Error (No data
> > available), peer (192.168.24.81:49198)
> > 
> > I could not find a 'remote-port' option in /var/lib/glusterd on either
> peer.
> > Could somebody tell me where this configuration is looked up from? 
Also,
> > sometime later, I rebooted serv0 and that seemed to solve the problem.
> > However, stop+start of replicated_vol and restart of
> /etc/init.d/glusterd
> > did NOT solve the problem.
> > Ignore that log. If no port is given in that volfile, it picks 24007 
as
> the
> > port, which is the default port where glusterd 'listens'
> > 
> > 
> > Any help on this matter will be greatly appreciated as I need to 
provide
> > robustness assurances for our setup.
> > 
> > Thanks a lot,
> > Anirban
> > 
> > P.s. Additional details:
> > glusterfs version: 3.4.2
> > Linux kernel version: 2.6.34
> > 
> > =====-----=====-----=====
> > Notice: The information contained in this e-mail
> > message and/or attachments to it may contain
> > confidential or privileged information. If you are
> > not the intended recipient, any dissemination, use,
> > review, distribution, printing or copying of the
> > information contained in this e-mail message
> > and/or attachments to it are strictly prohibited. If
> > you have received this communication in error,
> > please notify us by reply e-mail or telephone and
> > immediately and permanently delete the message
> > and any attachments. Thank you
> > 
> > 
> > 
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> > 
> > 
> > 
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> > 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150220/8f38d533/attachment.html>