[Gluster-users] gluster volume status show second node is offline

Tue Sep 7 17:04:34 UTC 2021

I have setup a similar test environment with two VM on my PC, exactly
identical to the one in production .

All work fine.
But when I restart the node 2, all start and work fine, but the volume
status of node2 is offline

   [root at virt2 ~]# gluster volume status gfsvol1
   Status of volume: gfsvol1
   Gluster process                             TCP Port  RDMA Port  Online  Pid
   ------------------------------------------------------------------------------
   Brick virt1.local:/gfsvol1/brick1           49152     0          Y       8591
   Brick virt2.local:/gfsvol1/brick1           N/A       N/A        N       N/A
   Self-heal Daemon on localhost               N/A       N/A        Y       970
   Self-heal Daemon on virt1.local             N/A       N/A        Y       8608

   Task Status of Volume gfsvol1
   ------------------------------------------------------------------------------
   There are no active volume tasks

I have found this solution:
https://bobcares.com/blog/gluster-bring-brick-online/

And when I re-run  "gluster volume start gfsvol1 force" the volume
bring back online

   [root at virt2 ~]# gluster volume start gfsvol1 force
   volume start: gfsvol1: success
   [root at virt2 ~]# gluster volume status gfsvol1
   Status of volume: gfsvol1
   Gluster process                             TCP Port  RDMA Port  Online  Pid
   ------------------------------------------------------------------------------
   Brick virt1.local:/gfsvol1/brick1           49152     0          Y       8591
   Brick virt2.local:/gfsvol1/brick1           49153     0          Y       1422
   Self-heal Daemon on localhost               N/A       N/A        Y       970
   Self-heal Daemon on virt1.local             N/A       N/A        Y       8608

   Task Status of Volume gfsvol1
   ------------------------------------------------------------------------------
   There are no active volume tasks

But if I reboot the node 2 server, when system is started the volume is
already offline

Also if I restart glusterd service on node2 volume bring back online.

   systemctl restart glusterd

Seem a serialize startup systemd service problem ... 

Seem glusterd is start Before network is online

I have try modify the systemd unit

   - from this 
   After=network.target
   Before=network-online.target

   - to this
   After=network.target network-online.target
   #Before=network-online.target

and now when I restart node 2 server all work fine and volume is always
online

What is wrong?

Let me know and many thanks for help
Dario

Il giorno mar, 07/09/2021 alle 09.46 +0200, Dario Lesca ha scritto:
> These are last line into /var/log/glusterfs/bricks/gfsvol1-brick1.log
> log
> 
> [2021-09-06 21:29:02.165238 +0000] I [addr.c:54:compare_addr_and_update] 0-/gfsvol1/brick1: allowed = "*", received addr = "172.16.3.1"
> [2021-09-06 21:29:02.165365 +0000] I [login.c:110:gf_auth] 0-auth/login: allowed user names: 12261a60-60a5-4791-a3f1-6da397046ee5
> [2021-09-06 21:29:02.165402 +0000] I [MSGID: 115029] [server-handshake.c:561:server_setvolume] 0-gfsvol1-server: accepted client from CTX_ID:444e0582-ac68-4f20-9552-c4dbc7724967-GRAPH_ID:0-PID:227500-HOST:s-virt1.realdomain.it-PC_NAME:gfsvol1-client-1-RECON_NO:-0 (version: 9.3) with subvol /gfsvol1/brick1 
> [2021-09-06 21:29:02.179387 +0000] W [socket.c:767:__socket_rwv] 0-tcp.gfsvol1-server: readv on 172.16.3.1:49144 failed (No data available)
> [2021-09-06 21:29:02.179451 +0000] I [MSGID: 115036] [server.c:500:server_rpc_notify] 0-gfsvol1-server: disconnecting connection [{client-uid=CTX_ID:444e0582-ac68-4f20-9552-c4dbc7724967-GRAPH_ID:0-PID:227500-HOST:s-virt1.realdomain.it-PC_NAME:gfsvol1-client-1-RECON_NO:-0}] 
> [2021-09-06 21:29:02.179877 +0000] I [MSGID: 101055] [client_t.c:397:gf_client_unref] 0-gfsvol1-server: Shutting down connection CTX_ID:444e0582-ac68-4f20-9552-c4dbc7724967-GRAPH_ID:0-PID:227500-HOST:s-virt1.realdomain.it-PC_NAME:gfsvol1-client-1-RECON_NO:-0 
> [2021-09-06 21:29:10.254230 +0000] I [addr.c:54:compare_addr_and_update] 0-/gfsvol1/brick1: allowed = "*", received addr = "172.16.3.1"
> [2021-09-06 21:29:10.254283 +0000] I [login.c:110:gf_auth] 0-auth/login: allowed user names: 12261a60-60a5-4791-a3f1-6da397046ee5
> [2021-09-06 21:29:10.254300 +0000] I [MSGID: 115029] [server-handshake.c:561:server_setvolume] 0-gfsvol1-server: accepted client from CTX_ID:fef710c3-11bf-4a91-b749-f52a536d6dad-GRAPH_ID:0-PID:227541-HOST:s-virt1.realdomain.it-PC_NAME:gfsvol1-client-1-RECON_NO:-0 (version: 9.3) with subvol /gfsvol1/brick1 
> [2021-09-06 21:29:10.272069 +0000] W [socket.c:767:__socket_rwv] 0-tcp.gfsvol1-server: readv on 172.16.3.1:49140 failed (No data available)
> [2021-09-06 21:29:10.272133 +0000] I [MSGID: 115036] [server.c:500:server_rpc_notify] 0-gfsvol1-server: disconnecting connection [{client-uid=CTX_ID:fef710c3-11bf-4a91-b749-f52a536d6dad-GRAPH_ID:0-PID:227541-HOST:s-virt1.realdomain.it-PC_NAME:gfsvol1-client-1-RECON_NO:-0}] 
> [2021-09-06 21:29:10.272430 +0000] I [MSGID: 101055] [client_t.c:397:gf_client_unref] 0-gfsvol1-server: Shutting down connection CTX_ID:fef710c3-11bf-4a91-b749-f52a536d6dad-GRAPH_ID:0-PID:227541-HOST:s-virt1.realdomain.it-PC_NAME:gfsvol1-client-1-RECON_NO:-0 
> 
> I have a network adapter reserved and direct connected from the two
> server with dedicated  IP  172.16.3.1/30 and 172.16.3.2/30, named via
> /etc/hosts virt1.local and virt2.local
> 
> In this logs I see also the real server name ( ... HOST:s-
> virt1.realdomain.it-PC_NAME: ...) which has another IP on another
> network.
> 
> Now this cluster is in production and support some VM.
> 
> What is the bes way to solve this dangerous situation without risk?
> 
> Many thanks
> Dario
> 
> Il giorno mar, 07/09/2021 alle 05.28 +0000, Strahil Nikolov ha
> scritto:
> > No, it's not normal.
> > Go to the virt2 and in /var/log/gluster directory you will find
> > 'bricks' . Check the logs in bricks for more information.
> > 
> > Best Regards,
> > Strahil Nikolov
> > 
> > 
> > > On Tue, Sep 7, 2021 at 1:13, Dario Lesca
> > > <d.lesca at solinos.it> wrote:
> > > Hello everybody!
> > > I'm a novice with gluster. I have setup my first cluster with two
> > > nodes 
> > > 
> > > This is the current volume info:
> > > 
> > >   [root at s-virt1 ~]# gluster volume info gfsvol1
> > >   Volume Name: gfsvol1
> > >   Type: Replicate
> > >   Volume ID: 5bad4a23-58cc-44d7-8195-88409720b941
> > >   Status: Started
> > >   Snapshot Count: 0
> > >   Number of Bricks: 1 x 2 = 2
> > >   Transport-type: tcp
> > >   Bricks:
> > >   Brick1: virt1.local:/gfsvol1/brick1
> > >   Brick2: virt2.local:/gfsvol1/brick1
> > >   Options Reconfigured:
> > >   performance.client-io-threads: off
> > >   nfs.disable: on
> > >   transport.address-family: inet
> > >   storage.fips-mode-rchecksum: on
> > >   cluster.granular-entry-heal: on
> > >   storage.owner-uid: 107
> > >   storage.owner-gid: 107
> > >   server.allow-insecure: on
> > > 
> > > For now all seem work fine.
> > > 
> > > I have mount the gfs volume on all two nodes and use the VM into
> > > it
> > > 
> > > But today I noticed that the second node (virt2) is offline:
> > > 
> > >   [root at s-virt1 ~]# gluster volume status
> > >   Status of volume: gfsvol1
> > >   Gluster process                            TCP Port  RDMA Port 
> > > Online  Pid
> > >   ---------------------------------------------------------------
> > > ---------------
> > >   Brick virt1.local:/gfsvol1/brick1          49152    0         
> > > Y      3090 
> > >   Brick virt2.local:/gfsvol1/brick1          N/A      N/A       
> > > N      N/A  
> > >   Self-heal Daemon on localhost              N/A      N/A       
> > > Y      3105 
> > >   Self-heal Daemon on virt2.local            N/A      N/A       
> > > Y      3140 
> > >     
> > >   Task Status of Volume gfsvol1
> > >   ---------------------------------------------------------------
> > > ---------------
> > >   There are no active volume tasks
> > >   
> > >   [root at s-virt1 ~]# gluster volume status gfsvol1 detail
> > >   Status of volume: gfsvol1
> > >   ---------------------------------------------------------------
> > > ---------------
> > >   Brick                : Brick virt1.local:/gfsvol1/brick1
> > >   TCP Port            : 49152              
> > >   RDMA Port            : 0                  
> > >   Online              : Y                  
> > >   Pid                  : 3090                
> > >   File System          : xfs                
> > >   Device              : /dev/mapper/rl-gfsvol1
> > >   Mount Options        :
> > > rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=1
> > > 28,swidth=128,noquota
> > >   Inode Size          : 512                
> > >   Disk Space Free      : 146.4GB            
> > >   Total Disk Space    : 999.9GB            
> > >   Inode Count          : 307030856          
> > >   Free Inodes          : 307026149          
> > >   ---------------------------------------------------------------
> > > ---------------
> > >   Brick                : Brick virt2.local:/gfsvol1/brick1
> > >   TCP Port            : N/A                
> > >   RDMA Port            : N/A                
> > >   Online              : N                  
> > >   Pid                  : N/A                
> > >   File System          : xfs                
> > >   Device              : /dev/mapper/rl-gfsvol1
> > >   Mount Options        :
> > > rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=1
> > > 28,swidth=128,noquota
> > >   Inode Size          : 512                
> > >   Disk Space Free      : 146.4GB            
> > >   Total Disk Space    : 999.9GB            
> > >   Inode Count          : 307052016          
> > >   Free Inodes          : 307047307
> > >   
> > > What does it mean?
> > > What's wrong?
> > > Is this normal or I missing some setting?
> > > 
> > > If you need more information let me know
> > > 
> > > Many thanks for your help
> > > 
> > > 
> ________
> 
> 
> 
> Community Meeting Calendar:
> 
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20210907/ef39bc4b/attachment.html>