[Gluster-users] Problem with add-brick
Ravishankar N
ravishankar at redhat.com
Wed Sep 28 01:44:43 UTC 2016
On 09/27/2016 10:29 PM, Dennis Michael wrote:
>
>
> [root at fs4 bricks]# gluster volume info
> Volume Name: cees-data
> Type: Distribute
> Volume ID: 27d2a59c-bdac-4f66-bcd8-e6124e53a4a2
> Status: Started
> Number of Bricks: 4
> Transport-type: tcp,rdma
> Bricks:
> Brick1: fs1:/data/brick
> Brick2: fs2:/data/brick
> Brick3: fs3:/data/brick
> Brick4: fs4:/data/brick
> Options Reconfigured:
> features.quota-deem-statfs: on
> features.inode-quota: on
> features.quota: on
> performance.readdir-ahead: on
> [root at fs4 bricks]# gluster volume status
> Status of volume: cees-data
> Gluster process TCP Port RDMA Port
> Online Pid
> ------------------------------------------------------------------------------
> Brick fs1:/data/brick 49152 49153 Y
> 1878
> Brick fs2:/data/brick 49152 0 Y 1707
> Brick fs3:/data/brick 49152 0 Y 4696
> Brick fs4:/data/brick N/A N/A N
> N/A
> NFS Server on localhost 2049 0 Y 13808
> Quota Daemon on localhost N/A N/A Y
> 13813
> NFS Server on fs1 2049 0 Y 6722
> Quota Daemon on fs1 N/A N/A Y
> 6730
> NFS Server on fs3 2049 0 Y 12553
> Quota Daemon on fs3 N/A N/A Y
> 12561
> NFS Server on fs2 2049 0 Y 11702
> Quota Daemon on fs2 N/A N/A Y
> 11710
> Task Status of Volume cees-data
> ------------------------------------------------------------------------------
> There are no active volume tasks
> [root at fs4 bricks]# ps auxww | grep gluster
> root 13791 0.0 0.0 701472 19768 ? Ssl 09:06 0:00
> /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
> root 13808 0.0 0.0 560236 41420 ? Ssl 09:07 0:00
> /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p
> /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S
> /var/run/gluster/01c61523374369658a62b75c582b5ac2.socket
> root 13813 0.0 0.0 443164 17908 ? Ssl 09:07 0:00
> /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p
> /var/lib/glusterd/quotad/run/quotad.pid -l
> /var/log/glusterfs/quotad.log -S
> /var/run/gluster/3753def90f5c34f656513dba6a544f7d.socket
> --xlator-option *replicate*.data-self-heal=off --xlator-option
> *replicate*.metadata-self-heal=off --xlator-option
> *replicate*.entry-self-heal=off
> root 13874 0.0 0.0 1200472 31700 ? Ssl 09:16 0:00
> /usr/sbin/glusterfsd -s fs4 --volfile-id cees-data.fs4.data-brick -p
> /var/lib/glusterd/vols/cees-data/run/fs4-data-brick.pid -S
> /var/run/gluster/5203ab38be21e1d37c04f6bdfee77d4a.socket --brick-name
> /data/brick -l /var/log/glusterfs/bricks/data-brick.log
> --xlator-option
> *-posix.glusterd-uuid=f04b231e-63f8-4374-91ae-17c0c623f165
> --brick-port 49152 49153 --xlator-option
> cees-data-server.transport.rdma.listen-port=49153 --xlator-option
> cees-data-server.listen-port=49152 --volfile-server-transport=socket,rdma
> root 13941 0.0 0.0 112648 976 pts/0 S+ 09:50 0:00 grep
> --color=auto gluster
>
> [root at fs4 bricks]# systemctl restart glusterfsd glusterd
>
> [root at fs4 bricks]# ps auxww | grep gluster
> root 13808 0.0 0.0 560236 41420 ? Ssl 09:07 0:00
> /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p
> /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S
> /var/run/gluster/01c61523374369658a62b75c582b5ac2.socket
> root 13813 0.0 0.0 443164 17908 ? Ssl 09:07 0:00
> /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p
> /var/lib/glusterd/quotad/run/quotad.pid -l
> /var/log/glusterfs/quotad.log -S
> /var/run/gluster/3753def90f5c34f656513dba6a544f7d.socket
> --xlator-option *replicate*.data-self-heal=off --xlator-option
> *replicate*.metadata-self-heal=off --xlator-option
> *replicate*.entry-self-heal=off
> root 13953 0.1 0.0 570740 14988 ? Ssl 09:51 0:00
> /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
> root 13965 0.0 0.0 112648 976 pts/0 S+ 09:51 0:00 grep
> --color=auto gluster
>
> [root at fs4 bricks]# gluster volume info
> Volume Name: cees-data
> Type: Distribute
> Volume ID: 27d2a59c-bdac-4f66-bcd8-e6124e53a4a2
> Status: Started
> Number of Bricks: 3
> Transport-type: tcp,rdma
> Bricks:
> Brick1: fs1:/data/brick
> Brick2: fs2:/data/brick
> Brick3: fs3:/data/brick
> Options Reconfigured:
> performance.readdir-ahead: on
> features.quota: on
> features.inode-quota: on
> features.quota-deem-statfs: on
I'm not sure what's going on here. Restarting glusterd seems to change
the output of gluster volume info? I also see you are using RDMA. Not
sure why the RDMA ports for fs2 and fs3 are not shown in the volume
status output. CC'ing some glusterd/rdma devs for pointers.
-Ravi
> [root at fs4 bricks]# gluster volume status
> Status of volume: cees-data
> Gluster process TCP Port RDMA Port
> Online Pid
> ------------------------------------------------------------------------------
> Brick fs1:/data/brick 49152 49153 Y
> 1878
> Brick fs2:/data/brick 49152 0 Y 1707
> Brick fs3:/data/brick 49152 0 Y 4696
> NFS Server on localhost 2049 0 Y 13968
> Quota Daemon on localhost N/A N/A Y
> 13976
> NFS Server on fs2 2049 0 Y 11702
> Quota Daemon on fs2 N/A N/A Y
> 11710
> NFS Server on fs3 2049 0 Y 12553
> Quota Daemon on fs3 N/A N/A Y
> 12561
> NFS Server on fs1 2049 0 Y 6722
> Task Status of Volume cees-data
> ------------------------------------------------------------------------------
> There are no active volume tasks
>
> [root at fs4 bricks]# gluster peer status
> Number of Peers: 3
>
> Hostname: fs1
> Uuid: ddc0a23e-05e5-48f7-993e-a37e43b21605
> State: Peer in Cluster (Connected)
>
> Hostname: fs2
> Uuid: e37108f8-d2f1-4f28-adc8-0b3d3401df29
> State: Peer in Cluster (Connected)
>
> Hostname: fs3
> Uuid: 19a42201-c932-44db-b1a7-8b5b1af32a36
> State: Peer in Cluster (Connected)
>
> Dennis
>
>
> On Tue, Sep 27, 2016 at 9:40 AM, Ravishankar N <ravishankar at redhat.com
> <mailto:ravishankar at redhat.com>> wrote:
>
> On 09/27/2016 09:53 PM, Dennis Michael wrote:
>> Yes, you are right. I mixed up the logs. I just ran the
>> add-brick command again after cleaning up fs4 and re-installing
>> gluster. This is the complete fs4 data-brick.log.
>>
>> [root at fs1 ~]# gluster volume add-brick cees-data fs4:/data/brick
>> volume add-brick: failed: Commit failed on fs4. Please check log
>> file for details.
>>
>> [root at fs4 bricks]# pwd
>> /var/log/glusterfs/bricks
>> [root at fs4 bricks]# cat data-brick.log
>> [2016-09-27 16:16:28.095661] I [MSGID: 100030]
>> [glusterfsd.c:2338:main] 0-/usr/sbin/glusterfsd: Started running
>> /usr/sbin/glusterfsd version 3.7.14 (args: /usr/sbin/glusterfsd
>> -s fs4 --volfile-id cees-data.fs4.data-brick -p
>> /var/lib/glusterd/vols/cees-data/run/fs4-data-brick.pid -S
>> /var/run/gluster/5203ab38be21e1d37c04f6bdfee77d4a.socket
>> --brick-name /data/brick -l
>> /var/log/glusterfs/bricks/data-brick.log --xlator-option
>> *-posix.glusterd-uuid=f04b231e-63f8-4374-91ae-17c0c623f165
>> --brick-port 49152 --xlator-option
>> cees-data-server.transport.rdma.listen-port=49153 --xlator-option
>> cees-data-server.listen-port=49152
>> --volfile-server-transport=socket,rdma)
>> [2016-09-27 16:16:28.101547] I [MSGID: 101190]
>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
>> thread with index 1
>> [2016-09-27 16:16:28.104637] I
>> [graph.c:269:gf_add_cmdline_options] 0-cees-data-server: adding
>> option 'listen-port' for volume 'cees-data-server' with value '49152'
>> [2016-09-27 16:16:28.104646] I
>> [graph.c:269:gf_add_cmdline_options] 0-cees-data-server: adding
>> option 'transport.rdma.listen-port' for volume 'cees-data-server'
>> with value '49153'
>> [2016-09-27 16:16:28.104662] I
>> [graph.c:269:gf_add_cmdline_options] 0-cees-data-posix: adding
>> option 'glusterd-uuid' for volume 'cees-data-posix' with value
>> 'f04b231e-63f8-4374-91ae-17c0c623f165'
>> [2016-09-27 16:16:28.104808] I [MSGID: 115034]
>> [server.c:403:_check_for_auth_option] 0-/data/brick: skip format
>> check for non-addr auth option auth.login./data/brick.allow
>> [2016-09-27 16:16:28.104814] I [MSGID: 115034]
>> [server.c:403:_check_for_auth_option] 0-/data/brick: skip format
>> check for non-addr auth option
>> auth.login.18ddaf4c-ad98-4155-9372-717eae718b4c.password
>> [2016-09-27 16:16:28.104883] I [MSGID: 101190]
>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started
>> thread with index 2
>> [2016-09-27 16:16:28.105479] I
>> [rpcsvc.c:2196:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service:
>> Configured rpc.outstanding-rpc-limit with value 64
>> [2016-09-27 16:16:28.105532] W [MSGID: 101002]
>> [options.c:957:xl_opt_validate] 0-cees-data-server: option
>> 'listen-port' is deprecated, preferred is
>> 'transport.socket.listen-port', continuing with correction
>> [2016-09-27 16:16:28.109456] W [socket.c:3665:reconfigure]
>> 0-cees-data-quota: NBIO on -1 failed (Bad file descriptor)
>> [2016-09-27 16:16:28.489255] I [MSGID: 121050]
>> [ctr-helper.c:259:extract_ctr_options] 0-gfdbdatastore: CTR
>> Xlator is disabled.
>> [2016-09-27 16:16:28.489272] W [MSGID: 101105]
>> [gfdb_sqlite3.h:239:gfdb_set_sql_params]
>> 0-cees-data-changetimerecorder: Failed to retrieve
>> sql-db-pagesize from params.Assigning default value: 4096
>> [2016-09-27 16:16:28.489278] W [MSGID: 101105]
>> [gfdb_sqlite3.h:239:gfdb_set_sql_params]
>> 0-cees-data-changetimerecorder: Failed to retrieve
>> sql-db-journalmode from params.Assigning default value: wal
>> [2016-09-27 16:16:28.489284] W [MSGID: 101105]
>> [gfdb_sqlite3.h:239:gfdb_set_sql_params]
>> 0-cees-data-changetimerecorder: Failed to retrieve sql-db-sync
>> from params.Assigning default value: off
>> [2016-09-27 16:16:28.489288] W [MSGID: 101105]
>> [gfdb_sqlite3.h:239:gfdb_set_sql_params]
>> 0-cees-data-changetimerecorder: Failed to retrieve
>> sql-db-autovacuum from params.Assigning default value: none
>> [2016-09-27 16:16:28.490431] I [trash.c:2412:init]
>> 0-cees-data-trash: no option specified for 'eliminate', using NULL
>> [2016-09-27 16:16:28.672814] W
>> [graph.c:357:_log_if_unknown_option] 0-cees-data-server: option
>> 'rpc-auth.auth-glusterfs' is not recognized
>> [2016-09-27 16:16:28.672854] W
>> [graph.c:357:_log_if_unknown_option] 0-cees-data-server: option
>> 'rpc-auth.auth-unix' is not recognized
>> [2016-09-27 16:16:28.672872] W
>> [graph.c:357:_log_if_unknown_option] 0-cees-data-server: option
>> 'rpc-auth.auth-null' is not recognized
>> [2016-09-27 16:16:28.672924] W
>> [graph.c:357:_log_if_unknown_option] 0-cees-data-quota: option
>> 'timeout' is not recognized
>> [2016-09-27 16:16:28.672955] W
>> [graph.c:357:_log_if_unknown_option] 0-cees-data-trash: option
>> 'brick-path' is not recognized
>> Final graph:
>> +------------------------------------------------------------------------------+
>> 1: volume cees-data-posix
>> 2: type storage/posix
>> 3: option glusterd-uuid f04b231e-63f8-4374-91ae-17c0c623f165
>> 4: option directory /data/brick
>> 5: option volume-id 27d2a59c-bdac-4f66-bcd8-e6124e53a4a2
>> 6: option update-link-count-parent on
>> 7: end-volume
>> 8:
>> 9: volume cees-data-trash
>> 10: type features/trash
>> 11: option trash-dir .trashcan
>> 12: option brick-path /data/brick
>> 13: option trash-internal-op off
>> 14: subvolumes cees-data-posix
>> 15: end-volume
>> 16:
>> 17: volume cees-data-changetimerecorder
>> 18: type features/changetimerecorder
>> 19: option db-type sqlite3
>> 20: option hot-brick off
>> 21: option db-name brick.db
>> 22: option db-path /data/brick/.glusterfs/
>> 23: option record-exit off
>> 24: option ctr_link_consistency off
>> 25: option ctr_lookupheal_link_timeout 300
>> 26: option ctr_lookupheal_inode_timeout 300
>> 27: option record-entry on
>> 28: option ctr-enabled off
>> 29: option record-counters off
>> 30: option ctr-record-metadata-heat off
>> 31: option sql-db-cachesize 1000
>> 32: option sql-db-wal-autocheckpoint 1000
>> 33: subvolumes cees-data-trash
>> 34: end-volume
>> 35:
>> 36: volume cees-data-changelog
>> 37: type features/changelog
>> 38: option changelog-brick /data/brick
>> 39: option changelog-dir /data/brick/.glusterfs/changelogs
>> 40: option changelog-barrier-timeout 120
>> 41: subvolumes cees-data-changetimerecorder
>> 42: end-volume
>> 43:
>> 44: volume cees-data-bitrot-stub
>> 45: type features/bitrot-stub
>> 46: option export /data/brick
>> 47: subvolumes cees-data-changelog
>> 48: end-volume
>> 49:
>> 50: volume cees-data-access-control
>> 51: type features/access-control
>> 52: subvolumes cees-data-bitrot-stub
>> 53: end-volume
>> 54:
>> 55: volume cees-data-locks
>> 56: type features/locks
>> 57: subvolumes cees-data-access-control
>> 58: end-volume
>> 59:
>> 60: volume cees-data-upcall
>> 61: type features/upcall
>> 62: option cache-invalidation off
>> 63: subvolumes cees-data-locks
>> 64: end-volume
>> 65:
>> 66: volume cees-data-io-threads
>> 67: type performance/io-threads
>> 68: subvolumes cees-data-upcall
>> 69: end-volume
>> 70:
>> 71: volume cees-data-marker
>> 72: type features/marker
>> 73: option volume-uuid 27d2a59c-bdac-4f66-bcd8-e6124e53a4a2
>> 74: option timestamp-file
>> /var/lib/glusterd/vols/cees-data/marker.tstamp
>> 75: option quota-version 1
>> 76: option xtime off
>> 77: option gsync-force-xtime off
>> 78: option quota on
>> 79: option inode-quota on
>> 80: subvolumes cees-data-io-threads
>> 81: end-volume
>> 82:
>> 83: volume cees-data-barrier
>> 84: type features/barrier
>> 85: option barrier disable
>> 86: option barrier-timeout 120
>> 87: subvolumes cees-data-marker
>> 88: end-volume
>> 89:
>> 90: volume cees-data-index
>> 91: type features/index
>> 92: option index-base /data/brick/.glusterfs/indices
>> 93: subvolumes cees-data-barrier
>> 94: end-volume
>> 95:
>> 96: volume cees-data-quota
>> 97: type features/quota
>> 98: option transport.socket.connect-path
>> /var/run/gluster/quotad.socket
>> 99: option transport-type socket
>> 100: option transport.address-family unix
>> 101: option volume-uuid cees-data
>> 102: option server-quota on
>> 103: option timeout 0
>> 104: option deem-statfs on
>> 105: subvolumes cees-data-index
>> 106: end-volume
>> 107:
>> 108: volume cees-data-worm
>> 109: type features/worm
>> 110: option worm off
>> 111: subvolumes cees-data-quota
>> 112: end-volume
>> 113:
>> 114: volume cees-data-read-only
>> 115: type features/read-only
>> 116: option read-only off
>> 117: subvolumes cees-data-worm
>> 118: end-volume
>> 119:
>> 120: volume /data/brick
>> 121: type debug/io-stats
>> 122: option log-level INFO
>> 123: option latency-measurement off
>> 124: option count-fop-hits off
>> 125: subvolumes cees-data-read-only
>> 126: end-volume
>> 127:
>> 128: volume cees-data-server
>> 129: type protocol/server
>> 130: option transport.socket.listen-port 49152
>> 131: option rpc-auth.auth-glusterfs on
>> 132: option rpc-auth.auth-unix on
>> 133: option rpc-auth.auth-null on
>> 134: option rpc-auth-allow-insecure on
>> 135: option transport.rdma.listen-port 49153
>> 136: option transport-type tcp,rdma
>> 137: option auth.login./data/brick.allow
>> 18ddaf4c-ad98-4155-9372-717eae718b4c
>> 138: option
>> auth.login.18ddaf4c-ad98-4155-9372-717eae718b4c.password
>> 9e913e92-7de0-47f9-94ed-d08cbb130d23
>> 139: option auth.addr./data/brick.allow *
>> 140: subvolumes /data/brick
>> 141: end-volume
>> 142:
>> +------------------------------------------------------------------------------+
>> [2016-09-27 16:16:30.079541] I [login.c:81:gf_auth] 0-auth/login:
>> allowed user names: 18ddaf4c-ad98-4155-9372-717eae718b4c
>> [2016-09-27 16:16:30.079567] I [MSGID: 115029]
>> [server-handshake.c:690:server_setvolume] 0-cees-data-server:
>> accepted client from
>> fs3-12560-2016/09/27-16:16:30:47674-cees-data-client-3-0-0
>> (version: 3.7.14)
>> [2016-09-27 16:16:30.081487] I [login.c:81:gf_auth] 0-auth/login:
>> allowed user names: 18ddaf4c-ad98-4155-9372-717eae718b4c
>> [2016-09-27 16:16:30.081505] I [MSGID: 115029]
>> [server-handshake.c:690:server_setvolume] 0-cees-data-server:
>> accepted client from
>> fs2-11709-2016/09/27-16:16:30:50047-cees-data-client-3-0-0
>> (version: 3.7.14)
>> [2016-09-27 16:16:30.111091] I [login.c:81:gf_auth] 0-auth/login:
>> allowed user names: 18ddaf4c-ad98-4155-9372-717eae718b4c
>> [2016-09-27 16:16:30.111113] I [MSGID: 115029]
>> [server-handshake.c:690:server_setvolume] 0-cees-data-server:
>> accepted client from
>> fs2-11701-2016/09/27-16:16:29:24060-cees-data-client-3-0-0
>> (version: 3.7.14)
>> [2016-09-27 16:16:30.112822] I [login.c:81:gf_auth] 0-auth/login:
>> allowed user names: 18ddaf4c-ad98-4155-9372-717eae718b4c
>> [2016-09-27 16:16:30.112836] I [MSGID: 115029]
>> [server-handshake.c:690:server_setvolume] 0-cees-data-server:
>> accepted client from
>> fs3-12552-2016/09/27-16:16:29:23041-cees-data-client-3-0-0
>> (version: 3.7.14)
>> [2016-09-27 16:16:31.950978] I [login.c:81:gf_auth] 0-auth/login:
>> allowed user names: 18ddaf4c-ad98-4155-9372-717eae718b4c
>> [2016-09-27 16:16:31.950998] I [MSGID: 115029]
>> [server-handshake.c:690:server_setvolume] 0-cees-data-server:
>> accepted client from
>> fs1-6721-2016/09/27-16:16:26:939991-cees-data-client-3-0-0
>> (version: 3.7.14)
>> [2016-09-27 16:16:31.981977] I [login.c:81:gf_auth] 0-auth/login:
>> allowed user names: 18ddaf4c-ad98-4155-9372-717eae718b4c
>> [2016-09-27 16:16:31.981994] I [MSGID: 115029]
>> [server-handshake.c:690:server_setvolume] 0-cees-data-server:
>> accepted client from
>> fs1-6729-2016/09/27-16:16:27:971228-cees-data-client-3-0-0
>> (version: 3.7.14)
>>
>
> Hmm, this shows the brick has started.
> Does gluster volume info on fs4 shows all 4 bricks? (I guess it
> does based on your first email).
> Does gluster volume status on fs4 (or ps aux|grep glusterfsd)
> show the brick as running?
> Does gluster peer status on all nodes list the other 3 nodes as
> connected?
>
> If yes, you could try `service glusterd restart` on fs4 and see if
> if brings up the brick? I'm just shooting in the dark here for
> possible clues.
> -Ravi
>
>> On Tue, Sep 27, 2016 at 8:46 AM, Ravishankar N
>> <ravishankar at redhat.com <mailto:ravishankar at redhat.com>> wrote:
>>
>> On 09/27/2016 09:06 PM, Dennis Michael wrote:
>>> Yes, the brick log /var/log/glusterfs/bricks/data-brick.log
>>> is created on fs4, and the snippets showing the errors were
>>> from that log.
>>>
>> Unless I'm missing something, the snippet below is from
>> glusterd's log and not the brick's as is evident from the
>> function names.
>> -Ravi
>>> Dennis
>>>
>>> On Mon, Sep 26, 2016 at 5:58 PM, Ravishankar N
>>> <ravishankar at redhat.com <mailto:ravishankar at redhat.com>> wrote:
>>>
>>> On 09/27/2016 05:25 AM, Dennis Michael wrote:
>>>
>>> [2016-09-26 22:44:39.254921] E [MSGID: 106005]
>>> [glusterd-utils.c:4771:glusterd_brick_start]
>>> 0-management: Unable to start brick fs4:/data/brick
>>> [2016-09-26 22:44:39.254949] E [MSGID: 106074]
>>> [glusterd-brick-ops.c:2372:glusterd_op_add_brick]
>>> 0-glusterd: Unable to add bricks
>>>
>>>
>>> Is the brick log created on fs4? Does it contain
>>> warnings/errors?
>>>
>>> -Ravi
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160928/b8138852/attachment.html>
More information about the Gluster-users
mailing list