[Gluster-users] Re; Strange behaviour with add-brick followed by remove-brick
Lalatendu Mohanty
lmohanty at redhat.com
Tue Nov 12 15:59:27 UTC 2013
On 11/06/2013 10:53 AM, B.K.Raghuram wrote:
> Here are the steps that I did to reproduce the problem. Essentially,
> if you try to remove a brick that is not the same as the localhost
> then it seems to migrate the files on the localhost brick instead and
> hence there is a lot of data loss.. If instead, I try to remove the
> localhost brick, it works fine. Can we try and get this fix into 3.4.2
> as this seems to be the only way to replace a brick, given that
> replace-brick is being removed!
>
> [root at s5n9 ~]# gluster volume create v1 transport tcp
> s5n9.testing.lan:/data/v1 s5n10.testing.lan:/data/v1
> volume create: v1: success: please start the volume to access data
> [root at s5n9 ~]# gluster volume start v1
> volume start: v1: success
> [root at s5n9 ~]# gluster volume info v1
>
> Volume Name: v1
> Type: Distribute
> Volume ID: 6402b139-2957-4d62-810b-b70e6f9ba922
> Status: Started
> Number of Bricks: 2
> Transport-type: tcp
> Bricks:
> Brick1: s5n9.testing.lan:/data/v1
> Brick2: s5n10.testing.lan:/data/v1
>
> ***********Now NFS mounted the volume onto my laptop and with a script
> created 300 files in the mount. Distribution results below **********
> [root at s5n9 ~]# ls -l /data/v1 | wc -l
> 160
> [root at s5n10 ~]# ls -l /data/v1 | wc -l
> 142
>
> [root at s5n9 ~]# gluster volume add-brick v1 s6n11.testing.lan:/data/v1
> volume add-brick: success
> [root at s5n9 ~]# gluster volume remove-brick v1 s5n10.testing.lan:/data/v1 start
> volume remove-brick start: success
> ID: 8f3c37d6-2f24-4418-b75a-751dcb6f2b98
> [root at s5n9 ~]# gluster volume remove-brick v1 s5n10.testing.lan:/data/v1 status
> Node Rebalanced-files
> size scanned failures skipped status run-time
> in secs
> --------- -----------
> ----------- ----------- ----------- ----------- ------------
> --------------
> localhost 0
> 0Bytes 0 0 not started 0.00
> s6n12.testing.lan 0
> 0Bytes 0 0 not started 0.00
> s6n11.testing.lan 0
> 0Bytes 0 0 not started 0.00
> s5n10.testing.lan 0
> 0Bytes 300 0 completed 1.00
>
>
> [root at s5n9 ~]# gluster volume remove-brick v1 s5n10.testing.lan:/data/v1 commit
> Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
> volume remove-brick commit: success
>
> [root at s5n9 ~]# gluster volume info v1
>
> Volume Name: v1
> Type: Distribute
> Volume ID: 6402b139-2957-4d62-810b-b70e6f9ba922
> Status: Started
> Number of Bricks: 2
> Transport-type: tcp
> Bricks:
> Brick1: s5n9.testing.lan:/data/v1
> Brick2: s6n11.testing.lan:/data/v1
>
>
> [root at s5n9 ~]# ls -l /data/v1 | wc -l
> 160
> [root at s5n10 ~]# ls -l /data/v1 | wc -l
> 142
> [root at s6n11 ~]# ls -l /data/v1 | wc -l
> 160
> [root at s5n9 ~]# ls /data/v1
> file10 file110 file131 file144 file156 file173 file19 file206
> file224 file238 file250 file264 file279 file291 file31 file44
> file62 file86
> file100 file114 file132 file146 file159 file174 file192 file209
> file225 file24 file252 file265 file28 file292 file32 file46
> file63 file87
> file101 file116 file134 file147 file16 file18 file196 file210
> file228 file240 file254 file266 file281 file293 file37 file47
> file66 file9
> file102 file12 file135 file148 file161 file181 file198 file212
> file229 file241 file255 file267 file284 file294 file38 file48
> file69 file91
> file103 file121 file136 file149 file165 file183 file200 file215
> file231 file243 file256 file268 file285 file295 file4 file50
> file7 file93
> file104 file122 file137 file150 file17 file184 file201 file216
> file233 file245 file258 file271 file286 file296 file40 file53
> file71 file97
> file105 file124 file138 file152 file170 file186 file202 file218
> file234 file246 file261 file273 file287 file297 file41 file54
> file73
> file107 file125 file140 file153 file171 file188 file203 file220
> file236 file248 file262 file275 file288 file298 file42 file55
> file75
> file11 file13 file141 file154 file172 file189 file204 file222
> file237 file25 file263 file278 file290 file3 file43 file58
> file80
>
> [root at s6n11 ~]# ls /data/v1
> file10 file110 file131 file144 file156 file173 file19 file206
> file224 file238 file250 file264 file279 file291 file31 file44
> file62 file86
> file100 file114 file132 file146 file159 file174 file192 file209
> file225 file24 file252 file265 file28 file292 file32 file46
> file63 file87
> file101 file116 file134 file147 file16 file18 file196 file210
> file228 file240 file254 file266 file281 file293 file37 file47
> file66 file9
> file102 file12 file135 file148 file161 file181 file198 file212
> file229 file241 file255 file267 file284 file294 file38 file48
> file69 file91
> file103 file121 file136 file149 file165 file183 file200 file215
> file231 file243 file256 file268 file285 file295 file4 file50
> file7 file93
> file104 file122 file137 file150 file17 file184 file201 file216
> file233 file245 file258 file271 file286 file296 file40 file53
> file71 file97
> file105 file124 file138 file152 file170 file186 file202 file218
> file234 file246 file261 file273 file287 file297 file41 file54
> file73
> file107 file125 file140 file153 file171 file188 file203 file220
> file236 file248 file262 file275 file288 file298 file42 file55
> file75
> file11 file13 file141 file154 file172 file189 file204 file222
> file237 file25 file263 file278 file290 file3 file43 file58
> file80
>
>
> ******* An ls of the mountpoint after this whole process only shows
> 159 files - the ones that are on s5n9. So everything that was on s5n10
> is gone!! ****
This matches the descirption in bug
https://bugzilla.redhat.com/show_bug.cgi?id=1024369.
Also in the bug comments, I can see it is confirmed that the issue is
not there in upstream master. But we need to back-port the fix/fixes to
3.4 branch.
-Lala
More information about the Gluster-users
mailing list