[Gluster-users] dude, where is my data ? AKA what is gluster doing with my file...?

Carlos Capriotti capriotti.carlos at gmail.com
Thu Mar 27 18:45:55 UTC 2014


Hello all.

I have a very curious puzzle here, an maybe you want to chip in your
opinion.

My current setup, which I was hoping would be the last, is the following:

One node: 1xDell 2950 8 GB ram,with an external RAID (Ataboy2) connected
via a SCSI card, issuing about 70 MB/s (slow, I know).

One node Dell 2950 with internal raid. Raid 5.

both Dells have 4 (yep, four) bonded NICs using the useless mode 6. I
recently learned that REAL link aggregation is switch dependent, so, if you
want SPEED, never mind playing around with software.

I have a REPLICATED volume using both servers with the respective bricks.

As part of the scenario I have an Isilon, which is my primary storage for a
few (10) fairly gig VMware images.

One of the nodes mounts a NFS share to the isilon and mounts the gluster
volume using the native glusterfs option.

I am on a unique situation where I can afford suspending the VM servers for
a few hours for a backup, so I wrote a nice simple bash script, ran from
that node, that does exactly this:

pauses the VM
uses cp to copy the pertinent files from the Isilon to the gluster volume.
resumes the VM

and repeats that for all of the servers.

Simple and elegant, I like to think.

BUT, here is the trouble:

For 20 SOLID minutes the system sits, reporting a steady connection to the
isilon, receiving at 740 Mbps. Impressive you would thing, right ? That
would be something like 150 GB of data.

Well, the problem is, NOTHING is written to the disks. NO disk at all. And,
with 8 GM of ram, it is not going to the memory either.

There is NOTHING on gluster-related logs. Actually, the only entries there
are from two days ago, when I last rebooted the system.

There is nothing on system logs either, and, regarding network
communication, there is no data flow to ANY other peer or point on the
servers.

After those 20 minutes of going fast nowhere, THEN the system decides to
start transmitting data to the other node,  bandwidth usage falls to around
270-300 Mbps, now FINALLY we have data recorded to both volumes.

While the issue is happening, gluster-related processes are doing nothing.
No processing reported by top.

Any idea about what is going on here ?

I am not even sure this is gluster, but, well, I can't think of anything
else right now.

on the network, x.92 is the isilon.
x.23 and x.24 are the nodes.

While that bizarre behavior was taking place, a quick tcpdump showed:


19:31:41.355065 IP 10.0.1.24.ssh > 10.0.1.23.50250: Flags [P.], seq
20534432:20534816, ack 7393, win 108, options [nop,nop,TS val 209602327 ecr
209546443], length 384
19:31:41.355123 IP 10.0.1.24.ssh > 10.0.1.23.50250: Flags [P.], seq
20534816:20535648, ack 7393, win 108, options [nop,nop,TS val 209602327 ecr
209546443], length 832
19:31:41.355164 IP 10.0.1.23.50250 > 10.0.1.24.ssh: Flags [.], ack
20534432, win 1122, options [nop,nop,TS val 209546443 ecr 209602327],
length 0
19:31:41.355173 IP 10.0.1.24.ssh > 10.0.1.23.50250: Flags [P.], seq
20535648:20535856, ack 7393, win 108, options [nop,nop,TS val 209602327 ecr
209546443], length 208
19:31:41.355188 IP 10.0.1.92.nfs > 10.0.1.24.954: Flags [.], seq
602055725:602064673, ack 1211761, win 65535, options [nop,nop,TS val
200506019 ecr 209602324], length 8948
19:31:41.355208 IP 10.0.1.24.954 > 10.0.1.92.nfs: Flags [.], ack 602064673,
win 6530, options [nop,nop,TS val 209602327 ecr 200506019], length 0
19:31:41.355213 IP 10.0.1.92.nfs > 10.0.1.24.954: Flags [.], seq
602064673:602073621, ack 1211761, win 65535, options [nop,nop,TS val
200506019 ecr 209602324], length 8948
19:31:41.355218 IP 10.0.1.24.ssh > 10.0.1.23.50250: Flags [P.], seq
20535856:20536064, ack 7393, win 108, options [nop,nop,TS val 209602327 ecr
209546443], length 208
19:31:41.355224 IP 10.0.1.92.nfs > 10.0.1.24.954: Flags [.], seq
602073621:602082569, ack 1211761, win 65535, options [nop,nop,TS val
200506019 ecr 209602324], length 8948
19:31:41.355235 IP 10.0.1.24.954 > 10.0.1.92.nfs: Flags [.], ack 602082569,
win 6530, options [nop,nop,TS val 209602327 ecr 200506019], length 0
19:31:41.355239 IP 10.0.1.24.ssh > 10.0.1.23.50250: Flags [P.], seq
20536064:20536416, ack 7393, win 108, options [nop,nop,TS val 209602327 ecr
209546443], length 352
19:31:41.355266 IP 10.0.1.23.50250 > 10.0.1.24.ssh: Flags [.], ack
20535648, win 1122, options [nop,nop,TS val 209546443 ecr 209602327],
length 0
19:31:41.355360 IP 10.0.1.23.50250 > 10.0.1.24.ssh: Flags [.], ack
20536416, win 1122, options [nop,nop,TS val 209546443 ecr 209602327],
length 0
19:31:41.355429 IP 10.0.1.92.nfs > 10.0.1.24.954: Flags [.], seq
602082569:602091517, ack 1211761, win 65535, options [nop,nop,TS val
200506019 ecr 209602324], length 8948
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140327/87034777/attachment.html>


More information about the Gluster-users mailing list