[Gluster-users] VMs blocked for more than 120 seconds

Mon May 13 07:37:15 UTC 2019

what is the context from dmesg ?

On Mon, May 13, 2019 at 7:33 AM Andrey Volodin <andrevolodin at gmail.com>
wrote:

> as per
> https://helpful.knobs-dials.com/index.php/INFO:_task_blocked_for_more_than_120_seconds. ,
> the informational warning could be suppressed with :
>
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>
> Moreover, as per their website : "*This message is not an error*.
> It is an indication that a program has had to wait for a very long time,
> and what it was doing. "
> More reference:
> https://serverfault.com/questions/405210/can-high-load-cause-server-hang-and-error-blocked-for-more-than-120-seconds
>
> Regards,
> Andrei
>
> On Mon, May 13, 2019 at 7:32 AM Martin Toth <snowmailer at gmail.com> wrote:
>
>> Cache in qemu is none. That should be correct. This is full command :
>>
>> /usr/bin/qemu-system-x86_64 -name one-312 -S -machine
>> pc-i440fx-xenial,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp
>> 4,sockets=4,cores=1,threads=1 -uuid e95a774e-a594-4e98-b141-9f30a3f848c1
>> -no-user-config -nodefaults -chardev
>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-one-312/monitor.sock,server,nowait
>> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
>> -no-shutdown -boot order=c,menu=on,splash-time=3000,strict=on -device
>> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
>>
>> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4
>> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5
>> -drive file=/var/lib/one//datastores/116/312/*disk.0*
>> ,format=raw,if=none,id=drive-virtio-disk1,cache=none
>> -device
>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1
>> -drive file=gluster://localhost:24007/imagestore/
>> *7b64d6757acc47a39503f68731f89b8e*
>> ,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none
>> -device
>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0
>> -drive file=/var/lib/one//datastores/116/312/*disk.1*
>> ,format=raw,if=none,id=drive-ide0-0-0,readonly=on
>> -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0
>>
>> -netdev tap,fd=26,id=hostnet0
>> -device e1000,netdev=hostnet0,id=net0,mac=02:00:5c:f0:e4:39,bus=pci.0,addr=0x3
>> -chardev pty,id=charserial0 -device
>> isa-serial,chardev=charserial0,id=serial0
>> -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-one-312/org.qemu.guest_agent.0,server,nowait
>> -device
>> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
>> -vnc 0.0.0.0:312,password -device
>> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device
>> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on
>>
>> I’ve highlighted disks. First is VM context disk - Fuse used, second is
>> SDA (OS is installed here) - libgfapi used, third is SWAP - Fuse used.
>>
>> Krutika,
>> I will start profiling on Gluster Volumes and wait for next VM to fail.
>> Than I will attach/send profiling info after some VM will be failed. I
>> suppose this is correct profiling strategy.
>>
>> Thanks,
>> BR!
>> Martin
>>
>> On 13 May 2019, at 09:21, Krutika Dhananjay <kdhananj at redhat.com> wrote:
>>
>> Also, what's the caching policy that qemu is using on the affected vms?
>> Is it cache=none? Or something else? You can get this information in the
>> command line of qemu-kvm process corresponding to your vm in the ps output.
>>
>> -Krutika
>>
>> On Mon, May 13, 2019 at 12:49 PM Krutika Dhananjay <kdhananj at redhat.com>
>> wrote:
>>
>>> What version of gluster are you using?
>>> Also, can you capture and share volume-profile output for a run where
>>> you manage to recreate this issue?
>>>
>>> https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command
>>> Let me know if you have any questions.
>>>
>>> -Krutika
>>>
>>> On Mon, May 13, 2019 at 12:34 PM Martin Toth <snowmailer at gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> there is no healing operation, not peer disconnects, no readonly
>>>> filesystem. Yes, storage is slow and unavailable for 120 seconds, but why,
>>>> its SSD with 10G, performance is good.
>>>>
>>>> > you'd have it's log on qemu's standard output,
>>>>
>>>> If you mean /var/log/libvirt/qemu/vm.log there is nothing. I am looking
>>>> for problem for more than month, tried everything. Can’t find anything. Any
>>>> more clues or leads?
>>>>
>>>> BR,
>>>> Martin
>>>>
>>>> > On 13 May 2019, at 08:55, lemonnierk at ulrar.net wrote:
>>>> >
>>>> > On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote:
>>>> >> Hi all,
>>>> >
>>>> > Hi
>>>> >
>>>> >>
>>>> >> I am running replica 3 on SSDs with 10G networking, everything works
>>>> OK but VMs stored in Gluster volume occasionally freeze with “Task XY
>>>> blocked for more than 120 seconds”.
>>>> >> Only solution is to poweroff (hard) VM and than boot it up again. I
>>>> am unable to SSH and also login with console, its stuck probably on some
>>>> disk operation. No error/warning logs or messages are store in VMs logs.
>>>> >>
>>>> >
>>>> > As far as I know this should be unrelated, I get this during heals
>>>> > without any freezes, it just means the storage is slow I think.
>>>> >
>>>> >> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks
>>>> on replica volume. Can someone advice  how to debug this problem or what
>>>> can cause these issues?
>>>> >> It’s really annoying, I’ve tried to google everything but nothing
>>>> came up. I’ve tried changing virtio-scsi-pci to virtio-blk-pci disk
>>>> drivers, but its not related.
>>>> >>
>>>> >
>>>> > Any chance your gluster goes readonly ? Have you checked your gluster
>>>> > logs to see if maybe they lose each other some times ?
>>>> > /var/log/glusterfs
>>>> >
>>>> > For libgfapi accesses you'd have it's log on qemu's standard output,
>>>> > that might contain the actual error at the time of the freez.
>>>> > _______________________________________________
>>>> > Gluster-users mailing list
>>>> > Gluster-users at gluster.org
>>>> > https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190513/69552f10/attachment.html>