How stable is stable/diablo?
I have a long-running (several months) cluster based on stable/diablo, ubuntu 11.10 and kvm that is configured similarly to trystack. It usually works fine ( and I run devstack on the vms) but several times another user has reported that they lose ssh connectivity to various vms. When they try to nova roboot them the vms get stuck in REBOOT state as reported by nova list but some are actually still running. I go to the compute nodes and usually, but not always discover that 'virsh list' hangs. Sometimes restarting libvirt fixes the problem and sometimes I have to reboot the compute nodes. After that, most vms recover after being rebooted. I don't feel like I could deploy this technology in production and wondered what the trystack experience is or if any one has deployed diablo for a long period and not had these kind of problems?
I don't know whether these issues are hypervisor-specific either. This has happened 3 or 4 times since the cluster started running.
Question information
- Language:
- English Edit question
- Status:
- Answered
- Assignee:
- No assignee Edit question
- Last query:
- Last reply:
Can you help with this problem?
Provide an answer of your own, or ask David Kranz for more information if necessary.