Fail to run instance on RHEL6.2

Asked by David Geng

I am trying to setup a production env, here is my topology:
controller node:
         mysqld
         qpidd
         Horizon (httpd)
         openstack-nova-api
         openstack-nova-objectstore
         openstack-nova-scheduler
         openstack-nova-cert
identity node:
         openstack-keystone
storage node:
         openstack-glance-registry
         openstack-glance-api
         openstack-nova-volume
2 compute nodes:
         openstack-nova-compute
  openstack-nova-network
         openstack-nova-metadata-api

Here is my nova.conf on the compute node:

[DEFAULT]
verbose = True
osapi_host=172.16.1.20
#rabbit_host=172.16.1.20
qpid_hostname=172.16.1.20
ec2_host=172.16.1.20
image_service=nova.image.glance.GlanceImageService
glance_api_servers=172.16.1.28:9292
logdir = /var/log/nova
state_path = /var/lib/nova
lock_path = /var/lib/nova/tmp
dhcpbridge = /usr/bin/nova-dhcpbridge
dhcpbridge_flagfile = /etc/nova/nova.conf
force_dhcp_release = True
injected_network_template = /usr/share/nova/interfaces.template
libvirt_xml_template = /usr/share/nova/libvirt.xml.template
libvirt_nonblocking = True
vpn_client_template = /usr/share/nova/client.ovpn.template
credentials_template = /usr/share/nova/novarc.template
network_manager = nova.network.manager.FlatDHCPManager
iscsi_helper = tgtadm
sql_connection = mysql://nova:nova@172.16.1.20/nova
iscsi_ip_prefix = 172.16.1.20
connection_type = libvirt
firewall_driver = nova.virt.libvirt.firewall.IptablesFirewallDriver
rpc_backend = nova.rpc.impl_qpid
root_helper = sudo nova-rootwrap
auth_strategy = keystone
flat_interface = eth0
flat_network_bridge = br100
public_interface = eth1
multi_host = True
fixed_range=10.0.0.0/8
# VOLUMES
volume_group = nova-volumes
volume_name_template = volume-%08x
iscsi_helper = tgtadm
osapi_volume_listen = 172.16.1.28
libvirt_inject_partition = -1

Here is the output from my glance server:
# glance index
ID Name Disk Format Container Format Size
------------------------------------ ------------------------------ -------------------- -------------------- --------------
3241935f-d10e-4cbc-ab18-91ab1b28fef0 f16-jeos qcow2 ovf 104258723

I got following error in the compute.log when I try to run instance with the command:
nova boot mytest-gjh --flavor 2 --key_name mykey --image 3241935f-d10e-4cbc-ab18-91ab1b28fef0

2012-09-05 17:30:55 DEBUG nova.utils [req-31714935-3df3-41af-a95c-fe9e8bd60d04 4ccef92427bc437a8b4f1197155f5ccc 5843e8eec3ec496ead2e35881a15ee56] Running cmd (subprocess): mkdir -p /var/lib/nova/instances/instance-00000007/ from (pid=27944) execute /usr/lib/python2.6/site-packages/nova/utils.py:220
2012-09-05 17:30:55 INFO nova.virt.libvirt.connection [req-31714935-3df3-41af-a95c-fe9e8bd60d04 4ccef92427bc437a8b4f1197155f5ccc 5843e8eec3ec496ead2e35881a15ee56] [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] Creating image
2012-09-05 17:30:55 DEBUG nova.virt.libvirt.connection [req-31714935-3df3-41af-a95c-fe9e8bd60d04 4ccef92427bc437a8b4f1197155f5ccc 5843e8eec3ec496ead2e35881a15ee56] block_device_list [] from (pid=27944) _volume_in_mapping /usr/lib/python2.6/site-packages/nova/virt/libvirt/connection.py:1418
2012-09-05 17:30:55 DEBUG nova.utils [req-31714935-3df3-41af-a95c-fe9e8bd60d04 4ccef92427bc437a8b4f1197155f5ccc 5843e8eec3ec496ead2e35881a15ee56] Running cmd (subprocess): mkdir -p /var/lib/nova/instances/_base from (pid=27944) execute /usr/lib/python2.6/site-packages/nova/utils.py:220
2012-09-05 17:30:55 DEBUG nova.utils [req-31714935-3df3-41af-a95c-fe9e8bd60d04 4ccef92427bc437a8b4f1197155f5ccc 5843e8eec3ec496ead2e35881a15ee56] Attempting to grab semaphore "478158b1252149ce64d2174ee496caf8bd5f73bf" for method "call_if_not_exists"... from (pid=27944) inner /usr/lib/python2.6/site-packages/nova/utils.py:928
2012-09-05 17:30:55 DEBUG nova.utils [req-31714935-3df3-41af-a95c-fe9e8bd60d04 4ccef92427bc437a8b4f1197155f5ccc 5843e8eec3ec496ead2e35881a15ee56] Got semaphore "478158b1252149ce64d2174ee496caf8bd5f73bf" for method "call_if_not_exists"... from (pid=27944) inner /usr/lib/python2.6/site-packages/nova/utils.py:932
2012-09-05 17:30:56 DEBUG nova.utils [req-31714935-3df3-41af-a95c-fe9e8bd60d04 4ccef92427bc437a8b4f1197155f5ccc 5843e8eec3ec496ead2e35881a15ee56] Running cmd (subprocess): env LC_ALL=C LANG=C qemu-img info /var/lib/nova/instances/_base/478158b1252149ce64d2174ee496caf8bd5f73bf.part from (pid=27944) execute /usr/lib/python2.6/site-packages/nova/utils.py:220
2012-09-05 17:30:56 DEBUG nova.virt.images [req-31714935-3df3-41af-a95c-fe9e8bd60d04 4ccef92427bc437a8b4f1197155f5ccc 5843e8eec3ec496ead2e35881a15ee56] 3241935f-d10e-4cbc-ab18-91ab1b28fef0 was qcow2, converting to raw from (pid=27944) fetch_to_raw /usr/lib/python2.6/site-packages/nova/virt/images.py:96
2012-09-05 17:30:56 DEBUG nova.utils [req-31714935-3df3-41af-a95c-fe9e8bd60d04 4ccef92427bc437a8b4f1197155f5ccc 5843e8eec3ec496ead2e35881a15ee56] Running cmd (subprocess): qemu-img convert -O raw /var/lib/nova/instances/_base/478158b1252149ce64d2174ee496caf8bd5f73bf.part /var/lib/nova/instances/_base/478158b1252149ce64d2174ee496caf8bd5f73bf.converted from (pid=27944) execute /usr/lib/python2.6/site-packages/nova/utils.py:220
2012-09-05 17:30:58 DEBUG nova.utils [req-31714935-3df3-41af-a95c-fe9e8bd60d04 4ccef92427bc437a8b4f1197155f5ccc 5843e8eec3ec496ead2e35881a15ee56] Result was 1 from (pid=27944) execute /usr/lib/python2.6/site-packages/nova/utils.py:236
2012-09-05 17:30:58 ERROR nova.compute.manager [req-31714935-3df3-41af-a95c-fe9e8bd60d04 4ccef92427bc437a8b4f1197155f5ccc 5843e8eec3ec496ead2e35881a15ee56] [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] Instance failed to spawn

2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] Traceback (most recent call last):
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 611, in _spawn
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] self._legacy_nw_info(network_info), block_device_info)
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] File "/usr/lib/python2.6/site-packages/nova/exception.py", line 114, in wrapped
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] return f(*args, **kw)
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/connection.py", line 935, in spawn
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] block_device_info=block_device_info)
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/connection.py", line 1249, in _create_image
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] size=size)
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/connection.py", line 1118, in _cache_image
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] call_if_not_exists(base, fn, *args, **kwargs)
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] File "/usr/lib/python2.6/site-packages/nova/utils.py", line 946, in inner
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] retval = f(*args, **kwargs)
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/connection.py", line 1115, in call_if_not_exists
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] fn(target=base, *args, **kwargs)
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/utils.py", line 296, in fetch_image
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] images.fetch_to_raw(context, image_id, target, user_id, project_id)
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] File "/usr/lib/python2.6/site-packages/nova/virt/images.py", line 99, in fetch_to_raw
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] path_tmp, staged)
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] File "/usr/lib/python2.6/site-packages/nova/utils.py", line 243, in execute
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] cmd=' '.join(cmd))
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] ProcessExecutionError: Unexpected error while running command.
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] Command: qemu-img convert -O raw /var/lib/nova/instances/_base/478158b1252149ce64d2174ee496caf8bd5f73bf.part /var/lib/nova/instances/_base/478158b1252149ce64d2174ee496caf8bd5f73bf.converted
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] Exit code: 1
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] Stdout: ''

2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] Stderr: 'qemu-img: error while reading sector 1052672: Input/output error\n'
2012-09-05 17:30:58 TRACE nova.compute.manager [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878]
2012-09-05 17:30:58 DEBUG nova.compute.manager [req-31714935-3df3-41af-a95c-fe9e8bd60d04 4ccef92427bc437a8b4f1197155f5ccc 5843e8eec3ec496ead2e35881a15ee56] [instance: 55d36f58-83ec-4fcd-83c9-b8ca54d6a878] Deallocating network for instance from (pid=27944) _deallocate_network /usr/lib/python2.6/site-packages/nova/compute/manager.py:635
2012-09-05 17:30:58 DEBUG nova.rpc.amqp [req-31714935-3df3-41af-a95c-fe9e8bd60d04 4ccef92427bc437a8b4f1197155f5ccc 5843e8eec3ec496ead2e35881a15ee56] Making asynchronous call on network ... from (pid=27944) multicall /usr/lib/python2.6/site-packages/nova/rpc/amqp.py:326
2012-09-05 17:30:58 DEBUG nova.rpc.amqp [req-31714935-3df3-41af-a95c-fe9e8bd60d04 4ccef92427bc437a8b4f1197155f5ccc 5843e8eec3ec496ead2e35881a15ee56] MSG_ID is 121df9b51708482c908e710a99f01fb2 from (pid=27944) multicall /usr/lib/python2.6/site-packages/nova/rpc/amqp.py:329
2012-09-05 17:30:59 ERROR nova.utils [req-31714935-3df3-41af-a95c-fe9e8bd60d04 4ccef92427bc437a8b4f1197155f5ccc 5843e8eec3ec496ead2e35881a15ee56] Original exception being dropped
2012-09-05 17:30:59 TRACE nova.utils Traceback (most recent call last):
2012-09-05 17:30:59 TRACE nova.utils File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 448, in _run_instance
2012-09-05 17:30:59 TRACE nova.utils injected_files, admin_password)
2012-09-05 17:30:59 TRACE nova.utils File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 611, in _spawn
2012-09-05 17:30:59 TRACE nova.utils self._legacy_nw_info(network_info), block_device_info)
2012-09-05 17:30:59 TRACE nova.utils File "/usr/lib/python2.6/site-packages/nova/exception.py", line 114, in wrapped
2012-09-05 17:30:59 TRACE nova.utils return f(*args, **kw)
2012-09-05 17:30:59 TRACE nova.utils File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/connection.py", line 935, in spawn
2012-09-05 17:30:59 TRACE nova.utils block_device_info=block_device_info)
2012-09-05 17:30:59 TRACE nova.utils File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/connection.py", line 1249, in _create_image
2012-09-05 17:30:59 TRACE nova.utils size=size)
2012-09-05 17:30:59 TRACE nova.utils File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/connection.py", line 1118, in _cache_image
2012-09-05 17:30:59 TRACE nova.utils call_if_not_exists(base, fn, *args, **kwargs)
2012-09-05 17:30:59 TRACE nova.utils File "/usr/lib/python2.6/site-packages/nova/utils.py", line 946, in inner
2012-09-05 17:30:59 TRACE nova.utils retval = f(*args, **kwargs)
2012-09-05 17:30:59 TRACE nova.utils File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/connection.py", line 1115, in call_if_not_exists
2012-09-05 17:30:59 TRACE nova.utils fn(target=base, *args, **kwargs)
2012-09-05 17:30:59 TRACE nova.utils File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/utils.py", line 296, in fetch_image
2012-09-05 17:30:59 TRACE nova.utils images.fetch_to_raw(context, image_id, target, user_id, project_id)
2012-09-05 17:30:59 TRACE nova.utils File "/usr/lib/python2.6/site-packages/nova/virt/images.py", line 99, in fetch_to_raw
2012-09-05 17:30:59 TRACE nova.utils path_tmp, staged)
2012-09-05 17:30:59 TRACE nova.utils File "/usr/lib/python2.6/site-packages/nova/utils.py", line 243, in execute
2012-09-05 17:30:59 TRACE nova.utils cmd=' '.join(cmd))
2012-09-05 17:30:59 TRACE nova.utils ProcessExecutionError: Unexpected error while running command.
2012-09-05 17:30:59 TRACE nova.utils Command: qemu-img convert -O raw /var/lib/nova/instances/_base/478158b1252149ce64d2174ee496caf8bd5f73bf.part /var/lib/nova/instances/_base/478158b1252149ce64d2174ee496caf8bd5f73bf.converted

And I can not find /var/lib/nova/instances/_base/478158b1252149ce64d2174ee496caf8bd5f73bf.part this file on my compute node, does that mean my image is not downloaded from the glance server?

Any thought?
Are there any error or missing in my nova.conf?

Thanks in advance!

Question information

Language:
English Edit question
Status:
Solved
For:
OpenStack Compute (nova) Edit question
Assignee:
No assignee Edit question
Solved by:
David Geng
Solved:
Last query:
Last reply:
Revision history for this message
David Geng (genggjh) said :
#1

My version info:
# nova-manage version
2012.1.1 (2012.1.1-LOCALBRANCH:LOCALREVISION

# rpm -qa |grep glance
python-glance-2012.1.1-1.el6.noarch
openstack-glance-2012.1.1-1.el6.noarch

Revision history for this message
Pádraig Brady (p-draigbrady) said :
#2

Note those packages remove the .part file on any error so that stale files are not left around.
Perhaps your system ran out of space?
I can't tell though, because you seem to have left out the "Exit code:" and "Stderr:" portions of the trace.

Revision history for this message
David Geng (genggjh) said :
#3

Here are the rc and stderr:

2012-09-05 17:30:59 TRACE nova.utils Exit code: 1
2012-09-05 17:30:59 TRACE nova.utils Stdout: ''
2012-09-05 17:30:59 TRACE nova.utils Stderr: 'qemu-img: error while reading sector 1052672: Input/output error\n'
2012-09-05 17:30:59 TRACE nova.utils
2012-09-05 17:30:59 ERROR nova.rpc.amqp [req-31714935-3df3-41af-a95c-fe9e8bd60d04 4ccef92427bc437a8b4f1197155f5ccc 5843e8eec3ec496ead2e35881a15ee56] Exception during message handling

And df output:
[root@gn03idp035 nova]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 49G 1.9G 44G 5% /
tmpfs 56G 0 56G 0% /dev/shm
/dev/sda3 1.1T 200M 990G 1% /iaas
There are still 44G available under /.

BTW, can I specify the .part files save to another location?

Revision history for this message
Pádraig Brady (p-draigbrady) said :
#4

So it's not a space issue. qemu-img is getting EIO. This could be due to any of:

* disk failing
* qemu-img bug: http://lists.gnu.org/archive/html/qemu-devel/2011-10/msg00222.html
* corrupted qcow2 image in glance
* corrupted qcow2 image transfer from glance

That last possibility is eerily similar to:
https://bugzilla.redhat.com/show_bug.cgi?id=832449

How reproducible is this for you?

If possible could you verify the image is OK in glance using something like:
qemu-img check /var/lib/glance/images/$the_image

If that's Ok then would it be possible to add debugging to the "qemu-img convert" line (99) in:
/usr/lib/python2.6/site-packages/nova/virt/images.py
and restarting the openstack-nova-compute service?
Something along the lines of comment 16 in the bug above, i.e.

try:
    utils.execute('qemu-img', 'convert', '-O', 'raw', path_tmp, staged)
except:
    glance_path = os.path.join('/var/lib/glance/images', image_href.split('/')[-1])
    out, err = utils.execute('cmp', path_tmp, glance_path)
    LOG.error("comparing fetched image %s to master %s: %s" % (path_tmp, glance_path, out))

thanks!

Revision history for this message
Eoghan Glynn (eglynn) said :
#5

> And I can not find /var/lib/nova/instances/_base/478158b1252149ce64d2174ee496caf8bd5f73bf.part
> this file on my compute node, does that mean my image is not downloaded from the glance server?

No, I think this is because the nova image code is written to remove the part file if the image conversion fails.

It would be useful to leave the .part file around for diagnostic purposes, to see whether it was truncated or corrupted or simply empty.

You could acheive this by temporarily commenting out the line:

    os.unlink(path_tmp)

that closely follows line 99 in /usr/lib/python2.6/site-packages/nova/virt/images.py and then restarting the nova-compute service.

Can you try that and report back?

Revision history for this message
Eoghan Glynn (eglynn) said :
#6

Ignore my reference to the unlink call, that was in a previous version of the code.

The part file removal is now handled by utils.remove_path_on_error.

Revision history for this message
David Geng (genggjh) said :
#7

I got error when check the image:

[root@gn02idp028 iaas]# qemu-img check /iaas/glance/images/092fe525-f647-49a8-88f7-3ae82537e55c
Warning: cluster offset=0x6370000 is after the end of the image file, can't properly check refcounts.
Warning: cluster offset=0x6370000 is after the end of the image file, can't properly check refcounts.
.....
Warning: cluster offset=0xcba0000 is after the end of the image file, can't properly check refcounts.

64 internal errors have occurred during the check.

An error has occurred during the check: Success
The check is not complete and may have missed error.

Very thanks for your support, I will download the image and try again.

It would be better if this problem can be caught when I try to add image through glance.

Anyway, I will update this question once I get the new image.

Revision history for this message
Eoghan Glynn (eglynn) said :
#8

> It would be better if this problem can be caught when I try to add image through glance.

Fair point, but as things currently stand, glance is completely agnostic about image content ... it doesn't perform any kind of validity checks, and instead assumes the error will emerge if and when the image is used.

Revision history for this message
David Geng (genggjh) said :
#9

Right, at least we can check the image before covert it in the nova-compute.

Revision history for this message
Eoghan Glynn (eglynn) said :
#10

True that, the output of qemu-img check is likely to be much more use for diagnosis than a failed convert.

One approach would be to only run the check *after* a failed conversion (prior to deletion), to avoid the overhead in the normal case.

In any case, can you file this enhancement request as a nova bug so it doesn't fall through the cracks.

Revision history for this message
David Geng (genggjh) said :
#11

Very thanks, after I registered the new image, the instance can be launched.
Bug #1046643 is opened.