Instance directly connected to provider network does not receive DHCP reply

Asked by Michiel K

Short description of problem:
In an envrionment where virtual routers and floating IP addresses are fully working, connecting an instance directly to the provider network does not work. The instance is unable to communicate with the network. More specifically: It is able to send a DHCP request message, but does not receive a DHCP reply. The reply message is visible on the physical network.

Details on the environment:
1 controller node
2 compute nodes
2 network nodes

All nodes have access to a management network (eth0). The compute and network nodes also have access to a tenant network (OVS GRE on eth1) and a provider network (eth2). eth0 and eth1 have an IP. Eth2 is configured without an IP.

All nodes are running as virtual machines in a manually maintained VMware environment. The only required / specialized change was to allow promisc mode on the provider network, otherwise for example the virtual routers on the network nodes did not receive network traffic on eth2.

What does work:
- create tenant network
- create provider network
- create router
- create instance connected to tenant network
- create floating ip and assign it to instance
- test network traffic from instance to provider network (checked if the source ip of the ping is the floating IP: yes)
- test network traffic from provider network to floating ip (enable port 22 on security group and try ssh'ing to the instance): works

The following steps are of interest:
- create a new instance connected to provider network
- check (using "nova list" and "nova show") to see if a IP address was provisioned from the provider network: yes.
- once the instance boots, login (via console) into the instance see if it is able to get the IP address from the DHCP server: No.
- statically set the IP address to the interface (inside the instance, using "ifconfig eth0 x.x.x.x netmask x.x.x.x") and test communication: fails, both traffic from instance to provider network and the other way around.

Starting from the drawing in http://docs.openstack.org/trunk/openstack-network/admin/content/under_the_hood_openvswitch.html i started debugging where the DHCP packets would get lost.

The DHCP Request packet reaches all the way from the instances to the dnsmasq process on one of the network nodes. One thing which catches my attention is that all packets seem to be duplicated for some reason, but this should not be a problem. This only appears to happen on bridges connected to the provider network. When looking for example in the qdhcp-xxxx-xxx.. namespace on the network node, i only see the packet once, so i will ignore this for now.

dnsmasq on the network node replies with an DHCP Reply packet and reaches onto the provider network, back into the eth2 interface of the compute node:

(node2 is the compute node where the instance is running, verified fa:16:3e:47:d9:1b is the mac of the instance, and 192.168.103.230 is the ip it should have according to nova. Also verified that 192.168.103.231 is the ip of the dhcp namespace on the network node)
root@node2:~# tcpdump -i eth2 -n 'udp port 67 or udp port 68'
tcpdump: WARNING: eth2: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2, link-type EN10MB (Ethernet), capture size 65535 bytes
13:10:46.873239 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:10:46.873682 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:10:46.873922 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:10:46.873929 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:10:49.885679 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:10:49.886451 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:10:49.886705 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:10:49.886712 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325

It then reaches onto the phy-br-ex end of the veth pair:

root@node2:~# tcpdump -i phy-br-ex -n 'udp port 67 or udp port 68'
tcpdump: WARNING: phy-br-ex: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on phy-br-ex, link-type EN10MB (Ethernet), capture size 65535 bytes
13:12:09.591398 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:12:09.591831 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:12:09.592604 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:12:09.592616 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:12:12.594992 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:12:12.595972 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:12:12.596023 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:12:12.597883 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325

but at the other end of the veth pair some packet loss occurs (only the reply packets)

root@node2:~# tcpdump -i int-br-ex -n 'udp port 67 or udp port 68'
tcpdump: WARNING: int-br-ex: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on int-br-ex, link-type EN10MB (Ethernet), capture size 65535 bytes
13:13:41.167895 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:13:41.168812 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:13:41.168925 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:13:41.169358 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:13:44.801697 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:13:44.801984 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:13:47.806615 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:13:47.807393 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:13:47.807739 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:13:47.807741 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:13:50.810922 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:13:50.811742 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:13:50.811912 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:13:50.811913 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:13:54.758003 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:13:54.758673 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:13:57.762639 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:13:57.763403 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:13:57.763643 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325
13:13:57.763644 IP 192.168.103.231.67 > 192.168.103.230.68: BOOTP/DHCP, Reply, length 325

When bringing the br-int interface up on the hypervisor and checking the contents there, it completely lost the reply packet:

root@node2:~# ifconfig br-int up && tcpdump -i br-int -n 'udp port 67 or udp port 68'
tcpdump: WARNING: br-int: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-int, link-type EN10MB (Ethernet), capture size 65535 bytes
13:20:08.545207 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:20:08.546072 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:20:12.402099 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:20:12.402607 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:20:15.407187 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280
13:20:15.407960 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:47:d9:1b, length 280

On the qvo and further interfaces/bridges towards the instance, the reply packet is also not visible. (allthough on one occasion i did manage to see a single reply message, but was unable to reliably reproduce the result.)

Some additional information:

When setting verbose and debug to true in /etc/quantum/quantum.conf and restarting the quantum-plugin-openvswitch-agent, the log at /var/log/quantum/openvswitch-agent.log contains constant sudo quantum-rootwrap /etc/quantum/rootwrap.conf ovs-vsctl get/list-ports commands, causing some CPU load, but not something to cause problems i believe.

top - 13:31:43 up 1:50, 2 users, load average: 0.88, 0.94, 0.83
Tasks: 102 total, 4 running, 98 sleeping, 0 stopped, 0 zombie
Cpu(s): 51.4%us, 7.9%sy, 0.0%ni, 40.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1019476k total, 649212k used, 370264k free, 109780k buffers
Swap: 1046524k total, 0k used, 1046524k free, 161596k cached

Eventhough according to ifconfig promisc mode is not set on the eth2 interfaces on the compute and network nodes, the network node seems to work fine handling traffic (floating ip addresses and everything). Setting promisc mode on the compute node has no effect as far as I can tell. I also believe this is not the problem, since I can see the traffic on eth2 and the veth end connecting to it.

Question information

Language:
English Edit question
Status:
Solved
For:
neutron Edit question
Assignee:
No assignee Edit question
Solved by:
Michiel K
Solved:
Last query:
Last reply:

This question was reopened

Revision history for this message
Michiel K (michiel-a) said :
#1

I discovered ask.openstack.org and posted the question also there (https://ask.openstack.org/question/3555/instance-directly-connected-to-provider-network-does-not-receive-dhcp-reply/) - If this is unwanted, please let me know.

Revision history for this message
Michiel K (michiel-a) said :
#2

Posted possible solution in https://ask.openstack.org/en/question/3555/instance-directly-connected-to-provider-network-does-not-receive-dhcp-reply/?answer=3740#post-id-3740

Environment might have been externally influenced eventhrough the TCPdumps were strongly suggesting a problem in Open vSwitch. After complete redeploy of the environment in the last few days provider networks works out of the box.

Revision history for this message
Michiel K (michiel-a) said :
#3

Unfortunately, the possible solution did not work out fully. New environments deployed still have this problem, so i still require some assistance with this problem.

Revision history for this message
Michiel K (michiel-a) said :
#4

Found the cause: The double traffic (see the TCPdumps) screws up the ARP tables. The double traffic is due to an error in physical networking configuration.

Outgoing traffic from the compute node's eth2 (which is a virtual machine within the vmware environment) is receiving echo's of its outgoing traffic. See the TCP dumps. This screws up the ARP tables in the Open vSwitch switches, causing them to think the mac address of the instance is located on eth2, instead of bridges deeper down into the compute node. As a result, when Open vSwitch receives the DHCP Reply, it sends it out eth2, instead of forwarding it to the instance.

The echo'ing only happens if the policy on VMware's vSwitch is set to allow promisc traffic. This however is not a problem within the VMware environment, but rather a problem in the physical switches connecting those physical VMware servers. I haven't found this cause yet, but i did connect my environment to an isolated set of networks and it started working.