Multiple routes to external network

Asked by Graham Hemingway

I am very close to getting a working Folsom + Quantum, but am having some problems in external network connectivity. I am trying to set up the "Provider Router with Private Networks" case (see http://docs.openstack.org/trunk/openstack-network/admin/content/use_cases_single_router.html). I am also using EmilienM's guide to help me along, though I know there are some differences.

So, here is the issue. I can ping and SSH into the VM, but with lots of issues. First, my SSH connection gets dropped all the time and the VM can not reach back out the external network. No ping no dnslookup.

I think the issue stems from how I have set up my OVS bridges. I think that multiple conflicting routes are being set up. Here is some debug output for reference:

foo@cloud1:~# ip r
10.0.0.0/24 dev eth4 proto kernel scope link src 10.0.0.3
10.5.5.0/24 dev tapdbd89f9a-05 proto kernel scope link src 10.5.5.2
10.5.5.0/24 dev qr-c89b0922-f7 proto kernel scope link src 10.5.5.1
99.59.104.0/23 dev qg-21f69e18-c0 proto kernel scope link src 99.59.105.185
99.59.104.0/23 dev br-ex proto kernel scope link src 99.59.105.184
192.168.49.0/24 dev eth3 proto kernel scope link src 192.168.49.244
192.168.50.0/24 dev eth2 proto kernel scope link src 192.168.50.244

For reference, 99.59.x.x is my public network, 192.168.x.x is my management network, and 10.x.x.x is for VMs.
As far as I can tell these are the following:
      tapdbd89f9a-05 (10.5.5.2) is the DHCP agent
      qr-c89b0922-f7 (10.5.5.1) is the port connecting the tenant network to the provider router
      qg-21f69e18-c0 (99.59.105.185) is the port connecting the provider router to the external network
      br-ex (99.59.105.184) is obviously the port for the br-ex bridge

Following EmilienM's guide I manually add the 99.59.105.184 address to br-ex, otherwise it would not have it. I don't see this step in any other guides.

Also as reference, here is the OVS config:

foo@cloud1:~# ovs-vsctl show
ba000344-9e7f-468d-9eeb-6d455be4938a
    Bridge br-int
        Port br-int
            Interface br-int
                type: internal
        Port "tapdbd89f9a-05"
            tag: 1
            Interface "tapdbd89f9a-05"
                type: internal
        Port patch-tun
            Interface patch-tun
                type: patch
                options: {peer=patch-int}
        Port "qr-c89b0922-f7"
            tag: 1
            Interface "qr-c89b0922-f7"
                type: internal
    Bridge br-tun
        Port br-tun
            Interface br-tun
                type: internal
        Port "gre-2"
            Interface "gre-2"
                type: gre
                options: {in_key=flow, out_key=flow, remote_ip="10.0.0.26"}
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
    Bridge br-ex
        Port br-ex
            Interface br-ex
                type: internal
        Port "eth0"
            Interface "eth0"
        Port "qg-21f69e18-c0"
            Interface "qg-21f69e18-c0"
                type: internal
    ovs_version: "1.4.0+build0"

Any help would be appreciated. I am very sorry I can't be more specific on the issue itself.
Thank you,
   Graham

Question information

Language:
English Edit question
Status:
Solved
For:
neutron Edit question
Assignee:
No assignee Edit question
Solved by:
Graham Hemingway
Solved:
Last query:
Last reply:

This question was reopened

Revision history for this message
Graham Hemingway (graham-hemingway) said :
#1

I found this: https://answers.launchpad.net/quantum/+question/208377
And after enough fooling around things started working. I am going to close this question, not because I know how it started working, but because it did start working.

G

Revision history for this message
dan wendlandt (danwent) said :
#2

thanks for the detailed write-up.

It seems like you likely has set use_namespaces = False, which is resulting in an overlapping route, which is confusing forwarding.

In particular, the overlapping routes are:

99.59.104.0/23 dev qg-21f69e18-c0 proto kernel scope link src 99.59.105.185
99.59.104.0/23 dev br-ex proto kernel scope link src 99.59.105.184

Basically, if you are not using namespaces, you should not assign an IP address to br-ex. If you are using namespaces, assigning the IP is required.

Revision history for this message
Graham Hemingway (graham-hemingway) said :
#3

Dan,

Thanks for the response. I am indeed have use_namespaces = False.
I am going to try removing the IP address from br-ex. Right now it looks like this (via ip a):

9: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether 78:2b:cb:07:27:ed brd ff:ff:ff:ff:ff:ff
    inet 99.59.105.184/23 scope global br-ex
    inet6 fe80::7a2b:cbff:fe07:27ed/64 scope link
       valid_lft forever preferred_lft forever

I am going to run this:

ip addr del 129.59.105.184/23 dev br-ex

Anything else I should need to do?
Thanks,
   Graham

Revision history for this message
Graham Hemingway (graham-hemingway) said :
#4

Ha, I meant

ip addr del 99.59.105.184/23 dev br-ex

Revision history for this message
Graham Hemingway (graham-hemingway) said :
#5

I can launch and and associate a floating ip with the instance. I can ping and ssh into the instance, but I can not get back out from the instance to anything outside of the cloud. The instance gets metadata fine. Also, my SSH connection gets dropped frequently.

Revision history for this message
Graham Hemingway (graham-hemingway) said :
#6

I think that maybe I just needed to let the network settle after I made that ip del call. Now I can SSH into the instance and it has external connectivity.

I still get a broken pipe error if I let the connection sit open for more than 5 seconds without traffic. If I just let it sit there with top running it works fine, so it seems I am hitting a timeout somewhere. I tried setting the timeouts for both the sshd config and for the ipv4/tcp_keepalive_time, but neither of these seemed to help.

Is there a keepalive setting for quantum/OVS somewhere?

Thanks,
   Graham

Revision history for this message
dan wendlandt (danwent) said :
#7

there's no explicit timeouts in quantum. my guess would be that you're hitting something with iptables, but 5 seconds is insanely low for that. I think iptables timeout default to days for an established connection.

Revision history for this message
Graham Hemingway (graham-hemingway) said :
#8

Well, I feel a bit sheepish. Turns out I had allocated a public IP that was already in use by someone else in our network. Once I moved to a free IP all was good.

Thanks Dan.