NSX: Bridging between VXLAN and VLAN

After you prepared your vSphere clusters for VXLAN you’re eager to start building your SDDC network. You provision some logical switches, a distributed logical router and maybe even an edge services gateway. Before you know it you are doing full-fledged network virtualization. It’s simple enough, right?

But then you realize you still have other virtual workloads and possibly all kind of physical equipment residing on that other network: The VLAN-based network.

Of course this is (hopefully) not the way you rollout VXLAN in your SDDC. Just like any other major change in your network (logical or physical) some kind of planning is required here. You should at least create a solid design for the VXLAN structure, the IP subnets, IP routing, and how it all connects and propagates to the physical network before you start implementing VXLAN. Things might get really complicated if you don’t.

But even with all the planning in the world you still might end up with workloads and equipment that for various reasons are stuck on VLANs. On top of that, some of these workloads require to be on the same L2 segment as the virtual workloads that you planned on migrating to VXLANs. This can be a short term (transitioning etc) or a long term requirement.

A helping hand

NSX BridgeOne component of NSX-V that comes in handy in a situation like this is the L2 bridge. The L2 bridge has a number of use cases including:

  • Migration: Physical to virtual, or virtual to virtual  without requiring re-IP.
  • Connectivity: Physical workloads not suitable for virtualization can maintain connectivity with virtual workloads inside of NSX.
  • Service insertion: Transparent integration of any physical appliance such as a router, load balancer or firewall into NSX.

There are some prerequisites and limitations:

  • L2 bridging requires a distributed logical router with a control VM
  • The VXLAN network and VLAN-backed port groups must be on the same distributed virtual switch and use the same physical NICs.
  • VLAN-backed port group must be configured with a VLAN ID (between 1 and 4094).
  • Don’t use a L2 bridge to connect a logical switch to another logical switch, a VLAN network to another VLAN network, or to interconnect data centers.
  • You can’t use a Universal logical router to configure bridging and you cannot add a bridge to a universal logical switch (cross-vCenter NSX objects).
  • A logical router can have multiple bridging instances, however, the routing and bridging instances cannot share the same VXLAN/VLAN network. Traffic to and from the bridged VLAN and bridged VXLAN cannot be routed to the bridged network and vice versa.
  • The recommended maximum is 500 bridging instances per distributed logical router. A number you’ll hopefully never need.

Configuring a L2 Bridge

The L2 bridge is configured with a couple of clicks over at the distributed logical router. Yes, a couple of API calls does the trick too.

At the DLR click the Manage tab and then Bridging. Click the green plus sign to add a bridge:

Skärmavbild 2018-09-08 kl. 19.48.50.pngType a name for the bridge and select a logical switch (VXLAN) and a distributed virtual port group (VLAN-backed). Click OK and don’t forget to publish your changes.

That’s all there is to it. You’re now bridging between a VXLAN and a VLAN.

Conclusion

VXLAN-VLAN bridging is not necessarily something you want to do over a long period of time as it adds some complexity to your environment. That being said, there are scenarios (mentioned above) where the L2 bridge is the right solution and it’s good to know that setting this up in NSX-V is a breeze.

VMware NSX 6.4.2 Released

nsxVMware has released NSX Data Center for vSphere 6.4.2 and it comes with some nice improvements.

Multicast routing is the big one in this release of course. It’s good to see that once again more NSX components can be managed/used with the vSphere Client (H5). Interesting as well is the addition of two new administrative roles: Network Engineer and Security Engineer.

Here’s a list of all that’s new:

Networking and Edge Services

  • Multicast Support: Adds ability to configure L3 IPv4 multicast on Distributed Logical Router and Edge Service Gateway through support of IGMPv2 and PIM Sparse Mode.
  • Default Limit of MAC identifiers: Increases from 2048 to 4096
  • Hardware VTEP: Added multi PTEP cluster capability to facilitate environments with multiple vCenters

Security Services

  • Context-Aware Firewall: Additional Layer 7 Application Context Support (EPIC, MSSQL, BLAST AppIDs)
  • Firewall Rule Hit Count: Monitor rule usage and easily identify unused rules for clean-up
  • Firewall Section Locking: Enables multiple security administrators to work concurrently on the firewall
  • NSX Application Rule Manager: Improved scale to 100 vNICs per session, further simplifying the process of creating security groups and whitelisting firewall rules for existing applications.

NSX User Interface

Operations and Troubleshooting

  • Authentication & Authorization: Introduces 2 new roles (Network Engineer and Security Engineer). Adds ability to enable/disable basic authentication.
  • NSX Scale Dashboard: Provides visibility into 25 new metrics. Adds ability to edit usage warning thresholds and filter for objects exceeding limits.
  • NSX Controller Cluster Settings: Specify common settings (DNS, NTP, Syslog) to apply to NSX Controller Cluster.
  • Support for VM Hardware version 11 for NSX components: For new installs of NSX 6.4.2, NSX appliances (Manager, Controller, Edge, Guest Introspection) are installed with VM HW version 11. For upgrades to NSX 6.4.2, please see Upgrade Notes for further details.

You can find the full release notes over here.

Updating my lab to 6.4.2 using the Upgrade Coordinator was a breeze. 🙂

A simple NSX-V load balancer lab

Skärmavbild 2018-08-21 kl. 21.19.57Recently I had to do a quick demo on the NSX-V logical load balancer. Setting up the NSX components in a small lab for such demo is a pretty straight-forward process. I thought I’d walk you through setting this up in case you want to play around with the logical load balancer or any of the other NSX components yourself.

The NSX-V logical load balancer in a nutshell

The NSX-V logical load balancer enables high availability service and distributes network traffic (L4 and L7) among multiple servers. Incoming requests are evenly distributed between backend servers that are grouped together in a pool.

It comes with configurable service monitors that perform health checking on backend servers. If one of the backend servers becomes unavailable, the service monitor marks that server as “DOWN” and stops distributing requests to it. When the backend server is available again, the service monitor marks it as “UP” and incoming requests are once again distributed to the server.

The NSX-V logical load balancer has all the functionality you can expect from a modern load balancer. There’s variety of features and two deployment models (inline vs one-armed), but they are beyond the scope of this simple lab. I recommend you have a look at the Logical Load Balancer chapter of the NSX Administration Guide for detailed documentation on the NSX-V logical load balancer.

The Lab Environment

Let’s have a  look at a small diagram showing the environment we’re going to build:

NSX-V Load Balancer PoC.png

 

Two virtual machines are connected to the “Web Tier” logical switch. The “Web Tier” uses IP subnet 172.16.1.0/24 as defined by the Distributed Logical Router (DLR). The DLR and the Edge Services Gateway (ESG)  connect using the “Transit” logical switch. The “Transit” VXLAN  uses the tiny 172.16.0.0/29 subnet. The ESG also acts as the gateway for the virtual network with its connection to the physical network.

The logical load balancer is a component of the ESG. Its job in this lab will be to distribute incoming web requests between the two web servers.

For this guide I assume NSX-V 6.4 is installed and configured and that your vSphere cluster is prepared for VXLAN.
We need to reserve four IP addresses on the physical network’s subnet to be used by the ESG. For this lab we also need Internet access from the physical network as well as access to a DNS server.

We don’t need a lot of resources for this lab. A single physical ESXi 6.5/6.7 host running a nested vSphere environment will do the job.

Building the NSX infrastructure

We start by creating the two logical switches “Web Tier” and “Transit”. The default settings for the switches are fine for this lab:

Skärmavbild 2018-08-13 kl. 20.19.10.png

Next we deploy the distributed logical router:

Skärmavbild 2018-08-13 kl. 20.24.02.png

At the “Configure interfaces” step we create and configure two interfaces: An uplink interface we’ll call “2ESG” which connects to the “Transit” logical switch and an internal interface that we’ll call “2WebTier” which connects to the “Web Tier” logical switch. We configure the interfaces using the IP addresses from the diagram above:

Skärmavbild 2018-08-13 kl. 20.32.18.png

The DLR’s default gateway is the IP address of the ESG’s internal interface, accessed via the “2ESG” interface:

Skärmavbild 2018-08-13 kl. 20.38.45.png

Once the DLR is deployed we go to its settings and configure DHCP relay. The DHCP relay server in this lab is the IP address of the ESG’s internal interface (we’ll configure a DHCP server on the ESG in a moment). We also need to enable a DHCP relay agent on the DLR’s “2WebTier” interface:

Skärmavbild 2018-08-13 kl. 20.52.10.png

Now it’s time to deploy the Edge Services Gateway appliance:

Skärmavbild 2018-08-13 kl. 20.55.05.png

We choose a compact appliance size and configure the ESG with two interfaces: An uplink interface we call “2Physical” which connects to a VLAN backed port group on the distributed switch as well as an internal interface we call “2DLR” which connects to the “Transit” logical switch.

We’re going to configure four IP addresses on the “2Physical” interface: One primary IP address and three secondaries. One of the secondaries will be used as the VIP address of the load balancer. The other two will be used for NAT:

Skärmavbild 2018-08-13 kl. 21.40.50.png

The default gateway for the ESG resides on the physical network. In my environment it’s at 10.0.1.1 and reached using the “2Physical interface:

Skärmavbild 2018-08-13 kl. 21.53.03.png

So let’s setup the DHCP server on the ESG. We do this under the “DHCP” tab of the ESG. We need to configure a DHCP pool and enable the DHCP service. Don’t forget to configure a primary name server. The “Web Tier” servers need one.

Skärmavbild 2018-08-13 kl. 21.58.55.png

Finally we need to let the ESG know how to reach the “Web Tier” subnet. We do this by adding a static route under “Routing”:

Skärmavbild 2018-08-21 kl. 19.52.38.png

Now the ESG knows that it should use the DLR to reach the “Web Tier” subnet.

Deploying the Virtual Machines

Now that the NSX infrastructure is in place we’ll continue with the deployment of our two virtual machines in the “Web Tier” VXLAN.

In this lab we use VMware’s Photon OS which is an open source minimal Linux container host optimized for vSphere. Download the OVA for vSphere 6.5 or newer and deploy two virtual machines from it. Make sure to connect both to the “Web Tier” logical switch.

Skärmavbild 2018-08-16 kl. 22.01.26.png

Once booted, the DHCP server on the ESG should have assigned IP addresses to the VMs. Take a note of the IPv4 addresses as we need them for our NAT rules:

Skärmavbild 2018-08-19 kl. 16.27.43.png

Configuring NAT

NAT rules are defined on the ESG under the “NAT” tab. First of all we want to provide Internet access to our “Web Tier” virtual machines so we can patch the OS and pull the NGINX container. For this we have to create a source NAT (SNAT) rule that looks like this:

Skärmavbild 2018-08-16 kl. 22.19.32.png

The primary IP address of the ESG “2Physical” interface (10.0.1.240) is used to represent any IP address from the “Web Tier” subnet (172.16.1.0/24) on the physical network.
Don’t worry about protocol for this lab.

As we also want to SSH into our virtual machines from the physical network, we need to create two destination NAT (DNAT) rules. One for each server. For my web-1 virtual machine the DNAT rule looks like this:

Skärmavbild 2018-08-16 kl. 22.27.53.png

IP address 10.0.1.242, one of the secondary IP addresses of the ESG’s “2Physical” interface, is translated to web-1’s IP address (172.16.1.50).
Don’t forget to create one more DNAT rule for the web-2 virtual machine using one of the other secondary IP addresses of the ESG’s “2Physical” interface (I chose 10.0.1.241 for example).

Installing the Web Servers

With the NAT rules in place we should be able to SSH into our virtual machines from the physical network:

ssh root@10.0.1.242

The default root password on the Photon OS OVA is “changeme” which we’ll have to change after first login.

We can now test if our SNAT rule is working by querying and installing available OS updates. Do this on both machines:

tdnf update

When it’s done updating reboot the servers:

reboot

After reboot, SSH back into the virtual machines to enable and start the Docker engine:

systemctl enable docker
systemctl start docker

Next pull the NGINX docker container:

docker pull nginx

We want the NGINX server to use an “index.html” that is stored outside of the docker  container. Create an “index.html” on each of the servers:

mkdir -p ~/docker-nginx/html
vi ~/docker-nginx/html/index.html

Copy and paste this into index.html on web-1:

<h1>This is web-1</h1>

And paste this into index.html on web-2:

<h1>This is web-2</h1>

Set the following permissions on the html directory (never do this in production):

chmod 755 ~/docker-nginx -R

We can now run the container. On each of the virtual machines issue the following command:

docker run --name docker-nginx -p 80:80 -d -v ~/docker-nginx/html:/usr/share/nginx/html nginx

With the containers up and running we should be able to access the web servers from the physical network. In my case I browse to http://10.0.1.242 and hit web-1’s “index.html”

Skärmavbild 2018-08-18 kl. 20.05.16

Make sure this works on both web servers before moving on.

Configuring the Logical Load Balancer

The web servers are up and running. It’s time to enable high availability service and start load balancing.

We start by defining a new server pool we’ll call “web-pool”. On the ESG click the “Load Balancer” tab and choose “Pools”. Add a new pool:

Skärmavbild 2018-08-18 kl. 22.02.43.png

Add our two virtual machines to the pool. Notice that you can specify vCenter/NSX objects (which require VMware Tools or ARP/DHCP Snooping) as well as IP addresses. The Photon OS virtual machines are running open-vm-tools meaning we can pick virtual machine objects for these servers.

The other thing we need to specify here is the service monitor. Choose “default_http_monitor” which by default executes a “HTTP GET” to the servers in the pool every 5 seconds.

Head over to “Virtual Servers”. Here we configure our virtual IP (VIP) and tie it to our server pool:

Skärmavbild 2018-08-18 kl. 22.30.32

Give the virtual server a name and enter the last available secondary IP address of the ESG’s “2Physical” interface (10.0.1.243 in my case). Select the pool that we just created as the default pool.

Finally enable the load balancer service under “Global Configuration”.

Testing the Logical Load Balancer

It’s time to test the load balancer. Open a web browser and navigate to http://VIP_IP. In my case http://10.0.1.243. You should now hit one of the web server’s “index.html”:

Skärmavbild 2018-08-19 kl. 17.11.37.png

Now reload the page in your browser.

Skärmavbild 2018-08-19 kl. 17.14.31.png

Voilà! The load balancer sent the request to the next available server in the pool which in my case was web-1.

Every time you reload the web page you will end up on the next available server in the pool. This is the intended behavior when using the Round Robin load balancing algorithm.

Let’s stop the NGINX container on one of the virtual machines:

docker stop docker-nginx

Now reload the web page a couple of times. Instead of a timeout every other time the load balancer keeps sending the requests to the only healthy server left in the pool.

Start the NGINX container again:

docker start docker-nginx

Keep reloading the web page in your browser and you’ll notice that it doesn’t take many seconds before the load balancer picks up the server and start sending requests to it again.

Conclusion

NSX-V, a key component of the VMware vSphere SDDC, delivers a very competent load balancer that is easy to install and configure.
In this short exercise we kept things really simple. There are more things to consider in a production environment, but building a lab like this hopefully gives you some idea of the concepts and components involved configuring a NSX-V logical load balancer.