rutgerblom.com

Kubernetes – NSX-T Lab

January 1, 2020
A while back Dumlu Timuralp published an excellent guide on integrating NSX-T 2.5 with K8s. If you haven’t read it already I strongly recommend that you have a look at it. The guide goes through every step of configuring the integration and does a great job explaining the architecture and components that make up this solution.

Today’s article is a quick walkthrough of my NSX-T integrated K8s lab which is based on Dumlu’s guide.

Bill of materials

The following components are used in my NSX-T – K8s lab:
1. vSphere 6.7 U3
2. NSX-T 2.5.1
3. Ubuntu 18.04
4. Docker CE 18.06
5. Kubernetes 1.16
The lab environment

The starting point before setting up the K8s integration:

A standard vSphere platform consisting of a couple of ESXi hosts and a vCenter server. NSX-T has been deployed and an overlay transport zone has been configured.

On the logical network side of things I have a very basic setup with just a Tier-0 gateway for the North-South connectivity.

The above infrastructure is pretty much always in place and mostly left untouched. The components for the NSX-T – K8s integration are connected to this existing infrastructure. Let’s have a look at how that’s done.

NSX-T constructs

A couple of NSX-T constructs are needed for the K8s integration:
- Tier-1 gateway for K8s node management
- Segment for K8s node management
- Segment for K8s node data plane
- IP block for K8s namespaces
- IP block for K8s namespaces not doing source NAT
- IP pool for K8s Ingress or LoadBalancer service type
- IP pool for source NATing K8s Pods in the namespaces
- Two distributed firewall policies
Placing the components on the diagram for some clarity:

Nothing too complex, but creating and configuring this by hand takes some time. Especially when doing this many times, which is not uncommon in my lab, it gets boring.

Luckily, the NSX-T hierarchical policy API helps me out here. I simply specify the desired topology and its configuration as a piece of code and then tell the API to create it for me.

So here’s the JSON-code for the topology and components above. If you want to use it yourself make sure that you change the values for:
- tier0_path – the path to your Tier-0 gateway
- transport_zone_path – the path to your overlay transport zone
I send this code as the body of a PATCH request to:
```
PATCH https://<nsx-mgr>/policy/api/v1/infra
```
And in a matter of seconds the components are in place.

Ubuntu VMs

On the compute side my K8s cluster consists of three Ubuntu VMs: A master and two worker nodes. Each VM is configured with two NICs where one connects to the “k8s-nodetransport” segment and the other to the “k8s-nodemanagement” segment:

To get these three VMs up and running as quick as possible I built a vApp and stored it as a template in a vSphere content library:

Each of the VMs in this vApp template is pre-configured as follows:
- Hostname
- IP stack on the mgt NIC
- Persistent storage directories
- Python
- Docker
- Kubernetes (installed not initialized)
- NSX Container Plug-in installation files
- NSX Container Plug-in container image loaded to the local Docker repository
K8s cluster

Once the vApp is deployed the first thing I do is to initialize the K8s cluster:
```
k8s-master:~$ sudo kubeadm init
```
The two worker nodes are joined to the cluster. For example:
```
k8s-worker1:~$ sudo kubeadm join 10.190.22.10:6443 --token 8xlrqd.uuvi16c7bgacxihe --discovery-token-ca-cert-hash sha256:5ef8bae3ea509e9605bef2a931f0eeccce40da8ae857174df35fa9fd17d54371
```
At this point “kubectl get nodes” shows me:
```
kubectl get nodes
NAME          STATUS     ROLES    AGE     VERSION
k8s-master    NotReady   master   2m26s   v1.16.4
k8s-worker1   NotReady            67s     v1.16.4
k8s-worker2   NotReady            15s     v1.16.4
```
Without a CNI plug-in installed the “NotReady” status is expected.

NSX container plug-in

Before installing NCP I need to tag the three segment ports of the “k8s-nodetransport” segment as follows:

Scope Tag (k8s-master) Tag (k8s-worker1) Tag (k8s-worker2)
ncp/node_name k8s-master k8s-worker1 k8s-worker2
ncp/cluster k8s-cluster k8s-cluster k8s-cluster

The ubuntu-ncp.yaml manifest that deploys NCP is already prepared for my lab environment. If you want to use it make sure you change the values for the following settings so that they match your environment:
- nsx_api_managers
- nsx_api_user
- nsx_api_password
- overlay_tz
- tier0_gateway
The manifest is aligned with the JSON that I use to create the NSX-T components.

Installing the NSX container plugin from the master node by running:
```
kubectl apply -f ncp-ubuntu.yaml
```
After a minute or two the pods are running in their own “nsx-system” namespace:
```
kubectl get pods -n nsx-system
NAME                       READY   STATUS    RESTARTS   AGE
nsx-ncp-6978b9cb69-899q8   1/1     Running   0          2m8s
nsx-ncp-bootstrap-8879t    1/1     Running   0          2m8s
nsx-ncp-bootstrap-xlnqh    1/1     Running   0          2m8s
nsx-ncp-bootstrap-zqxh6    1/1     Running   0          2m8s
nsx-node-agent-7twld       3/3     Running   0          2m8s
nsx-node-agent-9n64w       3/3     Running   0          2m8s
nsx-node-agent-jww7g       3/3     Running   0          2m8s
```
The node status has changed to “Ready” now that NCP is installed:
```
NAME          STATUS   ROLES    AGE   VERSION
k8s-master    Ready    master   1h   v1.16.4
k8s-worker1   Ready             1h   v1.16.4
k8s-worker2   Ready             1h   v1.16.4
```
Step 5 – Deploy a workload

To have something to play around with I deploy a containerized WordPress in my K8s cluster. Here are the yaml files that I use to deploy WordPress in case you want to set this up yourself.

First I create a separate namespace for the workload:
```
kubectl create -f namespace.yaml
```
Next, I deploy WordPress in this namespace:
```
kubectl apply -k ./ -n wp
```
Running a “kubectl get pods -n wp” shows me something like this:
```
kubectl get pods -n wp
NAME                               READY   STATUS    RESTARTS   AGE
wordpress-55ddbf6d75-7zjc8         1/1     Running   1          109s
wordpress-mysql-78dddb6bf7-n8pvn   1/1     Running   0          109s
```
Running “kubectl get service -n wp” shows the external IP that is assigned by NSX-T from the “k8s-lb-pool”:
```
kubectl get service -n wp
 NAME              TYPE           CLUSTER-IP    EXTERNAL-IP    PORT(S)        AGE
 wordpress         LoadBalancer   10.101.4.78   10.190.10.51   80:30008/TCP   3m38s
 wordpress-mysql   ClusterIP      None                   3306/TCP       3m38s
```
And browsing to “10.190.10.51” brings up a familiar page:

NSX-T container networking is operational. Happy blogging! 🙂

Summary

No rocket science here, but using the NSX-T hierarchical policy API is a time saver and so are vApp templates and yaml manifests. Put something like Ansible on top of this and you’re looking at a fully automated K8s with NSX-T deployment.

Hopefully this post inspires or maybe even helps you setting up your own NSX-T – K8s integration. It’s a pretty awesome solution and one I plan on covering in future posts as I learn more about it myself.

Stay tuned!

Scope	Tag (k8s-master)	Tag (k8s-worker1)	Tag (k8s-worker2)
ncp/node_name	k8s-master	k8s-worker1	k8s-worker2
ncp/cluster	k8s-cluster	k8s-cluster	k8s-cluster

NSX-T Distributed Firewall Threshold Monitoring

December 18, 2019

Like any other firewall the NSX-T Distributed Firewall (DFW) consumes memory and CPU. Unlike other firewalls the DFW’s resource consumption is distributed, taking place on the transport nodes where the workloads it protects reside.

Memory allocation

An ESXi transport node allocates a fixed amount of memory for the different DFW components. The amount of memory allocated depends on the total amount of RAM installed. For an ESXi host with 128GB RAM or more the allocation looks like this (NSX-T version 2.5):

DFW Component	Description	Memory Max Size (MB)
vsip-attr	Stores additional attributes used by the L7 context engine	1024
vsip-flow	Stores flow monitoring data	768
vsip-fqdn	Stores resolved FQDN addresses	512
vsip-module	Memory allocated to the vsip kernel process	2560
vsip-rules	Stores DFW rules, address sets and containers	3070
vsip-si	Memory allocated to the service insertion architecture	128
vsip-state	Stores DFW state (existing connections/connection table)	512

Thresholds

For both DFW memory and CPU usage the default threshold is set at 90%. You can see thresholds and current resource usage by running the “nsxcli -c get firewall thresholds” command on an ESXi transport node:

A similar command can be used from an NSX Manager node: “on <transport-node-id> exec get firewall thresholds“.

It’s nice that we can monitor the DFW resource usage on a per transport node basis, but in most environments this method isn’t very practical.

In today’s article I want to have a look at two things concerning DFW resource monitoring. Firstly, at how to configure custom thresholds for memory and CPU usage. Secondly, at how to set up central threshold monitoring with alerting.

Configuring custom DFW thresholds

Below are the steps at a high level for configuring custom DFW thresholds:

Create an NSGroup containing transport nodes
Create a threshold profile
Apply threshold profile
Verify

Time to get our hands dirty!

Step 1 – Create an NSGroup containing transport nodes

We need to group our transport nodes. Currently only NSGroups, the ones managed by the MP API, support having transport nodes as members.

NSGroups are managed under Advanced Networking & Security > Inventory > Groups. I’m creating an NSGroup called “esxi-tn” with a membership criteria that will add all the host transport nodes as members:

Copy the NSGroup ID to a text file as we need it at step 3:

Step 2 – Create a threshold profile

Using a REST API client we’re going to make a POST request to the NSX MP API:

POST https://{{nsx-manager-fqdn}}/api/v1/firewall/profiles

The request body contains the following piece of JSON code:

{
     "cpu_threshold_percentage" : 75,
     "display_name" : "ESXi DFW Threshold Profile"
     "mem_threshold_percentage" : 75,
     "resource_type" : "FirewallCpuMemThresholdsProfile"
 }

The values for “cpu_threshold_percentage” and “mem_threshold_percentage” will depend on your requirements. For this exercise I’m configuring a threshold at 75% for both memory and CPU usage.

The POST request body and the result:

Copy the threshold profile’s ID from the result to a text file as we need it in the next step.

Step 3 – Apply threshold profile

The second API call configures a service-config that links the threshold profile to the NSGroup:

POST https://{{nsx-manager-fqdn}}/api/v1/service-configs

With the following JSON code as the request body:

{
     "display_name":"DfwCpuMemServiceConfig",
     "profiles":[
             {
                 "profile_type":"FirewallCpuMemThresholdsProfile",
                 "target_id":"c4a003e0-468d-4582-a24e-ada96742f0ca"
             }
         ],
     "precedence": 10,
     "applied_to": [
         {
             "target_id":"744db991-e504-4e4b-83eb-c60b94a7f785",
             "target_type" : "NSGroup"
         }
     ]
 }

The threshold profile and the NSGroup IDs that we copied to a text file earlier are used as the values for the two target_ids.

The POST request body and the result:

Step 4 – Verify

An easy way to verify that the new DFW thresholds have been applied is to run the “get firewall thresholds” NSXCLI command. This time I’ll run it from an NSX Manager node:

As we see the new threshold value of 75% has been applied.

Setting up alerting

You might wonder what actually happens when a threshold is crossed? Currently there’s no alarming framework in NSX-T so the only thing that happens is that a threshold event is logged to syslog.

Luckily there’s always vRealize Log Insight. Configured as a syslog target for the NSX-T platform, DFW threshold events end up there too:

A quick look at a DFW threshold event. We see things like the transport node, the DFW component that crossed the threshold, as well as the configured threshold and the current usage.

Now that we know what a threshold event looks like, it’s easy to configure an alert based on the query in vRLI:

text contains “threshold event is raised”

Click on “Create Alert from Query”:

Fill out the details for the new alert:

And that’s it. From now on you’ll be notified each time a DFW threshold is crossed.

Summary

Configuring custom DFW thresholds and monitoring these with Log Insight isn’t too hard to set up.
It’s true that with a proper DFW design and by sticking to good practices for implementation, problems related to DFW memory or CPU usage are rare. That being said, it’s not a bad idea to keep an eye on the DFW’s resource utilization. Just in case.

Locking NSX-T Firewall Policies

December 15, 2019
After receiving a couple questions about the NSX-T firewall policy locking feature, I decided to write a short blog post about it.

The purpose of locking a firewall policy

The easy part first. As explained in the official NSX-T documentation we lock a firewall policy to prevent multiple users from editing the same section.

Locks could be short term like when a team is working in the NSX Manager firewall UI at the same time and want to avoid configuration collisions. Locks could also be long(er) term. For example when somebody is tasked with building a more complex firewall ruleset or when policies are subject to change management.

Let’s start locking then!

Here’s where it can get a bit confusing. While the option to lock a policy is always available, it won’t have any effect until you implement and use Role Based Access Control (RBAC) for NSX-T management. Why?

The default “admin” account, which is the only account you can work with in the NSX Manager UI without RBAC, has the “Enterprise Admin” role assigned to it. This superuser role has permission to make changes to firewall policies even when they are locked.

So, if your team is using this default account (very bad practice) or individual accounts with the “Enterprise Admin” role assigned, you can lock firewall policies all you want, but these locks won’t have any actual effect.

Let’s fix this then!

Yes. As said this requires that we implement and use RBAC for NSX-T management first. There’s documentation available that will help you set this up so I won’t go through that in this article. On a high level the process looks like this:
1. Deploy vIDM
2. Connect vIDM to Active Directory
3. Configure remote app access for the NSX Manager
4. Configure NSX Manager to use vIDM for AAA
When that’s done we can start assigning NSX-T roles to Active Directory users:

Example

In this example I’m assigning the “Security Engineer” role to two AD users:

The two security engineers have been configured:

Let’s pretend “jsmith@demo.local” logs in to the NSX Manager UI and starts working on a new DFW policy:

When called into a meeting jsmith locks the policy he’s working on to prevent anybody from making changes:

Next, the other security engineer “pgroot@demo.local” logs in to the NSX Manager UI. She has a look at the new policy and decides to make a minor change to it. When she tries to publish the change the following message appears:

The change can’t be realized with her account. This is the expected and desired behaviour. The policy lock is enforced with RBAC implemented.

Summary

While most organizations have the RBAC components for NSX-T management in place (vIDM, AD, etc), actually leveraging NSX-T management roles so that things like locking firewall policies work is perhaps another thing. Hopefully this short article gave you some better understanding of how to get started.

Tier-1 Failure Domain

December 1, 2019

With every new release of NSX-T interesting features are added to the platform. Take failure domain for example.

Introduced in version 2.5, failure domain adds another layer of protection for the centralized services running on Tier-1 Gateways. It basically facilitates a rack aware placement mechanism for the Tier-1 service router (SR) components.

In today’s article I’m going to do a simple failure domain proof of concept. I’ll walk through the configuration steps for setting up failure domain and verify its functionality.

The lab environment

For this exercise I installed a vSphere cluster consisting of four ESXi hosts divided over two racks. I’m calling these the Edge racks and made this very advanced diagram:

I then deployed four NSX-T Edge nodes (EN1 – EN4), one on each host, and added these to NSX-T Edge Cluster “T0 Cluster ECMP”:

I threw in a Tier-0 Gateway called “T0-01” which is running in Active-Active HA mode with ECMP enabled. The Tier-0’s 8 uplinks are all taking part in forwarding North-South traffic, simultaneously:

Finally, I deployed four more Edge nodes (EN5 – EN8), one on each host, and added these to Edge cluster “T1 Cluster”:

The eight Edge nodes in the NSX Manager UI:

Next step – Create Tier-1s

I will create Tier-1 Routers (Manager API) as opposed to Tier-1 Gateways (Policy API). This because the API call to trigger a Tier-1 SR reallocate I want to run later on only works on Tier-1 Routers. This has nothing to do with the failure domain feature itself which is compatible with both Tier-1 Routers and Tier-1 Gateways of course.

Configuring the first Tier-1 called “T1-01”:

I’m selecting the “T1 Cluster” Edge Cluster and no specific Edge Cluster Members.

Both of the Tier-1s and the Tier-0 listed in the NSX Manager UI:

Tier-1 service routers

Selecting an Edge cluster for a Tier-1 indicates that you intend to run one or more centralized services on that Tier-1. This means that one active and one standby service router (SR) are instantiated on two different Edge nodes in that cluster (a Tier-1 SR always runs in Active-Standby HA mode).

By the way, you should not select an Edge Cluster for a Tier-1 if you don’t intend to run centralized services on it as this can lead to unintended hairpinning of traffic over the Edge nodes.

You noticed that I didn’t specify any Edge Cluster Members for the SRs. This results in the management plane picking them for me. So where did they end up?

Clicking the Active-Standby link for each of the Tier-1 Routers reveals the SRs location. “T1-01” has its active SR on Edge node EN7 and its standby SR on Edge node EN5:

“T1-02” has its active SR on Edge node EN8 and its standby SR on Edge node EN6:

Fine. Let’s have a look at the Edge rack again now that we have introduced these Tier-1 SRs to the environment:

My two Tier-1s are in separate racks. Great! Or is it?
With the current Tier-1 SR placement a single Edge rack failure will result in one of the Tier-1 Routers losing both its active and the standby SR. That’s pretty bad.

Failure Domain

Failure domain prevents this silly SR placement from happening. Correctly configured, failure domain ensures that the active and standby SRs of a Tier-1 are always placed in different racks.

Sounds great. Time set this up.

Step 1 – Create two failure domains

Failure domains are created using a POST request to the NSX API at:

POST https://{{nsx-manager-fqdn}}/api/v1/failure-domains/

The request body for my first failure domain contains the following piece of JSON code:

{   
"display_name": "Rack-1"
}

The JSON code for my second failure domain:

{   
"display_name": "Rack-2"
}

Creating the first failure domain using Postman:

Copy the value for “id” from the request result for each of the failure domains as we need these in the next step.

Step 2 – Assign Edge nodes to failure domains

The Edge nodes in the “T1 Cluster” need to be assigned to their respective failure domains. This too is done through an API call to the Manager API.

For each Edge node we first retrieve its current configuration using the following GET request:

GET https://{{nsx-manager-fqdn}}/api/v1/transport-nodes/{{edge-node-id}}

You can find the ID of an Edge node in the NSX Manager UI (or via API):

The GET request for Edge node EN5:

Copy the request result to the body of a new PUT request and change the value for “failure_domain_id” to match the ID of one of the newly created failure domains.

PUT https://{{nsx-manager-fqdn}}/api/v1/transport-nodes/{{edge-node-id}}

Which failure domain ID to use depends on the rack location of the Edge node. The following table lists the failure domain plan for my Tier-1 Edge nodes:

Edge Node	Failure Domain	Failure Domain ID
EN5	Rack-1	7e1af661-8e2c-43f7-924f-68eabce0f40b
EN6	Rack-2	d78707df-2f7f-48a9-9e3e-98a5523901c7
EN7	Rack-1	7e1af661-8e2c-43f7-924f-68eabce0f40b
EN8	Rack-2	d78707df-2f7f-48a9-9e3e-98a5523901c7

Four GET/PUT requests later the Edge nodes have been assigned to the correct failure domains.

Step 3 – Configure the Edge Cluster

The “T1 Cluster” Edge Cluster needs to be configured for failure domain based placement. This is also done via the API.

First a GET request to retrieve the current configuration of the Edge Cluster:

GET https://{{nsx-manager-fqdn}}/api/v1/edge-clusters/{{edge-cluster-id}}

The “edge-cluster-id” can be found in the NSX Manager UI (or via API):

The GET request’s result in JSON:

Again, you copy the request result to the body of a new PUT request. The only thing that we need to change here is the value for “allocation_rules”

from:

"allocation_rules": [],

to:

"allocation_rules": [ {"action": {"enabled": true,"action_type": "AllocationBasedOnFailureDomain" } } ],

Send the PUT request to:

PUT https://{{nsx-manager-fqdn}}/api/v1/edge-clusters/{{edge-cluster-id}}

And we’re done. From now on this Edge Cluster will perform failure domain based placement for new Tier-1 SRs.

A new Tier-1

Let’s put this to the test immediately by creating a new Tier-1.

Here comes “T1-03”:

Once again I’m selecting the “T1 Cluster” Edge cluster and no specific Edge nodes (= Auto Allocated). So where did the management plane decide to place the SRs this time?

The Active SR is on EN5 and the standby SR on EN6. They indeed ended up in separate racks!

Existing Tier-1s

What about the Tier-1 SRs that were deployed before we configured failure domains? Can we trigger a reallocation so that they too are placed in accordance to the new failure domain configuration?

It turns out that we can, but it’s a data plane disruptive operation ~~and, as far as I know, only works for Tier-1s created through the manager API (or in the UI under Advanced Networking & Security).~~ Thank you Gary Hills for letting us know that the reallocate API call works for Tier-1s created in Policy UI/API as well by adding a header to the below request with key “X-Allow-Overwrite” and value of “true”:

A POST request on each of the existing Tier-1s will do the trick:

POST https://{{nsx-manager-fqdn}}/api/v1/logical-routers/{{logical-router-id}}?action=reallocate

The request body should contain the following JSON:

{
   "edge_cluster_id": "{{edge-cluster-id}}"
}

The values for “logical-router-id” and “edge-cluster-id” can be found in the NSX Manager UI (or via API).

Request accepted by the API:

A reallocation process now takes places behind the scenes. A few moments later we see that the active and standby SRs of the existing Tier-1s are now in separate racks:

Let’s have a last look at the Edge rack after implementation and enforcement of the failure domains:

Looks so much better now!

Summary

Today we had a look at how to set up Tier-1 failure domain in NSX-T 2.5. The goal was to ensure that active and standby Tier-1 SRs ended up in separate racks.

Failure domain is a pretty cool and useful new feature adding extra protection for the Tier-1 SRs. Currently configurable via the API only, but that process was straight forward. With just a couple of request we got failure domains up and running.

Whether Tier-1 failure domain makes sense in your environment will depend on your NSX Edge design, number of Edge nodes, and things like future growth.

Good luck!

Bulk Create NSX-T Segments Using A Postman Data File

November 23, 2019

Imagine this, you’ve been tasked with implementing micro-segmentation in your vSphere environment. You just deployed and configured NSX-T and the next step is to migrate VMs from their VDS port groups to N-VDS segments.

You fire up the vSphere Client and expand the VDS to have a look at the current situation:

It’s pretty bad.

Turns out your VMs are connected to no less than 784 different port groups! Overlay networking/consolidation and re-IP are currently not part of the plan so you’re stuck with these 784 VLANs. You now realize that you need to create 784 VLAN backed segments in NSX-T. Life sucks.

Postman to the rescue

In today’s short post I want to share an easy way that can help you out in a scenario like the one above. It involves the NSX-T Policy API, Postman, and a text file. Let’s go!

Step 1 – Prepare the CSV file

First we need to create a simple text file that contains values for the NSX-T segments and their corresponding VLAN IDs. The format of the comma separated text file is as follows:

segment_name, vlan_id
vlan-1000, 1000
vlan-1001, 1001
vlan-1002, 1002
vlan-1003, 1003
....
....
vlan-1783, 1783

For “segment_name” you use whatever fits your naming convention. I’m saving this file as “segments.csv”:

Step 2 – Prepare the Postman request

We’re going to leverage the NSX-T Hierarchical API to create these segments by making a PATCH request to:

https://{{nsx-manager-fqdn}}/policy/api/v1/infra

Only a small piece of JSON code is needed in the request body:

{
 "resource_type": "Infra", "children": [{
 "resource_type": "ChildSegment", "marked_for_delete": "false", "Segment": {
 "resource_type": "Segment",
 "type": "DISCONNECTED",
 "id": "{{segment_name}}",
 "display_name": "{{segment_name}}",
  "vlan_ids": [
         "{{vlan_id}}"
       ],
 "path": "/infra/segments/{{segment_name}}",
 "relative_path": "{{segment_name}}",
 "parent_path": "/infra/segments/{{segment_name}}",
 "transport_zone_path": "/infra/sites/default/enforcement-points/default/transport-zones/e82afbae-c811-48e1-8946-6e1f62b67871"
 } }]
 }

As you can see the variables “{{segment_name}}” and “{{vlan_id}}” are used a couple of times in this piece of code. Their values will be fetched from the matching columns in the ”segments.csv”.

The value for “transport_zone_path” is unique in every NSX-T deployment. You can easily find the ID of your VLAN transport zone in the NSX Manager UI under System > Configuration > Fabric > Transport Zones:

Putting it all together the Postman request will look like this:

I’m saving this request as “Create NSX-T Segments with data file” in a new collection folder called “NSX-T”.

Step 3 – Start the Postman Runner

Click the Runner button to start the collection runner:

In the next screen you select the saved request:

We need to configure a couple of things for this run. The table below lists my settings:

Setting	Value	Comment
Environment	Your NSX-T environment	Have a look at this post for more information about working with Postman environments.
Iterations	784	We have 784 segments in our data file.
Data	segments.csv	The data file.

After selecting your data file you can click the Preview button just to verify that Postman is interpreting the data correctly:

Looks pretty good to me. Time to press the big button:

Running these 784 iterations will take a couple of minutes. You can monitor the progress in the “Run Results” screen:

Notice the “200 OK” status for each iteration which is the NSX-T API’s response to the requests and means it was processed successfully.

Once the Runner is finished it’s time to have a look in the NSX Manager UI under Networking > Connectivity > Segments to see if new segments have been created:

That certainly seems to be the case. All of the 784 VLAN backed segments are there and configured with the correct transport zone and VLAN ID:

Summary

Bulk creating or modifying NSX-T objects can be done in a number of different ways. If coding is your thing you’ll probably have little trouble putting together a tool for this using your preferred language. If you’re more into scripting you can use something like PowerShell. And if you like to work really slow you can always turn to the NSX Manager UI.

For everybody else there’s Postman. Using this tool in combination with data files offers an easy and quick way for creating or modifying large amounts of NSX-T objects.

You can read more about the NSX-T Policy API in the NSX Policy API: Getting Started Guide. To learn more about working with Postman data files check out this tutorial.

Have fun!

Deploying NSX-T in a Stretched Cluster – Part 2

November 12, 2019

Welcome back! I’m in the process of setting up NSX-T in a stretched cluster environment.

In part 1 I deployed the NSX manager cluster and configured the ESXi hosts as NSX transport nodes. The N-VDS was installed on the ESXi hosts and their vmkernel adapters migrated from the VDS to the N-VDS.

In this second part I will configure the NSX data plane for north-south and east-west networking. Again, there’s a lot to do so let’s begin!

The lab environment

A couple of things happened since the last time I had a look at the lab environment’s diagram:

The vSphere management cluster is now also hosting an NSX manager cluster and the ESXi hosts turned into NSX-T transport nodes.

Speaking of ESXi hosts, here’s a little closer look at one of them:

There’s now an N-VDS instead of a VDS with the three vmkernel adapters Management, vMotion, and vSAN. There are also two new vmkernel adapters which are acting as tunnel endpoints (TEPs) for the NSX overlay networking (geneve encapsulation/decapsulation).

The infrastructure for east-west networking is largely in place, but without a north-south network path this cluster is pretty isolated.

NSX Edge

The NSX Edge provides a central entrance/exit point for network traffic entering and exiting the SDDC and is exactly what this environment needs.

Deploy edge VMs

I’m deploying a total of four edge VMs (two at each site). I’ll deploy them using the Edge VM OVA package so that I can connect the edge node’s management interface to the NSX-T segment at the time of deployment.

The table below contains the deployment details for the edge VMs:

Setting	en01-a	en01-b	en02-a	en02-b
Name	en01-a	en01-b	en02-a	en02-b
Network 0	site-a-nvds01-management	site-b-nvds01-management	site-a-nvds01-management	site-b-nvds01-management
Network 1	edge-uplink1	edge-uplink1	edge-uplink1	edge-uplink1
Network 2	edge-uplink2	edge-uplink2	edge-uplink2	edge-uplink2
Network 3	not used	not used	not used	not used
Mgmt IP	172.16.41.21/24	172.16.51.21/24	172.16.41.22/24	172.16.51.22/24

Deploying the edge VM using the OVA package:

Configure edge nodes

After deployment the edge nodes need to join the management plane. For this I use the “join management-plane” NSX CLI command:

Once he edge nodes have joined the management plane, I can pick them up in the NSX Manager UI to configure each of them as Edge Transport Nodes. I’m using the following configuration details for this :

Setting	en01-a	en01-b	en02-a	en02-b
Transport Zones	tz-vlan, tz-overlay	tz-vlan, tz-overlay	tz-vlan, tz-overlay	tz-vlan, tz-overlay
N-VDS Name	nvds01	nvds01	nvds01	nvds01
Uplink Profile	up-site-a-edge	up-site-b-edge	up-site-a-edge	up-site-b-edge
IP Assignment	Use Static IP List	Use Static IP List	Use Static IP List	Use Static IP List
Static IP List	172.16.49.30,172.16.49.31	172.16.59.30,172.16.59.31	172.16.49.32,172.16.49.33	172.16.59.32,172.16.59.33
Virtual NICs	fp-eth0 – uplink-1, fp-eth1 – uplink-2	fp-eth0 – uplink-1, fp-eth1 – uplink-2	fp-eth0 – uplink-1, fp-eth1 – uplink-2	fp-eth0 – uplink-1, fp-eth1 – uplink-2

Edge transport nodes are managed under System > Fabric > Nodes > Edge Transport Nodes.

Like the ESXi hosts, all four edge nodes are now fully configured transport nodes:

Edge cluster

The edge transport nodes need to be part of an edge cluster. I will create an edge cluster called edge-cluster01 and add all four nodes to this cluster.

Edge clusters are managed under System > Fabric > Nodes > Edge Clusters:

Anti-affinity rules

The edge VMs shouldn’t be running on the same ESXi host. To prevent this from happening I create two anti-affinity rules on the vSphere cluster; one for the edge VMs at Site A and another for the edge VMs at Site B:

Groups and rules

The edge VMs should also stick to their site. For this I create two host and a two VM groups. A “virtual machine to host” rule will then make sure that the edge VMs stay pinned to their respective site.

The host group for Site A:

The VM group for the edge VMs at Site B:

The “virtual machine to host” rule keeping edge VMs belonging to Site A on the ESXi hosts of Site A:

The result of having these groups and rules in place becomes visible after some seconds. Edge VMs are running at the correct site and on seperate ESXi hosts within a site:

That pretty much completes the NSX Edge infrastructure deployment in my stretched cluster.

Routing

Now that the NSX-T Edge is in place, it’s time to set up a connection with the physical network so that packets can actually get in and out of the environment.

Tier-0 gateway

A Tier-0 gateway provides the gateway service between the logical and the physical network and is just what I need.

I’m creating my Tier-0 gateway with the following configuration details:

Setting	Value
Name	tier0-01
High Availability Mode	Active-Active
Edge Cluster	edge-cluster01
Route Re-Distribution	all

Tier-0 gateways are managed under Networking > Connectivity > Tier-0 Gateways.

Interfaces

This Tier-0 will have eight external interfaces mapped to the different edge transport nodes at the two sites. The table below shows the interfaces and their configuration details:

Name	IP Address / Mask	Connected To	Edge Node	MTU
en01-a-uplink01	172.16.47.2/24	site-a-edge-transit01	en01-a	9000
en01-a-uplink02	172.16.48.2/24	site-a-edge-transit02	en01-a	9000
en02-a-uplink01	172.16.47.3/24	site-a-edge-transit01	en02-a	9000
en02-a-uplink02	172.16.48.3/24	site-a-edge-transit02	en02-a	9000
en01-b-uplink01	172.16.57.2/24	site-b-edge-transit01	en01-b	9000
en01-b-uplink02	172.16.58.2/24	site-b-edge-transit02	en01-b	9000
en02-b-uplink01	172.16.57.3/24	site-b-edge-transit01	en02-b	9000
en02-b-uplink02	172.16.58.3/24	site-b-edge-transit02	en02-b	9000

The Tier-0 external interfaces are now configured and active:

BGP

The TORs have been configured for BGP already and now I need to set up BGP at the Tier-0 gateway too.

The BGP settings that I will use on the Tier-0 gateway are:

Setting	Value
Local AS	65000
BGP	On
Graceful Restart	Off
Inter SR iBGP	On
ECMP	On
Multipath Relax	On

Configuring BGP details on the Tier-0 gateway:

I’m adding each TOR as a BGP neighbor to the Tier-0 gateway. The following table shows the configuration details for the four BGP neighbor entries:

IP address	BFD	Remote AS	Hold Down	Keep Alive
172.16.47.1	Enabled	65001	12	4
172.16.48.1	Enabled	65001	12	4
172.16.57.1	Enabled	65002	12	4
172.16.58.1	Enabled	65002	12	4

The BGP neighbor status after the four TORs are added:

Route map

To prevent asymmetric traffic flows, the NSX Edge infrastructure at Site A should be the preferred ingress/egress point for the north-south traffic.

I achieve this by AS path prepending on the BGP paths to Site B. This is configured in a route map on the Tier-0 gateway.

First I need to create an IP prefix list. Both IP prefix lists and route maps are managed on the Tier-0 gateways under Routing:

The details of the IP prefix list:

Setting	Value
Name	any-prefix
Network	any
Action	Permit

The details of the route map:

Setting	Value
Route Map Name	siteb-route-map
Type	IP Prefix
Members	any-prefix
AS path prepend	65000 65000

The route map needs to be attached to the BGP neighbor entries belonging to Site B. I configure the route map as Out Filter and In Filter:

The Site B neighbors now have filters configured:

This completes the Tier-0 gateway deployment.

Diagram

I’m just taking a step back to have a look at what it is I actually did here.

The diagram below shows the Tier-0 gateway’s L3 connectivity with the physical network:

It’s a pretty wild diagram I’m aware, but hopefully it makes some sense.

East-West

The Tier-1 gateway is where the NSX-T segments for virtual machine networking will be connected. The Tier-1 gateway is linked to the Tier-0 gateway too, of course.

I’m creating a Tier-1 gateway with the following configuration details:

Setting	Value
Name	tier1-01
Linked Tier-0 Gateway	tier0-01
Fail Over	Non Preemptive
Edge Cluster	edge-cluster01
Route Advertisement	all

Tier-1 gateways are managed under Networking > Connectivity > Tier-1 Gateways.

Workload segments

With the Tier-1 gateway in place I can now attach some NSX-T segments for the workloads (VMs).

I’m creating three segments Web, App, and DB with the following configuration details:

Setting	Value
Connected Gateway & Type	tier1-01, flexible
Transport Zone	tz-overlay
Subnets (gateway)	10.0.1.1/24 (Web), 10.0.2.1 (App), 10.0.3.1 (DB)

Creating the segments:

I notice that downlink ports have been created on the Tier-1 gateway:

Provision VMs

It’s all about the VMs of course. So I deploy three VMs web01, app01, and db01. They are connected to the segments.

VM web01 connected to segment Web as seen at the N-VDS Visualization in the NSX Manager UI:

Connectivity test

Time to test connectivity.

East-west

First between the VMs which I place on different ESXi hosts and at different sites.

web01 (10.0.1.10) at Site B pinging db01 (10.0.3.10) at Site A:

Visualized by the Port Connection tool in the NSX Manager UI:

app01 (10.0.2.10) at Site A pinging web01 at Site B:

Once again visualized by the Port Connection tool:

East-west and cross-site logical networking seems to be working!

North-south

How about north-south? Let’s see.

db01 at Site A pinging a host on the physical network (10.2.129.86):

The Traceflow tool in the NSX Manager UI tells me a bit more about the network path. I can see that the traffic exits the SDDC through Site A (en02-a):

The other way around a traceroute from the physical network to web01 at Site B:

Traffic entering the SDDC through Site A (en01-a). Perfect!

Summary

Wow! This has been quite an exercise. Are you still there? 😉

It all started with deploying the NSX Edge (virtual) infrastructure. On top of that infrastructure I deployed a Tier-0 gateway and configured dynamic routing between the Tier-0 and the TORs.

To facilitate for east-west distributed logical networking, I deployed a Tier-1 gateway and linked it to the Tier-0. I connected some NSX-T segments to the Tier-1 gateway and some virtual machines to the segments.

Some simple connectivity testing showed that north-south and east-west networking were working as intended. Site A is consistently used for the north-south ingress/egress traffic flows thanks to the BGP AS prepending.

Thanks for staying tuned this long. I hope this and the previous article about deploying NSX-T in a stretched cluster environment have been interesting reads. I might return to this environment for some more NSX-T multisite scenarios in future articles.

Cheers!

Deploying NSX-T in a Stretched Cluster – Part 1

November 8, 2019

A stretched cluster architecture facilitates for higher levels of availability and things like inter-site load balancing. It’s a common multisite solution and also part of VMware’s Validated Design for SDDCs with multiple availability zones.

Traditionally compute networking in an active-active multisite setup has had its challenges, but with vSAN storage and NSX networking technologies that’s a thing of the past.

In the coming two articles I want to have a closer look at NSX-T in an active-active multisite environment. Specifically I want to learn more about how the different NSX-T components are deployed and how the data plane is configured in a stretched cluster.

In this first part I will deploy the NSX-T 2.5 platform and perform the necessary configurations and preparations so that in part two I can focus solely on the data plane (north-south and east-west).

This is going to be quite an exercise so let’s get right to it!

The lab environment

Below a high level overview of the lab environment as it looks right now:

A vSAN cluster consisting of eight ESXi hosts stretched to a second site. A third site is hosting the vSAN witness appliance. A completely separate vSphere management cluster is only hosting the vCenter server right now.

A quick look at the vSphere environment then. I’m running vSphere 6.7 U3:

The hosts have two physical 10Gbit NICs:

Three vmkernel adapters have been configured: Management, vMotion, and vSAN:

As mentioned, this is a vSAN stretched cluster:

The following tables list the VLANs and the associated IP subnets that are currently configured per site:

Site A:

VLAN Function	VLAN ID	Subnet	Gateway
ESXi Management	1641	172.16.41.0/24	172.16.41.253
vMotion	1642	172.16.42.0/24	172.16.42.253
vSAN	1643	172.16.43.0/24	172.16.43.253

Site B:

VLAN Function	VLAN ID	Subnet	Gateway
ESXi Management	1651	172.16.51.0/24	172.16.51.253
vMotion	1652	172.16.52.0/24	172.16.52.253
vSAN	1653	172.16.53.0/24	172.16.53.253

Witness Site:

VLAN Function	VLAN ID	Subnet	Gateway
ESXi Management	1711	172.17.11.0/24	172.17.11.253
vSAN	1713	172.17.13.0/24	172.17.13.253

Management Cluster:

VLAN Function	VLAN ID	Subnet	Gateway
SDDC Management	1611	172.16.11.0/24	172.16.11.253

NSX-T is not deployed yet, but that’s about to change pretty soon 😉

Deploying the NSX-T manager cluster

Installing NSX-T 2.5 always starts with deploying the manager cluster. It consists of three manager nodes and an optional virtual IP (VIP).

I will deploy the NSX manager cluster nodes in the vSphere management cluster and connect them to the SDDC Management VLAN (1611).

The IP plan for the NSX manager cluster looks like this:

Hostname	IP Address
nsxmanager01	172.16.11.82
nsxmanager02	172.16.11.83
nsxmanager03	172.16.11.84
nsxmanager	172.16.11.81 (virtual IP)

First manager node

I deploy the first manager node from the OVA package:

Filling out the configuration details and then kicking off the deployment.

When the first manager node is up and running I’m logging in to the NSX Manager UI:

Second and third manager nodes

The second and third manager nodes can be deployed from the NSX Manager UI. Before I can do that I need to add my vCenter server under System > Fabric > Compute Manager:

Now I’m able to deploy the second and third manager nodes via System > Appliances > Add Nodes.

Once done the three nodes are shown in the UI and the cluster connectivity is up:

Assign virtual IP address

I finalize the manager cluster deployment by configuring a virtual IP address. This is done under System > Appliances > Virtual IP:

A couple of minutes later the virtual IP is active:

Configuring the NSX-T data plane

Now that the NSX-T management plane is fully operational I will continue with the data plane preparations and configurations.

More VLANs

First I need to provision some more VLANs in the TORs at the data sites. At each site I need two VLANs for overlay and another two for connecting NSX with the physical network later on:

Site A:

VLAN Function	VLAN ID	Subnet	Gateway
Host overlay	1644	172.16.44.0/24	172.16.44.253
Uplink01	1647	172.16.47.0/24	172.16.47.253
Uplink02	1648	172.16.48.0/24	172.16.48.253
Edge overlay	1649	172.16.49.0/24	172.16.49.253

Site B:

VLAN Function	VLAN ID	Subnet	Gateway
Host overlay	1654	172.16.54.0/24	172.16.54.253
Uplink01	1657	172.16.57.0/24	172.16.57.253
Uplink02	1658	172.16.58.0/24	172.16.58.253
Edge overlay	1659	172.16.59.0/24	172.16.59.253

Transport zones

Two transport zones should do it I believe. I create them using the following details:

Name	N-VDS Name	Traffic Type
tz-vlan	nvds01	VLAN
tz-overlay	nvds01	Overlay

Transport zones are managed under System > Fabric > Transport Zones:

Uplink profiles

Next, I need to create four uplink profiles. The table below shows the configuration details for each of them:

Name	Teaming Policy	Active Uplinks	Transport VLAN	MTU
up-site-a-esxi	Load Balance Source	uplink-1, uplink-2	1644	9000
up-site-a-edge	Load Balance Source	uplink-1, uplink-2	1649	9000
up-site-b-esxi	Load Balance Source	uplink-1, uplink-2	1654	9000
up-site-b-edge	Load Balance Source	uplink-1, uplink-2	1659	9000

Uplink profiles are managed under System > Fabric > Profiles > Uplink Profiles:

In order to achieve VLAN pinning, deterministic routing, and ECMP I need to add two named teaming policies to the uplink profiles that I just created:

Name	Teaming Policy	Active Uplinks
Uplink01	Failover Order	uplink-1
Uplink02	Failover Order	uplink-2

Adding the named teaming policies to the uplink profiles:

I also need to add the Uplink01 and Uplink02 named teaming policies to transport zone tz-vlan. This so that they can be selected on segments belonging to that transport zone later on:

Network I/O Control profile

To allocate bandwidth to different types of network traffic I create a network I/O control profile. After long and hard thinking I decided to call it nioc-profile and it has the following settings:

Traffic Type / Traffic Name	Shares
Fault Tolerance (FT) Traffic	25
vSphere Replication (VR) Traffic	25
iSCSI Traffic	25
Management Traffic	50
NFS Traffic	25
vSphere Data Protection Backup Traffic	25
Virtual Machine Traffic	100
vMotion Traffic	25
vSAN Traffic	100

Network I/O control profiles are managed under System > Fabric > Profiles > NIOC Profiles:

Segments

VLAN-backed segments are needed for system, uplink/transit, and overlay traffic. The table below lists the segments with their settings that I will create:

Segment Name	Uplink & Type	Transport Zone	VLAN
site-a-nvds01-management	none	tz-vlan	1641
site-a-nvds01-vmotion	none	tz-vlan	1642
site-a-nvds01-vsan	none	tz-vlan	1643
site-a-edge-transit01	none	tz-vlan	1647
site-a-edge-transit02	none	tz-vlan	1648
site-b-nvds01-management	none	tz-vlan	1651
site-b-nvds01-vmotion	none	tz-vlan	1652
site-b-nvds01-vsan	none	tz-vlan	1653
site-b-edge-transit01	none	tz-vlan	1657
site-b-edge-transit02	none	tz-vlan	1658
edge-uplink1	none	tz-vlan	0-4094
edge-uplink2	none	tz-vlan	0-4094

Segments are managed under Networking > Connectivity > Segments:

Uplink teaming policy

The uplink teaming policy for segments edge-uplink1 and edge-uplink2 need to be modified so that the named teaming policies Uplink01 and Uplink02 are used instead of the default.

For this I have to edit these segments under Advanced Networking & Security > Networking > Switching:

Configure ESXi hosts

Now the time has come to configure the ESXi hosts and turn them into NSX-T transport nodes!

In the NSX Manager UI I navigate to System > Fabric > Nodes and change the “Managed by” to my vCenter server. The ESXi hosts are listed:

Unfortunately, I can’t make use of a transport node profiles here as these are assigned at the vSphere cluster level. I will therefore configure my hosts one at a time.

The ESXi transport nodes in Site A will be configured with the following settings:

Setting	Values
Transport Zone	tz-vlan, tz-overlay
N-VDS Name	nvds01
NIOC Profile	nioc-profile
Uplink Profile	up-site-a-esxi
LLDP Profile	LLDP [Send Packet Disabled]
IP Assignment	Use DHCP
Physical NICS	vmnic0 – uplink-1 vmnic1 – uplink-2
vmk0	site-a-nvds01-management
vmk1	site-a-nvds01-vmotion
vmk2	site-a-nvds01-vsan

ESXi transport nodes in Site B use slightly different settings:

Setting	Values
Transport Zone	tz-vlan, tz-overlay
N-VDS Name	nvds01
NIOC Profile	nioc-profile
Uplink Profile	up-site-b-esxi
LLDP Profile	LLDP [Send Packet Disabled]
IP Assignment	Use DHCP
Physical NICS	vmnic0 – uplink-1 vmnic1 – uplink-2
vmk0	site-b-nvds01-management
vmk1	site-b-nvds01-vmotion
vmk2	site-b-nvds01-vsan

Selecting one host at a time clicking Configure NSX:

The network mappings for install for vmkernel adapter migration:

When I click Finish the NSX installation and configuration process starts on the selected ESXi host. NSX bits are installed, the host receives the N-VDS, and the vmkernel adapters are migrated from VDS port groups to the N-VDS segments.

When all hosts have been configured I quickly check the status of the transport nodes:

And in vCenter I notice there’s now an N-VDS with a bunch of opaque port groups:

Summary

Most of the NSX-T platform is in place now and I think this is a good point to take a small break.

I started by deploying and configuring the NSX manager cluster (aka the central management plane). Next, I prepared the environment for the NSX data plane by provisioning some VLANs, profiles, and segments. Lastly, I prepared the ESXi hosts in the stretched cluster by installing the NSX VIBs and configuring them as NSX transport nodes. vSphere system networking (vmkernel adapters) was migrated to the N-VDS.

In the next part I will continue with the installation of the data plane and more specifically deployment and configuration of the NSX Edge as well as the logical networking components.

Stay tuned!

Single N-VDS per Edge VM

October 7, 2019
Recently a new version of the NSX-T Reference Design Guide was released. This guide, which now covers NSX-T versions 2.0 – 2.5, is a must read for anyone interested in the NSX-T solutions and their recommended design.

One of the things you’ll find in the updated guide is a new recommended deployment mode for the edge VM for NSX-T 2.5 and onwards. The new recommended design for the Edge VM looks likes this:

This new design has a couple of advantages:
- One N-VDS carrying both overlay and VLAN traffic.
- Multi-TEP configuration for load balancing of overlay traffic.
- Distribution of VLAN traffic to specific TORs for deterministic point-to-point routing adjacencies.
- No change required in the vSphere distributed port group configuration when new workload VLAN segments are added.
This “single N-VDS per Edge VM” design is only supported with NSX-T version 2.5 and above. For NSX-T version 2.4 and lower you stick with the “three N-VDS per Edge VM” design that looks like this:

Getting to the 2.5 Edge VM design

The “three N-VDS per Edge VM” design is still perfectly valid and fully supported with NSX-T 2.5.

Upgrading NSX-T from 2.x to 2.5 won’t touch your Edge VM configuration so you automatically end up with the “three N-VDS per Edge VM” design in version 2.5.

And in most cases there’s no immediate reason to start messing around with the Edge VM design in a production environment just to have it aligned with the recommended design for version 2.5.

That being said, I wanted to go through the process just to see if it could be done with acceptable data plane disruption and of course to learn a thing or two in the process. Maybe you want to follow along and perhaps learn something too. Let’s have a look at what I did.

Step 1 – Create VLAN trunking port groups

I’m using my 2.5 Edge VM design diagram above as a blueprint and the first thing that I need to do is create two new port groups on the vSphere VDS. The Edge VM design requires two port groups configured as trunks. I will call these port groups Trunk1 and Trunk2.

Starting with Trunk1:

Setting the VLAN type to VLAN trunking:

For Teaming and failover I configure Uplink 1 as the active uplink and Uplink 2 as the standby uplink:

I then create the Trunk2 port group and configure it the same way except for the Failover order which is set the other way around:

The following port groups are now available on the VDS:

The idea here is that Trunk1 and Trunk2 will replace PG-OVERLAY, PG-UPLINK1, and PG-UPLINK2.

Step 2 – Create new Tier 0 transit segments

The current “three N-VDS per Edge VM” deployment in my lab environment is using Tier 0 transit segments with VLAN ID “0”. This means that they are backed by whatever VLAN ID is specified in the PG-UPLINK1 and PG-UPLINK2 VDS port groups.

An improvement upon this is to configure the VLAN ID at the NSX-T segment level instead. In this way we keep the VLAN configuration and control of it within the NSX platform which is a good thing.

I create two new segments called vlan1613 and vlan1614 and configure them with VLAN ID 1613 and 1614 respectively:

Step 3 – Create a new NSX-T uplink profile

The way the Edge VMs connect to the physical network is different with the 2.5 Edge VM design. I need to configure a new uplink profile that contains the required configuration.

Uplink profiles are managed under System > Fabric > Profiles > Uplink Profiles:

The new uplink profile called EdgeVM-Uplink-Profile contains three teaming configurations.

The [Default Teaming] is load balancing traffic between Uplink1 and Uplink2 and facilitates the multi-TEP capability of the 2.5 Edge VM design. The two other teaming configurations, VLAN-1613-Policy and VLAN-1614-Policy, are used for the point-to-point routing adjacencies.

Step 4 – Deploy new Edge VMs

As far as I know there is no easy way to reconfigure an N-VDS setup on existing edge transport nodes. I simply deploy two new Edge VMs that eventually will replace the existing Edge VMs:

It’s at the Configure NSX step I configure the Edge VM according to the version 2.5 Edge VM design. So what does that look like? Something like this:

A single N-VDS that is associated with both an overlay and a VLAN transport zone. The EdgeVM-Uplink-Profile gives me two DPDK Fastpath interfaces that I assign to each their VDS trunk port group.

When deployment of the two new Edge VMs is finished I have the following situation under System > Fabric > Nodes > Edge Transport Nodes:

Edge nodes en03 and en04 are the new Edge transport nodes.

I add the new Edge transport nodes to the existing Edge cluster where they join en01 and en02:

Step 5 – Transition

At this point en01 and en02 are the only Edge transport nodes with logical network configuration linked to them. While en03 and en04 are members of the same Edge cluster, they are not doing much in terms of data plane services.

A diagram of the L3 topology in my lab from an NSX Edge perspective:

Transitioning to the new Edge transport nodes won’t and shouldn’t alter anything in the L3 topology above. Otherwise I would consider it a bad transition.

I’m ready to replace the current Edge transport nodes with the new ones. Unfortunately, the Replace Edge Cluster Member won’t work here as the nodes are having different configurations.

Instead I’m going to do a manual transition and in my simple lab environment that’s a pretty straight forward process. The only service hosted in the NSX Edge besides north-south routing is a DHCP server. So this should be easy.

Starting by placing the en01 transport node in maintenance mode:

Now en01 is not involved in any data plane operations anymore. With that in mind I’m feeling comfortable going ahead with the next step which is the removal of the Tier 0 interfaces that are linked to en01.

My Tier 0 gateway has an active-standby HA mode which means it can’t have its configuration mapped to more than two Edge transport nodes at a time. By deleting the configuration linked to one Edge transport node I’m making room for a new Edge transport node. One at time.

Deleting the interfaces will break the Tier 0 gateway’s en01 connection with the TORs, but this is acceptable as en01 has been placed in maintenance mode and the data plane won’t experience any disruptions.

Once the two interfaces linked to en01 have been removed we can add them again with the same name and the same IP configuration as before, but this time I link them to en03 and select the newly created transit segments:

Once done with deleting and adding interfaces there’s a kind of hybrid situation where two Edge transport nodes (en02 and en03) each with a different deployment mode are serving the same Tier 0 gateway:

And it works!

Now I repeat the same process to replace en02 with en04:
1. Place en02 in maintenance mode (en03 takes over its duties).
2. Delete Tier 0 interfaces linked to en02.
3. Add Tier 0 interfaces, link them to en04 and select the new segment
The final result is four Tier 0 gateway interfaces with the same name and IP as before, but linked to the new Edge transport nodes:

Just the DHCP service left which is pretty easy.

I have to re-configure the DHCP service so that it uses the new Edge transport nodes. This is done under Advanced Networking & Security > Networking > DHCP > Server Profiles

I edit the profile so that it only contains en03 and en04 as its members.

Step 6 – Clean up

After verifying that everything is working as it should the time has come to say goodbye to the old Edge transport nodes.

I first remove en01 and en02 from the Edge Cluster:

And then simply delete them from the fabric:

I can also delete the PG-OVERLAY, PG-UPLINK1, and PG-UPLINK2 port groups in vSphere as they are no longer needed.

This leaves the environment with the new en03 and en04 Edge transport nodes and the new NSX-T 2.5 recommended Edge VM design!

Summary

A summarization of the steps I took to go from a “three N-VDS Edge VM” design to a “single N-VDS Edge VM” design:
1. Create trunking port groups in vSphere.
2. Create new transit segments configured with VLAN ID.
3. Create new uplink profile for the Edge transport node
4. Deploy two new Edge VMs and configure them with the “single N-VDS” design.
5. Replace the existing Edge transport nodes by doing a manual transition.
6. Verify and clean up.
Quite an operation but certainly doable. It might or might not be worth the effort. It comes down to wether the advantages that this new Edge VM design offers are important enough to you.

Keep in mind that placing Edge transport nodes in maintenance mode as I did in this article will trigger a fail-over between the nodes (with active-standby mode) which in turn causes short data plane disruptions. That’s not an issue in a lab, but something to consider in a production environment. For a Tier 0 gateway with an active-active HA mode and ECMP enabled this would be less of an issue.
NSX-T Recoverability – Part 2

September 25, 2019
Welcome back! In part 1 we had a look at some NSX-T management plane failure scenarios and how to recover from them. In this part we continue to investigate NSX-T recoverability at the data plane and more specifically the NSX Edge.

Quick note

If you ever experience an issue in your NSX-T production environment, the first and only thing you should do is open a VMware support request. Highly skilled experts who are dealing with all kind of NSX-T issues on a daily basis will help you in the best possible way with your specific issue.

NSX data plane failure & recovery

Most will agree that failures at the data plane are more critical than for instance failures at the management plane. After all, the data plane is where the network packets that really matter are flowing around. Failures at the data plane can potentially impact service availability.

Luckily, the NSX data plane is robust by design. Largely distributed and where it’s centralized it’s also clustered. Combine this with a proper design for the physical and logical components and you’re looking at a pretty solid solution.

But sure, things can break down and when they do it’s important to understand how to get back on track again.

The lab environment

We’re still using the same small lab environment as in part 1. I just added a VM in the compute cluster for today’s article. Below is a diagram showing the main components from a high level perspective.

The NSX Edge

The NSX Edge is a centralized, often clustered, component. It provides a range of gateway services, but one of its main responsibilities is routing traffic between NSX logical networks and the physical network.

The worker bees of the NSX Edge are the edge nodes. They are available in two form factors (virtual machine and bare metal) and are organized in one or more edge clusters.

In my lab environment the NSX Edge consists of of two edge node VMs and one edge node cluster.
Let’s have a quick look at the deployment details of one these edge node VMs.

A pretty common NSX-T 2.4 edge node deployment configuration for the VM form factor.

Below the layer 3 topology running on top of the NSX Edge.

As you can see the layer 3 network is making good use of the lower layer’s redundant paths.

Lastly, the Tier 0 gateway in this lab has been set up with an Active-Standby HA mode.

Current state of the NSX Edge

Life is good at the edge. The edge node VMs are up and running.

The edge transport nodes configuration state and node status are looking good.

The Tier 0 gateway’s BGP summary shows that BGP connections are established with both of the TORs.

The Tier 0 gateway’s routing table contains IP routes advertised by the TORs through BGP.

And last but not least the VM in the compute cluster can access the physical network. A “traceroute” to the PING host on the physical network shows that traffic is routed to TOR-Right (172.16.14.253) at the moment:

North-south networking is running beautifully! What can possibly go wrong on a day like this?

TOR down

Not exactly an NSX Edge failure, but definitely a failure scenario that concerns the NSX Edge.

TOR-Right broke down. What’s the impact? Let’s have a look.

The BGP summary indeed shows us that we’ve lost connection with TOR-Right. BGP connections with TOR-Left are still intact though.

The Tier 0’s routing table now only contains BGP routes advertised by TOR-Left.

All of this is expected, but how is the data plane affected by this TOR failure?

It seems to be working fine. Sure, the “traceroute” reveals that traffic is now passing through TOR-Left (172.16.13.254), but that’s about it.

The redundant infrastructure and BGP making use of that ensured that this TOR failure had minimal impact on the NSX data plane.

TOR down recovery

Basically we would just rack and stack a new TOR, configure it, and restore redundancy. The only thing we need to do within NSX is verify that the BGP connections are restored.

Edge node down

Last time I checked there were two edge node VMs in that Edge cluster. en01 is gone!

What’s the impact? How do we recover?

Let’s first investigate the impact this failure has on the north-south traffic.

Alright, none whatsoever.

The VM can still reach the physical network. The surviving edge transport node must have taken over the duties of the failed node.

But of course, the NSX Edge is now running on a single edge transport node and NSX Manager clearly shows us that we are dealing with a degraded state.

Without a standby node we’re living on the edge (pun intended). We need that second transport node up and running again.

Edge node down recovery

In a situation like this it’s good to remember that there’s nothing unique about an edge node. During its lifetime it is much like a container receiving and executing configuration from the management plane. In other words, losing the edge node is in itself nothing traumatic. We just need to get a new one.

The first step when recovering from a permanent edge node failure is to deploy a new edge node. Once it’s deployed three edge transport nodes are listed in the NSX Manager UI.
- en01 with status “Unknown” is the node that is missing.
- en02 with status “Degraded” because it can’t find its HA buddy.
- en03 with status “Up” is alive and happy but not doing much.
The second step is to tell the management plane that we want to replace the missing edge transport node with the one we just deployed.

This is done under Edge Clusters in the NSX Manager UI (or via the API).

After clicking the small gear icon we select Replace Edge Cluster Member. This starts the process of re-mapping logical network configuration from one edge transport node to another.

In our scenario we want to re-map from en01 to en03.

If the edge transport node would still be operational, we would put it in maintenance mode here to minimize data plane disruptions. In our failure scenario the node is already gone so maintenance mode is not relevant.

After clicking Save the management plane comes into action and links configurations and other related logical network constructs to the new edge transport node.

Once the process is done we can delete the orphaned edge transport node and after a minute or so we’re seeing two healthy edge transport nodes again.

A look at the Tier 0’s logical router ports shows us what happened.

Two of the logical router ports previously mapped to en01 have been relocated to en03.

BGP connections are established again.

Replacement successful! The fabric’s state is restored to normal operations.

Summary

Today we looked at two failure scenarios concerning the NSX Edge:
- Failure of a top-of-rack switch
  - limited impact on the data plane
- Failure of an NSX edge node
  1. deploy new edge node
  2. run edge transport node replace process
  3. remove orphaned edge transport node from fabric
Not too bad. This is a small environment, but the recovery procedures will be largely the same regardless of environment size.

Sure, more things can break. An ESXi host hosting an edge node, a physical NIC, cables, and so on. The bottom line is that unless we’re dealing with a complete meltdown, a properly designed NSX Edge will minimize the impact of component failure and make recovery a piece of cake.
Installing NSX Intelligence

September 21, 2019
With NSX-T 2.5 comes NSX Intelligence 1.0. This component, which is part of NSX Data Center Enterprise Plus, is something I’ve been looking forward to since it was announced.

NSX Intelligence adds a powerful analytics engine to the NSX-T platform. It provides workload and network context that is unique to NSX. Application owners and operations people can use the NSX Intelligence interface for configuration and monitoring.

Besides the NSX Intelligence data platform itself, this 1.0 release provides visualization and security rule and grouping recommendations.

Cool stuff. Let’s have a look at how to get it up and running.

Installation preparations

The preparation and installation steps are explained in detail in the official installation documentation. I strongly recommend you follow these guides when installing NSX Intelligence. Some things to point out:
- NSX Intelligence 1.0 requires NSX-T version 2.5. The first thing I had to do was upgrade my NSX-T lab to version 2.5. In a production environment the 2.5 upgrade requires its own planning and preparations of course
- The NSX Intelligence installation comes as a tar-file. Its contents need to be extracted and placed on a web server somewhere that can be accessed by your NSX Manager cluster.
- The NSX Intelligence appliance must be deployed on ESXi managed by vCenter.
Installation

Once the environment is prepared we can start the NSX Intelligence installation.

In NSX Manager navigate to Plan & Troubleshoot > Discover & Take Action:

Click on Go to system, scroll down on the Appliances page and click Add NSX Intelligence Appliance. This starts the appliance deployment wizard:

Enter the URL to the OVF file and the appliance network configuration:

I’m deploying the small NSX Intelligence appliance which is suitable for labs or PoCs. For a production environment you would select the large form factor.

In the next step we configure the vSphere details for the virtual appliance:

Configure the appliance credentials at the third and final step:

Click on Install Appliance to start the deployment:

Deployment took about 5 minutes to complete in my lab environment.

First look

Although it’s a separate virtual appliance, the NSX Intelligence UI seamlessly integrates with the NSX Manager UI. It can be found under Plan & Troubleshoot > Discover & Take Action.

The two objects we can work with here are virtual machines and groups:

We can choose to display only certain VMs/groups or all:

And apply a filter based on tags, flows, and rules:

After powering on two Windows VMs it took about 20 seconds before NSX Intelligence engine started to draw the communication paths of these VMs. Impressive!

In full screen mode you can switch to dark mode. Much appreciated.

To get actual firewall rule recommendations you need to start a new recommendation process:

After clicking the Start Recommendation button you can configure some parameters. Time range being the most important:

Click on Start Discovery to kick off the recommendation process. This process can be monitored under Recommendations:

Once done analyzing the recommended rules, groups, and services can be reviewed and modified:

At step 2 we choose placement for the new recommendation based security policy:

Clicking on Publish will create the objects and enforce the security policy:

The recommended rules are in place:

Summary

Installing NSX Intelligence is a straight forward process (apart from its web based OVF installation requiring a web server).

We took the NSX Intelligence engine for a really quick test drive and deployed some recommended firewall rules including service and group objects minutes after deployment (a longer period for analyzing is strongly recommended).

Even as version “1.0” NSX Intelligence is going to make micro segmentation very much easier and very much faster. It’s a big step towards self-driving micro segmentation operations. No to mention the slick visualization and visibility it gives us for our VMs communication paths.