Deploying NSX-T in a Stretched Cluster – Part 2

Welcome back! I’m in the process of setting up NSX-T in a stretched cluster environment.

In part 1 I deployed the NSX manager cluster and configured the ESXi hosts as NSX transport nodes. The N-VDS was installed on the ESXi hosts and their vmkernel adapters migrated from the VDS to the N-VDS.

In this second part I will configure the NSX data plane for north-south and east-west networking. Again, there’s a lot to do so let’s begin!

The lab environment

A couple of things happened since the last time I had a look at the lab environment’s diagram:

The vSphere management cluster is now also hosting an NSX manager cluster and the ESXi hosts turned into NSX-T transport nodes.

Speaking of ESXi hosts, here’s a little closer look at one of them:

There’s now an N-VDS instead of a VDS with the three vmkernel adapters Management, vMotion, and vSAN. There are also two new vmkernel adapters which are acting as tunnel endpoints (TEPs) for the NSX overlay networking (geneve encapsulation/decapsulation).

The infrastructure for east-west networking is largely in place, but without a north-south network path this cluster is pretty isolated.

NSX Edge

The NSX Edge provides a central entrance/exit point for network traffic entering and exiting the SDDC and is exactly what this environment needs.

Deploy edge VMs

I’m deploying a total of four edge VMs (two at each site). I’ll deploy them using the Edge VM OVA package so that I can connect the edge node’s management interface to the NSX-T segment at the time of deployment.

The table below contains the deployment details for the edge VMs:

Network 0site-a-nvds01-managementsite-b-nvds01-managementsite-a-nvds01-managementsite-b-nvds01-management
Network 1edge-uplink1edge-uplink1edge-uplink1edge-uplink1
Network 2edge-uplink2edge-uplink2edge-uplink2edge-uplink2
Network 3not usednot usednot usednot used
Mgmt IP172.16.41.21/24172.16.51.21/24172.16.41.22/24172.16.51.22/24

Deploying the edge VM using the OVA package:

ovf edge vm deployment

Configure edge nodes

After deployment the edge nodes need to join the management plane. For this I use the “join management-plane” NSX CLI command:

cli join

Once he edge nodes have joined the management plane, I can pick them up in the NSX Manager UI to configure each of them as Edge Transport Nodes. I’m using the following configuration details for this :

Transport Zonestz-vlan, tz-overlaytz-vlan, tz-overlaytz-vlan, tz-overlaytz-vlan, tz-overlay
N-VDS Namenvds01nvds01nvds01nvds01
Uplink Profileup-site-a-edgeup-site-b-edgeup-site-a-edgeup-site-b-edge
IP AssignmentUse Static IP ListUse Static IP ListUse Static IP ListUse Static IP List
Static IP List172.16.49.30,,,,
Virtual NICsfp-eth0 – uplink-1,
fp-eth1 – uplink-2
fp-eth0 – uplink-1,
fp-eth1 – uplink-2
fp-eth0 – uplink-1,
fp-eth1 – uplink-2
fp-eth0 – uplink-1,
fp-eth1 – uplink-2

Edge transport nodes are managed under System > Fabric > Nodes > Edge Transport Nodes.

en01-a transport node configuration

Like the ESXi hosts, all four edge nodes are now fully configured transport nodes:

edge transport nodes

Edge cluster

The edge transport nodes need to be part of an edge cluster. I will create an edge cluster called edge-cluster01 and add all four nodes to this cluster.

Edge clusters are managed under System > Fabric > Nodes > Edge Clusters:

Anti-affinity rules

The edge VMs shouldn’t be running on the same ESXi host. To prevent this from happening I create two anti-affinity rules on the vSphere cluster; one for the edge VMs at Site A and another for the edge VMs at Site B:

vm/host rule

Groups and rules

The edge VMs should also stick to their site. For this I create two host and a two VM groups. A “virtual machine to host” rule will then make sure that the edge VMs stay pinned to their respective site.

The host group for Site A:

host group

The VM group for the edge VMs at Site B:

vm group

The “virtual machine to host” rule keeping edge VMs belonging to Site A on the ESXi hosts of Site A:

vm to host rule

The result of having these groups and rules in place becomes visible after some seconds. Edge VMs are running at the correct site and on seperate ESXi hosts within a site:

correctly placed VMs

That pretty much completes the NSX Edge infrastructure deployment in my stretched cluster.


Now that the NSX-T Edge is in place, it’s time to set up a connection with the physical network so that packets can actually get in and out of the environment.

Tier-0 gateway

A Tier-0 gateway provides the gateway service between the logical and the physical network and is just what I need.

I’m creating my Tier-0 gateway with the following configuration details:

High Availability ModeActive-Active
Edge Clusteredge-cluster01
Route Re-Distributionall

Tier-0 gateways are managed under Networking > Connectivity > Tier-0 Gateways.

tier-0 gateway


This Tier-0 will have eight external interfaces mapped to the different edge transport nodes at the two sites. The table below shows the interfaces and their configuration details:

NameIP Address / MaskConnected ToEdge NodeMTU

The Tier-0 external interfaces are now configured and active:

tier-0 interfaces


The TORs have been configured for BGP already and now I need to set up BGP at the Tier-0 gateway too.

The BGP settings that I will use on the Tier-0 gateway are:

Local AS65000
Graceful RestartOff
Inter SR iBGPOn
Multipath RelaxOn

Configuring BGP details on the Tier-0 gateway:

I’m adding each TOR as a BGP neighbor to the Tier-0 gateway. The following table shows the configuration details for the four BGP neighbor entries:

IP addressBFDRemote ASHold DownKeep Alive

The BGP neighbor status after the four TORs are added:

bgp nrighbors

Route map

To prevent asymmetric traffic flows, the NSX Edge infrastructure at Site A should be the preferred ingress/egress point for the north-south traffic.

I achieve this by AS path prepending on the BGP paths to Site B. This is configured in a route map on the Tier-0 gateway.

First I need to create an IP prefix list. Both IP prefix lists and route maps are managed on the Tier-0 gateways under Routing:

route maps

The details of the IP prefix list:


The details of the route map:

Route Map Namesiteb-route-map
TypeIP Prefix
AS path prepend65000 65000

The route map needs to be attached to the BGP neighbor entries belonging to Site B. I configure the route map as Out Filter and In Filter:

route map out filter

The Site B neighbors now have filters configured:

filters configured for site b

This completes the Tier-0 gateway deployment.


I’m just taking a step back to have a look at what it is I actually did here.

The diagram below shows the Tier-0 gateway’s L3 connectivity with the physical network:

tier-0 bgp

It’s a pretty wild diagram I’m aware, but hopefully it makes some sense.


The Tier-1 gateway is where the NSX-T segments for virtual machine networking will be connected. The Tier-1 gateway is linked to the Tier-0 gateway too, of course.

I’m creating a Tier-1 gateway with the following configuration details:

Linked Tier-0 Gatewaytier0-01
Fail OverNon Preemptive
Edge Clusteredge-cluster01
Route Advertisementall

Tier-1 gateways are managed under Networking > Connectivity > Tier-1 Gateways.

tier-1 gateway

Workload segments

With the Tier-1 gateway in place I can now attach some NSX-T segments for the workloads (VMs).

I’m creating three segments Web, App, and DB with the following configuration details:

Connected Gateway & Typetier1-01, flexible
Transport Zonetz-overlay
Subnets (gateway) (Web), (App), (DB)

Creating the segments:


I notice that downlink ports have been created on the Tier-1 gateway:

downlink ports

Provision VMs

It’s all about the VMs of course. So I deploy three VMs web01, app01, and db01. They are connected to the segments.

VM web01 connected to segment Web as seen at the N-VDS Visualization in the NSX Manager UI:


Connectivity test

Time to test connectivity.


First between the VMs which I place on different ESXi hosts and at different sites.

web01 ( at Site B pinging db01 ( at Site A:

web01 pings db01

Visualized by the Port Connection tool in the NSX Manager UI:

port connection

app01 ( at Site A pinging web01 at Site B:

app01 pings web01

Once again visualized by the Port Connection tool:

port connection

East-west and cross-site logical networking seems to be working!


How about north-south? Let’s see.

db01 at Site A pinging a host on the physical network (

db01 pings physical

The Traceflow tool in the NSX Manager UI tells me a bit more about the network path. I can see that the traffic exits the SDDC through Site A (en02-a):


The other way around a traceroute from the physical network to web01 at Site B:

traceroute from physical

Traffic entering the SDDC through Site A (en01-a). Perfect!


Wow! This has been quite an exercise. Are you still there? 😉

It all started with deploying the NSX Edge (virtual) infrastructure. On top of that infrastructure I deployed a Tier-0 gateway and configured dynamic routing between the Tier-0 and the TORs.

To facilitate for east-west distributed logical networking, I deployed a Tier-1 gateway and linked it to the Tier-0. I connected some NSX-T segments to the Tier-1 gateway and some virtual machines to the segments.

Some simple connectivity testing showed that north-south and east-west networking were working as intended. Site A is consistently used for the north-south ingress/egress traffic flows thanks to the BGP AS prepending.

Thanks for staying tuned this long. I hope this and the previous article about deploying NSX-T in a stretched cluster environment have been interesting reads. I might return to this environment for some more NSX-T multisite scenarios in future articles.



  1. Manoj g

    You have deployed completely on the Site A and Site B , but what about the Witness site . For the witness site the reach-ability be from traditional network or we can implement it on the NSX-T.

    Just curious to know.

    Thank you


    • rutgerblom

      Hi Manoj,

      NSX-T was not deployed at the witness site as it was a site dedicated for the witness functionality.

      Thank you for reading my blog.



  2. Florian Meier

    Thank you for the really nice blog! As I understand you route everything over the DCI except the SDDC Management VLAN, this one is really streched between both sites (L2), right?

    Where do you attach the two Host-Overlay vlans, vmotion vlans and esx vlans for routing between the to datacenter sites in real world?

    Thanks for help


    • rutgerblom

      Thanks for reading my blog.

      It will depend on your network topology but with a leaf-spine topology these VLANs would be terminated in the leaves (ToRs) and traffic routed to a spine. From that spine traffic would then be routed to the leaf where the destination host is connected.
      Hope this makes sense.


      • Florian Meier

        Thanks for clarification but then for me its unclear why you split the esxi per site into different vlans?
        You really want to route the HA traffic between the datacenters? Isnt it the better way to just move all esxi which are on the same vcenter per site into the same vlan?


      • rutgerblom

        In this case the scenario is a stretched cluster. There will be some traffic (L3 and L2) crossing the data center interconnect as a result of that design. The primary reason for using different VLANs per site is to create smaller layer 2 fault domains.
        On a side note this is a vSAN stretched cluster and HA traffic goes over the vSAN network.


  3. Richard Granados-Rueda

    Hi, thanks for your brilliant guide on stretched clusters. One thing that is still puzzling me is say for example CustomerA had a presence at Site A with some VM’s and they also have some VM’s at Site B, is it possible for these customer VM’s to be on the same subnet? I’m just thinking from a failure scenario of the Site A hardware or a loss of the hardware at Site A for whatever reason – of course they would need the relevant vSAN policy applied to be able to tolerate this level of failure.

    Any ideas would be much appreciated.



    • rutgerblom

      Thanks Richard,
      The VMs can be on the same logical network regardless of on which site they’re running. This is accomplished using Geneve overlay in NSX-T.


  4. Pankaj


    The North-South trace shows Site-B Tier-0 in the path. Is there a reason for that? Also the VMware documentation says with active-active setup Tier-0 plums to all the Edge nodes in a cluster with ECMP. Can that be overcome with inbound as-path prepend that you have applied?

    Thank you.


    • rutgerblom


      The traceflow was done from a VM on site B. I realize that wasn’t clear at all.
      By AS-path prepending at site B, inbound traffic will enter through site A (under normal circumstances).



  5. LAB


    briliant work, thx…

    Do you have plans to update this lab solution with newest version of NSX 3.1

    I have plans to deploy NSX-T 3.1 to the prod system with existing vSAN 7 with stretched cluster.
    So, it has VDS 6.6 and where is no any management cluster, so just one consolidated compute cluster with management components like vCenter, NSX management cluster and Edge must be implimented with a 10 ESXi hosts.

    Any suggestions welcome…


    • rutgerblom

      Thanks. That should be fine. Any reason the VDS is still on 6.6? How many pNICs do your host have?


  6. Ben

    Thanks for the blog post, very helpful. Because the T0 is configured in ECMP will the traffic not traverse all links within the edge cluster regardless of the site?


    • rutgerblom

      Hi Ben. With this particular set up that’s indeed the case for traffic from NSX to the physical network. Incoming traffic will be routed to the site with the shortest BGP AS path. If you would also like to pin the outgoing traffic to a certain site you can do so by using route priorities.


  7. Andreas Wirén

    Hi Rutger, thanks for a very good guide on this. We are deploying a VCF solution and are having trouble understanding where to route the vSAN and vMotion VLANs. Is your suggestion to do this L3-routing in the ToR instead of doing it through NSX overlay?


    • rutgerblom

      Thanks Andreas.
      This will be dictated by VCF and depends on the version of VCF you are deploying and some of the design decisions. In the latest versions you will have everything routed except for the management network which is still stretched. So with VCF, regardless of what you and I think is the best way of doing it, it’s very important to stick to that intended design for supportability reasons.




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.