• A stretched cluster architecture facilitates for higher levels of availability and things like inter-site load balancing. It’s a common multisite solution and also part of VMware’s Validated Design for SDDCs with multiple availability zones.

    Traditionally compute networking in an active-active multisite setup has had its challenges, but with vSAN storage and NSX networking technologies that’s a thing of the past.

    In the coming two articles I want to have a closer look at NSX-T in an active-active multisite environment. Specifically I want to learn more about how the different NSX-T components are deployed and how the data plane is configured in a stretched cluster.

    In this first part I will deploy the NSX-T 2.5 platform and perform the necessary configurations and preparations so that in part two I can focus solely on the data plane (north-south and east-west).

    This is going to be quite an exercise so let’s get right to it!

    The lab environment

    Below a high level overview of the lab environment as it looks right now:

    lab environment

    A vSAN cluster consisting of eight ESXi hosts stretched to a second site. A third site is hosting the vSAN witness appliance. A completely separate vSphere management cluster is only hosting the vCenter server right now.

    A quick look at the vSphere environment then. I’m running vSphere 6.7 U3:

    vcenter

    The hosts have two physical 10Gbit NICs:

    physical nics

    Three vmkernel adapters have been configured: Management, vMotion, and vSAN:

    vmkernel adapters

    As mentioned, this is a vSAN stretched cluster:

    vsan stretched cluster

    The following tables list the VLANs and the associated IP subnets that are currently configured per site:

    Site A:

    VLAN FunctionVLAN IDSubnetGateway
    ESXi Management1641172.16.41.0/24172.16.41.253
    vMotion1642172.16.42.0/24172.16.42.253
    vSAN1643172.16.43.0/24172.16.43.253

    Site B:

    VLAN FunctionVLAN IDSubnetGateway
    ESXi Management1651172.16.51.0/24172.16.51.253
    vMotion1652172.16.52.0/24172.16.52.253
    vSAN1653172.16.53.0/24172.16.53.253

    Witness Site:

    VLAN FunctionVLAN IDSubnetGateway
    ESXi Management1711172.17.11.0/24172.17.11.253
    vSAN1713172.17.13.0/24172.17.13.253

    Management Cluster:

    VLAN FunctionVLAN IDSubnetGateway
    SDDC Management1611172.16.11.0/24172.16.11.253

    NSX-T is not deployed yet, but that’s about to change pretty soon 😉

    Deploying the NSX-T manager cluster

    Installing NSX-T 2.5 always starts with deploying the manager cluster. It consists of three manager nodes and an optional virtual IP (VIP).

    I will deploy the NSX manager cluster nodes in the vSphere management cluster and connect them to the SDDC Management VLAN (1611).

    The IP plan for the NSX manager cluster looks like this:

    HostnameIP Address
    nsxmanager01172.16.11.82
    nsxmanager02172.16.11.83
    nsxmanager03172.16.11.84
    nsxmanager172.16.11.81 (virtual IP)

    First manager node

    I deploy the first manager node from the OVA package:

    first nsx manager node

    Filling out the configuration details and then kicking off the deployment.

    When the first manager node is up and running I’m logging in to the NSX Manager UI:

    first nsx manager ui login

    Second and third manager nodes

    The second and third manager nodes can be deployed from the NSX Manager UI. Before I can do that I need to add my vCenter server under System > Fabric > Compute Manager:

    compute manager added

    Now I’m able to deploy the second and third manager nodes via System > Appliances > Add Nodes.

    Once done the three nodes are shown in the UI and the cluster connectivity is up:

    three nodes deployed

    Assign virtual IP address

    I finalize the manager cluster deployment by configuring a virtual IP address. This is done under System > Appliances > Virtual IP:

    change vip

    A couple of minutes later the virtual IP is active:

    vip configured

    Configuring the NSX-T data plane

    Now that the NSX-T management plane is fully operational I will continue with the data plane preparations and configurations.

    More VLANs

    First I need to provision some more VLANs in the TORs at the data sites. At each site I need two VLANs for overlay and another two for connecting NSX with the physical network later on:

    Site A:

    VLAN FunctionVLAN IDSubnetGateway
    Host overlay1644172.16.44.0/24172.16.44.253
    Uplink011647172.16.47.0/24172.16.47.253
    Uplink021648172.16.48.0/24172.16.48.253
    Edge overlay1649172.16.49.0/24172.16.49.253

    Site B:

    VLAN FunctionVLAN IDSubnetGateway
    Host overlay1654172.16.54.0/24172.16.54.253
    Uplink011657172.16.57.0/24172.16.57.253
    Uplink021658172.16.58.0/24172.16.58.253
    Edge overlay1659172.16.59.0/24172.16.59.253

    Transport zones

    Two transport zones should do it I believe. I create them using the following details:

    NameN-VDS NameTraffic Type
    tz-vlannvds01VLAN
    tz-overlaynvds01Overlay

    Transport zones are managed under System > Fabric > Transport Zones:

    transport zones

    Uplink profiles

    Next, I need to create four uplink profiles. The table below shows the configuration details for each of them:

    NameTeaming PolicyActive UplinksTransport VLANMTU
    up-site-a-esxiLoad Balance Sourceuplink-1, uplink-2 16449000
    up-site-a-edgeLoad Balance Sourceuplink-1, uplink-2 16499000
    up-site-b-esxiLoad Balance Sourceuplink-1, uplink-2 16549000
    up-site-b-edgeLoad Balance Sourceuplink-1, uplink-2 16599000

    Uplink profiles are managed under System > Fabric > Profiles > Uplink Profiles:

    uplink profiles

    In order to achieve VLAN pinning, deterministic routing, and ECMP I need to add two named teaming policies to the uplink profiles that I just created:

    NameTeaming PolicyActive Uplinks
    Uplink01Failover Orderuplink-1
    Uplink02Failover Orderuplink-2

    Adding the named teaming policies to the uplink profiles:

    add named teaming policies

    I also need to add the Uplink01 and Uplink02 named teaming policies to transport zone tz-vlan. This so that they can be selected on segments belonging to that transport zone later on:

    named teaming policies to transport zone.

    Network I/O Control profile

    To allocate bandwidth to different types of network traffic I create a network I/O control profile. After long and hard thinking I decided to call it nioc-profile and it has the following settings:

    Traffic Type / Traffic NameShares
    Fault Tolerance (FT) Traffic25
    vSphere Replication (VR) Traffic25
    iSCSI Traffic25
    Management Traffic50
    NFS Traffic25
    vSphere Data Protection Backup Traffic 25
    Virtual Machine Traffic100
    vMotion Traffic 25
    vSAN Traffic 100

    Network I/O control profiles are managed under System > Fabric > Profiles > NIOC Profiles:

    nioc profile

    Segments

    VLAN-backed segments are needed for system, uplink/transit, and overlay traffic. The table below lists the segments with their settings that I will create:

    Segment NameUplink & TypeTransport ZoneVLAN
    site-a-nvds01-managementnonetz-vlan1641
    site-a-nvds01-vmotionnonetz-vlan1642
    site-a-nvds01-vsannonetz-vlan1643
    site-a-edge-transit01nonetz-vlan1647
    site-a-edge-transit02nonetz-vlan1648
    site-b-nvds01-managementnonetz-vlan1651
    site-b-nvds01-vmotionnonetz-vlan1652
    site-b-nvds01-vsannonetz-vlan1653
    site-b-edge-transit01nonetz-vlan1657
    site-b-edge-transit02nonetz-vlan1658
    edge-uplink1nonetz-vlan0-4094
    edge-uplink2nonetz-vlan0-4094

    Segments are managed under Networking > Connectivity > Segments:

    vlan-backed segments

    Uplink teaming policy

    The uplink teaming policy for segments edge-uplink1 and edge-uplink2 need to be modified so that the named teaming policies Uplink01 and Uplink02 are used instead of the default.

    For this I have to edit these segments under Advanced Networking & Security > Networking > Switching:

    change segment teaming

    Configure ESXi hosts

    Now the time has come to configure the ESXi hosts and turn them into NSX-T transport nodes!

    In the NSX Manager UI I navigate to System > Fabric > Nodes and change the “Managed by” to my vCenter server. The ESXi hosts are listed:

    unconfigured hosts

    Unfortunately, I can’t make use of a transport node profiles here as these are assigned at the vSphere cluster level. I will therefore configure my hosts one at a time.

    The ESXi transport nodes in Site A will be configured with the following settings:

    SettingValues
    Transport Zonetz-vlan, tz-overlay
    N-VDS Namenvds01
    NIOC Profilenioc-profile
    Uplink Profileup-site-a-esxi
    LLDP ProfileLLDP [Send Packet Disabled]
    IP AssignmentUse DHCP
    Physical NICSvmnic0 – uplink-1
    vmnic1 – uplink-2
    vmk0site-a-nvds01-management
    vmk1site-a-nvds01-vmotion
    vmk2site-a-nvds01-vsan

    ESXi transport nodes in Site B use slightly different settings:

    SettingValues
    Transport Zonetz-vlan, tz-overlay
    N-VDS Namenvds01
    NIOC Profilenioc-profile
    Uplink Profileup-site-b-esxi
    LLDP ProfileLLDP [Send Packet Disabled]
    IP AssignmentUse DHCP
    Physical NICSvmnic0 – uplink-1
    vmnic1 – uplink-2
    vmk0site-b-nvds01-management
    vmk1site-b-nvds01-vmotion
    vmk2site-b-nvds01-vsan

    Selecting one host at a time clicking Configure NSX:

    configure nsx

    The network mappings for install for vmkernel adapter migration:

    network mappings for install

    When I click Finish the NSX installation and configuration process starts on the selected ESXi host. NSX bits are installed, the host receives the N-VDS, and the vmkernel adapters are migrated from VDS port groups to the N-VDS segments.

    When all hosts have been configured I quickly check the status of the transport nodes:

    transport node status

    And in vCenter I notice there’s now an N-VDS with a bunch of opaque port groups:

    n-vds installed

    Summary

    Most of the NSX-T platform is in place now and I think this is a good point to take a small break.

    I started by deploying and configuring the NSX manager cluster (aka the central management plane). Next, I prepared the environment for the NSX data plane by provisioning some VLANs, profiles, and segments. Lastly, I prepared the ESXi hosts in the stretched cluster by installing the NSX VIBs and configuring them as NSX transport nodes. vSphere system networking (vmkernel adapters) was migrated to the N-VDS.

    In the next part I will continue with the installation of the data plane and more specifically deployment and configuration of the NSX Edge as well as the logical networking components.

    Stay tuned!

  • Recently a new version of the NSX-T Reference Design Guide was released. This guide, which now covers NSX-T versions 2.0 – 2.5, is a must read for anyone interested in the NSX-T solutions and their recommended design.

    One of the things you’ll find in the updated guide is a new recommended deployment mode for the edge VM for NSX-T 2.5 and onwards. The new recommended design for the Edge VM looks likes this:

    one n-vds edge vm

    This new design has a couple of advantages:

    • One N-VDS carrying both overlay and VLAN traffic.
    • Multi-TEP configuration for load balancing of overlay traffic.
    • Distribution of VLAN traffic to specific TORs for deterministic point-to-point routing adjacencies.
    • No change required in the vSphere distributed port group configuration when new workload VLAN segments are added.

    This “single N-VDS per Edge VM” design is only supported with NSX-T version 2.5 and above. For NSX-T version 2.4 and lower you stick with the “three N-VDS per Edge VM” design that looks like this:

    three n-vds edge vm

    Getting to the 2.5 Edge VM design

    The “three N-VDS per Edge VM” design is still perfectly valid and fully supported with NSX-T 2.5.

    Upgrading NSX-T from 2.x to 2.5 won’t touch your Edge VM configuration so you automatically end up with the “three N-VDS per Edge VM” design in version 2.5.

    And in most cases there’s no immediate reason to start messing around with the Edge VM design in a production environment just to have it aligned with the recommended design for version 2.5.

    That being said, I wanted to go through the process just to see if it could be done with acceptable data plane disruption and of course to learn a thing or two in the process. Maybe you want to follow along and perhaps learn something too. Let’s have a look at what I did.

    Step 1 – Create VLAN trunking port groups

    I’m using my 2.5 Edge VM design diagram above as a blueprint and the first thing that I need to do is create two new port groups on the vSphere VDS. The Edge VM design requires two port groups configured as trunks. I will call these port groups Trunk1 and Trunk2.

    Starting with Trunk1:

    trunk 1

    Setting the VLAN type to VLAN trunking:

    VLAN trunking

    For Teaming and failover I configure Uplink 1 as the active uplink and Uplink 2 as the standby uplink:

    uplink 1 active

    I then create the Trunk2 port group and configure it the same way except for the Failover order which is set the other way around:

    uplink 2 active

    The following port groups are now available on the VDS:

    The idea here is that Trunk1 and Trunk2 will replace PG-OVERLAY, PG-UPLINK1, and PG-UPLINK2.

    Step 2 – Create new Tier 0 transit segments

    The current “three N-VDS per Edge VM” deployment in my lab environment is using Tier 0 transit segments with VLAN ID “0”. This means that they are backed by whatever VLAN ID is specified in the PG-UPLINK1 and PG-UPLINK2 VDS port groups.

    An improvement upon this is to configure the VLAN ID at the NSX-T segment level instead. In this way we keep the VLAN configuration and control of it within the NSX platform which is a good thing.

    I create two new segments called vlan1613 and vlan1614 and configure them with VLAN ID 1613 and 1614 respectively:

    new transit segments

    Step 3 – Create a new NSX-T uplink profile

    The way the Edge VMs connect to the physical network is different with the 2.5 Edge VM design. I need to configure a new uplink profile that contains the required configuration.

    Uplink profiles are managed under System > Fabric > Profiles > Uplink Profiles:

    The new uplink profile called EdgeVM-Uplink-Profile contains three teaming configurations.

    edge uplink profile

    The [Default Teaming] is load balancing traffic between Uplink1 and Uplink2 and facilitates the multi-TEP capability of the 2.5 Edge VM design. The two other teaming configurations, VLAN-1613-Policy and VLAN-1614-Policy, are used for the point-to-point routing adjacencies.

    Step 4 – Deploy new Edge VMs

    As far as I know there is no easy way to reconfigure an N-VDS setup on existing edge transport nodes. I simply deploy two new Edge VMs that eventually will replace the existing Edge VMs:

    deploy new edge vms

    It’s at the Configure NSX step I configure the Edge VM according to the version 2.5 Edge VM design. So what does that look like? Something like this:

    one n-vds config

    A single N-VDS that is associated with both an overlay and a VLAN transport zone. The EdgeVM-Uplink-Profile gives me two DPDK Fastpath interfaces that I assign to each their VDS trunk port group.

    When deployment of the two new Edge VMs is finished I have the following situation under System > Fabric > Nodes > Edge Transport Nodes:

    edge transport nodes

    Edge nodes en03 and en04 are the new Edge transport nodes.

    I add the new Edge transport nodes to the existing Edge cluster where they join en01 and en02:

    edge cluster 4 nodes

    Step 5 – Transition

    At this point en01 and en02 are the only Edge transport nodes with logical network configuration linked to them. While en03 and en04 are members of the same Edge cluster, they are not doing much in terms of data plane services.

    A diagram of the L3 topology in my lab from an NSX Edge perspective:

    Transitioning to the new Edge transport nodes won’t and shouldn’t alter anything in the L3 topology above. Otherwise I would consider it a bad transition.

    I’m ready to replace the current Edge transport nodes with the new ones. Unfortunately, the Replace Edge Cluster Member won’t work here as the nodes are having different configurations.

    Instead I’m going to do a manual transition and in my simple lab environment that’s a pretty straight forward process. The only service hosted in the NSX Edge besides north-south routing is a DHCP server. So this should be easy.

    Starting by placing the en01 transport node in maintenance mode:

    en01 maintenance mode

    Now en01 is not involved in any data plane operations anymore. With that in mind I’m feeling comfortable going ahead with the next step which is the removal of the Tier 0 interfaces that are linked to en01.

    My Tier 0 gateway has an active-standby HA mode which means it can’t have its configuration mapped to more than two Edge transport nodes at a time. By deleting the configuration linked to one Edge transport node I’m making room for a new Edge transport node. One at time.

    tier 0 interfaces
    confirm deletion

    Deleting the interfaces will break the Tier 0 gateway’s en01 connection with the TORs, but this is acceptable as en01 has been placed in maintenance mode and the data plane won’t experience any disruptions.

    Once the two interfaces linked to en01 have been removed we can add them again with the same name and the same IP configuration as before, but this time I link them to en03 and select the newly created transit segments:

    new transit interfaces

    Once done with deleting and adding interfaces there’s a kind of hybrid situation where two Edge transport nodes (en02 and en03) each with a different deployment mode are serving the same Tier 0 gateway:

    en02 and en03

    And it works!

    Now I repeat the same process to replace en02 with en04:

    1. Place en02 in maintenance mode (en03 takes over its duties).
    2. Delete Tier 0 interfaces linked to en02.
    3. Add Tier 0 interfaces, link them to en04 and select the new segment

    The final result is four Tier 0 gateway interfaces with the same name and IP as before, but linked to the new Edge transport nodes:

    four interfaces in new ens

    Just the DHCP service left which is pretty easy.

    I have to re-configure the DHCP service so that it uses the new Edge transport nodes. This is done under Advanced Networking & Security > Networking > DHCP > Server Profiles

    I edit the profile so that it only contains en03 and en04 as its members.

    Step 6 – Clean up

    After verifying that everything is working as it should the time has come to say goodbye to the old Edge transport nodes.

    I first remove en01 and en02 from the Edge Cluster:

    remove en01 and en02 from cluster

    And then simply delete them from the fabric:

    delete edge node

    I can also delete the PG-OVERLAY, PG-UPLINK1, and PG-UPLINK2 port groups in vSphere as they are no longer needed.

    This leaves the environment with the new en03 and en04 Edge transport nodes and the new NSX-T 2.5 recommended Edge VM design!

    Summary

    A summarization of the steps I took to go from a “three N-VDS Edge VM” design to a “single N-VDS Edge VM” design:

    1. Create trunking port groups in vSphere.
    2. Create new transit segments configured with VLAN ID.
    3. Create new uplink profile for the Edge transport node
    4. Deploy two new Edge VMs and configure them with the “single N-VDS” design.
    5. Replace the existing Edge transport nodes by doing a manual transition.
    6. Verify and clean up.

    Quite an operation but certainly doable. It might or might not be worth the effort. It comes down to wether the advantages that this new Edge VM design offers are important enough to you.

    Keep in mind that placing Edge transport nodes in maintenance mode as I did in this article will trigger a fail-over between the nodes (with active-standby mode) which in turn causes short data plane disruptions. That’s not an issue in a lab, but something to consider in a production environment. For a Tier 0 gateway with an active-active HA mode and ECMP enabled this would be less of an issue.

  • Welcome back! In part 1 we had a look at some NSX-T management plane failure scenarios and how to recover from them. In this part we continue to investigate NSX-T recoverability at the data plane and more specifically the NSX Edge.

    Quick note

    If you ever experience an issue in your NSX-T production environment, the first and only thing you should do is open a VMware support request. Highly skilled experts who are dealing with all kind of NSX-T issues on a daily basis will help you in the best possible way with your specific issue.

    NSX data plane failure & recovery

    Most will agree that failures at the data plane are more critical than for instance failures at the management plane. After all, the data plane is where the network packets that really matter are flowing around. Failures at the data plane can potentially impact service availability.

    Luckily, the NSX data plane is robust by design. Largely distributed and where it’s centralized it’s also clustered. Combine this with a proper design for the physical and logical components and you’re looking at a pretty solid solution.

    But sure, things can break down and when they do it’s important to understand how to get back on track again.

    The lab environment

    We’re still using the same small lab environment as in part 1. I just added a VM in the compute cluster for today’s article. Below is a diagram showing the main components from a high level perspective.

    The NSX Edge

    The NSX Edge is a centralized, often clustered, component. It provides a range of gateway services, but one of its main responsibilities is routing traffic between NSX logical networks and the physical network.

    The worker bees of the NSX Edge are the edge nodes. They are available in two form factors (virtual machine and bare metal) and are organized in one or more edge clusters.

    In my lab environment the NSX Edge consists of of two edge node VMs and one edge node cluster.
    Let’s have a quick look at the deployment details of one these edge node VMs.

    A pretty common NSX-T 2.4 edge node deployment configuration for the VM form factor.

    Below the layer 3 topology running on top of the NSX Edge.

    As you can see the layer 3 network is making good use of the lower layer’s redundant paths.

    Lastly, the Tier 0 gateway in this lab has been set up with an Active-Standby HA mode.

    Current state of the NSX Edge

    Life is good at the edge. The edge node VMs are up and running.

    The edge transport nodes configuration state and node status are looking good.

    The Tier 0 gateway’s BGP summary shows that BGP connections are established with both of the TORs.

    The Tier 0 gateway’s routing table contains IP routes advertised by the TORs through BGP.

    And last but not least the VM in the compute cluster can access the physical network. A “traceroute” to the PING host on the physical network shows that traffic is routed to TOR-Right (172.16.14.253) at the moment:

    North-south networking is running beautifully! What can possibly go wrong on a day like this?

    TOR down

    Not exactly an NSX Edge failure, but definitely a failure scenario that concerns the NSX Edge.

    TOR-Right broke down. What’s the impact? Let’s have a look.

    The BGP summary indeed shows us that we’ve lost connection with TOR-Right. BGP connections with TOR-Left are still intact though.

    The Tier 0’s routing table now only contains BGP routes advertised by TOR-Left.

    All of this is expected, but how is the data plane affected by this TOR failure?

    It seems to be working fine. Sure, the “traceroute” reveals that traffic is now passing through TOR-Left (172.16.13.254), but that’s about it.

    The redundant infrastructure and BGP making use of that ensured that this TOR failure had minimal impact on the NSX data plane.

    TOR down recovery

    Basically we would just rack and stack a new TOR, configure it, and restore redundancy. The only thing we need to do within NSX is verify that the BGP connections are restored.

    Edge node down

    Last time I checked there were two edge node VMs in that Edge cluster. en01 is gone!

    What’s the impact? How do we recover?

    Let’s first investigate the impact this failure has on the north-south traffic.

    Alright, none whatsoever.

    The VM can still reach the physical network. The surviving edge transport node must have taken over the duties of the failed node.

    But of course, the NSX Edge is now running on a single edge transport node and NSX Manager clearly shows us that we are dealing with a degraded state.

    Without a standby node we’re living on the edge (pun intended). We need that second transport node up and running again.

    Edge node down recovery

    In a situation like this it’s good to remember that there’s nothing unique about an edge node. During its lifetime it is much like a container receiving and executing configuration from the management plane. In other words, losing the edge node is in itself nothing traumatic. We just need to get a new one.

    The first step when recovering from a permanent edge node failure is to deploy a new edge node. Once it’s deployed three edge transport nodes are listed in the NSX Manager UI.

    • en01 with status “Unknown” is the node that is missing.
    • en02 with status “Degraded” because it can’t find its HA buddy.
    • en03 with status “Up” is alive and happy but not doing much.

    The second step is to tell the management plane that we want to replace the missing edge transport node with the one we just deployed.

    This is done under Edge Clusters in the NSX Manager UI (or via the API).

    After clicking the small gear icon we select Replace Edge Cluster Member. This starts the process of re-mapping logical network configuration from one edge transport node to another.

    In our scenario we want to re-map from en01 to en03.

    If the edge transport node would still be operational, we would put it in maintenance mode here to minimize data plane disruptions. In our failure scenario the node is already gone so maintenance mode is not relevant.

    After clicking Save the management plane comes into action and links configurations and other related logical network constructs to the new edge transport node.

    Once the process is done we can delete the orphaned edge transport node and after a minute or so we’re seeing two healthy edge transport nodes again.

    A look at the Tier 0’s logical router ports shows us what happened.

    Two of the logical router ports previously mapped to en01 have been relocated to en03.

    BGP connections are established again.

    Replacement successful! The fabric’s state is restored to normal operations.

    Summary

    Today we looked at two failure scenarios concerning the NSX Edge:

    • Failure of a top-of-rack switch
      • limited impact on the data plane
    • Failure of an NSX edge node
      1. deploy new edge node
      2. run edge transport node replace process
      3. remove orphaned edge transport node from fabric

    Not too bad. This is a small environment, but the recovery procedures will be largely the same regardless of environment size.

    Sure, more things can break. An ESXi host hosting an edge node, a physical NIC, cables, and so on. The bottom line is that unless we’re dealing with a complete meltdown, a properly designed NSX Edge will minimize the impact of component failure and make recovery a piece of cake.

  • With NSX-T 2.5 comes NSX Intelligence 1.0. This component, which is part of NSX Data Center Enterprise Plus, is something I’ve been looking forward to since it was announced.

    NSX Intelligence adds a powerful analytics engine to the NSX-T platform. It provides workload and network context that is unique to NSX. Application owners and operations people can use the NSX Intelligence interface for configuration and monitoring.

    Besides the NSX Intelligence data platform itself, this 1.0 release provides visualization and security rule and grouping recommendations.

    Cool stuff. Let’s have a look at how to get it up and running.

    Installation preparations

    The preparation and installation steps are explained in detail in the official installation documentation. I strongly recommend you follow these guides when installing NSX Intelligence. Some things to point out:

    • NSX Intelligence 1.0 requires NSX-T version 2.5. The first thing I had to do was upgrade my NSX-T lab to version 2.5. In a production environment the 2.5 upgrade requires its own planning and preparations of course
    • The NSX Intelligence installation comes as a tar-file. Its contents need to be extracted and placed on a web server somewhere that can be accessed by your NSX Manager cluster.
    • The NSX Intelligence appliance must be deployed on ESXi managed by vCenter.

    Installation

    Once the environment is prepared we can start the NSX Intelligence installation.

    In NSX Manager navigate to Plan & Troubleshoot > Discover & Take Action:

    Click on Go to system, scroll down on the Appliances page and click Add NSX Intelligence Appliance. This starts the appliance deployment wizard:

    Enter the URL to the OVF file and the appliance network configuration:

    I’m deploying the small NSX Intelligence appliance which is suitable for labs or PoCs. For a production environment you would select the large form factor.

    In the next step we configure the vSphere details for the virtual appliance:

    Configure the appliance credentials at the third and final step:

    Click on Install Appliance to start the deployment:

    Deployment took about 5 minutes to complete in my lab environment.

    First look

    Although it’s a separate virtual appliance, the NSX Intelligence UI seamlessly integrates with the NSX Manager UI. It can be found under Plan & Troubleshoot > Discover & Take Action.

    The two objects we can work with here are virtual machines and groups:

    We can choose to display only certain VMs/groups or all:

    And apply a filter based on tags, flows, and rules:

    After powering on two Windows VMs it took about 20 seconds before NSX Intelligence engine started to draw the communication paths of these VMs. Impressive!

    In full screen mode you can switch to dark mode. Much appreciated.

    To get actual firewall rule recommendations you need to start a new recommendation process:

    After clicking the Start Recommendation button you can configure some parameters. Time range being the most important:

    Click on Start Discovery to kick off the recommendation process. This process can be monitored under Recommendations:

    Once done analyzing the recommended rules, groups, and services can be reviewed and modified:

    At step 2 we choose placement for the new recommendation based security policy:

    Clicking on Publish will create the objects and enforce the security policy:

    The recommended rules are in place:

    Summary

    Installing NSX Intelligence is a straight forward process (apart from its web based OVF installation requiring a web server).

    We took the NSX Intelligence engine for a really quick test drive and deployed some recommended firewall rules including service and group objects minutes after deployment (a longer period for analyzing is strongly recommended).

    Even as version “1.0” NSX Intelligence is going to make micro segmentation very much easier and very much faster. It’s a big step towards self-driving micro segmentation operations. No to mention the slick visualization and visibility it gives us for our VMs communication paths.

  • Like everything else in life, stuff can break in your NSX-T environment too. When that happens it’s important to understand how to get things back on track again.

    In the following blog articles I’m going through a couple of NSX-T failure scenarios and look at how to recover from them. As usual I’m doing this mainly to learn something myself. By writing about it in a blog article others might learn something as well.

    Quick note

    Before starting I just want to state the obvious: If you ever experience an issue in your NSX-T production environment, the first and only thing you should do is open a VMware support request. Highly skilled experts who are dealing with all kind of NSX-T issues on a daily basis will help you in the best possible way with your specific issue.

    Management plane failure & recovery

    As you might know the NSX architecture consists of three main components: The management plane, the control plane, and the data plane. As of NSX-T version 2.4 the management plane and the control plane are living in the same virtual appliance, but from a functional point of view they are still as decoupled as before.

    One of the advantages with this architecture is that failure of one plane will not (immediately) affect the functioning of another. For example, a management plane failure won’t cause immediate issues at the data plane.

    Of course, NSX-T without a properly functioning management plane gets pretty annoying after a while and restoring it to normal operations will probably be a high priority in most environments. Let’s have a closer look at that.

    The lab environment

    The following is a very simple and high level overview of my lab environment:

    This is by no means a VMware validated or supported design. Just my lab environment.

    A compute cluster with two ESXi transport nodes on the left, an edge cluster with two edge VMs on the right, and a management cluster containing vCenter and the collapsed NSX-T management/controller cluster in the middle. Today our focus is the management cluster.

    Current state of the management plane

    The NSX-T management cluster is currently in a healthy and stable state. We see three manager nodes connected and synced. Life is good.

    Backup is configured and seems to be working fine too:

    In other words, we’re ready for some mayhem!

    One manager/controller node down

    One of the management/controller nodes is gone. Somebody accidentally deleted the virtual appliance. It happens (in my lab).

    The “get cluster status” NSX manager CLI command output clearly shows that the group status is degraded and that one of the nodes is down:

    Even though the management cluster is now running in a degraded state, the majority of the cluster is still up and running (we have quorum) and NSX-T management operations aren’t affected at this point.

    It’s not an optimal situation and we do want to return to a stable three node management cluster as soon as possible.

    One manager/controller node down recovery

    To recover from a one node failure we simply need to deploy a new manager/controller appliance that will replace the missing one.

    We’ll first remove the orphaned node from the management cluster using the “detach node <node-id>” NSX manager CLI command:

    The management cluster now returns to a stable state:

    This is just a cosmetic improvement of course. We still need to deploy a new manager/controller appliance and join it to the cluster to get back to a production-grade management cluster.

    Joining the new appliance to the cluster with the “join” command:

    And after 10 minutes or so we once again have a stable three node management cluster. Easy!

    Two manager/controller nodes down

    Is a two nodes down scenario all that different from the previous one? The answer is yes.

    With two nodes missing, the cluster loses quorum and can’t operate as a cluster anymore. The management plane goes into read-only mode. It’s not a disaster, but not a good situation either.

    You’ll notice things like not being able to connect to the cluster’s VIP anymore and when logging in to the surviving node’s UI directly you are greeted with something like this:

    We need to fix this, but how?

    Two manager/controller nodes down recovery

    We first need to deactivate the cluster. We do this by running the “deactivate cluster” manager CLI command on the surviving node:

    This leaves us with a single, but operational manager/controller node. We should now be able to login to the manager UI on the VIP address again.

    Checking under System > Overview we see a single operational node:

    From here we can just deploy two new manager/controller nodes, join them with the surviving node to form a cluster, and get back to this:

    Three manager/controller nodes down

    Now we lost all three of the manager/controller nodes. We are running NSX-T without a management plane (and central control plane for that matter). Don’t panic! Packets are still flowing as the the data plane is not affected by the management plane outage remember?

    But of course, it’s now just a matter of time before things will get really problematic. We need to do something.

    Three manager/controller nodes down recovery

    Without any surviving nodes the only option we have is to perform a restore from an NSX-T backup.

    The first step here is to deploy a new manager/controller appliance. Once deployed we need to configure and perform a restore.

    Navigate to System > Backup & Restore > Restore. Click on Edit to enter the details of your SFTP server containing the NSX-T backups:

    After clicking on Save you should be presented with a list of the available backups:

    Pick the backup you want to restore from and click Restore. Read the warning message. It tells you what to expect during the restore process:

    Pretty soon after starting the restore you are disconnected from the UI. After 10 minutes or so we should be able to log in again. When navigating to System > Backup & Restore > Restore, the following message is displayed:

    Okay, this makes sense. We’ll deploy two more appliances, form a cluster, and then resume the restore process. Once we’ve done that we should see the following confirmation under System > Backup & Restore > Restore:

    Don’t mind the timestamps in the screenshot above. That was just me taking a break 😉 The whole recovery process, deployment of manager nodes included, actually took less than an hour. Not too bad.

    Summary

    Recoverability of the NSX-T management plane is rather good I would say. We went through a couple of failure/recovery scenarios:

    • Loss of one manager/controller node:
      1. clean up orphaned node
      2. deploy new node
    • Loss of two manager/controller nodes:
      1. deactivate cluster
      2. deploy two new nodes
    • Loss of three manager/controller nodes:
      1. deploy new node
      2. restore from NSX-T backup
      3. deploy two more nodes
      4. finish restore operation

    In my very simple lab environment it was never any problem to get the management plane back up and running again. Thanks to NSX’s architecture the actual networking (data plane functions) was not affected at any point during the management plane outages.

    In theory size and complexity of an NSX-T environment shouldn’t matter much for management plane recovery procedures. As I said before, open a VMware support case if anything ever breaks in your NSX-T environment. It’s the smartest thing to do.

  • On August 30th 2019 I passed the brand new VMware Advanced Design NSX-T Data Center 2.4 exam.

    Let me tell you a little more about this exam and how I prepared for it.

    About the exam

    As far as I know 3V0-41.19 is the first ever design exam within the network virtualization certification track. A deploy exam based on NSX for vSphere has been around for a while (which I reviewed a while back). As the name implies the design exam is based entirely on NSX-T 2.4.

    Let’s look at a couple of things from the all important exam guide:

    • 50 questions
    • 105 minutes (plus additional time for non-native English speakers)
    • Passing score 300

    So on average you have a couple of minutes for each question.
    In my experience most of the questions took less than a minute to answer so that left me with some more time for the trickier ones.

    • 1 year of experience designing NSX-T Data Center solutions
    • 2 or more years designing physical and virtual data centers/networks

    The above aren’t strict requirements of course. I can imagine that candidates having this experience will have an easier time preparing for the exam and are also more likely to pass it.
    Experience with (VMware) design and the fact that I had worked extensively with NSX-T for the last 8 months certainly helped me.

    Preparing for the exam

    How do you prepare for an exam like this? The following are my personal recommendations partly based on my own experience.

    Know your NSX-T [V2.4]

    You might think this exam is all about high-level architecture and designs, but you would be wrong.

    You will need a very good technical understanding of the NSX-T solutions. Preferably your are on VCP-NV 2019 (NSX-T 2.4) level before going for this exam. Read the previous post to learn a little more about what it takes to get to that level.

    Know your NSX-T design [V2.4]

    Naturally. You should also have good knowledge about the different NSX-T designs and design decisions within areas like:

    • Physical infrastructure design
    • Compute host cluster design
    • Edge design
    • Logical networking design

    The Architecture and Design for VMware NSX-T Workload Domains document (part of VMware Validated Design 5.1) is a really good read and covers a lot of the designs and design decisions.

    Then there is another piece of documentation that I recommend you study: The VMware NSX-T Reference Design Guide. Outdated as it is (based on NSX-T 2.0) it still is a great resource if you want to learn about NSX-T design.

    Identify requirements, risks, constraints, and assumptions

    You might have seen these before. As a VMware architect working on a design, you are expected to be able to categorize the information collected during interviews and workshops with project stakeholders as either being a requirement, a risk, a constraint, or an assumption.

    It’s not all that hard really, but you really need to understand this. When I was studying this for the VCAP-DCV Design exam I found this site to have a pretty good explanation of it all.

    Attend the VMware NSX-T Data Center: Design [V2.4] course

    This is the recommended course as preparation for the exam. Full disclosure, I did not attend this course (yet) so I can’t say much about it, but it’s supposed to be really good.

    Watch the VMworld sessions about NSX-T design

    Every year William Lam puts together a nice list with links to all the recorded VMworld sessions. As a matter of fact he just did this for the VMworld 2019 US recordings.

    At a minimum you should watch the following sessions:

    • CNET1072BU – NSX-T Design for Small to Mid-Sized Data Centers
    • CNET1334BU – NSX-T Design for Multi-Site Networking and Disaster Recovery

    There are many other recorded sessions on NSX-T that you might want to check out as well.

    Summary

    I really enjoyed this exam. The scenarios in the questions were realistic and easy to relate to. Some of the questions were challenging while others were pretty easy.

    I felt pretty well prepared ahead of the exam. And it turned out well for me so there’s really no reason it shouldn’t for you.

    Good luck!

  • I recently passed the Professional NSX-T Data Center 2.4 (2V0-41.19) exam and in this short post I would like to share my experience with the exam and some recommendations for preparing for the exam.

    About the exam

    As with all VMware exams there is an excellent exam guide available. That guide should be the first thing you study. It contains all the information you need to plan your preparation.

    Some things from the current exam guide (it’s subject to change):

    • 70 questions
    • 100 minutes (plus additional time for non-native English speakers)
    • Passing score 300

    70 questions to be answered within 100 minutes. With good preparation this is fair. Time was not an issue for me during this exam. I did receive extra time being a non-native English speaker, but 100 minutes would’ve been enough.

    • 6 months experience installing, configuring, managing, and troubleshooting NSX-T 2.4 solutions
    • 6 months hands-on experience with Linux and KVM
    • 1 year of experience working in IT and with VMware vSphere and its command line

    This candidate profile looks reasonable to me. I recommend that you have a good look at NSX-T on Linux and KVM. VMware is serious about running NSX-T on these platforms.

    Preparing for the exam

    Everyone has their own preferred way of preparing for an exam. Also, different exams require different ways of preparing. The following is what helped me pass 2V0-41.19.

    Working with NSX-T

    I was lucky enough to be part of several PSO engagements that involved NSX-T. Actually, my last project was a complete NSX-T implementation where I participated from the design phase all the way to the day 2 operations.

    Thanks to this kind of real-life exposure to NSX-T I developed a pretty solid understanding of the product and its components which in turn was very beneficial in preparing for an exam like this.

    Studying the NSX-T 2.4 documentation

    Over and over again. Which I was kind of forced to do for my work anyway.

    The official NSX-T documentation contained much (but not all) of the theory I needed to master. I knew it more or less by heart ahead of the exam.

    Playing with NSX-T

    This exam tests once’s theoretical knowledge on pretty much all of the solutions within NSX-T. For me building, breaking and trying out things in an NSX-T lab environment was a great way to learn the product.

    So if you have the possibility to install an NSX-T lab environment I strongly recommend you do that.

    If setting up your own NSX-T lab is not an option you can always have a look at one of the VMware NSX-T hands-on labs. Especially HOL-1926-01-NET is a good one to play around with.

    Attending the VMware NSX-T Data Center: Install, Configure, Manage [v2.4] course

    This course is the recommended course for the exam. It’s also one of the four courses you can choose from in case you want to become certified (VCP-NV 2019) on passing this exam.

    This course is one of the better VMware courses I’ve attended. I also want to give a lot of credit to teacher Adrian Vizoso. A subject matter expert with the ability to explain relatively complex matter in an easy to understand way.

    The course goes through a lot of theory, but also contains plenty of labs. So you will gain hands on experience with the different NSX-T solutions.

    Perhaps I wasn’t exactly new to NSX-T at the time I attended this course, but about half of the attendees were. By the end of the course they had developed a good basic understanding of the platform and definitely felt better prepared for the exam.

    Summary

    The Professional NSX-T Data Center 2.4 exam is by no means an easy exam. Even though I felt well prepared it was actually a bit harder than I expected.

    That being said, with good preparation and hands-on experience you should be able to pass this exam without too much trouble.

    I wish you good luck with the studies and the exam itself!

  • Hi and welcome back. We’re looking into the NSX-T data path and investigating different points at which we can capture network traffic.

    You may remember from part one that virtual machine “app01” (172.16.2.50) is trying to ping another virtual machine called “web01” (172.16.1.53), but it’s receiving “request timeout”. We’re trying to find out where in the data path we’re having an issue.

    So far we have seen the ICMP packets at the following points:

    We captured traffic pre (1) and post (2) distributed firewall. Next, we ran captures before (3) and after (4) traffic got routed by the distributed logical router. In part two we first observed the ICMP packets going through the N-VDS uplink (5) and then again at the vmnic (6). Here we had to dig a bit deeper as the packets got Geneve encapsulated.

    Reference points

    In this final part of the series we look at the remainder of the data path for the ICMP packets. More precisely we will cover these points today:

    Reference point 1 – vmnic

    After the encapsulated packets traveled the overlay transport VLAN, they arrive at the destination ESXi host’s physical NIC. What do we see at this point? Let’s see:

    nsxcli -c start capture interface vmnic2 direction input file vmnic2_in.pcap

    Opening the pcap file in Wireshark reveals the following:

    So, this confirms that the ICMP packets survived their journey over the physical network and have indeed arrived at ESXi02’s physical NIC.

    Reference point 2 – Post N-VDS Uplink

    At this point the ICMP packets should have been decapsulated by the receiving TEP. To see the packets right after decapsulation, we need to use the pktcap-uw command which supports the “stage” parameter when capturing traffic at the uplink level. This is something nsxcli does not support yet.

    As pktcap-uw doesn’t interpret packet contents, we will capture traffic to a pcap file:

    pktcap-uw --uplink vmnic2 --dir 0 --stage post --srcip 172.16.2.50 -o /tmp/post_uplink.pcap

    To read the contents of the pcap file we can use Wireshark, but in this case we’ll do a quick read using the tcpdump-uw command:

    tcpdump-uw -r /tmp/post_uplink.pcap

    And there we see the ICMP packets right after being decapsulated.

    Reference point 3 – Pre DVfilter

    After decapsulation the packets are heading towards the destination virtual machine (web01). Remember they were already routed to the destination NSX-T segment on the source ESXi host. Distributed routing FTW!

    Before the packets get access to web01’s guest OS, they once again need to go through a distributed firewall filter. This time the one that is applied to web01’s vNIC slot 2.

    First we need to get hold of the filter’s name:

    summarize-dvfilter | less -p web01

    Next we capture traffic pre this filter:

    nsxcli -c start capture dvfilter nic-2780521-eth0-vmware-sfw.2 stage pre expression srcip 172.16.2.50

    And we’re seeing the ICMP packets!

    Reference point 2 – Post DVfilter

    Now let’s see if we can see the packets after it being processed by the distributed firewall rules:

    nsxcli -c start capture dvfilter nic-2780521-eth0-vmware-sfw.2 stage post expression srcip 172.16.2.50

    Nothing coming through! Some DFW rule must still be dropping the ICMP packets.

    Let’s have a look in NSX Manager:

    Another DFW rule dropping ICMP traffic destined for web01! How could we have missed that! 😉

    After changing the rule’s action to “allow” the capture immediately starts showing output:

    And the app01 virtual machine starts receiving its replies:

    What we should’ve done instead

    The whole ICMP timeout issue was kind of hypothetical of course. If we ever need to troubleshoot VM-to-VM communication in NSX-T the first tool that comes to mind is Traceflow in NSX Manager. Using Traceflow the ICMP issue in our scenario would have been solved in a second. Let’s have a look.

    In NSX Manager navigate to Advanced Network & Security > Tools > Traceflow:

    We pick the source (app01) and destination (web01) virtual machine we click “Trace“:

    In a matter of seconds the trace result shows us exactly where in the data path we’re having an issue.

    Summary

    It’s been quite a journey. Let’s summarize:

    We looked at 10 points in the NSX-T data path. At each of these points we captured traffic. The following table is an overview of the command used at each point:

    PointCapture Command
    1nsxcli -c start capture dvfilter nic-2103285-eth0-vmware-sfw.2 stage pre expression dstip 172.16.1.53
    2nsxcli -c start capture dvfilter nic-2103285-eth0-vmware-sfw.2 stage post expression dstip 172.16.1.53
    3nsxcli -c start capture interface vdrPort direction output expression dstip 172.16.1.53
    4nsxcli -c start capture interface vdrPort direction input expression dstip 172.16.1.53
    5nsxcli -c start capture interface uplink1 direction output expression dstip 172.16.1.53
    6nsxcli -c start capture interface vmnic2 direction output file vmnic2_out.pcap
    7nsxcli -c start capture interface vmnic2 direction input file vmnic2_in.pcap
    8pktcap-uw –uplink vmnic2 –dir 0 –stage post –srcip 172.16.2.50 -o /tmp/post_uplink.pcap
    9nsxcli -c start capture dvfilter nic-2780521-eth0-vmware-sfw.2 stage pre expression srcip 172.16.2.50
    10nsxcli -c start capture dvfilter nic-2780521-eth0-vmware-sfw.2 stage post expression srcip 172.16.2.50

    There are other tools available to gain visibility into the NSX-T data path. We talked briefly about the Traceflow tool in NSX Manager. The Port Mirror module in NSX Manager is another very powerful tool. And then of course we have VMware’s flagship product for analyzing, optimizing and troubleshooting network traffic vRealize Network insight.

    I still believe it’s important to also have a good understanding of the NSX-T data path on the level we discussed in these articles. It makes troubleshooting and working with higher level tools easier.

  • Welcome back! We’re looking at how to gain visibility at different points in the NSX-T data path.

    You may remember from part one that virtual machine “app01” (172.16.2.50) is trying to ping another virtual machine called “web01” (172.16.1.53), but it’s receiving “request timeout”. We’re trying to find out where in the data path we’re having an issue.

    So far we captured packets at several points in the NSX-T data path:

    We looked at traffic pre (1) and post (2) distributed firewall. Here we actually found a small issue with a missing DFW rule. Then we ran captures before (3) and after (4) traffic got routed by the distributed logical router.

    Reference points

    The NSX-T control plane provided information about the location of the destination IP address. Traffic is now heading towards the physical network.

    The points we’ll have a closer look at today are the N-VDS uplink and the vmnic as shown in the diagram below:

    Reference point 1 – N-VDS Uplink

    To see the relevant traffic at the N-VDS uplink, we run the following command:

    nsxcli -c start capture interface uplink1 direction output expression dstip 172.16.1.53

    As you can see the ICMP echo requests are arriving at the uplink. No issues here it seems so we’ll move on to the next point.

    Reference point 2 – vmnic

    So this should be easy. We’re starting a new capture by running:

    nsxcli -c start capture interface vmnic2 direction output expression dstip 172.16.1.53

    Nothing? No output? Traffic seems to have vanished between the N-VDS uplinks and the physical NICs.

    Let’s see if there’s traffic on vmnic2 at all:

    nsxcli -c start capture interface vmnic2 direction output

    There’s plenty of traffic. So where are the ICMP packets?

    Welcome to Geneve

    As in Generic Network Virtualization Encapsulation that is. The protocol providing network overlay capability in NSX-T.

    The Geneve protocol’s job is to set up a tunnelling mechanism between NSX-T transport nodes. Each transport node has one or more tunnel endpoints (TEPs) that encapsulate and decapsulate layer 2 frames that are originating from or destined to NSX-T logical networks.

    This encapsulation/decapsulation process is running close to the physical NICs. Some server grade NICs even have capabilities to offload the TEPs.
    See the VMware Compatibility Guide for supported NICs that offer Geneve offload capabilities:

    So that’s all great and cool, but now the ICMP packets went down a Geneve tunnel and we lost visibility!

    Nope.

    Wireshark to the rescue

    The good thing with Geneve is that it’s open to third party tools, like Wireshark, that can decode the encapsulated packets. Let’s have a look at that.

    First we need to capture vmnic2’s outbound traffic once more, but this time we’re saving the output to a pcap formatted file:

    nsxcli -c start capture interface vmnic2 direction output file vmnic2_out.pcap

    After 30 seconds or so we terminate the capture. The pcap file can be found under /tmp on the ESXi host. Using a tool like WinSCP (Windows) or Cyberduck (MacOS) or just native scp (Linux/MacOS) we can easily copy the file to a machine with Wireshark installed.

    Let’s open this file in Wireshark and have a look:

    Yes, a bunch of packets, but we want to find the ICMP packets. Could it be as easy as typing “ip.dst == 172.16.1.53” as the display filter?

    Yes, it is that easy! Wireshark decoded the Geneve packets and we can query on the attributes of the encapsulated packets.

    A closer look at the one of these captured frames:

    Wireshark makes it quite easy to understand the Geneve frame’s structure. Below the Geneve header we have the entire encapsulated ICMP packet originating from the NSX-T logical network (segment “App”). Pretty nice!

    We could also use the tcpdump-uw command on the ESXi host to look at the contents of a pcap file, but I prefer to use Wireshark with its excellent Geneve decoder for a more user friendly experience.

    Conclusion

    For a moment I thought we had lost visibility into the NSX-T data path. It turned out that the ICMP packets were being encapsulated. The encapsulation does mean some additional steps for detailed visibility, but all in all it’s a pretty quick process.

    In the next part we’ll continue the journey down the NSX-T data path as traffic arrives at the destination ESXi host. Hopefully I’ll find out why the virtual machine isn’t receiving ICMP replies in the process.

  • Having good insight into the different components of a network communication path is key when managing networks. This goes for physical networks and for software defined networks.

    Today I’m having a closer look at the NSX-T data path and more specifically how to capture network traffic at different reference points in the NSX-T data path.

    The environment

    A semi logical overview of my lab environment:

    A simple setup. Two ESXi hosts participating in logical NSX-T networking. Two segments each with a virtual machine connected. The two segments are linked to the same distributed logical router which is a Tier-1 gateway in my environment.

    The reference points

    The beauty of the NSX-T data path is that you can tap into it from many different points. Thanks to this this it’s quite easy to get a holistic picture of a traffic flow. This is invaluable in about any network troubleshooting scenario.

    Below a simplified diagram of the reference points in the part of the NSX-T data path that we’ll have a closer look at in this article:

    This might look a bit intimidating at first, but it is not too bad actually. All these reference points can be looked into from one and the same ESXi host and by using the same command. Let’s have a look.

    Reference point 1 – Pre DVfilter

    Perhaps we want to capture a virtual machine’s network traffic before it’s being processed by NSX distributed firewall rules.

    For this we first need to find the name of the filter in vNic slot 2. It’s at slot 2 the applied NSX DFW rules are living.

    On the ESXi host where the virtual machine of interest is running, type the following command:

    summarize-dvfilter | less -p virtual_machine_name 

    For example:

    summarize-dvfilter | less -p app01

    Now that I know the filter name (nic-2103285-eth0-vmware-sfw.2) I can start capturing packets before they enter the filter.

    For example:

    nsxcli -c start capture dvfilter nic-2103285-eth0-vmware-sfw.2 stage pre expression dstip 172.16.1.53

    Here the “stage pre” parameter instructs the “nsxcli start capture” command to capture packets before they enter the filter. The output of the command looks like this in my case:

    The output shows that virtual machine app01 (172.16.2.50) is trying to ping virtual machine web01 (172.16.1.53) . I’m only seeing echo requests though and no echo replies. Something is not working. Time to look on the other side of the filter!

    Reference point 2 – Post DVfilter

    So, I run the same command but change “stage pre” to “stage post” to capture packets after they’ve gone through the filter:

    nsxcli -c start capture dvfilter nic-2103285-eth0-vmware-sfw.2 stage post expression dstip 172.16.1.53

    After waiting 20 seconds or so I still don’t see any echo requests coming through. In other words the filter in slot 2 (DFW) is dropping the ICMP echo requests. No wonder that I don’t see any echo replies!

    I quickly create a DFW rule that allows outbound ICMP to 172.16.1.53. All good now, right?

    Wrong! It wasn’t that easy. It seldom is.

    Alright, but let’s look at traffic after the filter once more:

    nsxcli -c start capture dvfilter nic-2103285-eth0-vmware-sfw.2 stage post expression dstip 172.16.1.53

    This time the echo requests are coming through at least. But no, still no echo replies. The problem must be further down the data path.

    Reference point 3 – vdrPort output

    The next component down the NSX-T data path that we can tap into is the vdrPort. The vdrPort is the component that funnels traffic to and from the distributed logical router on an ESXi host:

    So let’s continue to investigate the “mystery” with my ICMP traffic. To capture traffic as it enters the the vdrPort I run the following command:

    nsxcli -c start capture interface vdrPort direction output expression dstip 172.16.1.53

    The “direction output” parameter might seem a bit counter intuitive here, but understand that the flow direction is with regard to the underlying N-VDS and not the T1 downlink interface (LIF). A quick diagram hopefully explains it:

    The capture ouput clearly shows that the ICMP packets arrive at the vdrPort so that seems fine:

    Reference point 4 – vdrPort input

    How about after the DLR kernel module has spent some quality time with the packets?

    It’s as simple as changing “output” to “input” in the nsxcli command:

    nsxcli -c start capture interface vdrPort direction input expression dstip 172.16.1.53

    The ICMP packets got routed to the Segment Web. If you look closely you’ll note that the source MAC address has changed. So everything is working as expected here.

    Summary

    In this part we captured traffic at four different reference points on the NSX-T data path. We followed virtual machine traffic from when it just left the guest OS all the way to where it got routed by the Tier-1 Gateway onto the next NSX-T segment.

    In the next part we will continue our journey on the NSX-T data path as the packets hit the physical network. Stay tuned!