Nested vSphere 7 And NSX-T 3.0 Deployed With Ansible

Posted by

With VMware releasing new major versions of vSphere and NSX-T last week, it’s high season for nested lab deployments. My Norwegian Proact colleague Rudi Martinsen just published a great two part series on how to deploy a nested lab using vRealize Automation. My Dutch buddy at VMware Iwan Hoogendoorn is doing something very exciting with Terraform. And William Lam, who has been building nested labs since the day he was born I believe, has been busy with something too.

There are many tools available to automate deployment of a nested lab. Which is your favorite one?

Ansible Playbooks

About a week ago I stumbled upon Yasen Simeonov’s GitHub repository. It contains a collection of Ansible Playbooks that automate the deployment of a nested vSphere environment. After trying it out a couple of times I decided to adopt Yasen’s, somewhat neglected, pet to give it some new love and attention (with Yasen’s blessing of course).

GitHub repository

Today I’m presenting my own GitHub repository vsphere-nsxt-lab-deploy which is largely based on Yasen’s, but with a couple of updates and additions.

First of all, I updated the code so that it can deploy the brand new vSphere 7. Then I took things one step further and added Playbooks for a complete NSX-T 2.5/3.0 deployment leveraging VMware’s new Ansible NSX-T 3.0 modules.

Runbook

I won’t go through the deployment process in detail. The repository’s single source of truth README.md is hopefully informative enough. Right now the runbook looks like this:

  1. Create a vSwitch and port groups on the physical ESXi host.
  2. Deploy and configure a vCenter Server Appliance (via automated CLI install).
  3. Deploy 5 ESXi virtual machines (via ISO install and KS.cfg).
  4. Configure the nested vSphere environment:
    1. Configure the ESXi hosts.
    2. Create and configure a VDS.
    3. Create Compute and Edge vSphere cluster and add ESXi hosts.
  5. Deploy NSX-T:
    1. Deploy NSX Manager.
    2. Register vCenter as a Compute Manager in NSX Manager.
    3. Create NSX-T Transport Zones (VLAN, Overlay, Edge).
    4. Create IP pool (TEP pool).
    5. Create Uplink Profiles.
    6. Create NSX-T Transport Node Profile.
    7. Deploy two NSX-T Edge Transport Nodes.
    8. Create and configure NSX-T Edge Cluster.
    9. Attach NSX-T Transport Node Profile to the “Compute” vSphere cluster (will effectively install NSX-T bits and configuration on the ESXi hosts in that cluster)

The deployment time is around 1,5 hours on my hardware. Without NSX-T it takes about 45 minutes.

The deployment is easy to modify. Change the settings in answerfile.yml to fit your needs and edit deploy.yml to control which components are being deployed. For example, if you’re not interested in deploying NSX-T you can simply comment out those Playbooks in deploy.yml.

Work in progress

The repository and its code is a work in progress and changes are committed on a regular basis. Although I’m pretty happy with what it currently does, there’s certainly room for improvement. I’m thinking about adding some optional Playbooks that set up some NSX-T logical networking constructs like Tier-0/Tier-1 Gateways, segments, and so on. I’ll keep you posted via the README.md and social media.

Summary

Feel free to use the repository as it is or let it inspire you and create something better. Just don’t forget to thank Yasen who laid the groundwork here.

43 comments

    1. It’s hard to say as it depends on your deployment. For example the amount of nested ESXi VMs and their configuration. Everything can be customized in the answerfile.yml. I’m running the default deployment on a 2-CPU/256GB RAM physical server without any problems.

      Like

      1. Hi,

        Great blog!

        What CPU’s do you have in your server where you’re labbing this? I thought I had it all good and ready….I have got 2 x HP DL380 G8’s with dual 8 core CPU’s in, 192GB RAM in each, 10G networking between them and almost 2TB of NVMe in each on PCIe cards and am really struggling with the reliability of the manager VM’s.

        Any insight much appreciated.

        Like

  1. For nested environments you need promiscuous and forged transmit enabled. Does these ansible play books takes care during deployments?

    Like

  2. Hi,

    This looks great. I am in process to test this in my home lab.
    I have question
    How its resolving nested Esxi and vCenter hostname. Do i need to deploy a Domain controller.
    If yes on which physical esxi portgroup i need this DC01.

    Like

    1. Hi
      Good question.
      A domain controller is not required but certainly nice to have. DNS is needed as the ESXi hosts are added by DNS name to vCenter. You could change this behavior in configureNestedESXi.yml if you like so that they are added by IP address.

      My DNS server is on the physical network and queries from the nested environment are routed to this DNS server. This of course involves setting host routes on the DNS server pointing back to the nested environments management subnet.

      I’m looking at ways to include name resolution in the deployment so that we get rid of this dependency.

      Like

      1. Thanks for the response.

        If i deploy DC01 and nfs on the nested network 1611 vlan and change the DNS ip in answerfile.yaml would that work.

        Like

  3. HI Rutger,

    sorry to bug you again. I am stucked in an issue. After running the ansible-playbook deploy.yml.
    Its only deploying router and vcenter and its getting stucked and not deploying esxi. I am deploying esxi6.7. I have checked vcenter is successfully deployed and working and i can login.

    is there any log i can check.

    Like

    1. No problem.

      The ESXi VMs are also deployed on your physical ESXi so for the deployment vCenter is not used.

      Is it just stuck or do you receive any message? One option is to run the ansible-playbook command with the -vvv flag for more verbosity. You could do something like “ansible-playbook playbooks/deployNestedESXi.yml -vvv” to just run that part of the deployment and with more output.

      Like

  4. Hi rutgerblom,

    I am getting this message. when i am running deployNestedEsxi.yml with verbose switch,

    },
    “results_file”: “/root/.ansible_async/485041661882.10013”,
    “started”: 1
    },
    “msg”: “Unable to find host \”192.168.0.21\””
    }

    PLAY RECAP *********************************************************************
    127.0.0.1 : ok=3 changed=1 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0

    Like

    1. Hi Lalit,
      Did you resolve this problem ? “msg”: “Unable to find host \”192.168.0.21\””.
      What mask is your physical network ? Different than /24 ?
      BR

      Like

  5. When i am running deploy.yml. Its stucking here for a long time.

    TASK [Perform vCenter CLI-based installation] **********************************
    task path: /root/vsphere-nsxt-lab-deploy/playbooks/deployVC.yml:32
    Wednesday 29 April 2020 11:20:42 +0000 (0:00:00.165) 0:03:06.096 *******
    ESTABLISH LOCAL CONNECTION FOR USER: root
    EXEC /bin/sh -c ‘echo ~root && sleep 0’
    EXEC /bin/sh -c ‘( umask 77 && mkdir -p “` echo /root/.ansible/tmp `”&& mkdir /root/.ansible/tmp/ansible-tmp-1588159243.1004767-4152-106655190239341 && echo ansible-tmp-1588159243.1004767-4152-106655190239341=”` echo /root/.ansible/tmp/ansible-tmp-1588159243.1004767-4152-106655190239341 `” ) && sleep 0’
    Using module file /usr/local/lib/python3.6/dist-packages/ansible/modules/commands/command.py
    PUT /root/.ansible/tmp/ansible-local-356458vwl3fy/tmpnop2q_ty TO /root/.ansible/tmp/ansible-tmp-1588159243.1004767-4152-106655190239341/AnsiballZ_command.py
    EXEC /bin/sh -c ‘chmod u+x /root/.ansible/tmp/ansible-tmp-1588159243.1004767-4152-106655190239341/ /root/.ansible/tmp/ansible-tmp-1588159243.1004767-4152-106655190239341/AnsiballZ_command.py && sleep 0’
    EXEC /bin/sh -c ‘/usr/bin/python3 /root/.ansible/tmp/ansible-tmp-1588159243.1004767-4152-106655190239341/AnsiballZ_command.py && sleep 0’

    Like

  6. Hello Rutger,
    Several steps further….
    I have similar problem to Lalit. During nested esxi installation my physical host is “missing”
    “msg”: “Unable to find host \”192.168.1.50\””
    }
    127.0.0.1 : ok=20 changed=10 unreachable=0 failed=1 skipped=14 rescued=0 ignored=0
    Do you have any additional hints for this ? 🙂
    Thank you in advance.
    BR
    Darek

    Like

  7. Yes, and rather before the start.
    Always in this same place
    Friday 01 May 2020 16:57:58 +0200 (0:00:01.887) 0:25:39.626 ************
    ===============================================================================
    Perform vCenter CLI-based installation ——————————- 1380.31s
    Copy ISO contents —————————————————— 69.28s
    Wait 30 seconds before we start checking whether the ESXi hosts are ready — 30.06s (extended for test )
    Upload the ESXi ISO to the datastore ———————————– 26.84s
    Wait 5 seconds for the port groups to become available —————— 5.04s
    Check if VCSA is already installed ————————————– 3.79s
    Deploy ESXi VMs ——————————————————— 3.38s
    Create Clusters ——————————————————— 2.46s
    Create custom ESXi ISO ————————————————– 2.29s
    Create a management port group for the lab environment —————— 1.89s
    Result check for deployment ——————————————— 1.89s
    Create trunk port group for the lab environment ————————- 1.48s
    Create Datacenter ——————————————————- 1.34s
    Create a VMware vSwitch on the ESXi host for the lab environment ——– 1.21s
    Create JSON template file for VCSA with embeded PSC ——————— 1.11s
    Check if the VyOS router is already depoyed —————————– 1.04s
    Unmount vCenter ISO —————————————————– 0.77s
    Mount vCenter ISO ——————————————————- 0.71s
    Delete the temporary template file for VyOS router ———————- 0.64s
    Edit boot.cfg ———————————————————– 0.64s

    Like

  8. Host is declared with IP only. Ping Ansible VM to Physical host during installation – 100%.
    One more remark – vlan-301 for VyOS is not created if missing – It stopped me for more than hour 🙂

    Like

    1. Darek, that VLAN ID should correspond to a VLAN ID in your physical network environment. It’s the public interface of the VyOS router. It should actually be on the same network as your Ansible control node. The same goes for “router_public_ip”.
      Change these so they match your environment.

      Like

  9. Rutger, Thank you for your replay. I have checked several times. My basic configuration seems to be OK. Today I have implemented updated script. I stopped at:
    TASK [Deploy ESXi VMs]
    changed: [localhost] => (item={‘key’: ‘esxi01’, ‘value’: {‘ip’: ‘172.16.1.11’, ‘mask’: ‘255.255.255.0’, ‘gw’: ‘172.16.1.1’, ‘fqdn’: ‘esxi01.lab.local’, ‘vmname’: ‘nested-esxi01’, ‘cluster’: ‘Compute’, ‘vlan’: ‘1611’, ‘vmotion_ip’: ‘172.16.12.11’, ‘vmotion_mask’: ‘255.255.255.0’, ‘vsan_ip’: ‘172.16.13.11’, ‘vsan_mask’: ‘255.255.255.0’, ‘username’: ‘root’, ‘password’: ‘VMware1!’, ‘cpu’: ‘8’, ‘ram’: ‘65536’, ‘boot_disk_size’: ‘8’, ‘vsan_cache_size’: ’90’, ‘vsan_capacity_size’: ‘180’}})
    TASK [Wait 3 seconds before we start checking whether the ESXi hosts are ready]
    TASK [Result check for deployment]
    failed: [localhost] (item={‘started’: 1, ‘finished’: 0, ‘ansible_job_id………………results_file”: “/root/.ansible_async/322638819862.21606”, “started”: 1}, “msg”: “Unable to find host \”192.168.1.50\””}

    At this moment I have full communication (pingi n both directions) between “external” network (192.168.1.0/16, VyOS, Ansible-VM) and nested VMs (172.16.1.0/24 – DC, vcenter). vcenter has no networks defined or datastores at this stage of the installation. Is it correct ?

    Any advices are very welcomed 🙂

    Thank you in advance.

    Like

  10. Rutger,
    Unfortunatelly no.
    After TASKS – [Create Clusters] Enable DRS, Mount ESXi ISO, Copy ISO contents, Mount ESXi ISO, Unmount ESXi ISO, Edit boot.cfg, insert customks.tgz in boot.cfg modules section, copy customks.tgz, Create custom ESXi ISO, Upload the ESXi ISO to the datastore , Delete temporary directory,
    and finally….

    Deploy ESXi VMs (output from 5 VMs with customized parameters, AD is working) – changed: [localhost] => (item={‘key’: ‘esxi01’, ‘value’: {‘ip’: ‘172.16.1.11’, ‘mask’: ‘255.255.255.0’, ‘gw’: ‘172.16.1.1’, ‘fqdn’: ‘esxi01.lab.local’, ‘vmname’: ‘nested-esxi01’, ‘cluster’: ‘Compute’, ‘vlan’: ‘1611’, ‘vmotion_ip’: ‘172.16.12.11’, ‘vmotion_mask’: ‘255.255.255.0’, ‘vsan_ip’: ‘172.16.13.11’, ‘vsan_mask’: ‘255.255.255.0’, ‘username’: ‘root’, ‘password’: ‘VMware1!’, ‘cpu’: ‘8’, ‘ram’: ‘65536’, ‘boot_disk_size’: ‘8’, ‘vsan_cache_size’: ’90’, ‘vsan_capacity_size’: ‘180’}})

    then
    TASK [Wait 3 seconds before we start checking whether the ESXi hosts are ready]
    TASK [Result check for deployment]
    failed: [localhost] (item={‘started’: 1, ‘finished’: 0, ‘ansible_job_id’: ‘768402499946.6246’, ‘results_file’: ‘/root………
    “msg”: “Unable to find host \”192.168.1.50\””} – for all 5 hosts.
    I have performed the script with IPs incuded in the script – this same issue.

    All VMs are reachable – (ping in both directions) between “external” network (192.168.1.0/16, VyOS, Ansible-VM) and nested VMs (172.16.1.0/24 – DC, vcenter)

    Thank you.

    Like

    1. Darek,

      We’ve been debugging this and it turns host the {{ PhysicalESX.host }} MUST resolve to a DNS name. You can not use an IP address here.

      Can you verify that you are using a FQDN as the value for {{ PhysicalESX.host }} in answerfile.yml?

      Thank you

      Like

  11. Hello Rutger,

    I’m trying to implement 1.2.8 version. vCenter 7 + NSX-T 3

    After edge nodes implementation (up and working on vSAN) I’m stack here….

    ….{‘httpStatus’: ‘BAD_REQUEST’, ‘error_code’: 9543, ‘module_name’: ‘NsxSwitching service’, ‘error_message’: ‘Cannot create HostSwitch n-vds01 without HostSwitchMode and TransportZoneEndpoints.

    I noticed that esxi-tnp is configured with vds not n-vds01

    Any advices are very welcomed 🙂

    Thank you in advance.

    Like

  12. Hello Rutger,
    If this script was not fully tested by other users, so I can add another remark.
    Claiming disks for vSAN storage doesn’t work correctly for both clusters – for me. I have to manually claim (unused SSD) disks for vSAN and start script from task – edge node implementation.
    I can’t exclude my hardware issue – of course.

    Best regards

    Like

    1. I can’t reproduce that issue in my environment. The issue regarding the edge n-vds is fixed now. I just pushed the fix to master tagged 1.2.83. Please pull/clone and try again.

      Cheers!

      Like

  13. Did anyone manage to optimise resources and deploy a minimal setup to explore and play with NSX-T 3.0 using provided Ansible playbooks ? Basically is it possible to install it on a single host that has only 32 or 64GB of memory ? I currently run NSX-T 2.4 on a single ESXi host that has 32G of ram. I followed the instructions on virten

    https://www.virten.net/2016/05/deploy-vmware-nsx-in-homelabs-with-limited-resources/

    https://blogs.vmware.com/services-education-insights/2019/02/why-everyone-needs-an-nsx-nested-lab-sandbox.html

    If the playbooks support a minimal setup , which files need to be modified? I guess in addition to each ESXI host resource playbooks for NSX-T components should be modified as well .

    Like

  14. I have a double nested environment, I am running a Dell T5600 with 128gb mem, 5 terabytes of SSD storage using FreeNAS to create 3 1tb Datastores with 24 processors.
    On this I have Windows 7 and then VMware Workstation on that. I’m using the internal Virtual Network Adapter for my networking and isolating my vDS and DPG’s (static vmk IP’s).
    On VMware Workstation I have 3 ESXi 6.7 Host VM’s and a FreeNAS VM.
    the ESXi’s VM’s are Compute, Network and Edge
    On the Compute VM (192.168.101.101) I have a VCSA 6.7 appliance (192.168.101.125)
    On the Network Host (192.168.101.100) I have NSX Manager (192.168.101.126) and NSX Edge-INT (192.168.101.127)
    On the Edge Host (192.168.101.100) I have Edge-EXT (192.168.101.127)
    My question is how can I use your process to create your environment (I can of course change the VMware Workstation VMnet adapters to reflect your IP schema). I have the VMware ISO’s for 6.7/7.0 and will modify your scripts to use just the 3 hosts (not the 5 your setup).

    Like

    1. The short answer is you can’t. This script comes with its own specific requirements and won’t work in an environment like yours without making substantial changes to the code.

      Like

  15. Hello,greate for you work.
    i test prepareISO playbook, this end successufuly but when test the iso, esxi setup not start when select UEFI boot mode.
    The normal boot work successufuly and esxi install without problem

    Like

  16. Pingback: SDDC.Lab v2 |

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.