Building the Foundation for a VCF Automation All Apps Landing Zone with Terraform

In a previous post I looked at guardrails in VCF Automation 9.1. This post is related, but the angle is different.

Instead of describing the boundaries conceptually, I want to look at how the first layers of a VCF Automation All Apps landing zone can start to take shape using Terraform.

I say first layers deliberately. I am not trying to claim that the entire All Apps consumption model can be represented as one clean Terraform module today. Some parts of the model may still be better handled through the UI, API, Kubernetes provider or other automation.

The foundation I am looking at here includes the organization, identity provider configuration, initial access, regional quota, organization networking, regional networking and content sources. Together, these are the pieces that start to give a tenant a boundary, capacity, network attachment and something useful to consume.

The rest of the landing zone builds on top of that.

What are we building?

The goal is not to create the most complete production-grade implementation. The goal is to show the sequence of objects that turn an empty All Apps organization into something that starts to look like a consumable landing zone.

At a high level, the flow looks like this:

Create the organization
→ Add an identity provider
→ Add initial organization access
→ Assign regional quota
→ Enable organization networking
→ Configure regional networking
→ Add optional shared subnet
→ Create content library
→ Create a namespace

This is not meant to describe the full platform build order. In a real environment, many provider-side constructs already exist before a tenant landing zone is created. The sequence here is simply the order I use to explain the tenant-facing foundation: start with the organization, attach identity and access, assign regional capacity, connect the organization to existing network foundation, and then move toward content and namespace consumption.

I have put the lab files used for this article in a GitHub repository. The examples are not meant to be a production module, but they show the exact resources I used while testing the landing zone flow.

Provider configuration

The first part is the Terraform provider configuration.

In my lab examples I normally use simple variable-based provider configuration. This keeps credentials and endpoints outside the actual resource definitions.

versions.tf
terraform {
required_providers {
vcfa = {
source = "vmware/vcfa"
version = "~> 1.1.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.38"
}
}
}

I use two vcfa provider configurations in the lab. The default provider uses the Provider Management context, mapped to the System organization. This is used for provider-side resources such as organizations, regional quota and organization networking.

provider.tf
provider "vcfa" {
url = var.vcfa_url
org = "System"
auth_type = "integrated"
user = var.vcfa_user
password = var.vcfa_password
allow_unverified_ssl = var.allow_unverified_ssl
}

The second provider configuration uses the tenant organization context. I use this later when the example moves into the tenant consumption path, specifically when retrieving the kubeconfig used for Supervisor Namespace creation.

provider.tf
provider "vcfa" {
alias = "tenant_blue"
url = var.vcfa_url
org = var.org_name
auth_type = "integrated"
user = "tenant-admin"
password = var.tenant_admin_password
allow_unverified_ssl = var.allow_unverified_ssl
}

Creating the All Apps organization

The organization is the first concrete object in the landing zone foundation.

It provides the tenant boundary and gives the consumer its own organization context and portal. Other parts of the landing zone, such as identity configuration, networking scope, projects and content visibility, are then attached to this organization.

main.tf
resource "vcfa_org" "lz" {
name = var.org_name
display_name = var.org_display_name
description = "Terraform lab organization for VCF Automation All Apps landing zone validation"
is_enabled = true
}

Adding an identity provider

An organization without an identity provider is not very useful, so the next step is to connect it to one.

In this example I use OIDC. The exact identity provider is not the important part. The important part is that the IdP configuration becomes part of the landing zone definition instead of something configured manually after the fact.

This assumes that the OIDC endpoint is reachable from VCF Automation and that the certificate chain presented by the identity provider is already trusted by VCF Automation.

oidc.tf
resource "vcfa_org_oidc" "lz" {
org_id = vcfa_org.lz.id
enabled = var.oidc_enabled
prefer_id_token = false
client_id = var.oidc_client_id
client_secret = var.oidc_client_secret
max_clock_skew_seconds = 60
wellknown_endpoint = var.oidc_wellknown_endpoint
claims_mapping {
email = "email"
subject = "sub"
first_name = "given_name"
last_name = "family_name"
full_name = "name"
groups = "groups"
}
}

This only configures the organization’s OIDC IdP settings. It does not, by itself, give anyone access to the organization.

Adding initial organization access

Configuring an identity provider only solves the authentication part. The organization still needs an access assignment before anyone can actually use it.

I did not find a first-class resource in the current VCFA provider for assigning IdP users or groups to organization roles. For this lab, I therefore create an initial local organization administrator as a bootstrap access path.

org-access.tf
data "vcfa_role" "org_admin" {
org_id = vcfa_org.lz.id
name = "Organization Administrator"
}
resource "vcfa_org_local_user" "tenant_admin" {
org_id = vcfa_org.lz.id
username = "tenant-admin"
password = var.tenant_admin_password
role_ids = [
data.vcfa_role.org_admin.id
]
}

This is not the access model I would normally use in production. There I would prefer IdP groups and role assignments based on those groups. For the purpose of this lab, the local user simply gives the new organization an initial administrator without relying on the provider administrator account.

Assigning regional quota

Once the organization exists and identity is configured, it needs access to capacity.

This is where region quota becomes important. The region connects the organization to the underlying VCF capacity, while the quota defines how much of that capacity the organization is allowed to consume.

data.tf
data "vcfa_vcenter" "vc" {
name = var.vcenter_name
}
data "vcfa_supervisor" "supervisor" {
name = var.supervisor_name
vcenter_id = data.vcfa_vcenter.vc.id
}
data "vcfa_region" "region" {
name = var.region_name
}
data "vcfa_region_zone" "zone" {
region_id = data.vcfa_region.region.id
name = var.region_zone_name
}
data "vcfa_region_vm_class" "small" {
region_id = data.vcfa_region.region.id
name = "best-effort-small"
}
data "vcfa_region_storage_policy" "vsan" {
region_id = data.vcfa_region.region.id
name = var.storage_policy_name
}
main.tf
resource "vcfa_org_region_quota" "lz" {
org_id = vcfa_org.lz.id
region_id = data.vcfa_region.region.id
supervisor_ids = [data.vcfa_supervisor.supervisor.id]
zone_resource_allocations {
region_zone_id = data.vcfa_region_zone.zone.id
cpu_limit_mhz = 20000
cpu_reservation_mhz = 0
memory_limit_mib = 65536
memory_reservation_mib = 0
}
region_vm_class_ids = [
data.vcfa_region_vm_class.vm_class.id
]
region_storage_policy {
region_storage_policy_id = data.vcfa_region_storage_policy.storage_policy.id
storage_limit_mib = 524288
}
}

This is where the landing zone starts to become bounded.

The organization can consume the region, but only within the limits defined here. That is an important distinction. Self-service without quota is just delegated risk. Self-service with quota becomes a controlled consumption model.

Enabling organization networking

Before regional networking can be configured through the Terraform provider, organization networking has to exist for the organization.

The VCFA provider exposes this as a separate vcfa_org_networking resource:

main.tf
resource "vcfa_org_networking" "lz" {
org_id = vcfa_org.lz.id
log_name = "tnblue"
}

The log_name is a provider resource argument rather than something I would treat as a major landing zone design decision. In the VCF Automation 9.1 UI, this part of the model is mostly hidden behind the organization networking and external connection workflow. In Terraform, however, it is explicit and must exist before the regional networking resource can be configured.

Configuring regional networking

After organization networking is enabled, the next logical step is to connect the organization to regional networking.

In VCF Automation 9.1, the UI model is based on external connections. I did not find a first-class Terraform resource for creating external connections in the current VCFA provider. Instead, the provider still exposes the regional networking relationship through provider_gateway_id, which appears to map to the underlying centralized connectivity object rather than the 9.1 UI terminology.

That makes this part slightly awkward. The VCF Automation UI talks in terms of external connections, while the Terraform provider still wants a provider gateway reference.

For this example, I assume that the external connection already exists in VCF Automation. In other words, I am not creating the provider network foundation itself here. I am only showing how the organization is connected to an existing provider-side network construct through the current Terraform provider model.

The Terraform configuration looks like this:

data.tf
data "vcfa_provider_gateway" "default" {
name = var.provider_gateway_name
region_id = data.vcfa_region.region.id
}
data "vcfa_edge_cluster" "default" {
name = var.edge_cluster_name
region_id = data.vcfa_region.region.id
}
main.tf
resource "vcfa_org_regional_networking" "lz" {
name = "${var.org_name}${var.region_name}"
org_id = vcfa_org_networking.lz.id
region_id = data.vcfa_region.region.id
provider_gateway_id = data.vcfa_provider_gateway.provider_gateway.id
edge_cluster_id = data.vcfa_edge_cluster.edge_cluster.id
}

In my lab, the organization, organization networking and regional quota were created successfully. The regional networking resource was the first part of the sequence that did not apply cleanly.

The referenced provider gateway is backed by an Active/Active Tier-0, and Terraform failed with the following error:

Unable to create Regional Networking Setting because the Provider Gateway eu-north-1-t0 backing Tier 0 has an unsupported HA mode, ACTIVE-ACTIVE.

The same regional networking configuration could be created through the VCF Automation 9.1 UI. So the practical result was:

Terraform create: failed
VCF Automation UI create: worked

After creating the regional networking setting in the UI, I wanted to see whether Terraform could at least import and read the object. The generated regional networking setting name in my lab was:

tenant-blueeu-north-1

I was then able to import the UI-created regional networking setting into Terraform state:

terraform import vcfa_org_regional_networking.tenant_blue \
'tenant-blue.tenant-blueeu-north-1'

That import worked, and with the resource name aligned to the UI-generated name, Terraform could read the imported object cleanly.

VCF 9.1 supports Active/Active Tier-0 for this design, and the UI can create the configuration. The Terraform provider can also import and read the object after it exists. The part that failed in my lab was creating the regional networking setting directly through Terraform when the provider gateway was backed by an Active/Active Tier-0. I opened a GitHub issue for this against the VCFA Terraform provider so the behaviour can be tracked separately.

Optional: shared subnet

A shared subnet is a way to expose a VLAN-backed network to an organization. This can be useful when a landing zone needs access to an existing network segment, for example for legacy services, migration scenarios or shared infrastructure dependencies.

A shared subnet can be created through the Terraform provider, but creating the subnet is only part of the workflow.

In the VCF Automation UI, this is a two-step workflow. The provider first creates the shared subnet, and then explicitly shares or assigns it to an organization.

In my lab, the vcfa_shared_subnet resource handled the first part. It created the shared subnet successfully and the object reached REALIZED state. I did not find an organization assignment argument on the vcfa_shared_subnet resource, so I treat the assignment step as outside this Terraform example for now.

shared-subnet.tf
resource "vcfa_shared_subnet" "legacy_services" {
name = "legacy-services-vlan-123"
description = "Shared VLAN subnet for legacy service connectivity"
region_id = data.vcfa_region.region.id
subnet_type = "VLAN"
gateway_cidr = "10.123.0.1/24"
vlan_id = 123
}

I did not find an organization assignment argument on the vcfa_shared_subnet resource. Because of that, I treat this as a provider-side subnet object in this example, not as a fully automated tenant-consumable landing-zone step.

Adding a content library

A landing zone also needs something to consume.

The content library is one way to make approved content available to the organization. The storage class must be available to the organization through the regional quota, so I make the content library depend on the quota assignment.

content-library.tf
data "vcfa_storage_class" "content_storage" {
region_id = data.vcfa_region.region.id
name = var.storage_policy_name
}
resource "vcfa_content_library" "tenant_blue" {
org_id = vcfa_org.lz.id
name = "${var.org_name}-library"
description = "${var.org_display_name} content library"
auto_attach = false
delete_recursive = true
storage_class_ids = [
data.vcfa_storage_class.content_storage.id
]
depends_on = [
vcfa_org_region_quota.lz
]
}

This worked cleanly in my lab. At this point the organization has an associated content library, but that does not mean there is a complete service catalog yet. A content library can provide images or content sources that later become part of the consumption model, while catalog items, blueprints and published services are a separate layer.

Creating a namespace

At this point the landing zone starts to move from provider-side foundation to consumer-facing workload scope.

A namespace gives an application team a bounded place to consume Supervisor-backed capabilities. Depending on how the platform is designed, that can include Kubernetes workloads, VKS clusters, VM-based workloads, storage classes, VM classes, content sources and network connectivity.

This part uses the tenant-scoped provider configuration introduced earlier. The vcfa_kubeconfig data source retrieves the kubeconfig details, and the Kubernetes provider uses those details when creating the Supervisor Namespace.

namespace.tf
data "vcfa_kubeconfig" "org" {
provider = vcfa.tenant_blue
}
provider "kubernetes" {
host = data.vcfa_kubeconfig.org.host
token = data.vcfa_kubeconfig.org.token
insecure = data.vcfa_kubeconfig.org.insecure_skip_tls_verify
}

The namespace resource also assumes that the target project and VPC already exist. I did not find first-class project or VPC resources in the current Terraform provider. In my lab, I used the default project and the default regional VPC that existed in the organization.

variables.tf
variable "project_name" {
type = string
description = "VCF Automation project used for the Supervisor Namespace"
default = "default-project"
}
variable "vpc_name" {
type = string
description = "VCF Automation VPC used for the Supervisor Namespace"
default = null
}
locals {
namespace_vpc_name = coalesce(var.vpc_name, "default-${var.region_name}")
}

With those prerequisites in place, the namespace resource looks like this:

namespace.tf
resource "vcfa_supervisor_namespace" "payments_dev" {
provider = vcfa.tenant_blue
name_prefix = "payments-dev"
project_name = var.project_name
class_name = "small"
description = "Payments development namespace"
region_name = data.vcfa_region.region.name
vpc_name = local.namespace_vpc_name
storage_classes_class_config_overrides {
name = var.storage_policy_name
limit = "100Gi"
}
vm_classes_class_config_overrides {
name = "best-effort-small"
}
zones_class_config_overrides {
name = var.region_zone_name
cpu_limit = "4000M"
cpu_reservation = "0M"
memory_limit = "8192Mi"
memory_reservation = "0Mi"
}
}

In the VCF Automation UI, the namespace normally inherits its resource settings from the selected Namespace Class unless they are overridden. In my lab, the Terraform resource still required explicit storage and zone configuration values before it would plan successfully, so I include them here.

What this gives us

If we look at the Terraform resources together, the landing zone starts to take shape.

vcfa_org
→ tenant boundary
vcfa_org_oidc
→ identity provider configuration
vcfa_org_local_user
→ initial bootstrap access
vcfa_org_region_quota
→ capacity envelope
vcfa_org_networking
vcfa_org_regional_networking
→ organization network foundation
vcfa_shared_subnet
→ optional provider-side shared subnet object
vcfa_content_library
→ content source for later consumption
vcfa_supervisor_namespace
→ consumer-facing workload scope

That is not the entire operating model, but it is a useful foundation.

Useful because it shows where Terraform helps and where it stops. Some parts of the landing zone foundation can be expressed quite cleanly today. Other parts still depend on existing platform objects, UI workflows, imports or other automation.

What is still outside this example?

This example is incomplete. It does not cover the full project model, VPC creation, Transit Gateway configuration, firewall policy delegation, catalog item publishing, approvals, lease policies, naming standards, DNS automation, backup integration or CMDB updates. Those are all important parts of a real platform, but they are not all part of the same Terraform workflow today.

Some of these areas may already be possible through other APIs or automation methods. Some may become better covered by the Terraform provider over time. Others probably belong in a different part of the operating model altogether. I do not think that is a problem. A landing zone is not necessarily one module, one provider or one pipeline.

The good thing here is that a significant part of the foundation can still be expressed declaratively. That makes the model more repeatable, but it also makes it easier to review. The design becomes explicit. Which organization exists? Which region is assigned? What is the quota? Which network foundation is the organization connected to? Which content sources are available? Which namespace configuration is used?

These are the same things I would want to be clear about in the design anyway. Terraform just makes the answers explicit.

Final thought

I still like the landing zone term for VCF Automation All Apps, as long as we do not treat it as a single product feature. To me, a landing zone is the assembled consumption environment: organizations, identity providers, access, quota, networking, projects, VPCs, namespaces, content sources and policies working together.

Testing this with Terraform made the current boundary pretty clear. The provider can describe parts of the foundation quite well, but it is not yet a complete representation of the VCF Automation 9.1 All Apps model. Some constructs are missing, some still use older terminology, and some workflows still need the UI, import, API calls or another automation path.

That is fine, as long as we are honest about it. Terraform can still help make parts of the landing zone repeatable, versioned and testable. It is not the whole answer today, but it is part of the answer.

Posted in , ,

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.