Everyone knows how crucial it is to stay on top of the rapidly changing and expanding AI and hybrid-cloud security landscapes. But to do so, organizations must address shortcomings in their traditional provisioning processes, including:
- Slow, error-prone, manual workflows and ticketing systems
- A lack of built-in security controls or secure templates for infrastructure code reuse
- Inconsistent or non-existent policy enforcement processes
- No system to detect non-compliant or drifted infrastructure
- Insufficient auditing and observability
- Ad-hoc infrastructure decommissioning
To successfully tackle these issues, teams must first think about how infrastructure and applications interact with three core audiences with very ...
Everyone knows how crucial it is to stay on top of the rapidly changing and expanding AI and hybrid-cloud security landscapes. But to do so, organizations must address shortcomings in their traditional provisioning processes, including:
- Slow, error-prone, manual workflows and ticketing systems
- A lack of built-in security controls or secure templates for infrastructure code reuse
- Inconsistent or non-existent policy enforcement processes
- No system to detect non-compliant or drifted infrastructure
- Insufficient auditing and observability
- Ad-hoc infrastructure decommissioning
To successfully tackle these issues, teams must first think about how infrastructure and applications interact with three core audiences with very different priorities:
- Developers and application teams (consumers) who consume the infrastructure to deploy and manage their applications. Their priority is to work with infrastructure in an easily consumable way that makes the deployment process easier.
- Operators or platform teams (providers) who provide the infrastructure in a self-service way for their end developers. These teams solve problems such as making the provisioning process repeatable, building policy guardrails, and removing blockers for downstream developers.
- Security and compliance teams (approvers) who are responsible for ensuring that all infrastructure deployed meets the security requirements of the organization.
As organizations progress in their cloud journey, it can quickly become challenging to maintain a balance between the needs and wants of these three groups of stakeholders. How can teams preserve productivity for developers while ensuring best practices set by security, compliance, and finance are met across the entire infrastructure estate?
The answer is infrastructure as code (IaC). Tools like HashiCorp Terraform codify infrastructure to make it versionable, scannable, and reusable, ensuring security and compliance are always at the forefront of the provisioning processes. This post offers six fundamental practices for your Terraform workflow that can help ensure secure infrastructure from the first provision to months and years in the future.
1. Bridging the provisioning skills gap
Organizations consistently rank skills gaps as the most common barrier to multi-cloud adoption. Platform teams must design workflows with the needs of junior developers in mind. Not only does this help devs get up to speed sooner, but it also protects the organization's infrastructure from security and compliance issues caused by inexperience or a lack of standard processes.
Simplified workflows
The first way to leverage Terraform is to build your infrastructure using HashiCorp Configuration Language (HCL). The gentle learning curve for HCL is a big reason for Terraformâs popularity in the IaC world. Its simple syntax lets you describe the desired state of infrastructure resources using a declarative approach, defining an intended end-state rather than the individual steps to reach that goal.
Terraform provides unified provisioning for multi-cloud to reduce many workflows into a single golden provisioning workflow for any type of infrastructure. This allows operators and developers to limit their focus to one segment of the workflow, and reduces misconfigurations from lack of expertise. For instance, once an operator builds, validates, and approves a module, a developer can utilize Terraformâs no-code provisioning to provision infrastructure from this module without writing a single line of HCL.
While Terraform provides a single workflow for all infrastructure, we understand that not all infrastructure resources are provisioned through Terraform today. Config-driven import provides an automated and secure way to plan multiple imports with existing workflows (VCS, UI, and CLI) within HCP Terraform. Developers can import during the Terraform plan and apply stages without needing to access the state file or credentials, enabling a self-serve workflow while keeping resources secure.
Terraform can also connect directly to a version control system (VCS) to add additional features and improve workflows. For example, HCP Terraform can automatically initiate runs when changes are committed to a specific branch or simplify code review by predicting how pull requests will affect infrastructure. Terraform runs can also be directly managed by HCP Terraform via remote operations. These runs can be initiated by webhooks from your VCS provider, by UI controls within HCP Terraform, by API calls, or through the Terraform CLI.
For organizations with strict security controls, ensuring that your VCS provider is not accessible over the public internet is critical. HCP Terraform offers private VCS access, ensuring that private VCS repositories can be securely accessed without exposing sensitive data to the public internet.
Robust authentication methods
Credential management plays a key role in ensuring a secure provisioning workflow with Terraform. The days when static passwords and IP-based security were a viable security strategy are long gone. Organizations must adapt their authentication workflows to support a multi-cloud environment. Integrating a proven secrets management solution with automated secrets generation and rotation is a good start. HashiCorp Vault is a popular choice that integrates well with Terraform.
Users can also leverage single sign-on (SSO) and role-based access control (RBAC) to govern access to their Terraform projects, workspaces, and managed resources. These workflows help centralize the management of HCP Terraform users and other Software-as-a-Service (SaaS) vendors with supported providers including Okta, Microsoft Azure AD, and SAML.
Platform teams also need secure authentication to the providers Terraform interacts with, which can be achieved by implementing just-in-time (JIT) access. Terraform can help with its native dynamic provider credentials, which provide short-lived, JIT access to official cloud providers through the industry standard OpenID Connect (OIDC) protocol. These credentials are unique to each Terraform workload and can be generated on demand for Amazon Web Services (AWS), Microsoft Azure, Google Cloud, and the Vault provider, reducing the risk of potential exposure and reuse.
Terraform also offers Vault-backed dynamic credentials, a feature that combines dynamic provider credentials with Vault secrets engines to offer a consolidated workflow. This approach authenticates Terraform runs to Vault using workload identity tokens generated by HCP Terraform, then uses Vault secrets engines to generate dynamic credentials for the AWS, Azure, and Google Cloud providers.
For those looking for additional control, Terraform also offers Hold Your Own Key (HYOK), a security principle that gives organizations ownership of the encryption keys used to access their sensitive data. These authentication methods provide a significant enhancement for users already using Vault for on-demand cloud access and for any organization seeking to reduce the risks of managing its credentials.
2. Building and reusing secure modules
Writing infrastructure configurations from scratch can be time-consuming, error-prone, and difficult to scale in a multi-cloud environment. To alleviate this, Terraform provides the ability to codify infrastructure in reusable âmodulesâ that contain your organization's security requirements and best practices.
Early in cloud migrations or Terraform adoption, operations teams often deploy Terraform in separate silos across the organization. This isolation can create issues such as duplication of code, non-secure or non-compliant configurations, and inconsistent processes for module creation and consumption. For many organizations, the solution is to set up a central internal Terraform module registry. This lets operators enable developers while limiting security risks by creating golden configurations that have been reviewed, tested, and validated. New Terraform users can also use the public registry to reference and deploy common infrastructure configurations. This registry contains more than 14,000 pre-written modules, providing foundational resources for a wide variety of provisioning efforts.
In addition to this, Terraformâs test framework helps teams produce secure, higher-quality modules. Once enabled on a module, test runs will execute automatically based on version control events such as pull requests and merges, and can be initiated from the CLI or API. Just like workspace runs, tests execute remotely in a secure environment, eliminating the need for developers to handle sensitive cloud credentials on their workstations. With integrated tests and more direct control over publishing, platform teams can be confident that new module versions are well-tested before making them available to downstream users. For a walkthrough of the test framework, read Testing in HashiCorp Terraform.
Artifact governance
Many organizations notice that their image creation and management processes face similar issues surrounding manual, siloed workflows. These inconsistencies pose security risks as images lay the foundation for modern infrastructure in hybrid-cloud environments. HashiCorp Packer helps users mitigate these risks by codifying security and compliance requirements directly into their golden images. Similar to Terraform, image versions can be reviewed, approved, and published to the HCP Packer artifact registry for reference and consumption.
3. Creating policy guardrails
Rapid provisioning opens up tremendous possibilities, but organizations need to maintain compliant infrastructure and prevent over-provisioning. In the past, these security, compliance, and cost policies required manual validation and enforcement. This process was error-prone and challenging to scale, resulting in bottlenecks in provisioning workflows.
Similar to HashiCorpâs approach to provisioning with IaC, policy as code can be used to reduce manual errors, enable scaling, and avoid bottlenecks. HashiCorpâs policy as code framework, Sentinel, helps you to write custom policies automatically enforced in the provisioning workflow. Terraform also natively supports the open source policy engine Open Policy Agent (OPA), allowing users to migrate their existing Rego-based policies.
Users getting started can take inspiration from pre-written policy sets by trusted experts in the policy libraries section of the official Terraform Registry. For those running infrastructure in AWS, we have developed 500+ policy sets across various industry standards including CIS, NIST, and FSBP. Reusing these pre-written policies streamlines your provisioning workflows and reduces the chance of misconfiguration. Users can also leverage the more than 20 run task partners to directly integrate third-party tools and context into their Terraform workflows such as code scanning, cost control, and regulatory compliance. For example, with the HCP providerâs Packer data source, you can easily reference HCP Packer to pull in the most recent version of an image.
For the enterprises looking to leverage policy and run tasks in private environments, HCP Terraform offers private run tasks that facilitate the execution of tasks integrated from private or self-managed services to allow automated interactions with internal systems without exposing them to the public internet. Similarly, private policy enforcement gives organizations the ability to enforce policies within private cloud environments. This maintains data confidentiality by keeping policy-related interactions within private infrastructure.
4. Enforcing guardrails at the time of provisioning
Terraform enables users to move security and compliance efforts upstream by enforcing guardrails during the provisioning process and automatically validating them against the code. For example, policies might validate that an end user is consuming published modules rather than creating custom code, the infrastructure is tagged for visibility, the data storage location adheres to GDPR, or that storage buckets are not accessible by externally facing IP addresses.
This automatic policy integration into your provisioning workflows can be customized with different enforcement levels:
- Advisory, to warn users when a policy breaks
- Soft mandatory, where users provisioning need to override the policy to break it
- Hard mandatory, where users provisioning are not allowed to break the policy
Terraform users can also use preconditions and postconditions â granular tests that validate your configuration before and after the Terraform apply phase. These allow practitioners and module authors to enforce custom conditions, such as checking for correctness during the plan or apply phase, or creating a precise software contract for modules. Defining conditions specific to your organization helps capture assumptions for future maintainers and returns useful information on errors to help consumers more easily diagnose and troubleshoot configuration issues.
As organizations scale their multi-cloud infrastructure, they often see an accumulation of resources that are no longer relevant or in use, particularly in testing and development environments. These unused or forgotten resources may be outdated and contain vulnerabilities that pose security risks if not managed properly. With Terraformâs ephemeral workspaces, youâll be able to define time-to-live policies and automate the cleanup of resources. Once your defined date is reached, Terraform will automatically queue and apply a destroy plan on the workspace, helping to mitigate the risk of outdated resources accumulating in your infrastructure.
5. Continuously enforcing guardrails
Lots of operational attention is focused on building and deploying infrastructure, but the biggest risks come after deployment during ongoing maintenance. As organizations grow in size and complexity, it gets increasingly difficult to maintain consistent infrastructure. Even with a secure initial provisioning process, settings on infrastructure can still be undone or circumvented. This can open your infrastructure up to the possibility of configuration drift. To minimize outages, unnecessary costs, and emergent security holes, teams should have a system in place to monitor this drift. Organizations can try to build this into their processes, or they can use Terraformâs native drift detection and health assessments. These continuous checks help you detect and respond to unexpected changes in provisioned resources on Day 2 and beyond.
Terraformâs drift detection notifies you if your infrastructure has changed from its original state, so you can make sure security and compliance measures remain in place. With these infrastructure change alerts, you can quickly get to the root reason for that change, understand if it is necessary, and complete the change, or automatically remediate if not.
You can also schedule regular automated health checks using assertions defined in your Terraform code with HCP Terraformâs continuous validation. Users can monitor whether these conditions continue to pass after Terraform provisions the infrastructure. These checks give customers flexible options to validate their infrastructure uptime, health, and security â all in one place without requiring additional tools.
6. Observability and decommissioning
The final step to ensure a secure Day 2 is having general observability of your entire infrastructure estate and retiring infrastructure resources when no longer needed. In a Terraform environment, this means maintaining visibility into your workspaces with a clear audit trail and standardizing lifecycle management workflows.
Infrastructure visibility
A key part of infrastructure visibility is understanding every change and action taken across your environments, starting with audit logs. Audit logs are exposed in HCP Terraform via the audit trails API, and Terraform Enterprise provides log forwarding. These logs give you visibility into important events such as user logins, changes to organization and workspace settings, run logs, approvals, policy violations, and policy overrides.
Explorer for workspace visibility in HCP Terraform provides a consolidated view of workspace data across your organization, including information on providers, modules, Terraform versions, and health checks from drift detection and continuous validation. This consolidated view helps teams ensure their environments have the necessary up-to-date versions for Terraform, modules, and providers while tracking health checks to ensure security, reliability, and compliance.
The ability to find and drill into workspaces on your Terraform dashboard is key to speedy debugging and health checking. Terraformâs filtering and tagging capabilities enable users to quickly discover and access their workspaces. Workspaces also act as the system of record for provisioned infrastructure by maintaining secure storage and versioning of Terraform state files.
You can check workspace activity, gaining insight into its users and usage. This allows you to answer questions like:
- Which users are accessing which workspaces?
- What configurations are they changing, at what time, and from where?
- Who is accessing, modifying, and removing sensitive variables?
- Which users are changing or attempting to change your policy sets?
This organization-wide audit trail gives platform teams visibility into their entire infrastructure estate, helping them keep security and compliance top of mind.
Lifecycle management
So now youâve successfully provisioned your infrastructure and enforced your guardrails, what happens now? If you leave outdated resources deployed, they can pose a security risk to your organization.
One way to prevent this from happening in the first place is through effective module lifecycle management. Over time there is a need to deprecate modules and replace them with updated versions, which requires:
- Visibility into where the modules are being used
- A way to comment and then push a notification to end users
- A deprecation process
HCP Terraform provides all of this functionality, standardizing your approach to decommissioning workflows. Module lifecycle management features in HCP Terraform visualize clear insights into where modules are referenced, provide warnings that modules are going to be deprecated, and stop users from referencing outdated module versions.
Looking forward
While transitioning to hybrid cloud infrastructure can be difficult, it also presents an opportunity to get a fresh start in standardizing infrastructure workflows. While these six fundamentals were primarily focused on infrastructure security, these steps include fundamental practices that can help with speed, efficiency, cost savings, and reliability. Secure infrastructure automation enables innovation and provides a solid foundation for your hybrid-cloud estate, enabling success across other parts of your organization, such as networking and applications.
Terraform security is just one piece of your organizationâs overall cybersecurity strategy. To start understanding the broader framework of requirements for security in the AI era, we recommend reading this guide: The next generation of cloud security: Unified risk management, compliance, and zero trust.
And if youâre curious about Terraformâs role in your infrastructure as we move further into the AI era, read Building intelligent infrastructure automation with HashiCorp.
Get started with HCP Terraform for free to begin provisioning and managing your infrastructure in any environment.