IaC Configuration Management with User-Data
Configure the provisioned instance using User-Data Shell Scripts and Cloud-Init
User Data is used to perform common automated configuration tasks and run scripts after the instance starts. This data can be plain text or base64 encoded, the latter is required when we use the IBM Cloud API. The content of the User-Data is sent to the Cloud-Init service of the provisioned instance.
Cloud-Init is the de facto industry standard for early-stage initialization of virtual machines in the cloud. There are two common formats of user data: Shell Scripts and Cloud-Init directives.
Shell Scripts
This is the easiest way to send commands to an instance to execute once it has started. It’s very important the shell script starts with a shebang, the #!
characters followed by a path pointing to the interpreter to execute the script. This script is commonly Bash (#! /bin/bash
) but could be any scripting language like Python or Perl, as long as the interpreter is pre-installed in your instance. It’s also very common to use the env
command in the shebang pointing to the interpreter like this #!/usr/bin/env bash
.
The script is executed as root
user, so there is no need to use the sudo
command. This means the files created are owned by root
so make sure to assign the right owner and permissions once they are created. Avoid commands that require user input, make sure the scripts run non-interactively, using the appropriate command parameters (i.e. -y
) to make them non-interactive or use commands like yes
.
Notice that when Terraform or Schematics gets the instance up and running the user data script starts and Terraform continues with the provisioning of the other resources. This means the user data shell script is being executed in parallel to other Terraform provisioning and in some cases after Terraform completes. As needed depending on the script, allow some time after Terraform is done to verify the shell script is complete.
If something goes wrong or you’d like to see the status of the script execution check the log file at /var/log/cloud-init-output.log
. It may be useful to set the -x
option to debug bash scripts or to know what command is being executed, as the log file only shows the output of the commands by default. Set the -x
option in the shebang or enable/disable the debug option in part of the code using set
, like so:
#!/bin/bash -x# Or using set:set -x # enable debuggingecho "do something here"set +x # disable debugging
The execution of the shell script is performed by Cloud-Init. The content of the script is copied to a file located in /var/lib/cloud/instances/instance-id/
. After the execution the script is not deleted, so you can use it for debugging or future execution. However if it’s not needed, or if it has sensitive information, it’s recommended to delete it. For additional debugging options check the Cloud-Init section below.
The user data or Cloud-Init script is only executed during the provisioning of the instance. If the instance is rebooted the script is not executed again, however, this can be changed using Cloud-Init directives.
Here is one example of a user data script from the terraform compute resource example:
resource "ibm_is_instance" "iac_app_instance" {...user_data = <<-EOUD#!/bin/bashecho '${data.local_file.db.content_base64}' | base64 --decode > /var/lib/db.min.json# https://askubuntu.com/questions/1154892/prevent-question-restart-services-during-package-upgrades-without-askingecho '* libraries/restart-without-asking boolean true' | debconf-set-selections
Cloud-Init Directives
All the data passed in the user_data
parameter is interpreted by Cloud-Init, even if it’s a shell script as in the previous section. However, Cloud-Init can do more tasks than execute a script and it accepts multiple configuration settings.
Cloud-Init directives are also sent with the user_data
parameter in the same way that a script is provided, but the syntax is different. The data begins with #cloud-config
(this is not a shebang) and must be valid YAML syntax.
When Cloud-Init starts it runs a collection of modules, these modules are listed in the Cloud-Init documentation and they are configured using directives in YAML syntax. Some modules are executed by IBM Cloud such as the SSH module to setup the SSH Keys to allow you to login to the instances, and Mounts to mount the given volumes to the instance. The most common tasks that are done with Cloud-Init are achieved using modules.
The example used in the Getting Started with Terraform is provided as a bash script, the same code using Cloud-Init would be:
#cloud-configwrite_files:- content: |Hello Worldpath: /index.htmlruncmd:- nohup busybox httpd -f -p 8080 &
Cloud-Init directives must begin with #cloud-config
, then followed by the the module directives. In this example we use the Write File module to write the text Hello World
under the parameter content
into the file /index.html
specified by the parameter path
. Then we use the module Runcmd to execute a list of command(s) (in this example, a list has just one command).
Cloud-Init has five boot stages and for every stage there are modules executed in an specific order. The file /etc/cloud/cloud.cfg
can be modified during the image build phase to configure the boot stages, the modules to load and execution order. The sections to modify in the /etc/cloud/cloud.cfg
file to change the modules to load and execution order are:
cloud_init_modules
runs thedisk_setup
andmounts
modules during the Network stagecloud_config_modules
runs the config modules, including theruncmd
module, during the Config stagecloud_final_modules
runs scripts and modules for package installation, configuration management (i.e. puppet, chef) and user scripts. This happens during the Final stage.
The /etc/cloud/cloud.cfg
file also includes configuration data about which user will run Cloud-Init, data sources, and vendor data.
Cloud-Init Modules
Here are some of the most used Cloud-Init modules and examples. For the entire list of modules and details refer to the Cloud-Init Modules documentation.
Users and Groups defines new users with the key
users
and new groups with the keygroups
.users:- name: jsmithgroups: sudoshell: /bin/bashsudo: ['ALL=(ALL) NOPASSWD:ALL']ssh-authorized-keys:- ssh-rsa AAAA....- name: johnsmsudo: falseChange Passwords change or remove the password of an existing user, and enable/disable SSH password authentication. This module use the directive
ssh_pwauth
,chpasswd
andpassword
. Notice that usingRANDOM
orR
generate a random password visible in the/var/log/cloud-init-output.log
log file.password: defaultInsecurePasswdchpasswd:list: |johnsm:Sup3rP@ssw0rdjsmith:Th3-B35t_P@ssW0rd-Ev3r!httpuser:RANDOMexpire: FalseWrite Files creates or appends the given content a file at the given path. The file can be encoded (base64, gzip, or both) with optional permissions and owner. All the parameters but
path
are optional.write_files:- path: /test.txtcontent: |Here is a line.Another line is here.- path: /var/lib/db.min.jsoncontent: {"movies": []}Update or Install Packages allows packages to be updated, upgraded or installed during boot using the keys
packages
,package_update
andpackage_upgrade
. If a package requires reboot use the directivepackage_reboot_if_required
. Use module Apt Configure to add source list and configureapt
if you are on Ubuntu or Debian based OS, for RedHat use the module Yum Add Repo.packages:- curl- nodejs- [libpython2.7, 2.7.3-0ubuntu3.1]package_update: TrueSSH Configuration used to manage SSH keys: assign them to users with
ssh_authorized_keys
and generate keys withssh_keys
. This module is always executed by IBM Cloud when an instance is created to assign the SSH Keys.ssh_keys:rsa_private: |-----BEGIN RSA PRIVATE KEY-----your_rsa_private_key-----END RSA PRIVATE KEY-----rsa_public: your_rsa_public_keydisable_root: Truessh_authorized_keys:Trusted CA Certificates to add CA certificates to
/etc/ca-certificates.conf
and update the SSL cert cache.ca-certs:remove-defaults: Truetrusted:- |-----BEGIN CERTIFICATE-----your_CA_cert-----END CERTIFICATE-----Configure DNS uses the directive
resolv_conf
to configure the DNS service fileresolv.conf
to use your own DNS server. MAke sure the directivemanage-resolv-conf
is set to True. If there is no DNS server but you need all the instancess know each other, you can use other Hashicorp tools such as Serf or Consul to create a cluster of instances, collect the IP address of each node and add them to/etc/hosts
.manage_resolv_conf: Trueresolv_conf:nameservers: ['8.8.4.4', '8.8.8.8']searchdomains:- foo.example.com- bar.example.comdomain: example.comoptions:rotate: TrueHostname & Etc Hosts are used to set the instance hostname, domain name (
fqdn
) and update them in the/etc/hosts
file. It allows you to use a template for the /etc/hosts file located in/etc/cloud/templates/hosts.tmpl
. It’s important note that ifmanage_etc_hosts
is set the/etc/hosts
file will be updated in every boot, so any change to this file has to be done in the template.hostname: app-nodefqdn: app-node.example.commanage_etc_hosts: trueRun Commands is very helpful when there is no module for a required task. Using the directive
runcmd
you can execute one or multiple commands. The string with the command to execute are passed to thesh
shell process to run. The output of all the commands will be logged to the file/var/log/cloud-init-output.log
.runcmd:- [ npm, install, -g, json-server]- json-server --watch /var/lib/db.min.json --port 8080 --host 0.0.0.0 &
Variables Interpolation & Templates
When a Shell Script or Cloud-Init Directive are used with User-Data in Terraform they can include Terraform variables (input or local variables) or data sources. For example:
resource "ibm_is_instance" "iac_app_instance" {...user_data = <<-EOUD#cloud-configpackages:- curl- python3-pippackage_update: Truewrite_files:
A best practice and very common way to use User-Data is to have a local variable to store the user data (shell script or cloud-init directives) and set it to the user_data
parameter. This variable can have the content hard-coded in the Terraform code or can be read from a file or template.
For example, placing the User-Data Shell Script in the scripts/init.sh
file:
scripts/init.sh#!/bin/bashecho "'${json_db_b64}'" | base64 --decode > /var/lib/db.min.json# https://askubuntu.com/questions/1154892/prevent-question-restart-services-during-package-upgrades-without-askingecho '* libraries/restart-without-asking boolean true' | debconf-set-selectionsapt updateapt install -y python3-pippip3 install json-server.py
Then we can reference and use the scripts/init.sh
like a template using the templatefile
function to set the user_data
parameter, like this.
resource "ibm_is_instance" "iac_app_instance" {...user_data = templatefile("${path.module}/scripts/init.sh", { json_db_b64 = data.local_file.db.content_base64, port = var.port })...}
If you are using Terraform 0.11 or lower you need to use the template_file
data source instead of the templatefile
function.