IaC Configuration Management with User-Data

Configure the provisioned instance using User-Data Shell Scripts and Cloud-Init

Shell Scripts
Cloud-Init Directives
Cloud-Init Modules
Variables Interpolation & Templates

User Data is used to perform common automated configuration tasks and run scripts after the instance starts. This data can be plain text or base64 encoded, the latter is required when we use the IBM Cloud API. The content of the User-Data is sent to the Cloud-Init service of the provisioned instance.

Cloud-Init is the de facto industry standard for early-stage initialization of virtual machines in the cloud. There are two common formats of user data: Shell Scripts and Cloud-Init directives.

What other formats are supported?

Shell scripts and Cloud-Init directive are the most common, however, they are not the only user-data formats supported. Cloud-Init also supports Include Files, Upstart Jobs, Cloud Boothooks and Part Handler.

You can also disable User-Data using the directive allow_userdata: false to avoid accepting future deployed user-data.

Shell Scripts

This is the easiest way to send commands to an instance to execute once it has started. It’s very important the shell script starts with a shebang, the #! characters followed by a path pointing to the interpreter to execute the script. This script is commonly Bash (#! /bin/bash) but could be any scripting language like Python or Perl, as long as the interpreter is pre-installed in your instance. It’s also very common to use the env command in the shebang pointing to the interpreter like this #!/usr/bin/env bash.

The script is executed as root user, so there is no need to use the sudo command. This means the files created are owned by root so make sure to assign the right owner and permissions once they are created. Avoid commands that require user input, make sure the scripts run non-interactively, using the appropriate command parameters (i.e. -y) to make them non-interactive or use commands like yes.

Notice that when Terraform or Schematics gets the instance up and running the user data script starts and Terraform continues with the provisioning of the other resources. This means the user data shell script is being executed in parallel to other Terraform provisioning and in some cases after Terraform completes. As needed depending on the script, allow some time after Terraform is done to verify the shell script is complete.

If something goes wrong or you’d like to see the status of the script execution check the log file at /var/log/cloud-init-output.log. It may be useful to set the -x option to debug bash scripts or to know what command is being executed, as the log file only shows the output of the commands by default. Set the -x option in the shebang or enable/disable the debug option in part of the code using set, like so:

#!/bin/bash -x
# Or using set:
set -x    # enable debugging
echo "do something here"
set +x    # disable debugging

The execution of the shell script is performed by Cloud-Init. The content of the script is copied to a file located in /var/lib/cloud/instances/instance-id/. After the execution the script is not deleted, so you can use it for debugging or future execution. However if it’s not needed, or if it has sensitive information, it’s recommended to delete it. For additional debugging options check the Cloud-Init section below.

The user data or Cloud-Init script is only executed during the provisioning of the instance. If the instance is rebooted the script is not executed again, however, this can be changed using Cloud-Init directives.

Here is one example of a user data script from the terraform compute resource example:

resource "ibm_is_instance" "iac_app_instance" {
  ...
  user_data = <<-EOUD
            #!/bin/bash
            echo '${data.local_file.db.content_base64}' | base64 --decode > /var/lib/db.min.json
            # https://askubuntu.com/questions/1154892/prevent-question-restart-services-during-package-upgrades-without-asking
            echo '* libraries/restart-without-asking boolean true' | debconf-set-selections

Cloud-Init Directives

All the data passed in the user_data parameter is interpreted by Cloud-Init, even if it’s a shell script as in the previous section. However, Cloud-Init can do more tasks than execute a script and it accepts multiple configuration settings.

Cloud-Init directives are also sent with the user_data parameter in the same way that a script is provided, but the syntax is different. The data begins with #cloud-config (this is not a shebang) and must be valid YAML syntax.

When Cloud-Init starts it runs a collection of modules, these modules are listed in the Cloud-Init documentation and they are configured using directives in YAML syntax. Some modules are executed by IBM Cloud such as the SSH module to setup the SSH Keys to allow you to login to the instances, and Mounts to mount the given volumes to the instance. The most common tasks that are done with Cloud-Init are achieved using modules.

The example used in the Getting Started with Terraform is provided as a bash script, the same code using Cloud-Init would be:

#cloud-config
write_files:
  - content: |
      Hello World
    path: /index.html
runcmd:
  - nohup busybox httpd -f -p 8080 &

Cloud-Init directives must begin with #cloud-config, then followed by the the module directives. In this example we use the Write File module to write the text Hello World under the parameter content into the file /index.html specified by the parameter path. Then we use the module Runcmd to execute a list of command(s) (in this example, a list has just one command).

Cloud-Init has five boot stages and for every stage there are modules executed in an specific order. The file /etc/cloud/cloud.cfg can be modified during the image build phase to configure the boot stages, the modules to load and execution order. The sections to modify in the /etc/cloud/cloud.cfg file to change the modules to load and execution order are:

cloud_init_modules runs the disk_setup and mounts modules during the Network stage
cloud_config_modules runs the config modules, including the runcmd module, during the Config stage
cloud_final_modules runs scripts and modules for package installation, configuration management (i.e. puppet, chef) and user scripts. This happens during the Final stage.

The /etc/cloud/cloud.cfg file also includes configuration data about which user will run Cloud-Init, data sources, and vendor data.

Cloud-Init Modules

Here are some of the most used Cloud-Init modules and examples. For the entire list of modules and details refer to the Cloud-Init Modules documentation.

Users and Groups defines new users with the key users and new groups with the key groups.

users:
  - name: jsmith
    groups: sudo
    shell: /bin/bash
    sudo: ['ALL=(ALL) NOPASSWD:ALL']
    ssh-authorized-keys:
      - ssh-rsa AAAA....
  - name: johnsm
    sudo: false

Change Passwords change or remove the password of an existing user, and enable/disable SSH password authentication. This module use the directive ssh_pwauth, chpasswd and password. Notice that using RANDOM or R generate a random password visible in the /var/log/cloud-init-output.log log file.

password: defaultInsecurePasswd
chpasswd:
  list: |
    johnsm:Sup3rP@ssw0rd
    jsmith:Th3-B35t_P@ssW0rd-Ev3r!
    httpuser:RANDOM
  expire: False

Write Files creates or appends the given content a file at the given path. The file can be encoded (base64, gzip, or both) with optional permissions and owner. All the parameters but path are optional.

write_files:
  - path: /test.txt
    content: |
      Here is a line.
      Another line is here.
  - path: /var/lib/db.min.json
    content: {"movies": []}

Update or Install Packages allows packages to be updated, upgraded or installed during boot using the keys packages, package_update and package_upgrade. If a package requires reboot use the directive package_reboot_if_required. Use module Apt Configure to add source list and configure apt if you are on Ubuntu or Debian based OS, for RedHat use the module Yum Add Repo.
```
packages:
    - curl
    - nodejs
    - [libpython2.7, 2.7.3-0ubuntu3.1]
package_update: True
```

SSH Configuration used to manage SSH keys: assign them to users with ssh_authorized_keys and generate keys with ssh_keys. This module is always executed by IBM Cloud when an instance is created to assign the SSH Keys.

ssh_keys:
  rsa_private: |
    -----BEGIN RSA PRIVATE KEY-----
    your_rsa_private_key
    -----END RSA PRIVATE KEY-----
  rsa_public: your_rsa_public_key
disable_root: True
ssh_authorized_keys:

Trusted CA Certificates to add CA certificates to /etc/ca-certificates.conf and update the SSL cert cache.

ca-certs:
  remove-defaults: True
  trusted:
    - |
      -----BEGIN CERTIFICATE-----
      your_CA_cert
      -----END CERTIFICATE-----

Configure DNS uses the directive resolv_conf to configure the DNS service file resolv.conf to use your own DNS server. MAke sure the directive manage-resolv-conf is set to True. If there is no DNS server but you need all the instancess know each other, you can use other Hashicorp tools such as Serf or Consul to create a cluster of instances, collect the IP address of each node and add them to /etc/hosts.

manage_resolv_conf: True
resolv_conf:
    nameservers: ['8.8.4.4', '8.8.8.8']
    searchdomains:
        - foo.example.com
        - bar.example.com
    domain: example.com
    options:
        rotate: True

Hostname & Etc Hosts are used to set the instance hostname, domain name (fqdn) and update them in the /etc/hosts file. It allows you to use a template for the /etc/hosts file located in /etc/cloud/templates/hosts.tmpl. It’s important note that if manage_etc_hosts is set the /etc/hosts file will be updated in every boot, so any change to this file has to be done in the template.
```
hostname: app-node
fqdn: app-node.example.com
manage_etc_hosts: true
```

Run Commands is very helpful when there is no module for a required task. Using the directive runcmd you can execute one or multiple commands. The string with the command to execute are passed to the sh shell process to run. The output of all the commands will be logged to the file /var/log/cloud-init-output.log.

runcmd:
  - [ npm, install, -g, json-server]
  - json-server --watch /var/lib/db.min.json --port 8080 --host 0.0.0.0 &

Variables Interpolation & Templates

When a Shell Script or Cloud-Init Directive are used with User-Data in Terraform they can include Terraform variables (input or local variables) or data sources. For example:

resource "ibm_is_instance" "iac_app_instance" {
  ...
  user_data = <<-EOUD
            #cloud-config
            packages:
              - curl
              - python3-pip
            package_update: True
            write_files:

A best practice and very common way to use User-Data is to have a local variable to store the user data (shell script or cloud-init directives) and set it to the user_data parameter. This variable can have the content hard-coded in the Terraform code or can be read from a file or template.

For example, placing the User-Data Shell Script in the scripts/init.sh file:

scripts/init.sh
#!/bin/bash
echo "'${json_db_b64}'" | base64 --decode > /var/lib/db.min.json
# https://askubuntu.com/questions/1154892/prevent-question-restart-services-during-package-upgrades-without-asking
echo '* libraries/restart-without-asking boolean true' | debconf-set-selections
apt update
apt install -y python3-pip
pip3 install json-server.py

Then we can reference and use the scripts/init.sh like a template using the templatefile function to set the user_data parameter, like this.

resource "ibm_is_instance" "iac_app_instance" {
  ...
  user_data = templatefile("${path.module}/scripts/init.sh", { json_db_b64 = data.local_file.db.content_base64, port = var.port })
  ...
}

If you are using Terraform 0.11 or lower you need to use the template_file data source instead of the templatefile function.

IaC Resources: Containers

IaC Configuration Management: Ansible