If you’re a Mac user you’ll probably be aware that a new version of OS X was recently released. This means I’ll soon be running a new, and hopefully better, version of Apple’s Operating System and I thought it would be interesting to document the process I use to switch my laptops to new OS versions.
Observant readers will note I have not used the word ‘upgrade’ so far in this post. That’s because I don’t do upgrades. I like my systems to be clean, lean and free from old cruft from previous installations.
This is important for me as I do a lot of development and a lot of the software I use has been compiled from source; Apple historically has a nasty habit of changing compilers, library locations and filesystem hierarchy between versions. I have neither the time or inclination to track down compilation and linking errors in the bowels of
Apple’s yearly release cadence since OS X 10.8 means that I am going to be starting from scratch pretty much every year, and setting up a machine takes time and energy, there are a couple of things that I do to try and mitigate this.
The first: reliable backups, is not the topic of this post. You should have a robust backup system in place for any data that you care about. Raid doesn’t count, neither does Time Machine or Cloud Storage.
I still like Jamie Zawinski’s public service announcement on the topic, although a lot of the technical advice is out of date now that Apple laptops are becoming consumer devices with no user-serviceable parts.
For the record, I tend towards option 1 in Jamie’s PSA. I listen to CD’s and enjoy film photography. I find this to be more reliable than computers! What data I do care about is spread across some hard disks kept in different places using
The second thing I couldn’t do without is some kind of infrastructure automation and configuration system. There are loads of different options available, designed for many different purposes and infrastructure sizes. I use Chef at work but it required a bit too much bootstrapping for what I’m attempting, which is basically: how little work can I do to get from a fresh install to a useable system?
Ansible is an automation system, it’s reasonably new, and so changes frequently. It organises bundles of configuration into Playbooks, which are written in YAML. Ansible itself, is written in Python.
I don’t know very much Python, and I think that writing configuration management tasks in YAML is a terrible idea but these things aside, there are lots of reasons why Ansible is suitable for situations like this:
Ansible is push based rather than pull based. Most CF frameworks I’ve used are predominantly client/server based. Nodes run a client application that keeps configuration syncronised against a set of resources stored on a server. Puppet has it’s Puppetmaster and Chef has the Chef Server, it’s possible to run both of these systems in headless mode, without a server, but this can be more complex. And in some cases, the solution is to start an in-memory temporary server anyway.
Ansible’s playbooks reside on the local workstation, and there is no server. Ansible provisions nodes by pushing the playbooks up over SSH and executing tasks. It doesn’t have Puppet’s catalog compilation or Chef’s compile and converge stages. It just executes commands, in the order you specify in your YAML. This simplicity lends itself to the task of provisioning a workstation as your playbooks are normally simple enough that expensive dependency resolution is a complexity overhead. And with the
ansible_connection=local setting in your host inventory, it will even skip SSH completely, making the entire run much quicker and easier to set up.
The most important point is that Ansible is simple to set up. It’s written in Python, which is available by default on OS X, and it doesn’t require any kind of external processes (such as a server or client daemon). Assuming you have already written a playbook (mine lives on BitBucket), the steps required to bootstrap a machine from fresh install are:
I have encapsulated steps 3-4 into a bootstrap script. It looks like this
#!/usr/bin/env bash set -e echo 'Installing Ansible' # prefer pip for installing python packages over the older easy_install # if [[ ! -x `which pip` ]]; then sudo easy_install pip fi if [[ -x `which pip` && ! -x `which ansible` ]]; then sudo CFLAGS=-Qunused-arguments CPPFLAGS=-Qunused-arguments pip install ansible fi echo 'Running Ansible to configure Dev machine' if [[ -x `which ansible` ]]; then ansible-playbook -i hosts site.yml --ask-sudo-pass fi
This automates the installation and running of Ansible. It attempts to be idempotent, so I can run
bootstrap.sh everytime I want to apply the playbook. It lives in my playbook repository, which I copied to my machine in step 2.
Ansible requires two main files to run, both of which you can see being passed into the
ansible-playbook command in the script above. The first,
hosts is a list of hosts and host groups that you want Ansible to target. In this case my
hosts file contains the single line
which tells Ansible that we’re using a local connection to the localhost.
site.yml, is my primary playbook. It is convention to have a site playbook at the top level which delegates to other playbooks that provide the main configuration to the nodes. My
site.yml looks like this:
- hosts: all roles: - role: workstation vars: - home: /Users/
This says: For every host defined in the inventory, apply the main playbook from the workstation role and pass along a variable called home that has the value
Ansible uses Jinja2 for variable interpolation in its templates, and
ansible_user_id is a built in Fact (system provided variable) representing the name (?) of the user currently running Ansible. This is an important point: the playbook has been written assuming it’s going to be run by a normal workstation user, not a privileged root account.
A Role is a directory of playbooks with a certain semantic structure. My main workstation role consists of the following files:
tasks/main.yml The main entry point into the playbook. In my
site.yml when I apply the workstation role, this is the playbook that will be run.
handlers/main.yml Lists of tasks that can be notified based on the output of other tasks. I don’t use these in my playbooks but the Ansible docs tell me that they’re mostly for triggering service restarts and reboots.
Playbooks consist of lists of tasks - my main playbook in the role above looks like this:
--- - name: Make sure required system directories exist file: name= state=directory owner= sudo: yes with_items: - /opt/rubies - name: Make sure required home directories exist file: name= state=directory with_items: - '/code' - include: redis.yml - include: mysql.yml - include: ruby.yml - name: Install required software homebrew: name= state=present with_items: - ack - git - vim - bats - leiningen - include: emacs.yml - include: pow.yml
Hopefully it’s not hard to read and work out what this playbook is doing. Each task defines a name, which is displayed on
stdout during a run, and then uses a module from the Ansible module library to perform a certain action. In this case
file says: make sure the file identified by ‘name’ has a certain state and a certain owner.
You can flag that certain tasks need to run as root by using the
sudo: yes flag and you can include other playbooks using the
The loop syntax is pretty grim. YAML is just a markup language so doesn’t support these kind of constructs. When you use the
with_items parameter Ansible runs the parent task once for each item in the list and provides you with a variable called
item that contains the current element of the list. There are a variety of other slightly clunky parameters for doing other kinds of looping that you would expect in this kind of circumstance, but they all feel equally unpleasant to use.
The Ansible documentation is pretty good and easy to search, which is a good thing because the number of programming concepts shoe-horned into a markup language will require you to look things up a lot!
Most of the time taken setting up a new OS X install seems to be in finding and installing applications from various source. You can get software from the App Store, from
.pkg files littered around the internet, from developer utilities like Homebrew or MacPorts, or just tarballs of source code that needs to be hand compiled.
There are several utilities that I like to use to help with all of this stuff.
Homebrew, which I’ve mentioned before, sells itself as a package manager, although realistically it’s more like Arch Linux’s AUR: it provides recipes for compiling software rather than the installable binary packages of more traditional package managers such as Apt or RPM.
I prefer Homebrew to it’s primary alternative MacPorts because, unlike MacPorts, it works with the Apple provided libraries, compiling against them where necessary. MacPorts maintains a Portage like ports tree so it will compile it’s own versions of library dependancies, even when OS X provides a working version.
Homebrew Cask extends Homebrew to make the process of installing native Mac applications easier. It provides a set of Homebrew like recipes for installing software like Google Chrome, VirtualBox, TextMate, Alfred, and a lot of other super useful desktop software. You can see the full list of software it supports on the Github project.
Both Homebrew and Cask have modules in the Ansible module library so automating all this installation is very easy:
- name: Install ancillary applications homebrew_cask: name= state=present with_items: - alfred - firefox - google-chrome - virtualbox - textmate - name: Install required software homebrew: name= state=present with_items: - ack - git - vim - bats
Ansible also has support for most of the major source control systems so getting my Emacs configuration is a simple
- name: install eightbitraptors emacs configuration git: repo=https://github.com/eightbitraptor/dotemacs.git dest=/.emacs.d update=no
And if you really want, you can automate that super secure installation pattern of ‘pipe a random script from the internet into my shell’:
- name: fetch Pow installer get_url: url=http://get.pow.cx dest=/tmp/pow-install.sh - name: install Pow shell: /tmp/pow-install.sh creates=/.pow
I hope this insight into my process has been illuminating. There’s a lot about Ansible I don’t like:
But despite all this I think that Ansible is a perfectly adequate choice for a small, simple tool to configure a single developer Mac.
There are definitely other tools out there too: Thoughtbot’s Laptop, Boxen - which uses Puppet, or Kitchenplan - which uses Chef and probably more, but all of these tools are either very opinionated, require far too much bootstrapping, or are just overcomplicated in my opinion. I already know YAML, and the Ansible documentation is easy to use. Writing simple playbooks that get me 99% of the way to my optimal setup only took a couple of hours. In fact, I probably spent longer writing this post. And I can use my playbooks to keep multiple systems consistent, even on future versions of OS X.
I’m not going to make my Ansible repository available publically. It contains setup artifacts that, whilst not as secret as private keys, I’d rather keep to myself.