Ansible on FreeBSD: Faster setup

August 18, 2016

In a stock FreeBSD install Ansible’s “setup” task can take a really long time. Testing against a dual xeon with 256GB of memory I observed the task consistently taking over 15 seconds to complete. When compared to a 2-core Ubuntu 16.04 vm taking a couple of seconds, something feels very wrong!

In the single jail test I have my hosts file as follows:

my_single_jail ansible_connection=jail ansible_python_interpreter=/usr/local/bin/python

and I make sure to keep the generated ansible script on the target host when running the setup module in isolation.

sudo ANSIBLE_KEEP_REMOTE_FILES=1 ansible -m setup -i hosts my_single_jail

The remote file, a script, can be found under /root/.ansible on the target host (ansible_connection=jail requires Ansible be run as root rather than becoming root with something like sudo or doas).

Running the script under truss and following forks processes gives some interesting results…

truss -f /usr/local/bin/python .ansible/tmp/ansible-tmp-1471479743.01-38328629651633/setup

as the output is filled with close syscalls against ascending file descriptors:

8975: close(117158)	ERR#9 'Bad file descriptor'
8975: close(117159)	ERR#9 'Bad file descriptor'
8975: close(117160)	ERR#9 'Bad file descriptor'
8975: close(117161)	ERR#9 'Bad file descriptor'
8975: close(117162)	ERR#9 'Bad file descriptor'
8975: close(117163)	ERR#9 'Bad file descriptor'
8975: close(117164)	ERR#9 'Bad file descriptor'
8975: close(117165)	ERR#9 'Bad file descriptor'
8975: close(117166)	ERR#9 'Bad file descriptor'
8975: close(117167)	ERR#9 'Bad file descriptor'
8975: close(117168)	ERR#9 'Bad file descriptor'
8975: close(117169)	ERR#9 'Bad file descriptor'
8975: close(117170)	ERR#9 'Bad file descriptor'

On my testbed this number grow into the millions and took a few minutes before my SIGINT was able to stop the process.

But what code is causing this? Fortunately python ships with a great module that allows us to profile the execution of a script by function.

python -m cProfile -s cumtime .ansible/tmp/ansible-tmp-1471479743.01-38328629651633/setup

Running this we can see that most of the cumulative time of execution is caught up running subprocesses:

setup:64(<module>)
setup:131(main)
setup:81(run_setup)
setup:5154(ansible_facts)
setup:1890(run_command)
subprocess.py:650(__init__)
subprocess.py:1195(_execute_child)
$ pkg list python27 | grep subprocess.py
/usr/local/lib/python2.7/subprocess.py
...

Running a subprocess requires fork’ing the python process and exec’ing the new command. After the fork a certain amount of tidying up of preparation is done in the new environment pre-exec. Part of this means closing any inherited file descriptors that are not required.

if close_fds:
    self._close_fds(but=errpipe_write)

What does this function do?

def _close_fds(self, but):
    if hasattr(os, 'closerange'):
        os.closerange(3, but)
        os.closerange(but + 1, MAXFD)

It closes all file descriptors that aren’t the error pipe, upto MAXFD which is defined above as

try:
    MAXFD = os.sysconf("SC_OPEN_MAX")
except:
    MAXFD = 256

What does that sysconf evaluate to on our system?

$ python
>>> import os
>>> os.sysconf("SC_OPEN_MAX")
7546230

That’s 7 million wasted syscalls everytime we try to run a subprocess.

$ ulimit -n
7546230

Yes, the limits for maxfiles are maximum by default. Let’s fix it:

limits -n 1024 /usr/local/bin/python .ansible/tmp/ansible-tmp-1471479743.01-38328629651633/setup

The setup code now completes in under a second. How do we fix this for actual ansible runs?

Solution 1

Only works on Ansible < 2.1

The BSD Support page on Ansible’s site notes that the ansible_python_interpreter host_var should be set to /usr/local/bin/python. We have to go one step further to include the maxfiles limit:

ansible_python_interpreter="limits -n 1024 /usr/local/bin/python"

This is broken in Ansible 2.1 as per this bug report. A patch was submitted and merged in, but it only fixes the case where /usr/bin/env <command> is used.

Solution 2

We can make a custom wrapper for python, that applies the limit and runs python.

#!/bin/sh
exec limits -n 1024 /usr/local/bin/python "$@"

Save this, make it executable and refer to it in host_vars:

ansible_python_interpreter="/usr/local/bin/pythonwrapper"

This requires additional complexity on a jail setup, as all of the jails must have a copy of this wrapper available.

Solution 3

Alter limits for the user running ansible (root for me) under /etc/login.conf and run cap_mkdb /etc/login.conf to update the login class database.

Aside: Why are the file limits so high?

The FreeBSD handbook section on tuning kernel limits covers the kern.maxfiles sysctl:

The read-only sysctl(8) variable kern.maxusers is automatically sized at boot based on the amount of memory available in the system

The beefier the box is, the slower it will run Ansible’s setup without modifications.

One of the fantastic things about FreeBSD is that the source code for the system can typically be found under /usr/src. The code that determines maxfiles and maxfilesperproc can be found under sys/kern/subr_param.c.