In a stock FreeBSD install Ansible’s “setup” task can take a really long time. Testing against a dual xeon with 256GB of memory I observed the task consistently taking over 15 seconds to complete. When compared to a 2-core Ubuntu 16.04 vm taking a couple of seconds, something feels very wrong!
In the single jail test I have my
hosts file as follows:
my_single_jail ansible_connection=jail ansible_python_interpreter=/usr/local/bin/python
and I make sure to keep the generated ansible script on the target host when running the setup module in isolation.
sudo ANSIBLE_KEEP_REMOTE_FILES=1 ansible -m setup -i hosts my_single_jail
The remote file, a script, can be found under
/root/.ansible on the target host (
ansible_connection=jail requires Ansible be run as root rather than becoming root with something like
Running the script under
truss and following forks processes gives some interesting results…
truss -f /usr/local/bin/python .ansible/tmp/ansible-tmp-1471479743.01-38328629651633/setup
as the output is filled with close syscalls against ascending file descriptors:
8975: close(117158) ERR#9 'Bad file descriptor' 8975: close(117159) ERR#9 'Bad file descriptor' 8975: close(117160) ERR#9 'Bad file descriptor' 8975: close(117161) ERR#9 'Bad file descriptor' 8975: close(117162) ERR#9 'Bad file descriptor' 8975: close(117163) ERR#9 'Bad file descriptor' 8975: close(117164) ERR#9 'Bad file descriptor' 8975: close(117165) ERR#9 'Bad file descriptor' 8975: close(117166) ERR#9 'Bad file descriptor' 8975: close(117167) ERR#9 'Bad file descriptor' 8975: close(117168) ERR#9 'Bad file descriptor' 8975: close(117169) ERR#9 'Bad file descriptor' 8975: close(117170) ERR#9 'Bad file descriptor'
On my testbed this number grow into the millions and took a few minutes before my SIGINT was able to stop the process.
But what code is causing this? Fortunately python ships with a great module that allows us to profile the execution of a script by function.
python -m cProfile -s cumtime .ansible/tmp/ansible-tmp-1471479743.01-38328629651633/setup
Running this we can see that most of the cumulative time of execution is caught up running subprocesses:
setup:64(<module>) setup:131(main) setup:81(run_setup) setup:5154(ansible_facts) setup:1890(run_command) subprocess.py:650(__init__) subprocess.py:1195(_execute_child)
$ pkg list python27 | grep subprocess.py /usr/local/lib/python2.7/subprocess.py ...
Running a subprocess requires fork’ing the python process and exec’ing the new command. After the fork a certain amount of tidying up of preparation is done in the new environment pre-exec. Part of this means closing any inherited file descriptors that are not required.
if close_fds: self._close_fds(but=errpipe_write)
What does this function do?
def _close_fds(self, but): if hasattr(os, 'closerange'): os.closerange(3, but) os.closerange(but + 1, MAXFD)
It closes all file descriptors that aren’t the error pipe, upto
MAXFD which is defined above as
try: MAXFD = os.sysconf("SC_OPEN_MAX") except: MAXFD = 256
What does that
sysconf evaluate to on our system?
$ python >>> import os >>> os.sysconf("SC_OPEN_MAX") 7546230
That’s 7 million wasted syscalls everytime we try to run a subprocess.
$ ulimit -n 7546230
Yes, the limits for maxfiles are maximum by default. Let’s fix it:
limits -n 1024 /usr/local/bin/python .ansible/tmp/ansible-tmp-1471479743.01-38328629651633/setup
The setup code now completes in under a second. How do we fix this for actual ansible runs?
Only works on Ansible < 2.1
The BSD Support page on Ansible’s site notes that the
ansible_python_interpreter host_var should be set to
/usr/local/bin/python. We have to go one step further to include the maxfiles limit:
ansible_python_interpreter="limits -n 1024 /usr/local/bin/python"
This is broken in Ansible 2.1 as per this bug report. A patch was submitted and merged in, but it only fixes the case where
/usr/bin/env <command> is used.
We can make a custom wrapper for python, that applies the limit and runs python.
#!/bin/sh exec limits -n 1024 /usr/local/bin/python "$@"
Save this, make it executable and refer to it in
This requires additional complexity on a jail setup, as all of the jails must have a copy of this wrapper available.
Alter limits for the user running ansible (root for me) under
/etc/login.conf and run
cap_mkdb /etc/login.conf to update the login class database.
Aside: Why are the file limits so high?
The FreeBSD handbook section on tuning kernel limits covers the
The read-only sysctl(8) variable kern.maxusers is automatically sized at boot based on the amount of memory available in the system
The beefier the box is, the slower it will run Ansible’s setup without modifications.
One of the fantastic things about FreeBSD is that the source code for the system can typically be found under
/usr/src. The code that determines
maxfilesperproc can be found under