Devex Tech Blog

Devex.do(:good).well()

Performance Issues With Rails and VirtualBox

| Comments

Two weeks ago, we noticed some performance issues with Rails in our development setup, while all our other environments, some much less powerful, were working with much better performance. After confirming that no recent change caused this slow-down, and running some diagnostics and measurements to record the performance in some point, this took us on a small trip into some Ruby on Rails debugging on a VirtualBox.

Our development setup

Most of us at Devex, we use a specific setup of our apps into a VirtualBox machine so we can hold a local development version of our site and check how the new features integrate before to give the work as done. In some cases, for feature availability, performance, and workstation power reasons, these apps, and their components run completely on this environment, but in other cases, we don’t run all the components in the virtual machine and we do use some components from the staging (we call it develop) environment, which is quite approximate to what we need. So this development environment’s architecture and infrastructure is quite different from the ones we have in staging, pre-production, and, of course, production.

Not only that, in all our other environments, we run the apps using Unicorn as a daemon, with more or less workers, while in this development environment, we run WEBrick within a screen session, just to simplify the load.

The problem arises

The problem arose when we noticed Rails was underperforming when running in the local development setup. Some measurements were taken showing up that the problem seemed to be located in our front-end application, since the back-end was running properly:

$ time wget -pq --no-cache --delete-after http://localhost:3002/apps/front_end/api/system/health
real    0m0.523s
user    0m0.000s
sys 0m0.000s

$ time wget -pq --no-cache --delete-after http://localhost:3004/public/system/health
real    0m0.019s
user    0m0.000s
sys 0m0.000s

These two URIs produce more or less identical content, JSON-formatted status info regarding our front-end and back-end applications, respectively. So, another measurement was done, which consisted into measuring the time to serve one complete page:

$ time wget -pq --no-cache --delete-after http://localhost:3002/people
real    0m13.393s
user    0m0.008s
sys 0m0.032s

Looking for similar cases on Internet, we found some links related to WEBrick performance improvements, Rails performance within VirtualBox, and WEBrick reverse lookups.

None of the solutions we tried from these links helped to find the solution. We then tried doubling processors and memory available in the VM configuration, but it didn’t solved the issue. Measuring with atop showed no bottleneck. Reproducibility of the issue in other computers and setups was also checked, so it was not something related to the specific setup or the hardware on which it was relying.

What’s going on then?

Then, we started analizing the issue in some more depth, first taking a look on what was going on when the browser issued a request. So, using the Network view in the web inspector, we saw very long waiting times for static content, but the rest of the times were ok, no execution issues or so.

Next step was taking a closer look to the log to see what would be happening when Rails received the static requests. We could confirm that the logs were written much more slowly when serving static files. So we decided to trace the WEBrick process.

When we strace’d the WEBrick process, we couldn’t see anything meaningful, but then we learned WEBrick is threaded, so the actual requests where being attended by threads, which traces were not showed in the strace output.

The trick to identify the thread TIDs is to run the ps command with the -T option, which lists also the threads as processes, with the corresponding id. Then you can run strace on those TIDs. Finally we started to see errors on unavailable resources. We searched again and found out that VirtualBox uses a specific filesystem for the shared folders, with some performance problems, and that NFS is one of the fastest ones you can use.

The fix

Some of us, simply copied the shared folder content to be used to a regular directory in the VM disk. This fixes the issue, but introduces the need to keep copying the content when it gets updated.

Some others of us, decided to go for NFS, which has a couple of drawbacks:

  • It is not compliant with having a Windows host, but this is not our case.
  • It requires a LAN to be setup between the host and the guest, but we solved the whole problem by adding the following to our Vagrantfile:

    ip = ENV['VMSETUP_IP'] || `vboxmanage list hostonlyifs | grep IPAddress | cut -d: -f2 | tr -d ' '`.to_s.tr("\n", "") + '0'
    [...]
    Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
      [...]
      config.vm.network :private_network, ip: "#{ip}"
      [...]
      config.vm.synced_folder ".", "/vagrant", type: "nfs"
      [...]
    end
    

By the way, the final measurements are as follows:

$  time wget -pq --no-cache --delete-after http://localhost:3004/public/system/health

real 0m0.016s
user 0m0.000s
sys 0m0.004s

$ time wget -pq --no-cache --delete-after http://localhost:3002/apps/front_end/api/system/health

real 0m0.066s
user 0m0.000s
sys 0m0.006s

$ time wget -pq --no-cache --delete-after http://localhost:3002/people

real 0m1.661s
user 0m0.012s
sys 0m0.026s

Comments