Apache remote logging with rsyslog

One of the challenges of running a cluster of webservers is to decide where to store the log files for post-processing, considering they are relevant for you for any reason. The first approach most people think is to keep them locally and sync to a central server using rsync from time to time. However, such approach can still lead to missing entries if the machine goes down between synchronizations, which is specially true when using Amazon AutoScaling to dynamically add and remove servers from the grid (which is our case).

A much better approach is to log everything to a remote server right on spot, instead of relying on local files. The most common solution is to use rsyslog, which comes with default with the majority of Linux distributions, or syslog-ng, which claims to be a better tool than rsyslog.

In order to make it work with rsyslog, three things are necessary: configure the webserver (Apache, in my case) to delegate the logging to an external program (/usr/bin/logger), configure the log server to accept connections either in UDP or TCP ports, and configure the client (which should be the same machine where the webserver is running) to send the information to the server.

 To configure Apache, open its configuration file and change the directive CustomLog to it pipes the contents to an external program, like this:

CustomLog "|/usr/bin/logger -t apache -p local6.info" combined

The next step is to configure rsyslog client, which usually is located at /etc/rsyslog.conf, where we need at least specify that the level “local6.info” should be sent remotely – you may choose to send everything to the remote log server, however in my case I only want to send the Apache stuff. It works like this:

# Syntax:
# <level> @<IP>:<port>
local6.info @10.11.12.13:514

For “<level>” you can pass “*.*” to send everything. The single “at” symbol (@) means that the connection will be made via UDP, while two @ is for TCP.

The last step is the server, also in /etc/rsyslog.conf, where we need to enable the UDP module and specify where the Apache logs should be written. The following is the minimum you need:

$ModLoad imuxsock.so
$ModLoad imklog.so

# Provides UDP syslog reception
$ModLoad imudp.so

# The port where to listen
$UDPServerRun 514

# Write all apache logs to this file (please note the comma)
$template apacheAccess,"/var/log/apache_access_log"

# If the log's tag is "apache" and matches
# the defined level, send it to a specific file
if $syslogtag == 'apache' then {
    local6.info ?apacheAccess
    & ~
}

That’s all you need.

One last thing: officially each line is limited to 4kb by rsyslog, although I have heard that the kernel ring size also plays a role. In any case, content bigger than that will be trimmed.

Zero downtime deploy script for Jetty

One of the challenges when developing Java web applications is to deploy new versions of the app without any perceptible downtime by the end users – in fact, this impacts virtually any platform, although in Java it could be trickier than for PHP or Rails, for example. The problem is that most servlet containers need to first shutdown the context in order to load it again, an operation that can take several seconds to complete (or, in a worst scenario, several minutes, depending of how your webapp is built).

When you have a cluster of servers serving the same app it may not be such a big problem, as one possible approach is to deploy the new version one box at a time. On the other hand, it is fairly common to have a single machine (despite its size) with a single webserver to do all the work, and there lies a monster.

In order to address such issue, I have created a bash script that does some tricks with Jetty and Apache configuration files that allows us to deploy a new version of the application and switch to it (as well switch back to the older version if necessary) with no noticeable downtime. Although it was created with the environment we have in production there where I work in mind, it is easy to adapt it to your needs (or vice-versa). The script assumes the following:

  • Jetty’s hot deploy feature should be disabled (basically, set “scanInterval” to 0 in jetty-contexts.xml)
  • Apache is in front of Jetty through mod_proxy
  • Your app is deployed as an open directory (e.g, not as a war), ideally using Capistrano or another similar tool
  • The ports 8080 and 8081 are available
  • The environment variable JETTY_HOME points to the Jetty installation directory
  • The environment variable APACHE_HOST_CONF points to the Apache configuration file for the host you are dealing (ideally not httpd.conf, but “example.conf”)
It works this way: you use the script “jetty_deploy.sh” as workhorse in place of the usual “jetty.sh”. To start a new instance, run “jetty_deploy.sh start_new”, and the script will change the proper configuration files to listen on the “opposite” port (e.g, 8081 if 8080 is the current one, or vice-versa), start a new Jetty server and wait until the context fully starts. After that it will restart Apache, which will then proxy all requests to the new jetty server. If something goes wrong you can use “jetty_deploy.sh rollback”, and if everything is OK, you can stop the previous and old instance by running “jetty_deploy.sh stop_previous”. Simple as that.
The project is freely available at https://github.com/rafaelsteil/jetty-zero-downtime-deploy, and please make sure you read the instructions in the file “jetty_config”. In fact, it is advisable to either understand how “jetty_deploy.sh” does its job.