Varnish on CentOS - full guide

Varnish – how to make websites fly?

In this episode we will make our websites fly with Varnish on CentOS. What exactly is Varnish? Varnish is an HTTP accelerator designed for websites with high traffic. It can speed up you website significantly, especially when you have more reads than writes. It's designed to handle heavily loaded websites as well as APIs. It's used by many high-traffic websites like The Guardian, New York Times etc. In this episode we will learn how to install Varnish on CentOS.

[sc:lamp_series ]

How to install Varnish on CentOS?

Installation is pretty straightforward:

rpm --nosignature -i https://repo.varnish-cache.org/redhat/varnish-4.0.el6.rpm
yum install varnish -y
varnishd -V

Varnish comes in two versions. Widely used version 3 and version 4 which is recommended at the moment. Version 3 is generally deprecated  since the end of April 2015, so we will install latest version of Varnish 4. After installation  varnishd -V  shows the version that is inside our system. In our case it's 4.0.3

Now we can try to run Varnish by executing following command:

sudo service varnish start

...and you should see the error - varnish failed to start. Why's that? Read next section about SElinux policy. If you don't see the error you can move forward to next part of this tutorial.

SElinux policy for Varnish on CentOS

The problem is that Varnish in version 4 requires some additional permissions in order to run under CentOS. We have SElinux enabled by default. We shouldn't not disable that (which is sometimes common practice in such situations). We will add the rule to the policy, so Varnish can start without any problems.

First of all we need the tool called audit2allow. Let's install it:

sudo yum install policycoreutils-python -y

Now, try to start Varnish one more time. It should give you fail error. After that execute following command:

cd ~/sources
mkdir varnish
cd varnish
sudo grep varnishd /var/log/audit/audit.log | audit2allow -M varnishd2
sudo semodule -i varnishd2.pp

We will use our sources directory that we created at the beginning of our series. You can of course use any directory you want.

Next we look through audit.log on CentOS and try to find varnishd occurrences. audit.log file contains blocked activities by SELinux. From these blocked activities we will generate module -M for SELinux that will grant access to required permissions.

Note 2 at the end of varnishd2  name. We need to add new policy. varnishd  policy already exists in SELinux policies.

Enable the module and try to start Varnish service again. In our case it didn't work out, and we still got fail error. So we did it again:

sudo grep varnishd /var/log/audit/audit.log | audit2allow -M varnishd3
sudo semodule -i varnishd3.pp
sudo service varnish start

and Varnish started without any errors!

Setup firewall for Varnish

So now it's the time to play around with Varnish. The installation only is not enough to be able to have Varnish fully working. Let's start by adding some firewall rules in order to be able to test Varnish on CentOS.

By default Varnish is running on port 6081 , but we don't have rules in our firewall, so you won't be able to see the actual result. Open firewall rules file and add these rules:

-A INPUT -p tcp --dport 6081 -m state --state NEW,ESTABLISHED -j ACCEPT
-A OUTPUT -p tcp --sport 6081 -m state --state ESTABLISHED -j ACCEPT
-A INPUT -p tcp --sport 6081 -m state --state ESTABLISHED -j ACCEPT
-A OUTPUT -p tcp --dport 6081 -m state --state NEW,ESTABLISHED -j ACCEPT

It's the same rules like for instance for port 80 for Apache. For testing purposes we will enable to access our website by two different ports - 80 (by Apache) and 6081 (by Varnish). Port 80 will work without caching, port 6081 will serve cached content.  At the end we will change our setup, so only port 80 will be available and it'll be Varnish port, not Apache like for now.

Restart the firewall to apply the rules:

sudo service firewall restart

And now you can try to test it by calling http://example.com:6081

You should see the error, and that is perfectly fine.

Test with microtime()

Now it's time to test Varnish cache. The easiest way is to create PHP script with following content in /var/www/example/com/htdocs :

<?php
#/var/www/example.com/htdocs/varnishtest.php
echo microtime();

then try to run it in your browser: http://example.com/varnishtest.php and refresh couple of time. On each refresh you should see different contents.  Now it's time to make http://example.com:6081/varnishtest.php  working.

First of all we need to configure backend for Varnish. Edit file /etc/varnish/default.vcl

You should see backend default there. By default port is set to 8080, but our Apache is working on port 80, so we need to change that:

vcl 4.0;

# Default backend definition. Set this to point to your content server.
backend default {
    .host = "127.0.0.1";
    .port = "80";
}

Exit editor and reload Varnish:

sudo service varnish reload

Try to reload the page couple of times. At second refresh you should see exact same thing as before. So result from microtime()  will not change. If you achieve such effect, you can be happy! Varnish is working correctly:D

If you will see at response headers you will get additional headers. Most important is Age  header. It should have value > 0. By that you can see that Varnish is caching your website correctly. But if you still can see that it's not working (content is still changing) you should see next section of this article.

Troubleshooting Varnish Cache

There are plenty of reasons why Varnish might not cache your website. You must know that it's not easy to diagnose what's wrong with your cache/website. Here are few tips that might be helfpul

ModSecurity is running

If you have ModSecurity enabled there might be an issue with caching. Or you might see 403 pages sometimes. There's an easy fix for that. You need to disable 960020 rule. In order to do that you must edit the httpd-security.conf  file that we created during Apache setup and add line  so the file will look like that:

# File location: /usr/local/apache2/httpd/conf/extra/httpd-security.conf

#...
<IfModule security2_module>
      Include conf/crs/modsecurity_crs_10_setup.conf
      Include conf/crs/base_rules/*.conf
      # Include conf/crs/experimental_rules/*.conf
      # Include conf/crs/optional_rules/*.conf


      #.... configure ModSecurity here
      
      #Add this line: 
        SecRuleRemoveById 960020
</IfModule>
#... Rest of the file

Remember that you can't do it in .htaccess. It's important to remove the rule after you include rules from OWASP!

After that restart HTTP daemon and check if page is caching or not:

sudo service httpd restart

Missing caching header

Another issue might be connected to missing caching header. You can set how long each page can be cached by setting the cache file. Try to change varnishtest.php file like that:

<?php
header( 'Cache-Control: max-age=60' );
echo microtime();

It means that this file should be keep in cache no longer than 60 seconds. Try to refresh the page and see if it brings any help or not?

Double caching header

If You are using mod_expire for Apache, there might be situation where you have two caching headers. One is set from PHP script, second one by Apache. The easiest way to check it is to see header in Chrome for instances. If You will see two caching headers like that:

Cache-Control:max-age=10
Cache-Control:max-age=0

You need to adjust your Vhost file for file caching. For instance sometimes you can find such settings in your httpd.conf or Virtual host file:

<IfModule mod_expires.c>
    #If you are using caching for static contents this line must be present, don't remove it!
    ExpiresActive on

    #Default cache is set to 0, comment it out
     ExpiresDefault 0
   
     #Disable cache per content type, comment it out
     ExpiresByType application/json                      "access plus 0 seconds"
     ExpiresByType application/xml                       "access plus 0 seconds"     ExpiresByType text/html                             "access plus 0 seconds"
</IfModule>

If you plan to use Varnish cache, you should control your cache inside your application, and don't rely on Apache cache when it comes to website content, API endpoints etc. You can still cache static assets like images, scripts or styles with Apache. Double Cache-Control header will prevent Varnish from caching PHP scripts.

Those cookies...

This is really important. By default Varnish will NOT cache any output if there are any cookies. That's how usually authentication mechanisms works for logged in users. If you are logged in you want to see most recent content instead of cached one. Make sure that you are not sending any Cookies. If you have some problems with Tracking cookies like from Optimizely or Google Analytics  you an always unset them in Varnish. Try this approach. Open default.vcl  file where you set the backend port. Change sub vcl_recv section to something like this:

sub vcl_recv {
	 if(req.http.Cookie){
    set req.http.Cookie = ";" + req.http.Cookie;
    set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
    set req.http.Cookie = regsuball(req.http.Cookie, ";(PHPSESSID)=", "; \1=");
    	set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");

    if (req.http.Cookie == "") {
        	unset req.http.Cookie;
    }
  	}
}

This is example taken from Varnish webiste. It will unset all cookies but not the cookie contains PHPSESSID. Save changes in file and restart varnish:

sudo service varnish restart

Check your page now.

Nothing helps 🙁

If nothing from above helped, you need to find the issue. There is also great documentation that comes from Varnish and you can see Varnish troubleshooting guide. If still nothing helps you can post comments here below, use StackOverflow or some forums. We learned that Varnish sometimes can be pain in the a** and it's hard to debug why something doesn't work...

Purging Varnish Cache

Great! Now we have our cache up and running! We change something in our script, refresh the page, but hey! There is still cached content! So now the question is, how to clear Varnish cache?

Method 1 - varnishadm

Let's introduce first way to purge Varnish cache. First method is with usage of Varnish tools. One of them is varnishadm which will help us to clear the cache. Let's try it:

sudo varnishadm ban 'req.http.host == "lamp.com:6081" && req.url ~ "/varnishtest.php"'

This is very basic usage. We will use ban command from varnishadm to add ban in Varnish settings. In ban command we can use any parameters that are available in Varnish settings, but here are two basic things that you will need.

First we need to check host req.http.host == "lamp.com:6081" . This is very useful if we have multiple domains on our server. We need to be sure that we are clearing the cache only for one, particular website, not for all websites. Port is necessary for now, because we don't have Varnish on port 80 now.

Next is the most important thing - URL. req.url ~ "/varnishtest.php"  We can use here any regex expression, which is really helpful. This command will clear only one particular URL. You can use req.url ~ "^/article" to ban all URLs that starts with /article phrase. Sometimes helpful command is req.url ~ ".*"  that will clear cache for every page and file.

Short note about clearing the cache - if you have really high traffic on your website you need to be very careful when it comes to clearing Varnish cache. You don't want to kill your website with really high traffic that will goes straight to Apache for instance. So sometimes it's better to wait for Varnish cache to expire by itself instead of clearing the cache manually. It all depends on your website and architecture.

Method 2 - PURGE request

OK, but how to clear the cache from PHP script for instance? It's not safe to execute shell commands from script directly. Therefore Varnish comes with another way of clearing cache - PURGE request.

You can send PURGE command with cURL, for instance:

curl -X PURGE http://example.com:6081/varnishtest.php

ModSecurity fix

If you have ModSecurity enabled, you will get 403 error. That's because PURGE request method is not allowed. We need to edit OWASP settings file located in /usr/local/apache2/conf/crs/modsecurity_crs_10_setup.conf  and we need to find the line with tx.allowed_methods . Then we need to add PURGE request, so it will look like that:

setvar: 'tx.allowed_methods=GET HEAD POST OPTIONS PURGE', \

Restart Apache to load new policy:

sudo service httpd restart

Then try to execute curl command from above. You should get contents of our test file instead of an error.

 

As you can see, you will get our test file contents, but it won't clear Varnish cache for this URL. We need to prepare Varnish to enable PURGE requests. Edit VCL file located in /etc/varnish/default.vcl  We need to add following section:

acl purge {
        "localhost";
        "192.168.77.1"/24;
}

sub vcl_recv {
        #....

        if (req.method == "PURGE") {
                if (!client.ip ~ purge) {
                        return(synth(405,"Not allowed."));
                }
                return (purge);
        }
        #.....

}

In acl purge  we define addresses that can send PURGE requests. Localhost is obvious, if we want to send requests from PHP scripts. But sometimes we need to clear the cache from external IP address. So it's good to add it here. If you need to clear the cache only from PHP script, and do not send it from outside, leave localhost  only.

In sub vcl_recv  section we check if request method is PURGE, if so, we need to check if client.ip  is within allowed list for PURGE requests.

Try to send request again. It should clear the cache.

VCL

So what is this magic VCL file? In short words it a file designed to define request handling and caching policies for Varnish. We can specify additional rules, like removing request cookies, or disable the cache if user is logged in. Configuration strongly depends on your website or CMS you are using. You can use ready VCL files like this one - mattiasgeniar/varnis-4.0-configration-templates. This is really cool and well commented example VCL file. You can find multiple tricks for Varnish there, and it's really worth checking. It has configuration for WordPress and Drupal, but it should also work well with Symfony2 or Zend Framework. You can also find more details about Varnish and Symfony2 on their website.

It's all really depends on your website, used CMS or framework, installed plugins etc. So it's really hard write one VCL for everyone. You need to do it by trial and error or googling. But really good start is to check VCL file mentioned above.

Varnish settings per domain/Virtual Host

What if we have multiple websites and we want to have separate configuration per website? For instance one website is based on Drupal, second one is just pure PHP. Varnish settings for global is not possible due to some conflicts. How to deal with such issue?

VCL file and language is pretty powerful and we can resolve such conflict It's not that easy, but doable. Edit  default.vcl  file - /etc/varnish . Take a look on example file below:

vcl 4.0;

backend default {
    .host = "127.0.0.1";
    .port = "80";
}

# ...

sub vcl_recv {

    # ... global rules for all websites here

    # Return 404 error error in requests without Host. It's optional but helpful
    if(! req.http.Host){
        return (synth(404, "Host header missing"));
    }

    if(req.http.Host == "example.com"){
        include "/etc/varnish/example.com.vcl";
    }else if (req.http.Host == "secondexample.com"){
        # you can also place normal rules here, no need to include files
    }
}

Varnish has include command. With this command you can include pieces of code inside different files. So you can have kind of per domain settings. It's not perfect, because you need do it inside sub directive. It's not possible to include whole sub vcl_recv sections  etc.

With this there comes second disadvantage. If you have larger pieces of instructions per domain and you have to modify another sub  like vcl_backend_response , you need to make similar if statement inside each section.

You can modify the if as you wish. But remember, that you can also use ~  to mark domains with subdomains + www etc.

In this example there's no port. If you are testing Varnish on different port, so it's not running on port 80, you need to specify the port as well. if(req.http.Host == "example.com:6081")

This is not perfect solution, but it's working. In real life you probably won't have to create statement like that. Most of the rules can be applied globally. But if you need to make an exception for particular domain, now you know how to do that;)

Varnish useful tools

Here are couple of really useful tools that comes with Varnish.

varnishstat

This command will give you an access to Varnish statistics. You can see lot of data there, but really important are cache_hit  and cache_miss . If you have lot of cache_miss , that means, that your content is not served from Varnish, but from backend like Apache. It also means that something is not OK and you should really look through your configuration and caching.

If you have great number of cache_hit  and not that many cache_miss  it's perfectly alright. When Varnish cache expire for particular URL it goes directly to backed and cache_miss  value is incremented.

varnishtop

varnishtop is yet another useful piece of software you can find in varnish toolbox. It shows the URLs that have most hits to your backend. It is also great debugging tool. The cool thing is that you are able to filter the logs. There are so many filters that it's not possible to list all of them here. But there is great documentation for this tool, that you can find here.

Here are two examples, that are really useful

  • varnishtop -i ReqURL  it will show list of URLs that comes from Varnish most frequently 
  • varnishtop -i BareqURL  it will show you list of URLs that goes to your backend the most. This is really great command, because you can easily catch most frequent misses.

varnishlog

This is another great tool. It logs all requests in real-time with tons of data. It's great for debugging why particular URL doesn't work (ie. it's not cached, but it should). It has very similar syntax to varnishtop, so if you master varnishtop, you can use varnishlog without any problems.

By deafult, Varnish doesn't log to file. This is really important. Why's that? Think about such situation. You have an API with some endpoint or one particular page in WordPress. This endpoint/page has heavy load, let's say 1000 requests per second. Apache can't handle it, so you install Varnish and suddenly everything is OK. But you don't know what's going on in logs. In Apache access.log you have only few requests which has cache_miss in Varnish. But Varnish doesn't log anything to file. And to be honest - it's correct behavior. First of all, it will speed up entire service, because you don't need to waste the time for I/O operations on your hard-drive every requests. Second of all, digging in really big access.log is pain in the a** and it's difficult to find anything there.

But if you need logging to file, because let's say, you don't have some really high traffic you can enable logging to file by executing:

sudo service varnishlog start

and if you want to run it on system start

sudo chkconfig varnishlog on

Varnish logs can be found by default in /var/log/varnish . But please, be aware, that it will generate lot of writes to your hard-drive and log file will increase it's size pretty fast on heavy load websites. So this solution is not recommended.

varnishncsa

It's another useful tool for logging. It's different output from varnishlog. varnishlog will log every detail into your log about the query. If you need logs similar to for instance Apache logs, you should use varnishncsa command. By default it's disabled just like varnishlog. If you want to log to file you should start the service varnishncsa and add it to autostart with chkconfig.

varnishncsa

Performance

OK, awesome, Varnish is super cool , my websites goes blazing fast! But hey! Is it possible to speed it up even more? Well, theoretically - yes. There are few tips that can increase your throughput. But you must remember one thing - the MOST important part is to have high hit rate. If you have low hit rate and high miss rate, this tips won't help you that much. So remember, achieve high hit rate first. Then take care of Varnish tuning:)

Few tips before you start tuning up Varnish:

  • There is no one common ctrl+c ctrl+v method that will improve results on all machine
  • Change one thing at the time and then WAIT for results if it helps or not
  • Monitor statistics with Varnish tools. They will tell you whether the change will work or not
  • Remember that Varnish is blazing fast by default. If you don't have enormous traffic, you'll probably don't need these tips.
  • Increasing particular values too much can't make your hardware die. It will eat up all your CPU or RAM resources, so please be careful.
  • Monitor, monitor and monitor your system:)

OK, let's start tuning. First, i suggest to read Varnish book and chapter about tuning. It will help better understanding of what's going on.

Varnish has configuration file that contains important options that will help you increase Varnish power. Edit the file by sudo vi /etc/sysconfig/varnish

Now Varnish supports couple of ways how to configure Varnish daemon. They are all present in this file so figure out which one are you using. By default they are called Alternative 1, Alternative 2 and Alternative 3.

In my opinion Alternative 3 is the best, because you only need to change variable value instead of whole DEAMON_OPTS  string. In Varnish 4 it's enabled by default, rest of alternatives should be commented out. If not, comment out Alternative 1 and Alternative 2, and uncomment Alternative 3. But if you want you can stick to any Alternative you wish. It's just the matter of passing configuration to Varnish daemon.

You can set Varnish port there, secret file etc. But among the settings you will find threads section, and storage type. Let's start with storage type.

Files or RAM for cache? 

# # Cache file size: in bytes, optionally using k / M / G / T suffix,
# # or in percentage of available disk space using the % suffix.
VARNISH_STORAGE_SIZE=256M
#
# # Backend storage specification
VARNISH_STORAGE="malloc,${VARNISH_STORAGE_SIZE}"

Now it's up to You and your server resources how to set these values. By default Varnish will use your RAM as a storage. It also can use the file method, but keeping things in RAM is lot faster than reading them from hard-drive, especially if it's not SSD. But if you have little RAM on your server and lot of URLs to cache, use file , instead of malloc .

So this is one thing that will speed up your Varnish - using RAM. How many? Larger value is definitely better, but keep in mind, that it's not strict value. Varnish will add overhead per each object in cache (around 1kb). So if you have large amount of objects, it will increase storage size. You need to play around with this value a bit and check how much is enough. Keep in mind that you need to have enough free RAM for other processes like httpd or php-fpm.

If you will use RAM check if the storage size is big enough to handle all your requests. Use this command:

varnishstat -1 | grep n_lru_nuked

If the number of LRU nuked objects is greater than 0, then it means that or your storage size is not big enough, or you have something wrong with caching. For instance you try to cache assets or images for very long period of time.

Threads settings

Second thing that will increase Varnish performance are threads. In the same file you will find settings for them:

# # The minimum number of worker threads to start
VARNISH_MIN_THREADS=50
#
# # The Maximum number of worker threads to start
VARNISH_MAX_THREADS=1000

There are multiple options for threading in Varnish, but you should mostly play around with these two values. You can increase both values here, but you should keep withing the Varnish guidelines. Here is explanation how Varnish threads works from Varnish documentation.

When a connection is accepted, the connection is delegated to one of these thread pools. The thread pool will further delegate the connection to available thread if one is available, put the connection on a queue if there are no available threads or drop the connection if the queue is full.

Read here how to calculate how many threads you should run to not to kill your server.

Varnish and security

There are various of methods how to secure more your Varnish. We will not cover them here. But to make things clear - most important thing is to secure your purging requests. Do not allow anyone to purge your cache from outside. He can purge entire domain, and if you have huge traffic - it can kill your website when all requests will come to your backend server.

You can install also firewall for Varnish. It's really similar to ModSecurity for Apache. You have basically two options:

They are just additional rules in VCL that will block unwanted requests.

Our opinion about that? They are OK, but we like to keep Varnish VCL as simple as possible and do not add additional overhead to parsing requests. In addition we have ModSecurity for Apache installed, and it does basically the same job. So is it worth to have two security layers and add exception rules to both of them? We will just stick to Apache ModSecurity:)

Varnish on port 80

Now it's the most important part. For now we use Varnish on "test" port 6081. Before you will switch Varnish to port 80, make sure, that IT WORKS. If requests are cached correctly etc. It's important to put working Varnish as an entry point to your server, instead of just additional layer that will forward all your requests to Apache.

First you need to pick the new port for Apache. Common practice is to use 8080, but this port is often used for different tools like SOLR. We usually set port 81 for Apache. After you will pick a port number, you need to edit few files.

Note that if you are not following our tutorial from start, your files might be located elsewhere. 

httpd.conf

We need to change main port for Apache in httpd.conf file

# File located in /usr/local/apache2/conf/httpd.conf
Listen 81

All VirtualHosts files

All Virtual Hosts files also contain port. You need to change every single one file with new port.

#All files with .conf extension located in /usr/local/apache2/conf/vhosts
<VirtualHost *:81>
        ServerName example.com

Firewall rules for new Apache port (optional step)

This is optional step. We have working firewall on our server. If you will need to access website directly (not through Varnish cache) you can open new port. It's usually useful when it comes to debugging. If this is your app issue or is it Varnish cache fault. If You need to enable new port, add appropriate lines inside firewall rules.

# Firewall rules are in /etc/iptables.firewall.rules
-A INPUT -p tcp --dport 81 -m state --state NEW,ESTABLISHED -j ACCEPT
-A OUTPUT -p tcp --sport 81 -m state --state ESTABLISHED -j ACCEPT
-A INPUT -p tcp --sport 81 -m state --state ESTABLISHED -j ACCEPT
-A OUTPUT -p tcp --dport 81 -m state --state NEW,ESTABLISHED -j ACCEPT

Restart firewall to load new rules

sudo service firewall restart

Varnish daemon settings

Next thing is that we need to change Varnish daemon settings to listen on port 80.

# File is located in /etc/sysconfig/varnish
VARNISH_LISTEN_PORT=80

Varnish VCL file

Last thing is to change VCL file backend port, and update all rules to remove previous port 6081

# File is located in /etc/varnish/default.vcl
backend default {
    .host = "127.0.0.1";
    .port = "81";
}

# Remove :6081 if present in host
if(req.http.Host == "example.com"){

Remove old Varnish port from firewall

Remember when you added port 6081 to firewall in order to test some stuff? You will not need that port to be open.You can remove these lines from your firewall rules, and restart the firewall.

Restart httpd and Varnish

Once you change all files it's time to restart all daemons and check if everything works as expected.

sudo service httpd restart
sudo service varnish restart

Now check your website. It should be powered by Varnish now!

What's next?

Lengthy tutorial, wasn't it? But now your websites should be blazing fast! Remember to check it from time to time, if you have good hit/miss ratio:)

As always you can do everything here with Ansible and our repo, which you can find on GitHub.

Now you should be in place where you have secure CentOS, working LAMP and Varnish on top of everything. What's next? I believe that we should add nginx to our setup to keep everything nice and slim, and after that start installing some real useful things like Redis, RabbitMQ and add some good tools to PHP etc. See you next time!

Our services: