Squid doesn’t cache? Check your headers
I am sysadmin at Ivinco and wanted to share my recent experience configuring Squid. We’ve faced a problem with Squid caching which took me quite a while to fix. Below could be an interesting read to anyone who wants to understand what problems you can expect when configuring web page caching with Squid.
Our configuration on this project is simple and quite standard: we use Squid in front of the Apache servers. There are two Squid servers, they are load-balanced on firewall, and their caches are shared between each other (they are configured as siblings). We force caching in Squid by sending http headers in the web app.
Recently we’ve noticed that our Squid does not cache all dynamic pages. After initial research I’ve found a couple of weird things:
– Squid was not caching any files larger than ~50K
– Sometimes a file larger than 50K would get cached, but even when it happened, the sibling Squid server would still not use this cached object.
Deeper investigation showed that static content larger than 50K is not being cached too. The following two options has something to do with that, setting this makes at least caching of big static files work:
range_offset_limit 16 MB
Also I found that there were some recent bugs in Squid related to maximum_object_size option – if it was defined after cache_dir, it was silently ignored. I moved it before cache_dir just in case. Additionally, I’ve added max-size=16777216 option into cache_dir definition, i.e.:
cache_dir aufs /mnt/data/squid/cache 256000 256 512 max-size=16777216
But all this helped only with static files. So there was still a problem with our PHP generated content – it was sometimes getting cached and sometimes not, without any clear pattern.
Researching further, I’ve found a cool service – http://redbot.org. It can analyze any URL for it’s “cacheability” and give some recommendations. Using this service, I’ve found the key problem. Our PHP scripts were returning “Expires” header using PHP’s date(“r”) function. This format does not fully match the RFC Squid expects to see in the header: date(‘r’) is RFC8222, and Squid wants RFC822. The difference is in how the timezone is specified:
date('r'): "Expires: Tue, 10 Feb 2015 14:40:43 +0000"
what squid respects: "Expires: Tue, 10 Feb 2015 14:40:43 GMT".
This minor difference seriously lowered the chance of object caching. After fixing this our dynamic page caching started working properly.
There was one more issue related to incorrect testing. While making tests, I was running something like this:
"for i in `seq -w 1 1000`; do curl -is http://domain.com/test.html | head -20 | grep Age; done;"
This returns part of the header ‘Age: NNN’ which means that the object is cached. But what I saw was that when the request went to the neighbor Squid server, my testing script returned no such header, which means the object was still not properly cached. After making a deeper research I found that the problem was just in my method of testing. Doing “|head -20” or any other method of cutting part of response and terminating the request makes Squid skip caching since the client disconnects before getting full response. After changing the test method to this:
"for i in `seq -w 1 1000`; do wget -S http://domain.com/test.html 2>&1 | grep Age; done;"
the problem was gone.
Basically that’s it. The resume is that at first we thought that Squid is glitching since it’s caching behavior was quite unpredictable, but after the deeper research we figured out a number of internal issues and once they were fixed, Squid started working properly.
Just be careful with the http headers you return to control Squid’s behavior, because it won’t give you any feedback/clue when you’re do this wrong. Otherwise Squid proved to be a good web caching solution.