Monday, August 8, 2011

10 Minute Caching with Apache

The problem with caching is that it encompasses 3 layers of the stack, and requires a good solid read to understand what's really happening. You should enable browser caching on every site with more than 10 users, and proxy caching for resources pretty soon thereafter. This is my attempt to distill this down into a method that will get you "far enough".

Goal: cache everything, except that which is content in the browser and in a disk cache.
  1. Install and activate plone.app.caching (aka HTTP Caching). This is included by default after Plone 4.1. Use it. On the first tab: enable gzip and enable caching. Leave caching proxies and in memory cache at the default. On the third tab: Content files and images get moderate, content folder and content items get weak, and file and image and stable file and image get strong. You may want unstable file and image to be moderate but... I doubt it. When in doubt, use less caching. Last but not least, in the last tab, weak caching is configured to use last modified (that's it), moderate caching is default with just max age set, strong caching is same as moderate but up the expiration time to 864,000, and also check store in RAM cache. This number means those resources will be 10 days to be in the proxy/browser cache, and you can probably up that significantly as your caching chops get sharper. You may be tempted to put everything in ram cache at this point in time, but don't. That's for people who like to debug caches.
  2. Install Apache 2 and Configure Your Site. You only need to do this if this is a new site and/or it's not already configured. Many linux installs actually have this by default.  Some good ubuntu instructions are here. You probably want to enable rewrite and proxy as well, in which case you will need to a2enmod rewrite, proxy, proxy_http as well if you want to do rewriting, but this post isn't about that :)
  3. Configure Apache as a disk cache. Basic instructions are here. You will need to add one key line: "CacheIgnoreNoLastMod On". This is because the js and css resources don't have a modification time by default and we still want them cached on disk. In fact, these are the resources that we care the MOST about being cached. Make sure to set up the cache cleaning process. Check out the example site config to see how simple it can be. Note that this is just a balls to the wall quick write up. All apache is doing here is looking at your caching headers from p.a.caching and doing the same thing that the browser would. When configured properly, after the first fetch, Plone will never render any css, js, or image files again.
  4. Validate. Use LiveHeaders in Firefox or look at the network diagrams in Chrome/Safari. JS, CSS, and [non-content] images should never 304. They should download on the first pass and then always pull from cache. Pages, folders, or any other content should always 200. If you see any 404s in the meanwhile, stop immediately and fix them. Clear the cache. Validate that all new page requests come from the cache.
FAQ
  1. How do I know its hitting the cache? I'm so glad I pretended like you asked. There is no built in way to do this but with some clever jiggering it's easy. From the command line, tail the Z2.log(s) for each running zope instance. (Pst: you can tail multiple logs at once by just listing them with a space in between). Then, open up the browser. Load a page. Notice that everything is pulled the first time. Clear the cache. Load the page again. Note that all the resources have 200 OK status (meaning they were actually retrieved from the servers) but there is no entry in your Z2.log. It's a festivus miracle! This means that they were served from apache.
  2. What about HAProxy? That's nice, and I totally recommend it. But not required. 
  3. Should I use an apache in buildout? No. Unless you want to hate yourself or you are an expert sysadmin [that is trying to get fired]. Tutorials don't care about buildout and you want to have access to tutorials.
  4. Shouldn't I be using varnish too? Perhaps. I personally have never liked it, and there is no one config out of the box that makes me happy. All of my sites are closed, pretty dynamic and include 3rd party integration so caching individual pages never suits me. Remember, this is a 10 minute way to get basic results. Varnish is really overkill for a lot of sites and will just give you more headache than its worth. 
  5. Purging? That's for another tutorial. 
  6. What exactly is cached here and where? This does NOT cache and views or pages. That's the point. It's simple enough for debugging and small setups while still redirecting "crap traffic" to another server. It caches just enough and no more. I'm pretty sure there is a your mom joke in here somewhere...
  7. Isn't apache for old people? Beat it punk.
  8. Did you test these instructions? No, they were written in anger and rush. Comment if I messed up and I'll help you then update this. If there is significant interest I can do a more detailed post. This assumes a minimum sysadmin knowledge.
  9. What if I want to force fresh css/js? Easy. Just reinstall your product OR turn development mode on then off for js and/or css. This causes a new file name to be generated and the caching cycle to start over again. For static images, you may be in a harder situation. I have just relegated to create a new name for the images each time. I find that these rarely change since these images are almost always going to be from design. If you can wait the 10 days (or whatever time you set for strong caching) you don't need to do anything. \o/

4 comments:

  1. Nice! Folks need to remember that for most cases, a little caching is cheap and goes a long way.

    BTW, Nginx configuration would be only trivially different.

    ReplyDelete
  2. Thanks for the writeup! Regarding FAQ #9, just clicking the SAVE button at the end of the portal_css or portal_javascripts ZMI pages is enough to generate the new filename. No need to turn development mode on and then off, although I used to do that too :)

    BTW I'd love to see a redundancy of that SAVE button at the top of the page, not sure if it'd be possible. We'd surely need a warning so people don't save too soon, before the page has fully loaded, otherwise some entries could be lost.

    ReplyDelete
  3. I loved this post for so many reasons:

    * I love PloneChix! (who doesn't?)

    * I too use Apache and it's built-in caching/proxying (and load-balancing) capabilities until a project proves it needs more, to keep things simple

    * I haven't expirmented with p.a.caching yet on Plone 4.1, and this comes at a perfect time, as I start rolling out 4.1 sites.

    Thanks for the info and see you at the Plone Conference in San Francisco for more brilliant sysadmin insights! (Like the plug?)

    ReplyDelete
  4. @Davilima oh man good to know! that will save me at least a minute a month (every second counts)

    @ctxlken go go go! I love the plug :)

    ReplyDelete