Nginx as a front-end proxy cache for WordPress

nginx-wp-love

The short version:

We put an nginx caching proxy server in front of our wordpress mu install and sped it up dramatically – in some cases a thousandfold. I’ve packaged up a plugin, along with installation instructions here – WordPress Nginx proxy cache integrator.

The long version:

Here at blogs.law.harvard.edu, our wordpress mu was having problems. We get a fair amount of traffic (650k+ visits/month), – combine that with ‘bots (good and bad) – and we were having serious problems. RSS feeds (we serve many from some pretty prominent blogs.) are expensive to create, files are gatewayed through PHP (on wpmu), and letting PHP dynamically create each page meant we were VERY close to maxing out our capacity – which we frequently did, bringing our blogs to a crawl.

WordPress – as lovely as it is – needs some kind of caching system in place once you start to see even moderate levels of traffic. There are many, many high quality and well-maintained options for caching – however, none of them really made me happy, or fit my definition of the “holy grail” of how a web app cache should work.

In my mind, caching should:

  • be high performance (digg and slashdot proof),
  • light-weight,
  • be structured to avoid invoking the heavy application frameworks it sits in front of. If you hit your app server (in this case, wordpress) – you’ve failed.
  • be as unobtrusive as possible: caching should be a completely separate layer that lives above your web apps,
  • have centralized and easily tweaked rules, and
  • be flexible enough to work for any type (or amount) of traffic.

So I decided to put a proxy in front of wordpress to static cache as much as possible. ALL non-authenticated traffic is served directly from the nginx file cache, taking some requests (such as RSS feed generation) from 6 pages/second to 7000+ pages/second. Oof. Nginx also handles logging and gzipping, leaving the heavier backend apaches to do what they do best: serve dynamic wordpress pages only when needed.

A frontend proxy also handles “lingering closes” – clients that fail to close a connection, or that take a long time to do so (say, for instance, because they’re on a slow connection). Taken to an extreme, lingering closes act as a “slow loris” attack, and without a frontend proxy your heavy apaches are left tied up. With a lightweight frontend proxy, you can handle more connections with less memory. Throw a cache in the mix and you can bypass the backend entirely, giving you absolutely SILLY scalability.

On nginx – it’s so efficient it’s scary. I’ve never seen it use more than 10 to 15 meg of RAM and a blip of CPU, even under our heaviest load. Our ganglia graphs don’t lie: we halved our memory requirements, doubled our outgoing network throughput and completely leveled out our load. We have had basically no problems since we set this up.

To make a long story short (too late), I packaged this up as a plugin along with detailed installation and configuration info. Check it out! Feedback appreciated: WordPress Nginx proxy cache integrator.

86 thoughts on “Nginx as a front-end proxy cache for WordPress

  1. Pingback: How-to: Nginx as a front-end proxy cache for WordPress « WordPress Publisher Blog

  2. Matt! Great question. This is an issue, but not a huge one for us.

    For now, invalidation for anonymous users is fairly coarse – based simply on the cache timing out. However, if you submit a comment or log in, html is served uncached. For most blogs and for most users, this makes the cache fairly transparent.

    Ideally, I’d like to make nginx de-cache when wordpress sees posts/pages/comments change. I’d also like to give admins more knobs to tweak – X-Accel-Expires values emitted by wordpress could manage cache expiration times entirely.

    The caching rules I’ve articulated in the nginx conf probably need tweaking and may hold URLs too long for some sites. I have – so far – had only one “hey, this page isn’t updating!” complaint – a blog author made a change and logged out immediately afterwards.

    The freshness / performance tradeoff is one we’ve been happy to make so far.

  3. Dan,

    Thank you for a great plugin. I’ve been looking and testing the awesome plugins that already exist as far as caching WP goes but ended up running into issues and had to undo changes.

    I will try it on couple of my large sites and see how it does. If its doing great here I don’t see why it wouldn’t perform better else where.

    Thanks again,

    Aaron

  4. Hi Dan,

    is there any way to make the plugin compatible with WP Ban? None of the caching plugins have that support and that virtually defeats their purpose. I believe caching plugins are primarily intended for heavy traffic websites and if you’re an owner of such, you also have to put up with a lot of abuse.

    Now all by themselves, many caching plugins work great and all by itself, WP Ban works great as well, but if you use both of them together, a page intended for a banned user gets cached and served to legitimate visitors. This limits the possibilities of each of the plug ins to low traffic websites only cause low traffic sites either don’t need a cache plug in or don’t need a banning plugin.

    Any chance you would consider looking into it and force Nginx to never cache pages served by WP Ban?

    Mark

  5. @Mark: if wp-ban uses HTTP status codes properly: say the HTTP status code 403 – Forbidden, then it will work fine with this plugin. The example nginx config in this plugin states that only HTTP 200 response codes are cached.

    If wp-ban doesn’t use HTTP status codes properly and returns the ban page with 200 status code – well, then it needs to be fixed. This is what HTTP status codes are for. You should return a 403 status code when a client’s access to content is forbidden.

  6. Hey Dan,

    actually, WP Ban allows you to display a custom HTML page. So it’s not a 403 Forbidden Error. It simply redirects all requests from banned client to your custom page on which you can display a custom message or make it look like something else. Can that be worked with?

    Mark

  7. @mark – It looks like the wp-ban plugin is short circuiting page rendering and issuing a 200 status code along with the banned page content. This is incorrect. HTTP status codes exist for a reason, they allow servers to tell clients information about requests and what the server is configured to give as a response.

    wp-ban really should be fixed to use status codes properly.

    I think if you added:
    header(“HTTP/1.1 403 Forbidden”);
    above line 82 in wp-ban.php, it’d work fine with my nginx setup.

  8. Why not use nginx to serve the static files directly and use fastcgi to handle the php request?
    It’s saves your server load much more than direct php request to Apache

  9. @xitong: That’s a possibility. It’d be kinda a pain to use nginx front of WordPress Mu as it virtualizes the wp-content directory, so your frontend cache would have to be made aware of the backend location of wp-content for files and theme resources (I’m sure someone has already posted the nginx rewrite rules somewhere).

    It’s also fairly nice to be able to manipulate URL patterns and use all the awesomeness that is Apache – for instance, we use mod_auth_ldap for SSO to restrict access to some wpmu blogs that’re treated as intra-department intranet resources. For us, we can’t leave apache until we can re-implement our moderately complex mod_auth_ldap rules.

    Generally, though: for anonymous users, they are going to see things right out of the cache. I can’t imagine the overhead apache imposes versus fastcgi will be an issue when apache is just seeding the cache for most pageviews.

    Also, I’m not entirely convinced that fastcgi is all that much quicker, from direct experience and research. Here’s
    http://www.brandonturner.net/blog/2009/07/fastcgi_php_opcode_cache_benchmarks/

    I also like that apache gives you more knobs to tweak in terms of child processes – last time I used FastCGI, it was fairly coarse.

  10. @Klark We do run the admin backend behind SSL. Currently the HTTPS traffic is handled directly by apache, bypassing nginx entirely (shh! don’t tell anyone!) This hasn’t been a problem, a vast minority of our traffic comes through ssl.

    If you did want to proxy, though, I think you’d just configure your keys and add a new listen parameter to the main “server{}” clause. I’ll try it out and update the plugin. I’m sure this is drop-dead easy.

  11. @djcp: Check out the X-Accel-Redirect support I contributed to WPMU (which Donncha committed a fair while back)… Look for WPMU_ACCEL_REDIRECT in wp-content/blogs.php. It’s perfect for nginx. :-)

  12. Sweet! That’ll be another nice addition to this plugin – though I’d imagine that once a file is seeded in the cache you’re not going to get much more efficient than that. Still, unobtrusively avoiding the apache backend in more situations is a worthy feature.

  13. Pingback: Nginx as a front-end proxy cache for WordPress | Inside CoPress

  14. Pingback: Barely about the Apple iPad | Marketing Over Coffee Marketing Podcast

  15. Thanks. Just so I understand, your public server is http://blogs.law.harvard.edu/ and that is running Nginx.

    The backend server is something like backend.law.hardvard.edu. That’s where people edit?

    What does WordPress MU think its name is? The public one, or the backend one?

    Put another way, when you set it up, did you do it through Nginx?

  16. Nope, people edit through “blogs.law.harvard.edu” as well. I have the wordpress/apache backend listening on a private IP for port 80 and a public IP for port 443 (though that’s not really important). It could listen on an alternate port as well, which is how the (tested, and live on another machine) example configs work.

    The apache/wordpress server thinks its name is “blogs.law.harvard.edu”, this is because nginx sets the Host: header via the:

    proxy_set_header Host $host;

    and nginx passes that value to the backend.

    It’s fairly magic. The apache backend acts just like it would if it were publicly available.

    Say you have multiple standalone wordpress blogs on their own domain names – the nginx config works without modification. And in the apache backend, you’d just set up a normal virtualhost for each.

    As to installation – you can always layer nginx in later by modifying the IP/port your apache listens to, along with the virtualhost containers. If you did get the configuration right, though, the install process would “just work.” Apache doesn’t know any different.

  17. Thanks. This all sounds so simple, and cool. I am going to give it a shot. Thanks again for sharing.

  18. Sorry for another question: doesn’t WordPressMU complain about running on an alternate port?

  19. @klark No. An alternate port works fine. In my experience, if you’re proxying wordpress you have to ensure that the proxy sets the hostname – that’s the “proxy_set_header” thing above.

    If you were using apache as a frontend proxy, you’d use (depending on how you deployed. . .) “ProxyPreserveHost on”. Another data point.

  20. @ajay I link to the nginx installation page in the plugin docs. Otherwise, google is your friend.

  21. I *might* be able to figure that out.

    Just thinking out loud here:
    The big issue is that I have to test whether or not a user is logged in or if they’ve taken an action that should cause them to see a customized page – e.g. – posted a comment.

    If they should get a customized page, then we use a separate cache key that is never cached because wordpress emits an X-Accel-Expires header of 0. It knows to do this when it uses essentially the same cookie-based test that the nginx frontend does.

    If I could figure out a way to represent the three states:
    1) Not logged in (cache and get content from frontend proxy),
    2) Not logged in but has done something to cause a customized page (don’t cache or get content from the frontend proxy), and
    3) Logged in, along with some kind of unique identifier (don’t cache or get content from the frontend proxy).

    by only inspecting cookies, then I probably can make this work without the wordpress side. I’ll take another stab at it. I still think it’d be nice to have some kind of wordpress-level cache time control panel.

  22. This could be just what we’ve been looking to develop ourselves for a couple of months :) – something like this is vital for scaling WordPress once you get to the levels of which you speak.

    Thanks for sharing this – I will certainly spread the word once we have this configured and working properly ourselves

  23. The configuration you suggest, combined with your plugin, made my load average go up a lot.

    I’ve had zero problems with Nginx as the frontend, Apache as the backend, and WP Super Cache creating static html and gz files for Nginx to serve. I am curious to know if you tried this combination.

    If your configuration was able to cache and serve the html and gz files created by WP Super Cache, and serve those files from the Nginx cache there might be some benefit without the increased load average. It seems the missing link is the lack of html and gz files being served directly from the cache. Contact me if you need the WP Super Cache rewrite rules for the Nginx.conf.

  24. @NG – Yeah, do whatever you need to negotiate statically rendered content.

    I’ve seen the wp-super-cache / nginx frontend deployment option before: I figured I’d bypass wp-super-cache entirely and handle everything with nginx. It’s been working great here, and I really can’t see why it’d be less performant than your combo.

    Do you get a significant amount of authenticated traffic? If so, you should still use an object (and opcode) cache.

    I’m surprised to hear about your load average problems. Are you certain that content was getting served from the static cache properly? I’ve confirmed this by putting a snippet in my theme footer that echos when the page was generated and then looking to see if that changes on subsequent requests.

  25. @NG – wait – re-reading your post: if you installed my plugin and set up the nginx config properly, you would be seeing content pulled from the nginx cache. That’s the whole point of this setup. If you didn’t see nginx static caching for you, then you had it configured incorrectly.

  26. Thanks for a great guide to setting up the proxy cache, I’ve used it very successfully.

    I wonder though, if anyone has rewrite rules for nginx to access timthumb cache files directly if a cache file exists already?

    My home page (http://xpdnews.com) uses timthumb a fair bit and obviously each image fires up apache to serve even the timthumb-cached files, which isn’t so crash hot.

  27. @Nick McIntosh:

    The rule:

    location ~* wp\-.*\.php|wp\-admin {
    # Don't static file cache admin-looking things.
    proxy_pass http://wordpressapache;
    }

    is probably what’s passing all timthumb requests to the backend.

    If you put a rule something like this above it:

    location ~* timthumb\.php {
    proxy_cache_valid 200 120m;
    expires 864000;
    proxy_pass http://wordpressapache;
    proxy_cache staticfilecache;
    }

    then you’ll catch all URLs with “timthumb.php” in them and static-cache on the front-end for 120 minutes. Given that this solution caches based on the full URL (including query parameters) this should work just fine for anonymous access.

  28. Pingback: thanks « Fried Chicken and Okra

  29. Any thoughts on how to use nginx as the backend instead of apache? What would the nginx config for the backend look like?

    Thanks for your work on this!

    Cheers,
    Bill

  30. The backend doesn’t matter. It just sits there, serving pages. The frontend is almost entirely transparent to the backend wordpress apache|nginx|lighttpd|whatever. I can’t give you configs for an nginx backend because I don’t have them. I’m sure they’re around on the internets.

    Use whatever you want for the backend. We use apache and will probably never switch from it: we rely heavily on mod_auth_ldap to provide restricted access to several intranet blogs and the nginx ldap options do not look as robust or trustworthy.

  31. Pingback: Nginx front-end proxy cache for WordPress « hakre on wordpress

  32. Hey, i should boost friends wordpress installation (not using it myself)
    i just used nginx as frontend cacher for typo3 (together with nginx boost plugin, which puts the dynamic pages for xy mins (you can configure the value) in a memcached server (compiled page without the binary stuff (eg. picutres)).

    The nginx can serve the page then directly out of ram (and the apache is delivering the jpegs etc), php doesnt even start for a cached page.

    possible to tune your stuff and implement that?
    i have no idea about wordpress, but think that should be possible, and speed up wordpress sites really much.

    greets

  33. With enough RAM and a moderately sized corpus of cached resources, much of your nginx file cache will probably live in RAM buffers giving you much of the benefit of memcached. Memcached makes a huge amount of sense if you have multiple frontend machines to manage a single cache, though.

  34. Hi Don,

    Thanks for this great plugin and for this blog, it’s not easy to find info about this treat!

    On Feb, 3rd 2010 you wrote that your given nginx configuration would work for multiple stand alone WordPress blogs on separate domains, I guess it would work as well for membership sites running on php like Butterfly Software or I’m I wrong?

    In the backend on Apache, you stated that it’s necessary to create a virtualhost for each standalone install, referring to your install file you create a virtual host like this:

    ServerName blog.example.org
    RewriteEngine on
    DocumentRoot /home/wp/wordpress/
    ServerAdmin you@example.com

    #block POSTS without referrers.
    RewriteCond %{REQUEST_METHOD} POST
    RewriteCond %{REQUEST_URI} .wp-comments-post\.php*
    RewriteCond %{HTTP_REFERER} !.*blog.example.org.*
    [OR]
    RewriteCond %{HTTP_USER_AGENT} ^$
    RewriteRule (.*) – [R=403,L]

    If each virtual host uses the same IP: 127.0.0.1:8200 wouldn’t there be any problems?

    My last question concerns the opcode cache, I’m looking forward to using either xcache or eAccelerator, any recommendation?

  35. I set up a clean ubuntu install with nginx 0.7.66

    when i paste the example config i get a lot of errors:

    Starting nginx: [warn]: duplicate MIME type “text/html” in /etc/nginx/nginx.conf:22
    [emerg]: unknown directive “proxy_cache_path” in /etc/nginx/nginx.conf:34

    I have been serching a bit but i cant find a good answer that fixes the problem.

    any ideas?

  36. This is a great thread and awesome input. I have had trouble with increasing traffic and havent found a decent web application cache, at least not one I could readily wrap my head around (not a true techie). Thanks again for the information.

    Dean

  37. @Patrick – it sounds like you haven’t merged the config files changes correctly – they don’t all go in /etc/nginx/nginx.conf

    For instance, in the example configs where I say:

    # And then in the file /etc/nginx/sites-enabled/default

    That section goes in that file, essentially an nginx <virtualhost> container. Not all directives are valid in all contexts, you need to put the right things in the right places.

  38. Dan, thanks for the plugin. I been using it for several month now.

    Some time ago I came up with this idea of using AJAX for cache invalidation. Just yesterday I finally managed to create an actual plugin and so far it seems to be working perfectly.

    I would be really interested in hearing your feedback on the idea and how well you think it could scale and what are the possible drawbacks of such implementation.

  39. @Kaspars – 1 additional file stat per page shouldn’t be an undue burden, especially if it allows you to have longer cache times because you know that your pages are getting invalidated only when they need to.

    File stat()s CAN become a factor when you’re rendering pages if:
    1) You’re using a network filesystem. . . though some newer filesystems (nfs v4?) might have make this more efficient.
    2) You’re stat’ing all of your framework – if you’re using an opcode cache (APC, etc.) you can factor this out.

    As always, I would benchmark. It sounds fairly solid, I would think through what happens when a user doesn’t accept cookies (maybe you did).

  40. @Kaspars – I tried out that plugin or yours. Such a simple idea! It works great on a couple of my medium sized sites. Not sure about scaling to really large installations though.

  41. I’m getting ready to set up a new server starting on Tuesday. I’m confused only by two things in the nginx config file you offer:

    upstream wordpressapache {

    and

    proxy_pass http://wordpressapache;

    Am I supposed to change these? I’m sorry, but the answer isn’t obvious.

  42. @RT Cunningham – you can, but it doesn’t matter. “wordpressapache” is just used internally by nginx to refer to the upstream server. It’s just a name.

Comments are closed.