About our wordpress deployment
We are frequently asked about our wordpress deployment by universities, NGOs and other institutions that’re interested in setting up their own multiuser blogging platform. We’ve been answering those questions on an ad-hoc basis – this page will serve to collect the most common questions and hopefully be something we can refer to interested parties. First – see our Project Info page if you’re interested in the early history of blogging at Berkman.
Notes: Answers are current as of 1/27/2011. This document does not represent the official position of the Berkman Center, Harvard Law School or any other entity.
What OS do you use?
Ubuntu LTS (multiple flavors). Any *nix would be great.
What is your current hardware for the blogs web server, and database server (we assume MySQL)? Does any of this run in virtual machines?
WordPress requires MySQL. There was a PostgreSQL fork a while back that died off pretty quickly. We have a well appointed database server (you’ll get a cert warning from this link as it’s using a self-signed SSL cert: the SHA1 fingerprint is CF:DD:34:9D:B8:CD:E0:B9:EE:E8:1D:0F:FE:A9:1F:33:36:58:0D:7C) that shares duties with many other sites and applications – the database server is not a virtual machine and has directly attached storage to maximize IO (all the normal stuff you’d do to create a high performance database server).
Our wordpress application server (again, you’ll get a cert warning because of a self-signed SSL cert, same SHA1 as above) is a xen VM with 3 gig of ram and 4 cores. We run nginx as a caching front-end proxy to our apache backend. I packaged up this nginx config as a plugin, along with sample configs, info here. My talk about high-performance wordpress (along with an overview of our nginx deployment) at Wordcamp Boston 2010 is here.
We’ve read about some of the improvements you have made via your news page : which improvement has been most important?
Hands down – the nginx caching proxy. Some requests are VERY expensive – RSS feeds, for instance. A caching proxy (or perhaps WP Super Cache) is a necessity. A default, uncached wordpress deploy IS NOT going to get you far.
You definitely want a physical machine to maximize MySQL IO. You should tune it properly for the large amount of RAM you’ve surely installed in it.
Your wordpress app server needs multiple cores to maximize concurrency.
Be sure to use a PHP opcode cache – APC has been nothing but unicorns and rainbows for us.
We could probably handle double the traffic with our current hardware, and nginx can load balance for us if/when we need to use multiple wordpress application servers. Our performance problems have not been related to our MySQL server so far.
Do you let users install custom themes or plugins?
No. We will install custom themes or plugins occasionally for special projects, but only after a thorough audit and after all development has taken place on a completely separate system.
Do you allow SSH access to blog owners / theme developers?
Any core hacks?
A couple, but we’re factoring those out and have even contributed one to the wordpress core. We expect to be on a completely clean wordpress core by Summer 2011.
Do you integrate with LDAP or another directory service?
Yes and no. We use apache’s mod_auth_ldap to protect some private blogs, but we don’t use it to populate users inside wordpress. This has worked out fine, with few complaints from users about having a separate account. It also has the advantage of allowing those who wouldn’t be in a university LDAP server to have accounts – alumni, contractors, collaborators, consultants, etc.
Who gets a blog?
Anyone with a harvard.edu address.
How do you deal with spam?
For comments, we use Akismet. It does a pretty good job, but it seems to be losing effectiveness over time. Either that, or the sheer volume of blog spam has been increasing – most likely it’s a combination of both. We also suggest that blog authors have comments close automatically on old posts (after, say, 30 days), and that they moderate comments to devalue us as a target.
For spam blogs or malicious users – requiring a harvard.edu address is a pretty high barrier. That said, we do have issues with compromised accounts, or university affiliates attempting to exploit us via linkfarming. We enforce our terms of service and view linkfarming as injurious to the university and against the spirit of this endeavor. Defining what’s spam can be a bit like defining obscenity – to paraphrase Justice Potter Stewart’s concurrence in Jacobellis v. Ohio, “you know it when you see it.”
What kind of traffic do you see? How many blogs are you hosting?
- 800+ live blogs, probably 200 are what you’d consider active, and maybe 100 are what you’d consider REALLY active.
- 700k visits per month, around 3 million+ page views by actual humans. Probably 7 to 8 million total page views counting bots.
- TONS of bot visits. It’s kinda like we’re under a continuous DDOS attack. See our minimal robots.txt – we attempt to enforce the Crawl-delay value through the excellent limiting features provided by nginx.
What’re people blogging about?
We host personal blogs, project blogs, the entire web presence for various working groups, archives of administrative updates, and a whole slew of other types of content. It’s perhaps best shown rather than told through a very small selection:
We wanted an open source multi-user blogging platform and it seemed the best choice at the time. We’ve been very happy with it, and there have been real improvements to the core features WITHOUT the core team throwing backwards compatibility under the bus.