We’ve installed a google analytics plugin – you should see a “Google Analytics” choice under your “Settings” menu in the wordpress backend.
We’ve also installed a network-wide google analytics code to let us see activity on a broader basis. This doesn’t disclose any information to the blogs.law tech team that isn’t already available in the web server logs. We may not keep this network-wide tracking code in place permanently, depending on how well it jibes with our internal site-wide analytics.
Thanks for your patience, a lot of you have been waiting for better analytics for quite some time.
The datacenter that hosts blogs.law.harvard.edu is shutting down from Friday 6/10/2011 at 9pm to approximately Saturday 6/11/2011 to 9am, a duration of 12 hours. This maintenance window is necessary to allow University Information Services to upgrade power infrastructure.
Most Harvard staff, students, and employees should already be aware of this planned outage, but we were remiss in notifying our non-local or alumni bloggers earlier. We apologize for the short notice.
We will be returning 503 error codes for page requests. More info on why at google’s webmaster central blog. We will have staff and extra equipment on hand to ensure a smooth transition.
Please see this post for details on new features in WordPress 3.1. Most noticeable will be the site wide toolbar you see when you’re logged in.
Also, be sure to check out the “Gravity Forms” plugin, which is a great way to create custom forms to collect data or create feedback systems for your blog.
We are frequently asked about our wordpress deployment by universities, NGOs and other institutions that’re interested in setting up their own multiuser blogging platform. We’ve been answering those questions on an ad-hoc basis – this page will serve to collect the most common questions and hopefully be something we can refer to interested parties. First – see our Project Info page if you’re interested in the early history of blogging at Berkman.
Notes: Answers are current as of 1/27/2011. This document does not represent the official position of the Berkman Center, Harvard Law School or any other entity.
What OS do you use?
Ubuntu LTS (multiple flavors). Any *nix would be great.
What is your current hardware for the blogs web server, and database server (we assume MySQL)? Does any of this run in virtual machines?
WordPress requires MySQL. There was a PostgreSQL fork a while back that died off pretty quickly. We have a well appointed database server (you’ll get a cert warning from this link as it’s using a self-signed SSL cert: the SHA1 fingerprint is CF:DD:34:9D:B8:CD:E0:B9:EE:E8:1D:0F:FE:A9:1F:33:36:58:0D:7C) that shares duties with many other sites and applications – the database server is not a virtual machine and has directly attached storage to maximize IO (all the normal stuff you’d do to create a high performance database server).
Our wordpress application server (again, you’ll get a cert warning because of a self-signed SSL cert, same SHA1 as above) is a xen VM with 3 gig of ram and 4 cores. We run nginx as a caching front-end proxy to our apache backend. I packaged up this nginx config as a plugin, along with sample configs, info here. My talk about high-performance wordpress (along with an overview of our nginx deployment) at Wordcamp Boston 2010 is here.
We’ve read about some of the improvements you have made via your news page : which improvement has been most important?
Hands down – the nginx caching proxy. Some requests are VERY expensive – RSS feeds, for instance. A caching proxy (or perhaps WP Super Cache) is a necessity. A default, uncached wordpress deploy IS NOT going to get you far.
You definitely want a physical machine to maximize MySQL IO. You should tune it properly for the large amount of RAM you’ve surely installed in it.
Your wordpress app server needs multiple cores to maximize concurrency.
Be sure to use a PHP opcode cache – APC has been nothing but unicorns and rainbows for us.
We could probably handle double the traffic with our current hardware, and nginx can load balance for us if/when we need to use multiple wordpress application servers. Our performance problems have not been related to our MySQL server so far.
Do you let users install custom themes or plugins?
No. We will install custom themes or plugins occasionally for special projects, but only after a thorough audit and after all development has taken place on a completely separate system.
Do you allow SSH access to blog owners / theme developers?
Any core hacks?
A couple, but we’re factoring those out and have even contributed one to the wordpress core. We expect to be on a completely clean wordpress core by Summer 2011.
Do you integrate with LDAP or another directory service?
Yes and no. We use apache’s mod_auth_ldap to protect some private blogs, but we don’t use it to populate users inside wordpress. This has worked out fine, with few complaints from users about having a separate account. It also has the advantage of allowing those who wouldn’t be in a university LDAP server to have accounts – alumni, contractors, collaborators, consultants, etc.
Who gets a blog?
Anyone with a harvard.edu address.
How do you deal with spam?
For comments, we use Akismet. It does a pretty good job, but it seems to be losing effectiveness over time. Either that, or the sheer volume of blog spam has been increasing – most likely it’s a combination of both. We also suggest that blog authors have comments close automatically on old posts (after, say, 30 days), and that they moderate comments to devalue us as a target.
For spam blogs or malicious users – requiring a harvard.edu address is a pretty high barrier. That said, we do have issues with compromised accounts, or university affiliates attempting to exploit us via linkfarming. We enforce our terms of service and view linkfarming as injurious to the university and against the spirit of this endeavor. Defining what’s spam can be a bit like defining obscenity – to paraphrase Justice Potter Stewart’s concurrence in Jacobellis v. Ohio, “you know it when you see it.”
What kind of traffic do you see? How many blogs are you hosting?
- 800+ live blogs, probably 200 are what you’d consider active, and maybe 100 are what you’d consider REALLY active.
- 700k visits per month, around 3 million+ page views by actual humans. Probably 7 to 8 million total page views counting bots.
- TONS of bot visits. It’s kinda like we’re under a continuous DDOS attack. See our minimal robots.txt – we attempt to enforce the Crawl-delay value through the excellent limiting features provided by nginx.
What’re people blogging about?
We host personal blogs, project blogs, the entire web presence for various working groups, archives of administrative updates, and a whole slew of other types of content. It’s perhaps best shown rather than told through a very small selection:
We wanted an open source multi-user blogging platform and it seemed the best choice at the time. We’ve been very happy with it, and there have been real improvements to the core features WITHOUT the core team throwing backwards compatibility under the bus.
Boring but important – we’re now running wordpress 3.0.3. The last couple point releases have served to plug a few privilege escalation bugs – nothing too scary, but needing attention nonetheless.
We’ve also contributed a patch to the wordpress core that should be released in version 3.2. Hopefully we’ll have more to contribute in the future.
— Happy Holidays!
Upgrade to WordPress 3.0.1
The changes that matter most to me (as the sysadmin) aren’t of much interest to most: wordpress 3.0 merged vanilla wordpress and wordpress-mu into a single codebase. This means there’s only one wordpress to contend with, and as plugins get “certified” to work under wordpress 3.x, they should work for us as well (ceteris paribus, especially around security and privacy considerations).
Changes that might matter to you – as a blogger – include:
- A new default theme – “twentyten”. It’s clean and very customizable – check it out on my blog,
- Further refinement of the backend interface,
- A “get shortlink” feature on the post edit page for use in twitter and other size constrained social service.
We’re still building out the infrastructure, but we’re going to offer piwik and google as improved analytics options. Piwik is a fairly impressive open source analytics program (demo here) that’s a great option for the privacy-conscious: we will run the piwik analytics server and your visitor data won’t be leaked to third parties, ever. We’ll provide google analytics through a plugin if that’s your preference.
We’re looking to expand our selection of plugins and themes in our WordPress install, and would love your help figuring out what would be most useful to the blogs.law community.
Here’s the official WordPress plugin directory. Keep in mind that not all plugins can be run in a WordPress Mu environment, and that we reserve the right to reject plugins of dubious quality, security, or function.
Here’s the official WordPress theme directory. We don’t have a budget to pay for any premium themes (so only nominate free ones, please) and the same caveats apply around quality, security, and function.
Please comment below with your nominations, including the full URL to the item of concern. Happy blogging, and thanks for your help.
We’ve implemented a new page caching system that should help improve response time and ensure better uptime for the blogs.law.harvard.edu server. A “page cache” will intercept requests and serve pages WITHOUT invoking all of the wordpress code – allowing our site to serve many more requests over all. The page cache only comes in to play for non-logged-in users, and for those that haven’t posted a comment recently to a blog we host.
The page cache has one significant side effect: WP-slimstats will be even less accurate, as most pages served to non-logged in users come directly from a page cache. This means that WP-slimstats doesn’t know about all traffic. Wp-Slimstats over-counts illegitimate traffic and is inefficient, unsupported and inaccurate – we’re looking into other options to provide some kind of analytics for our sites that will work with the new page cache.
We get a significant amount of legitimate – and not so legitimate – traffic. We served over 3 million legitimate page views and around 300,000 unique visitors this August – on top of all the ‘bot traffic. Badly behaved ‘bots request too many pages simultaneously and can cause a significant service interruption in combination with all the legitimate traffic we’re already handling.
Our traffic – bad and good – has been increasing over time and we have been seeing more service interruptions due to high load: we have reached the point where we simply can’t continue to provide a high quality service without a page cache in place. We’ve already implemented a behaviour based ‘bot catcher and numerous other tricks to optimize how we serve content: the page cache is the newest weapon in our arsenal.
Thanks for your patience! The site should feel snappier. If you want to browse a fully cached version of your site, log out and clear any “blogs.law.harvard.edu” cookies. Logging out by itself isn’t enough, you have to clear your blogs.law cookies, too.
And there you have it. We’ve been upgraded to WordPress 2.8.4a, the latest stable WordPress Mu release.
You should notice some minor changes in the adminstrator backend for your blog – nothing major, just a nice set of refinements to the look-and-feel and a few new features (like the redone “widget” control under appearance -> widgets).
Please contact techhelp at cyber dot law dot harvard dot edu if you’re seeing any oddness. Thanks!