You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

~ Archive for metrics ~

Hey, didja catch that BBC Focus article?

ø

BBC Focus put out a micro-comparison of Wikipedia, Britannica Online, Encarta, and Infoplease, asking three experts to review one article apiece. Suburbia describes it well.

Reporters running a statistically insignificant comparison with other references, is becoming as popular as vandalizing Wikipedia, when it comes to coming up with a story to publish.

“Fatally Flawed” — Internal Britannica Review Tackles Nature Methods

1

Below is a letter that Encyclopedia Britannica sent out today to some of its customers, in response to the December Nature article comparing the accuracy of articles in Wikipedia and Britannica.  A more detailed review of the Nature study, including responses to each alleged error and omission, is linked from the front page of www.eb.com; you can also see an HTML version of the review here (thanks to Ben Yates).

In one of its recent issues, the science journal Nature published an article
that claimed to compare the accuracy of the online Encyclopædia Britannica
with Wikipedia, the Internet database that allows anyone, regardless of
knowledge or qualifications, to write and edit articles on any subject.
Wikipedia had recently received attention for its alleged inaccuracies, but
Nature’s article claimed that Britannica’s science coverage was only
slightly more accurate than Wikipedia’s.

Arriving amid the revelations of vandalism and errors in Wikipedia, such a
finding was, not surprisingly, big news. Perhaps you even saw the story
yourself. It’s been reported around the world.

Those reports were wrong, however, because Nature’s research was invalid. As
our editors and scholarly advisers have discovered by reviewing the research
in depth, almost everything about the Nature’s investigation was wrong and
misleading. Dozens of inaccuracies attributed to the Britannica were not
inaccuracies at all, and a number of the articles Nature examined were not
even in the Encyclopædia Britannica. The study was so poorly carried out and
its findings so error-laden that it was completely without merit.

Since educators and librarians have been among Britannica’s closest
colleagues for many years, I would like to address you personally with an
explanation of our findings and tell you the truth about the Nature study.

Almost everything Nature did showed carelessness and indifference to basic
research standards. Their numerous errors and spurious procedures included
the following:

*       Rearranging, reediting, and excerpting Britannica articles. Several
of the “articles” Nature sent its outside reviewers were only sections of,
or excerpts from Britannica entries. Some were cut and pasted together from
more than one Britannica article. As a result, Britannica’s coverage of
certain subjects was represented in the study by texts that our editors
never created, approved or even saw.
*       Mistakenly identifying inaccuracies. The journal claimed to have
found dozens of inaccuracies in Britannica that didn’t exist.
*       Reviewing the wrong texts. They reviewed a number of texts that were
not even in the encyclopedia.
*       Failing to check facts. Nature falsely attributed inaccuracies to
Britannica based on statements from its reviewers that were themselves
inaccurate and which Nature’s editors failed to verify.
*       Misrepresenting its findings. Even according to Nature’s own
figures, (which grossly exaggerated the number of inaccuracies in
Britannica) Wikipedia had a third more inaccuracies than Britannica. Yet the
headline of the journal’s report concealed this fact and implied something
very different.

Britannica also made repeated attempts to obtain from Nature the original
data on which the study’s conclusions were based. We invited Nature’s
editors and management to meet with us to discuss our analysis, but they
declined.

The Nature study was thoroughly wrong and represented an unfair affront to
Britannica’s reputation.

Britannica practices the kind of sound scholarship and rigorous editorial
work that few organizations even attempt. This is vital in the age of the
Internet, when there is so much inappropriate material available. Today,
having sources like Britannica is more important than ever, with content
that is reliable, tailored to the age of the user, correlated to curriculum,
and safe for everyone.

Whatever may have prompted Nature to do such careless and sloppy research,
it’s now time for them to uphold their commitment to good science and
retract the study immediately. We have urged them strongly to do so.

Nature responded with a polite but firm declination.

1 Million What??

ø

The original English Wikipedia turns 1 million this week.  Kudos to KG, who won the millionth-article pool… the two-millionth pool is now closed, but you can still place (gentleman’s) bets on when the eleventy-billionth article will be written.  (Full disclosure: My money’s on 2021.)

New Hitwise Data (generated for WP)

1

New traffic data from Hitwise (.doc)
suggests that by their standards, Wikipedia is also in the top 20 orgs
with popular websites; though some, such as Yahoo, MSN, Google and
Myspace, have more than one site ahead of it.  Thanks to Hitwise  for sharing their results for the millionth article press release.

I hope that some of these leisure sites will start to integrate more
useful content with their portals, and not remain paeans to the id; it
is heartwarming to see useful content providers (such as pure search
engines, and news portals) near the tops of the list. 

Wikipedia fields 11% of education-related traffic, and 0.17%
of all traffic they measured, with Answers.com getting 1/3 of
that.  I asked for details on their methodology and sample size;
they claim 25 million users, but I don’t know their distribution,
geographically or otherwise.  They also show a pretty flat age
distribution from 18 through 44, and an even split along gender lines.

Pulses, Zeitgeists

ø

Wikipulse is gone . But its spirit lives on.  Perhaps it can be revitalized on a New Machine.  We can rebuild it. The Six Million Dollar Analytic>

17 lovers around the world rejoice

ø

This week Wikipedia briefly broke into the top-17 list of most visited websites, as gauged by Alexa Toolbar users; snagging the attention of 3% of them that day.  Rock on…

In other news, if you want to find out more about Wikipedia and are in the Boston area, come to the upcoming presentation at Simmons on Feb. 13.

The Open Society : Myth or Catastrophic Novelty?

ø

Earlier today, Jay-Zed pointed out the humor in juxtaposing fears of a Closed Web and resulting closed society, with the dramatic changes in openness, penetration, and reusability of information and tools over the past decade.  He posited that the existence of certain types of platforms
— for instance, inverted-hourglass networks and PC architectures —
was a specially enabling design decision, which was somewhat arbitrary
and potentially outmoded.  The implication was that without these
platforms, said dramatic changes would have been far less dramatic.

I also enjoy the juxtaposition of the recent explosive openness
with current fears about open channels of communication being closed
off; and do at times find myself laughing at over-pessimistic
statements about the world today.  On the other hand, I don’t
think that focusing on architectures, or on historical platform
choices, is very relevant to the changes we have seen.  A firmer
association can be found between penetration and reuse, and the
availability of ever-better toolchains and factories for mass
production.  

A methodical Gutenberg was not the unilateral harbinger of
the modern newspaper; that took many revolutions in pulp-processing and
printing-press design.  Today’s cheap, colorful paper production
is the result of tens of thousands of excellent, focused
innovations.  Likewise, ENIAC was not the harbinger of Ruby on
Rails (or any other modern library that allows someone with basic
programming skills to leverage 10 hours of familiarization into
a fully-customized and appealing application) — that took many
revolutions in software abstraction and philosophy…  nor were
DARPANet and IBM and Microsoft the natural father, mother, and holy concubine
of the modern “all-purpose computer”; this too was many scores of
years, and thousands of mathematical, engineering, and social
innovations in the making.

It is certainly charming that I can now find out what the Ohio
newspapers and tv stations are printing and showing, by looking online
or flipping through my satellite service.  But all the same, we hardly
live in the ‘most open’ environment our modern world has ever
known.  In many ways, we remain less open and networked than, say,
a cozy, classed Greek city-state, with a shared educational, social, and financial gossip network; shared religious, historical, and cultural anecdotes; and shared metrics
for success, civilization-wide goals, and honour; all far more intimate
than parallels in my country today.  Even the most all-telling of
tell-all [auto]biographies is diluted by this lack of openness.

Let us end on a positive note.  What further expansions in
openness may be expected or hoped for in the coming decades? 

  • An improvement in open sharing and classification of ideas,
    so that a good idea in one place is recognized and taken up in many
    others.  Great window-hinge, washing-machine, hobbyist and diaper
    designs should traverse the oceans; great experimental designs the
    fields; &c.
  • A new consciousness of making information public;
    people actively choosing every day to free and share their
    observations, discoveries, thoughts, and analyses — rather than only
    on special occasions.  This consciousness filtered out into
    processes, organizations, and governments.
  • A renaissance in the libraries of methods
    available to access information — one’s own, that of one’s family,
    that of one’s community and office, that of the world at large. 
    This is not dependent on a simple ‘application layer’ provided by a few
    organizations; any more than the question of “where can I find a copy of Anna Karenina” depends on the ‘layer’ of friends’ shelves, bookstores, libraries and online book-sellers I have access to.
  • … add your own!  good comments will be added to this list.

authority : an idea

ø

Joho wrote recently about distributed authority, providing trusted
views of Wikipedia content.  An excerpt from my reply follows:

Distributed authority — in the ‘stamp and seal’
sense — is not my idea.  And what I would like to see happen with research groups has
been suggested by others before me; there is simply growing interest in
it now. I want to make it easy for people who already work on and
review content in a field to do so in a way that directly improves
Wikipedia.

At the moment, individual authors ‘adopt’ certain articles and try
to keep them fresh and free of errors. And various organizations
maintain their own internal knowledge-bases with content that overlaps
a good deal with relevant Wikipedia articles

Rather than trying to hack an authority system into MediaWiki, you
can do something simpler to encourage both of the above : have groups
that maintain their own small clusters of articles — 10 or 20 or 100
— on a local wiki, with its own portal page. Give them an easy way to
offer their work for merging with WP, without requiring them to all
join the site. The edits they make are implicitly ‘approved’ by them.

This is not a good verification method within WP, however
software changes are required for that (and Seth’s suggestion is one
specific path one might take). At the moment, Nature can link to
revisions of 100 articles that they approve. But once you follow a link
through to a Nature-edited revision of [[DNA]], and follow a link to
another WP article, you’ve already returned to the realm of public
editing.

The motivation for this is a few professors and talented writers who
began editing on WP, but commented that editing Wikipedia directly can
be offensive and off-putting (they are readily offended by trolling,
and have no patience for even trivial wiki-lawyering).

We’re making progress towards Wikipedia 1.0, slowly but surely; I
think along the way we will improve both the default view of content
and the selection of optional views suggested above.

Community metrics: Size

ø

I have seen many estimates of the size
of Wikipedia’s community; all of them too low.  And what surprises
me most of all is that noone cares much about the lack of real metrics
in their speech, their writing, their journalism, their research. 
Okay, that last is going a bit far; many researchers are very careful
about defining their metrics and terms.  But this is what makes
those which are not stick out so severely.

Here are some basic statistics, care of Erik Zachte’s scripts, the Wikimedia Foundation’s server farms, and over 100,000 active contributors over the past four years (user statistics often exclude the 15% of edits which come from editors without named accounts).

To the point of the user community: 

  • There are more than 15,000 active English-language editors, at least 1500 of them editing ‘very actively’ — 100 times a month.
  • There are 30,000 active editors, and 4,500 very active editors, in all languages combined.

Just to reiterate the casual power of thousands of zealous volunteers
with a variety of content-addictions, some of the scripted data above
has a hand-generated and hand-updated wiki cousin, with its own original additions.

As for where I personally draw the line at counting community size, I
would say the English Wikipedia has this year passed the
10,000-volunteer mark, and is currently around 20,000.  We would
know better if we counted not only edits but page-views
per
user… there are those who edit infrequently but keep up with all
aspects of the community; and also many who edit occasionally but
haven’t taken
time to learn the community policies or norms; which one might discount.

I would estimate 60,000 in the ‘copyediting’ community (active
readers, familiar with the interface, acting as typo and vandalism
monitors; and anonymous contributors), and ten times again as many
regular readers – around 500,000.  

For all languages combined : 40,000 volunteers, perhaps 120,000 in the
‘copyediting’ community (people in other langs are on average less
likely to understand that they can edit; which I would expect to grow more than linearly
with the size of the community and press coverage in that language),
and some 2M active readers.

Good Samaritans : the strength of ten normal men?

ø

There’s been some hubbub lately about the usefulness of anonymous
contributions to the information commons.  In particular, Monday
saw a somewhat ad-hoc test of the effect on forcing account-creation on
the quality of contributions to the* English Wikipedia.

I have some statistics of my own to add about that particular
experiment.  However, for the moment I would simply like to point
to a lovely Wikipedia contribution analysis, “Explaining Quality in Internet Collective Goods: Zealots and Good Samaritans in the case of Wikipedia” (pdf) by researcher Denise Anthony, who presented it this past Monday at MIT.  Her research suggested to her that “the highest quality contributions come from the vast numbers of anonymous ‘Good Samaritans’ who contribute infrequently.”

http://web.mit.edu/iandeseminar/Papers/Fall2005/anthony.pdf

* Note : the direct article is appropriate here because of the
“English” adjective before Wikipedia. For more detail, see my old reply
to JDL at Joho’s house.

Log in