Archiving Web links: Building global layers of caches and mirrors
The Web is highly distributed and in flux; the people using it, even moreso. Many projects exist to optimize its use, including:
- Reducing storage and bandwidth: compressing parts of the web; deduplicating files that exist in many places, replacing many with pointers to a single copy of the file [Many browsers & servers, *Box]
- Reducing latency and long-distance bandwidth: caching popular parts of the web locally around the world [CDNs, clouds, &c]
- Increasing robustness & permanence of links: caching linked pages (with timestamps or snapshots, for dynamic pages) [Memento, Wayback Machine, perma, amber]
- Increasing interoperability of naming schemes for describing or pointing to things on the Web, so that it’s easier to cluster similar things and find copies or versions of them [HvdS’s 15-year overview of advancing interop]
This week I was thinking about the 3rd point. What would a comprehensively backed-up Web of links look like? How resilient can we make references to all of the failure modes we’ve seen and imagined? Some threads for a map:
- Links should include timestamps, important ones should request archival permalinks.
- When creating a reference, sites should notify each of the major cache-networks, asking them to store a copy.
- Robust links can embed information about where to find a cache in the a tag that generates the link (and possibly a fuzzy content hash?).
- Permalinks can use an identifier system that allows searching for the page across any of the nodes of the local network, and across the different cache-networks. (Browsers can know how to attempt to find a copy.)
- Sites should have a coat of amber: a local cached snapshot of anything linked from that site, stored on their host or a nearby supernode. So as long as that site is available, snapshots of what it links to are, too.
- We can comprehensively track whether sites have signalled they have an amber layer. If a site isn’t yet caching what they link to, readers can encourage them to do so or connect them to a supernode.
- Libraries should host amber supernodes: caches for sites that can’t host those snapshots on their host machine.
- Snapshots of entire websites should be archived regularly
- Both public snapshots for search engines and private ones for long-term archives.
- A global network of mirrors (a la [C]LOCKSS) should maintain copies of permalink and snapshot databases
- Consortia of libraries, archives, and publishers should commit to a broad geographic distribution of mirrors.
- mirrors should be available within any country that has expensive interconnects with the rest of the world;
- prioritization should lead to a kernel of the cached web that is stored in ‘seed bank‘ style archives, in the most secure vaults and other venues
- There should be a clear way to scan for fuzzy matches for a broken link. Especially handy for anyone updating a large archive of broken links.
- Is the base directory there? Is the base URL known to have moved?
- Are distant-timestamped versions of the file available? [some robustlink implementations do this already]
- Are there exact matches elsewhere in the web for a [rare] filename? Can you find other documents with the same content hash? [if a hash was included in the link]
- Are there known ways to contact the original owner of the file/directory/site?
Related questions: What other aspects of robustness need consideration? How are people making progress at each layer? What more is needed to have a mesh of archived links at every scale? For instance, WordPress supports a chunk of the Web; top CDNs cache more than that. What other players can make this happen? What is needed for them to support this?
Cop dines with homeless mother of four, gets kudos. Her plight is ignored.
Recent news blurbs across our fair state, applaud a state trooper for “sharing lunch with a homeless mother of four”. (Headline language).
This was noticed and photographed by a passerby; the trooper then identified by the state police and posted to their online webpage praising him for his good deed; a CBS affiliate spent hours tracking down both the photographer and the woman for a video interview. They got quotes from her about: being a ‘homeless panhandler’, his common decency, and her surprise. She was described by her motherhood, her panhandling, and being down on her luck.
And that’s it! Nothing thoughtful about why this young mother is homeless in Fall River, or what will become of her family. No opportunities to reach out and fix a tragedy. She clearly needs more than one good meal and healthcare, but the outpouring of interest in the viral photo is entirely directed towards how and whether to applaud the police officer [who, quite decently, refused to be interviewed], how this reflects on police officers everywhere, how this perhaps restores faith in humanity.
Reader: Discover the effect of happiness on your health today
“When I was 5 years old, my mother always told me that happiness was the key to life. When I went to school, they asked me what I wanted to be when I grew up. I wrote down happy. They told me I didn’t understand the assignment, and I told them they didn’t understand life.” —Lennon
From the BODYWORLDS exhibit in Amsterdam, full of flayed and preserved human bodies.
WMF Audit Committee update – Call for Volunteers
Friday June 05th 2015, 7:07 pm
Filed under: wikipedia
The Wikimedia Foundation has an Audit Committee that represents its Board in overseeing financial and accounting matters. This includes reviewing the foundation’s financials, its annual tax return, and an independent audit by KPMG. For details, and the current committee members, see the WMF’s Audit Committee page and the Audit Committee charter.
I currently serve as the Audit Committee chair. We are forming the committee for 2015-16, and are looking for volunteers from the community.
Members serve on the Committee for one year, from July through July. The Foundation files its annual tax return in the U.S. in April, and publishes its annual plan in June. Committee members include trustees from the Foundation’s board and contributors from across the Wikimedia movement.
Time commitment for the committee is modest: reviews are carried out via three or four conference calls over the course of the year. The primary requirement is financial literacy: some experience with finance, accounting or auditing.
If you are interested in joining the Committee for the coming year, please email me at sj at wikimedia.org with your CV, and your thoughts on how you could contribute. Thank you!
Soft, distributed review of public spaces: Making Twitter safe
Successful communities have learned a few things about how to maintain healthy public spaces. We could use a handbook for community designers gathering effective practices. It is a mark of the youth of interpublic spaces that spaces such as Twitter and Instagram [not to mention niche spaces like Wikipedia, and platforms like WordPress] rarely have architects dedicated to designing and refining this aspect of their structure, toolchains, and workflows.
Some say that ‘overly’ public spaces enable widespread abuse and harassment. But the “publicness” of large digital spaces can help make them more welcoming in ways than physical ones – where it is harder to remove graffiti or eggs from homes or buildings – and niche ones – where clique formation and systemic bias can dominate. For instance, here are a few ‘soft’ (reversible, auditable, post-hoc) tools that let a mixed ecosystem review and maintain their own areas in a broad public space:
Allow participants to change the visibility of comments: Let each control what they see, and promote or flag it for others.
- Allow blacklists and whitelists, in a way that lets people block out harassers or keywords entirely if they wish. Make it easy to see what has been hidden.
- Rating (both average and variance) and tags for abuse or controversy can allow for locally flexible display. Some simple models make this hard to game.
- Allow things to be incrementally hidden from view. Group feedback is more useful when the result is a spectrum.
Increase the efficiency ratio of moderation and distribute it: automate review, filter and slow down abuse.
- Tag contributors by their level of community investment. Many who spam or harass try to cloak in new or fake identities.
- Maintain automated tools to catch and limit abusive input. There’s a spectrum of response: from letting only the poster and moderators see the input (cocooning), to tagging and not showing by default (thresholding), to simply tagging as suspect (flagging).
- Make these and other tags available to the community to use in their own preferences and review tools
- For dedicated abuse: hook into penalties that make it more costly for those committed to spoofing the system.
You can’t make everyone safe all of the time, but can dial down behavior that is socially unwelcome (by any significant subgroup) by a couple of magnitudes. Of course these ideas are simple and only work so far. For instance, in a society at civil war, where each half are literally threatened by the sober political and practical discussions of the other half, public speech may simply not be safe.
Righteousness and peace, and recovering at last what we threw away
Dinesen‘s short story Babette’s Feast includes a lovely riff on Psalm 85. This is quoted in full towards the end, and refined in the film. In one of those revealing errors highlighting the fragility of citation, there is a canonical English misquote online, repeated in a thousand places, but the correct quote did not exist.
I leave the quote here in honor of the season. And I wish you, dear reader, a confident and grateful year, full of potential and choiceness.
Mercy and truth are met together.
Righteousness and peace have kissed one another.
Man, in his weakness and short-sightedness,
believes he must make choices in this life.
He trembles at the risks he must take.
We know that fear.
But, no - our choice is of no importance.
There comes a time when our eyes are opened.
And we come to realize at last that mercy is infinite.
We need only await it with confidence,
and receive it with gratitude.
Mercy imposes no conditions.
And, see: Everything we have chosen has been granted to us.
And everything we renounced — has also been granted.
Yes, we even get back what we threw away.
For mercy and truth are met together.
And righteousness and peace have kissed one another.
Snow Use’s Kitchen: dishes fit to make hearts melt and mouths water…
In Snow Use’s kitchen there stood a large stove,
And what she cooked on it she cooked with much love.
She used chunks of chocolate, melted in steam,
And sugar and egg-whites and oodles of cream.
(And, for effect, an occasional scream!)
She stirred it and mashed itinto a thick paste,
And added some cognac to give it more taste.
(As to the calories: they went to waist)
She poured the concoction into a strange mold;
Then into the freezer until it got cold.
(With a note saying: Please do not spindle or fold)
And when it was frozen so-o-o pleased was Snow Use,
For she had made Thidwick, the chocolate mousse
Digital rights groups in Europe are gaining ground: a model to watch
The recent historic wins for net neutrality in the EU demonstrate an organized and informed advocacy network that is still not echoed in the US or in many other parts of the world. We should celebrate and learn from their work.
Thanks to Axel Arnbak for his thorough and delightful writeup of this.
Aksyonov predicts Crimean takeover in ’79 novel
Vassily Aksyonov wrote The Island of Crimea in 1979 – about an imagined future. It looks surprisingly like the present.
Kudos to Michael Idov at the New Yorker for writing about it beautifully, with all of its spooky accuracy.
(Night Wolves! Aksyonov again!)
Women’s Public Voice: points left out of Mary Beard’s history of speech
Bruce recently recommended this essay on the historical public voice of women, by noted classicist Mary Beard.
Beard is a fine and provocative writer; it is good rhetoric.
But I don’t think it gives much insight into historical causes, or ways we can bring about change. Women face deeply gendered and hateful criticism today, particularly online. The argument that this is due to Greco-Roman rhetorical traditions, or the Western literary canon, is unconvincing. I was discouraged by the selection bias in the examples used.
I would love to see a revision of this essay that gets nuances right, and tries to explain changes in the past century based on its arguments.
+ The complexity of women’s voice in Rome, from Fulvia and Livia to Irene of Athens;
+ Greek admiration of Gorgo, Roman admiration of Zenobia;
+ Conflicting views of leaders in adjacent cultures (Boudica, Cleopatra, Dido);
+ The Old Testament (Deborah and Esther come to mind).
Misused for effect:
– Ovid: No metamorphs of any gender could speak; Io for one was changed back.
– Fulvia: First by describing her as someone’s wife, though she was one of the most powerful figures in Rome; then by framing her hatred of Cicero as a matter of gender.
On a tangent: here are two speeches I love, to lift the spirits. (Both American; I know less about oratory from the rest of the world. Suggestions welcome!):
Frances Wright on global patriotism and change:
# Independence Day speech at New Harmony (1828)
Margaret Chase Smith on an issue too great to be obscured by eloquence, thankfully no longer a concern today:
# Declaration of Conscience (1950)
“BRB singularity” : A comic on love, death, and robots
XBRB – stories from the Singularity.
A Blue/Red/Brown production.
Ty Burr examines the Aaron Swartz biopic in Sundance context
Thursday January 23rd 2014, 12:44 am
Filed under: Seraphic
A lovely combined review of four different biographies, helping to highlight the topography of each.
Ravalomanana v. Rajaonarimampianina
Madagascar’s presidential election, after 4 years of being couped up, heats up in neck-and-neck runoff with apparent vote-rigging and complaints about fraud on both sides.
It is a beautiful island of 22 million people; also a microcosm of regional political hijinks.