You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

The Longest Now


Anonymizing data on the users of Wikipedia
Wednesday July 25th 2018, 12:22 pm
Filed under: chain-gang,citation needed,Glory, glory, glory,wikipedia

Updated for the new year: with specific things we can all start doing 🙂

Wikipedia currently tracks and stores almost no data about its readers and editors.  This persistently foils researchers and analysts inside the WMF and its projects; and is largely unnecessary.

Not tracked last I checked: sessions, clicks, where on a page readers spend their time, time spent on page or site, returning users.  There is a small exception: data that can fingerprint a user’s use of the site is stored for a limited time, made visible only to developers and checkusers, in order to combat sockpuppets and spam.

This is all done in the spirit of preserving privacy: not gathering data that could be used by third parties to harm contributors or readers for reading or writing information that some nation or other powerful group might want to suppress.  That is an essential concern, and Wikimedia’s commitment to privacy and pseudonymity is wonderful and needed.

However, the data we need to improve the site and understand how it is used in aggregate doesn’t require storing personally identifiable data that can be meaningfully used to target editors in specific. Rather than throwing out data that we worry would expose users to risk, we should be fuzzing and hashing it to preserve the aggregates we care about.  Browser fingerprints, including the username or IP, can be hashed; timestamps and anything that could be interpreted as geolocation can have noise added to them.

We could then know things such as, for instance:

  • the number of distinct users in a month, by general region
  • how regularly each visitor comes to the projects; which projects + languages they visit [throwing away user and article-title data, but seeing this data across the total population of ~1B visitors]
  • particularly bounce rates and times: people finding the site, perhaps running one search, and leaving
  • the number of pages viewed in a session, its tempo, or the namespaces they are in [throwing away titles]
  • the reading + editing flows of visitors on any single page, aggregated by day or week
  • clickflows from the main page or from search results [this data is gathered to some degree; I don’t know how reusably]

These are just rough descriptions — great care must be taken to vet each aggregate for preserving privacy. but this is a known practice that we could do with expert attention..

What keeps us from doing this today?  Some aspects of this are surely discussed in places, but is hard to find.  Past discussions I recall were brought to an early end by [devs worrying about legal] or [legal worrying about what is technically possible].

Discussion of obstacles and negative-space is generally harder to find on wikis than discussion of works-in-progress and responses to them: a result of a noun-based document system that requires discussions to be attached to a clearly-named topic!

What we can do, both researchers and data fiduciaries:

  • As site-maintainers: Start gathering this data, and appoint a couple privacy-focused data analysts to propose how to share it.
    • Identify challenges, open problems, solved problems that need implementing.
  • Name the (positive, future-crafting, project-loving) initiative to do this at scale, and the reasons to do so.
    • By naming the positive aspect, distinguish this from a tentative caveat to a list of bad things to avoid, which leads to inaction.  (“never gather data!  unless you have extremely good reasons, someone else has done it before, it couldn’t possibly be dangerous, and noone could possibly complain.“)
  • As data analysts (internal and external): write about what better data enables.  Expand the list above, include real-world parallels.
    • How would this illuminate the experience of finding and sharing knowledge?
  • Invite other sociologists, historians of knowledge, and tool-makers to start working with stub APIs that at first may not return much data.

Without this we remain in the dark —- and, like libraries who have found patrons leaving their privacy-preserving (but less helpful) environs for data-hoarding (and very handy) book-explorers, we remain vulnerable to disuse.



In eternal rhyme: as Cyberiad draws nigh, a tiny Lem shrine
Sunday March 25th 2018, 1:49 pm
Filed under: chain-gang,Glory, glory, glory,noetic,poetic justice

Stanislaw Lem‘s Cyberiad is a miracle of 20th century literature, and of translation. I want to preserve parts of two stories here in as many languages as I can find.  Sources wanted for both, if you have a copy in your language:

  • The poems of Trurl’s Electronic Bard, with their exquisite compact wordplay.
  • (to come!) The story How the World Was Saved — where everything beginning with a single letter is destroyed. Douglas Hofstadter’s paean to translation, Le Ton Beau de Marot, touches on the challenges with translating this story.  

(more…)



Mental battlefield: How we are forfeiting the zeroth AI war
Monday August 07th 2017, 6:03 pm
Filed under: %a la mod,chain-gang,knowledge,metrics,popular demand

Part 2: Forging Social Proof – the Networked Turing Test Rules the First AI War

Last week, Jean Twenge wrote the latest in a series of reflections on connected culture: “Have smartphones destroyed a generation?

Some commentators wrote off her concerns as the periodic anxiety of an older generation seeing technology changing the world of their children, comparing it to earlier concerns about books or television.  But I don’t see this as Yet Another Moral Panic about changing tech or norms. I see it as an early AI conflict, one that individuals have lost to embryonic corporate AI.

The struggle is real

We have greatly advanced algorithms for claiming and retaining human attention, prominently including bulk attacks on shared Commons such as quiet spaces, spare time, empty mailboxes. This predates the net, but as in many areas, automation has conclusively outpaced capacity to react. There’s not even an arms race today: one the one hand, we have a few attention-preserving tools, productive norms that increasingly look like firewall instructions, a few dated regulations in some countries. On the other hand, we have a $T invested in persuasion, segmentation, attention, engagement: a growing portion of our economy, dinner conversations, and self-image as a civilization.

Persuasion is much more than advertising. The libraries of mind hacks and distractions we have developed are prominent in every networked app and social tool. Including simple things like adding a gloss of guilt and performative angst to increase engagement — like Snap or Duolingo adding publicly visible streaks to keep up daily participation.

We know people can saturate their capacity to track goals and urgencies. We know minds are exploitable, hard sells are possible — but (coming from a carny, or casino, or car salesman) unethical, bad for you.  Yet when the exploit happens at a scale of billions, one new step each week, with a cloak of respectability — we haven’t figured out how to think about it.  Indeed most growth hackers & experience designers, at companies whose immersive interfaces absorb centuries of spare time each day, would firmly deny that they are squeezing profit out of the valuable time + focus + energy of users :: even as they might agree that in aggregate, the set of all available interfaces are doing just that.

Twenge suggests people are becoming unhappier the more their attention is hacked: that seems right, up to a point. But past that point, go far enough and people will get used to anything, create new norms around it. If we lose meaningful measures of social wellbeing, then new ones may be designed for us, honoring current trends as the best of all possible worlds. A time-worn solution of cults, castes, frontiers, empires. Yet letting the few and the hawkers of the new set norms for all, doesn’t always work out well.

Let us do better.
+ Recognize exploits and failure modes of reason, habit, and thought. Treat these as important to healthy life, not simply a prize for whoever can claim them.
+ Measure maluses like addiction, negative attractors like monopoly, self-dealing, information asymmetry.
+ Measure things like learning speed, adaptability, self sufficiency, teamwork, contentment over time.
+ Reflect on system properties that seem to influence one or the other.
And build norms around all of this, countering the idea that “whatever norms we have are organic, so they must be good for us.

Comments Off on Mental battlefield: How we are forfeiting the zeroth AI war


“‘I participate in contact origami’, The Book”, The Movie
Thursday March 09th 2017, 8:48 pm
Filed under: fly-by-wire,Glory, glory, glory,indescribable,noetic

Footprints in a self-similar river. The occasional passing act of will that remains and is amplified downstream, so that at some future moment, perhaps fording at another spot altogether, you discover a print announcing to you alone that you have been there before.

screen-shot-2017-03-09-at-5-33-45-pmA decade ago, I once spent too long creating a stylesheet for a tiny “how-to” template: the numbers in boxes laying out a three-step process, whether to switch fonts, bold, padding, background and border colors. Making the css just right to work on screens of all sizes.

It looked something like this.  >>

In fact, almost exactly like that.  Some things worked, some didn’t.  I tried to add padding to the left of the roman numerals, tried to remove the pixel of whitespace above the bordered boxes, without success.  Should the roman numerals be left-aligned but the boxed text centered?  Since then, scores of similar templates have copied and remixed it, changing text and context but not style.  The color palette I settled on, almost content with it, shows up on hundreds of pages. It would now take a script and many hours to find and tweak each instance of the design.

I run across one myself every few months, and experience river-shock: the sense of seeing something simple you did once that has a quiet, pervasive mark that cannot be undone.  This is quite different from the sense of pride or dismay that comes from seeing the expected result of a major endeavor: a book in someone’s hands, a clinic building in use or in disrepair, a student now teaching others.

Another memory: One week I set about compiling a collection for a museum, a complete series of parts, diagrams and XO laptops: a few boxes full.  I had sent background context by mail, but at the last minute took a fine-tipped sharpie and attached clarifying notes to post-its on each cluster.

Years later, visiting the museum with a friend, I ran across the display as part of a history of computing; the electronics beautifully preserved as I had hoped, as I saw with pride.  And – river shock – a handful of my post-its, with small diagrams and 8pt-font notes to the curator, exactly where I had placed them.  Anyone with access to the materials could have chosen one of each and put them in a box; my handwriting made it seem like my own workdesk, enclosed in perspex and on display.

On occasion a visitor will find one of the historical texts I’ve preserved against linkrot and plagiarism, like the acquiantance checking up on the man trapped in Charles de Gaulle airport, or a friend running across their favorite college essay or spellpoem, and I have a shadow of that frisson.  A passing fancy, created to be found anonymously by others, appearing at least once more in the endless river of daily life.



The Memetic Zoo – Collaborative space where woke creatures share slang
Monday January 23rd 2017, 10:47 pm
Filed under: Glory, glory, glory,noetic,poetic justice

January is gray, and this now is the Ur-Jan, but today for a change was bright.  Thanks to the snain, red lettering, h’rissa and light.

And to you, dear Reader, for truth, presence, and ideas catalysed in telling – rarities that should be commonplaces. And for this box of potential thoughts about thoughts I will think Thursday.

I still have so many things to ask: what you know of supersimultaneity, quantified serendipity, if you feel a cool thrill in the small of your back when a crux or potentiality approaches, foreshadowing and afterimages. Sometimes I wake with the certainty I must pursue such things with all who might answer, before my pulse cools and I file it away as dream residue for review. Next time.

Today drew out instead improvements in preservation and propagation: idiogenics, cryogenic Seed Vaults and Culture Vaults, a vicariant Greenland. Advances in meme propagation as a critical piece of biological development, including RNA and human speech, but countless other innovations besides. Revisited a recurring dream of a summer camp (memezoo!) for animals who have learned to communicate with humans, to demonstrate relevant universalities.  These prodigies & their humans could spend time with the most precocious of their own species, sharing their newfound memetics, solving puzzles together, creating cross-species pidgins and developing contextual slang, to see what emerges // a place where Batyr and Kosik could have met, and Kanzi could build language bridges with more than just his step-sister.

And what is Earth herself but a planet methodically coated by a memetic zoo?  Once we have a more balanced sense of non-human memetics, we may be able to see our own more clearly, in both historical and current context.


techsolidThen Tech Sølidarity met in a converted warehouse, 150 people totally focused on the moment, technologists listening and thinking for over an hour. At most a handful of computers out, checking data or taking notes for the room. But the same narrow cross-section as before, 80% men, 90% white.

We agended aligning national efforts towards: visualizing data for local politics, streamlining calls to city pols; visualizing gerrymandering & voter disempowerment; securing voting machines; coordinating and sustaining responses to alt-facts (like the Guardian’s dangerously wrong WhatsApp bashing); listing things tech design decisions have broken & proposing fixes; devising mottos for technologists (Protect the Vulnerable?); building toolkits for curators and reviewers to ward off vandals and trolls.  And finally, looking for interfaith groups holding similar gatherings of religious leaders, with which we might cross-pollinate.

I worked with the group gathering voting-machine tech & policy wonks who could provide checklists and advice that we could adopt and share with city councils and mayors. We glissed an arpeggio of steps from procurement & policy to auditing & security, which could each be adopted by someone. Only pranksters whispered about blockchain, but agreed we needed a tacky sign to raise whenever the word came up. Next time.


They too agreed not to wait too long before the next, and to endeavour|fail|evolve rather than simply passing on the meme.

Comments Off on The Memetic Zoo – Collaborative space where woke creatures share slang


Designing life for episodic tyranny | 2: Social networks
Saturday November 12th 2016, 8:17 pm
Filed under: Uncategorized

For background, see also Part 1: Secure toolchains

Motivation

Imagine a Stasinario: while in a Tier 3 environment, you expect your social networks to be subverted, with people pressured to report on one another, and casual gatherings discouraged or explicitly outlawed.  Your contact with local colleagues and neighbors is always tinged with the certainty that eventually, one of them will report on the others, if only to stay out of trouble themselves. 
Assume that a few community members will be willing informants, and that everyone else would rather not inform, but will periodically be questioned by an adversary trying to prevent organizing or information-passing of any kind.  When questioned, you will be punished for sharing any information that can be shown to be false.  What sorts of preparation can you make in advance, for both offline and online gatherings? [Input needed from people facing this in closed systems, and in heavily-monitored activist movements.]

Social design options

1. Make gathering information more expensive.  Add plausible noise to the system; report frequently rather than rarely?  

2.  Human ddos/noise: instead of LOIC [Anons], have collective noise generation pointed at some unethical public db or data-collection.  1) setting your devices to signal to such networks; 2) sending your info / generating random info to send there; 

  2a. For human / minority-tracking databases:  blacklists, registering refugees, or migrants from specific regions/religions.  Consider self-registration, auto-registration of valid-looking but random identities.
 2b. Try SETI@Home style noise, where a large number of devices compute/produce small amounts of signal sent out along a given channel
3.  Social steganography? Embed real discussions among a few friends with lots of chatbots? so it’s hard to know which comments are real to find participants to trace or lean on.  [Or even change which apparent participant in a channel is the real person communicating, over time].  Possibly not helpful if subversion happens at the human level using the tapped-in comms device.
4.  Find ways to confound tracking and data-tracing.
 4a. Make mixing (or air-gap) services widely / anonymously available 
 4b.  Fake geo-tag generation. Fake GPS data from a group of users’ phones so it can’t be seen that they are all gathering together. Emit randomized (but logical) GPS coordinates when requested if turned on. ++
5.  Randomized salting of communication, to provide plausible deniability for those who pass on wrong information, and to spot-check members of a group for currently being a leak.
Ex: Encrypted group chat has pairwise encryption now.  No guarantee you get the same message as someone else in the group?  You could implement round-robin disinformation where one member of a group chat gets different info than the rest [and you could randomly select who gets bad info to see if outsiders sweep in / show up at the wrong place]
6.  Signalling: Be open about some of the above preparation, so that all parties know there are less certain returns on relying on such information.  Share how to build a system like this [specifics?] that anyone can adopt unilaterally without active coordination.
7.  Open books: imagine ways to share access to your toolchain to friends, self-surveillance to let everyone observe there is no or limited collaboration with dangerous parties.
8.  Collective multi-national insurance? to offset risks of a bubble of tyranny in one place: a pool that will help you relocate, find jobs/home in another jurisdiction…  Similarly: flesh out details of potential future costs, currently handled by the public, that might become individual costs under f – in case you have to start paying for them yourself.
   8a.  Related: collective libersurance: investing in a libertarian solution, that stops relying on government to provide those shared services (EPA protection, health insurance, &c) : leaving less on the table for a governmental shift to distort.
   8b.  Counterpoint: you might be prevented from doing this? if the government is explicitly propping up one industry (coal) over another.  Gov occupies a bunch of fields that individuals can’t use.
   
9.  Reduce reliance on your region’s infrastructure. Practice living through blackouts, emphasize taking your gadgets off-grid on a regular basis, ensuring they still work.  Ditto for plumbing.
10.  Preserve mulinational free-trade zones, black markets, networks outside of national jurisdictions, not as terribly large or strong, but with reasonable burst capacity and robust to crushing.  So that there is always a functioning side channel.  [Ex: ?? falls in Lat Am, Kowloon City]

Related ideas

1. Fix security holes in current distributed communication.
  1a.  Metadata about who’s using what network and when is still sharable;  WeChat is not very secure – even being in a channel can make you guilty and rounded up.  IPFS is great as far as it goes, but their routing mechanism still shows the node-interconnection-graph, which as with bittorrent can show who seeds/shares/acts as a hub.
  1b.  Iterated/ decentralization? needed.  A mostly-decentral system with central elements can be more vulnerable than a robustly-central system that acknowledges this as a weakness and prepares for it. 
2. Consider multinational/extranational decision-making and stakeholding, so no core stakeholder group can be entirely dominated by a central national actor
3. Keep doing this work transparently and publicly.  Increase security for discussing & updating & suggesting new ideas. 
Comments Off on Designing life for episodic tyranny | 2: Social networks


Designing life for episodic tyranny | 1: Secure toolchains
Friday November 11th 2016, 6:00 pm
Filed under: Aasw,Blogroll,chain-gang

See also Part 2: social networks

Motivation

Classify your local environment according to how much freedom you have to create and share tools, access those of others, and communicate across secure networks.  
  • In a “Tier 1” environment you have access to all popular security technology, and can build whatever infrastructure you want, entirely within your control.  
  • In a “Tier 2” environment, central network nodes and critical infrastructure all have backdoors and logging, and noone is allowed to distribute strong cryptography that some central group is unable to break.  
  • In a “Tier 3” environment, using secure tools and all but trivial cryptography is illegal – you shouldn’t have anything to hide.  Even talking about such tools may put you on a blacklist.  A central group that enforces the law may also access, modify, or reassign your work and possessions at will.
Say you live in a Tier 1 jurisdiction, which controls land, banks, and physical infrastructure.  Periodically, it shifts for a time to a Tier 3 regime, which may make abrupt changes at any depth in society to suit the fashion of the moment.
 
While in the latter regime, you can’t always trust the law or social norms to preserve
  • Your right to communicate with others
  • Your right to use your own tools and resources
  • The visibility (to you and those around you) of how your rights and tools are changing, if these are taken away

Most infrastructure in such an environment becomes untrustworthy.  Imagine losing trust in AT&T, Google, Symantec, Cisco.  (Even if you trust the people who remain running the system, they might no longer be in full control, or may not be able to inform you if your access was altered, filtered, compromised.)  

What can you do while in a Tier 1 regime to moderate the periods where you have fewer rights?

These are some quick thoughts on the topic, from a recent discussion.  Improvements and other ideas are most welcome.

Technical design decisions to improve resilience:

1.  multi-homing, letting users choose their jurisdiction.  for instance, let you choose from a number of wholly independent services running almost the same stack, each within a different jurisdiction.
  1a.  Be able to choose who hosts your data, tools, funds.  E.g., fix current US-EU policy – give users choice of where data resides and under which laws.
  1b.  Measure: how long it takes to shift key storage / control elements betweeb jurisdictions, copying rather than mirroring any required pieces.  Make it possible to shift on the timescale of expected transition between Tiers.
 
2. Give users advance warning that the threat to their data/account is rising; make it possible to quickly change what is stored [not just what is shared with other users].
2a. Learn explicitly from how banking does this (cf. concerns among many users about funds being frozen, for less-than-fascist conflicts).
 
3. work with telcos to add built-in IP and egress-fuzzing
   3a.  consider what china does: blocking per IP, by each egress point.  harder but possible in the US.
 
4. multi-source hardware, and any other needed ‘raw materials’ at each level of abstraction
  4a.  Both multiple sources w/in a jurisdiction (for the first stages when only some producers have lost control of their own production), and in different jurisdictions.
 
5. have systems that can’t be subverted too quickly: relying on the temporary nature of the fascist trend.  (if it lasts long enough, everything mentioned here can be undone; design to make that take a reasonable amount of time and a lot of humanpower)
  5a.  add meshes – like the electrical grid, that have local robusness. When central management disappears or ‘shuts things off’, local communities can build a smaller-scale replica that uses the physical infrastructure [even if they have to go in and replace control nodes, like generators, by hand]. 
  5b.  make change happen on the lifescale of hardware that has to be replaced.  e.g. a bulk of investment in dumb pipes that have to be replaced or removed by hand.  Systems with high upfront infrastructure costs that are easy to maintain but relatively hard to replace.
 
6. design alternate solutions for each level of the stack that have minimal central requirements.  E.g. fuel-powered USB chargers, gas generators, solar panels, desktop fabs and factories.  Make it easy to produce inferior, but usable, components if the high-economy-of-scale sources dry up.
 
7. keep strong contacts with someone in the existing [government], even when there’s nothing that you need to lobby for. that makes transitions smoother, and you less likely to be surprised by change.  Cf. Idea 3: invest heavily into those social relations.
 
8. distribute end-user tools that let individuals adapt under hostile conditions.  Examples:
  8a.  Ship antennas or power sources flexible enough to be modded.  
  8b.  Allow broadcast updates to the latest version, but allow users to freeze the version at one they support.  
  8c.  Support unblockable rollbacks to earlier revisions: something like a hardware button that rollsback to one of a few previous versions, if you realize you’ve installed malware or controlware.  you can still push updates as agressively as you like, as long as the provider can hint that a new snapshot is useful as risks of overtaking increases.
  8d.  provide some sort of checksum to see if firmware has changed [even with above may be possible for new software to change that option; but users should at least know]

Related ideas

1. consider reasonable steps to degrade control:  
  1a.  starting with increased infra for those who align with government views.  (or decreased for those breaking new / stringent laws)
  1b.  compare how voting is restricted, liquidity is restricted.
 
2. consider: is it better to be asset-heavy or asset-light?  
  2a.  usefulness of land and resources to use, vs. having things that can’t be claimed / revoked. networks rather than assets – land, tools?  
  2b.  compare liquidity of favors to that of funds or items.
 
3. compare current work with regulations/regulators.  in politics, relationships w/in a commission made it valuable to have a rotating door.  Invest in those relations, considering also 2) above – invest before assets are frozen to offset risk.
 
4. compare how US corps plan for inter-state shifts within the country.  Including being flexible enough to move to a new state for favorable regs, or shift ops/people among different centers.
5. Currently there’s network-tracking of IP addresses in malls, &c.  There are tools now that have a ‘War mode’ that randomizes your MAC or other address all the time.  Injecting noise into bluetooth and other tracking is straightforward.
Comments Off on Designing life for episodic tyranny | 1: Secure toolchains


Psych statistics wars: new methods are shattering old-guard assumptions
Thursday October 20th 2016, 12:51 pm
Filed under: %a la mod,chain-gang,citation needed,Glory, glory, glory,knowledge,meta,metrics

Recently, statistician Andrew Gelman has been brilliantly breaking down the transformation of psychology (and social psych in particular) through its adoption of and creative use of statistical methods, leading to an improved understanding of how statistics can be abused in any field, and of how empirical observations can be [unwittingly and unintentionally] flawed. This led to the concept of p-hacking and other methodological fallacies which can be observed in careless uses of statistics throughout scientific and public analyses. And, as these new tools were used to better understand psychology and improve its methods, existing paradigms and accepted truths have been rapidly changed over the past 5 years. This shocks and anguishes researchers who are true believers in”hypotheses vague enough to support any evidence thrown at them“, and have built careers around work supporting those hypotheses.

Here is Gelman’s timeline of transformations in psychology and in statistics, from Paul Meehl’s argument in the 1960s that results in experimental psych may have no predictive power, to PubPeer, Brian Nosek’s reprodicibility project, and the current sense that “the emperor has no clothes”.

Here is a beautiful discussion a week later, from Gelman, about how researchers respond to statistical errors or other disproofs of part of their work.  In particular, how co-authors handle such new discoveries, either together or separately.

At the end, one of its examples turns up a striking example of someone taking these sorts of discoveries and updates to their work seriously: Dana Carney‘s public CV includes inline notes next to each paper wherever significant methodological or statistical concerns were raised, or significant replications failed.

Carney makes an appearance in his examples because of her most controversially popular research, with Cuddy an Yap, on power posing.  A non-obvious result (that holding certain open physical poses leads to feeling and acting more powerfully) became extremely popular in the popular media, and has generated a small following of dozens of related extensions and replication studies — which starting in 2015 started to be done with large samples and at high power, at which point the effects disappeared.  Interest within social psychology in the phenomenon, as an outlier of “a popular but possibly imaginary effect”, is so great that the journal Comprehensive Results in Social Psychology has an entire issue devoted to power posing coming out this Fall.
Perhaps motivated by Gelman’s blog post, perhaps by knowledge of the results that will be coming out in this dedicated journal issue [which she suggests are negative], she put out a full two-page summary of her changing views on her own work over time, from conceiving of the experiment, to running it with the funds and time available, to now deciding there was no meaningful effect.  My hat is off to her.  We need this sort of relationship to data, analysis, and error to make sense of the world. But it is a pity that she had to publish such a letter alone, and that her co-authors didn’t feel they could sign onto it.

Update: Nosek also wrote a lovely paper in 2012 on Restructuring incentives to promote truth over publishability [with input from the estimable Victoria Stodden] that describes many points at which researchers have incentives to stop research and publish preliminary results as soon as they have something they could convince a journal to accept.

Comments Off on Psych statistics wars: new methods are shattering old-guard assumptions


Archiving Web links: Building global layers of caches and mirrors
Sunday June 12th 2016, 4:23 pm
Filed under: international,knowledge,meta,metrics,popular demand,wikipedia

The Web is highly distributed and in flux; the people using it, even moreso.  Many projects exist to optimize its use, including:

  1. Reducing storage and bandwidth:  compressing parts of the web; deduplicating files that exist in many places, replacing many with pointers to a single copy of the file [Many browsers & servers, *Box]
  2. Reducing latency and long-distance bandwidth:  caching popular parts of the web locally around the world [CDNs, clouds, &c]
  3. Increasing robustness & permanence of links: caching linked pages (with timestamps or snapshots, for dynamic pages) [Memento, Wayback Machine, perma, amber]
  4. Increasing interoperability of naming schemes for describing or pointing to things on the Web, so that it’s easier to cluster similar things and find copies or versions of them [HvdS’s 15-year overview of advancing interop]

This week I was thinking about the 3rd point. What would a comprehensively backed-up Web of links look like?  How resilient can we make references to all of the failure modes we’ve seen and imagined?  Some threads for a map:

  1. Links should include timestamps, important ones should request archival permalinks.
    • When creating a reference, sites should notify each of the major cache-networks, asking them to store a copy.
    • Robust links can embed information about where to find a cache in the a tag that generates the link (and possibly a fuzzy content hash?).
    • Permalinks can use an identifier system that allows searching for the page across any of the nodes of the local network, and across the different cache-networks. (Browsers can know how to attempt to find a copy.)
  2. Sites should have a coat of amber: a local cached snapshot of anything linked from that site, stored on their host or a nearby supernode.  So as long as that site is available, snapshots of what it links to are, too.
    • We can comprehensively track whether sites have signalled they have an amber layer.  If a site isn’t yet caching what they link to, readers can encourage them to do so or connect them to a supernode.
    • Libraries should host amber supernodes: caches for sites that can’t host those snapshots on their host machine.
  3. Snapshots of entire websites should be archived regularly
    • Both public snapshots for search engines and private ones for long-term archives.
  4. A global network of mirrors (a la [C]LOCKSS) should maintain copies of permalink and snapshot databases
    • Consortia of libraries, archives, and publishers should commit to a broad geographic distribution of mirrors.
      • mirrors should be available within any country that has expensive interconnects with the rest of the world;
      • prioritization should lead to a kernel of the cached web that is stored in ‘seed bank‘ style archives, in the most secure vaults and other venues
  5. There should be a clear way to scan for fuzzy matches for a broken link. Especially handy for anyone updating a large archive of broken links.
    • Is the base directory there? Is the base URL known to have moved?
    • Are distant-timestamped versions of the file available?  [some robustlink implementations do this already]
    • Are there exact matches elsewhere in the web for a [rare] filename?  Can you find other documents with the same content hash? [if a hash was included in the link]
    • Are there known ways to contact the original owner of the file/directory/site?

Related questions: What other aspects of robustness need consideration? How are people making progress at each layer?  What more is needed to have a mesh of archived links at every scale? For instance, WordPress supports a chunk of the Web; top CDNs cache more than that. What other players can make this happen?  What is needed for them to support this?

Comments Off on Archiving Web links: Building global layers of caches and mirrors


Cop dines with homeless mother of four, gets kudos. Her plight is ignored.
Thursday May 19th 2016, 2:48 pm
Filed under: fly-by-wire,Not so popular,Rogue content editor,Uncategorized

Recent news blurbs across our fair state, applaud a state trooper for “sharing lunch with a homeless mother of four“.  (Headline language).

This was noticed and photographed by a passerby; the trooper then identified by the state police and posted to their online webpage praising him for his good deed; a CBS affiliate spent hours tracking down both the photographer and the woman for a video interview.  They got quotes from her about: being a ‘homeless panhandler’, his common decency, and her surprise.  She was described by her motherhood, her panhandling, and being down on her luck.

And that’s it!  Nothing thoughtful about why this young mother is homeless in Fall River, or what will become of her family.  No opportunities to reach out and fix a tragedy. She clearly needs more than one good meal and healthcare, but the outpouring of interest in the viral photo is entirely directed towards how and whether to applaud the police officer [who, quite decently, refused to be interviewed], how this reflects on police officers everywhere, how this perhaps restores faith in humanity.

(Update: It seems the trooper and one local news affiliate did find a way to help her temporarily with material support, a bit after that event. And a few cases like this that have famously included a crowdfunding campaign. But the most newsworthy issue is: how does this happen in our society, what can we do to fix that, and what permanent fixes could work for the family in the spotlight.)

Comments Off on Cop dines with homeless mother of four, gets kudos. Her plight is ignored.


The Underlay — Brazing public knowledge graphs for the public good
Sunday February 07th 2016, 2:09 pm
Filed under: Uncategorized

Lately I have been dreaming of knowledge graphs, iteratively refined and detailed, that allow us to pore over what we know and enhance our knowledge.  A framework and language for amplifying, amending, annotating, qualifying, contextualizing, decomposing, reconstituting, synthesizing and comparing specific and uniquely-named elements of trains of logic, thought, computation, interpolation, and other inference.

One shared element that I keep coming back to is an underlayer of data, assertions, and reported knowledge, designed to support many different mesh sizes (for the conceptual mesh used to describe an observation), to let you expand observations into increasingly granular bits, and to add context and background, tracing them back to their original observation.  To make use of it efficient, it would also support clusterings (for the equivalence class of names that, in the current context, resolve to the same thing), and filters (for deciding what data to include or exclude in a given view).

For all of this, I propose a collective project to which we can all contribute: an Underlay, comprised of networks of interlinked, structured data.  Each point versioned, meshed, linked to its sources, and linking likewise to the composites and analyses that have relied on it.  Each underlay a composite of many different layers, each with its own canonical mesh-grain; and the global Underlay project a constellation of the individual underlays, providing a way to name and disambiguate an idea or claim or discussion.



Reader: Discover the effect of happiness on your health today
Wednesday November 25th 2015, 11:44 pm
Filed under: %a la mod,wikipedia

“When I was 5 years old, my mother always told me that happiness was the key to life.  When I went to school, they asked me what I wanted to be when I grew up.  I wrote down happy. They told me I didn’t understand the assignment, and I told them they didn’t understand life.”  —Lennon

From the BODYWORLDS exhibit in Amsterdam, full of flayed and preserved human bodies.



WMF Audit Committee update – Call for Volunteers
Friday June 05th 2015, 7:07 pm
Filed under: wikipedia

The Wikimedia Foundation has an Audit Committee that represents its Board in overseeing financial and accounting matters.  This includes reviewing the foundation’s financials, its annual tax return, and an independent audit by KPMG. For details, and the current committee members, see the WMF’s Audit Committee page and the Audit Committee charter.

I currently serve as the Audit Committee chair.  We are forming the committee for 2015-16, and are looking for volunteers from the community.

Members serve on the Committee for one year, from July through July.  The Foundation files its annual tax return in the U.S. in April, and publishes its annual plan in June.  Committee members include trustees from the Foundation’s board and contributors from across the Wikimedia movement.

Time commitment for the committee is modest: reviews are carried out via three or four conference calls over the course of the year.  The primary requirement is financial literacy: some experience with finance, accounting or auditing.

If you are interested in joining the Committee for the coming year, please email me at sj at wikimedia.org with your CV, and your thoughts on how you could contribute. Thank you!

Comments Off on WMF Audit Committee update – Call for Volunteers


Soft, distributed review of public spaces: Making Twitter safe
Monday October 27th 2014, 2:56 pm
Filed under: %a la mod,ideonomy,knowledge,popular demand,wikipedia

Successful communities have learned a few things about how to maintain healthy public spaces. We could use a handbook for community designers gathering effective practices. It is a mark of the youth of interpublic spaces that spaces such as Twitter and Instagram [not to mention niche spaces like Wikipedia, and platforms like WordPress] rarely have architects dedicated to designing and refining this aspect of their structure, toolchains, and workflows.

Some say that ‘overly’ public spaces enable widespread abuse and harassment. But the “publicness” of large digital spaces can help make them more welcoming in ways than physical ones – where it is harder to remove graffiti or eggs from homes or buildings – and niche ones – where clique formation and systemic bias can dominate. For instance, here are a few ‘soft’ (reversible, auditable, post-hoc) tools that let a mixed ecosystem review and maintain their own areas in a broad public space:

Allow participants to change the visibility of comments:  Let each control what they see, and promote or flag it for others.

  • Allow blacklists and whitelists, in a way that lets people block out harassers or keywords entirely if they wish. Make it easy to see what has been hidden.
  • Rating (both average and variance) and tags for abuse or controversy can allow for locally flexible display.  Some simple models make this hard to game.
  • Allow things to be incrementally hidden from view.  Group feedback is more useful when the result is a spectrum.

Increase the efficiency ratio of moderation and distribute it: automate review, filter and slow down abuse.

  • Tag contributors by their level of community investment. Many who spam or harass try to cloak in new or fake identities.
  • Maintain automated tools to catch and limit abusive input. There’s a spectrum of response: from letting only the poster and moderators see the input (cocooning), to tagging and not showing by default (thresholding), to simply tagging as suspect (flagging).
  • Make these and other tags available to the community to use in their own preferences and review tools
  • For dedicated abuse: hook into penalties that make it more costly for those committed to spoofing the system.

You can’t make everyone safe all of the time, but can dial down behavior that is socially unwelcome (by any significant subgroup) by a couple of magnitudes.  Of course these ideas are simple and only work so far.  For instance, in a society at civil war, where each half are literally threatened by the sober political and practical discussions of the other half, public speech may simply not be safe.



Utter License, n.: A minimal way to grant all rights to a work
Tuesday October 21st 2014, 3:03 am
Filed under: %a la mod,Aasw,null,poetic justice,wikipedia

[You may do UTTERLY ANYTHING with this work.]

UTTER ♥2

 

Utter details and variants

Comments Off on Utter License, n.: A minimal way to grant all rights to a work


Righteousness and peace, and recovering at last what we threw away
Saturday October 04th 2014, 6:10 pm
Filed under: Uncategorized

Dinesen‘s short story Babette’s Feast includes a lovely riff on Psalm 85. This is quoted in full towards the end, and refined in the film. In one of those revealing errors highlighting the fragility of citation, there is a canonical English misquote online, repeated in a thousand places, but the correct quote did not exist.

I leave the quote here in honor of the season. And I wish you, dear reader, a confident and grateful year, full of potential and choiceness.


Mercy and truth are met together.
Righteousness and peace have kissed one another.

Man, in his weakness and short-sightedness,
believes he must make choices in this life.
He trembles at the risks he must take.
We know that fear.
But, no - our choice is of no importance.
There comes a time when our eyes are opened.
And we come to realize at last that mercy is infinite.

We need only await it with confidence,
and receive it with gratitude.
Mercy imposes no conditions.
And, see: Everything we have chosen has been granted to us.
And everything we renounced — has also been granted.
Yes, we even get back what we threw away.

For mercy and truth are met together.
And righteousness and peace have kissed one another.

Comments Off on Righteousness and peace, and recovering at last what we threw away


Lila Tretikov named as Wikimedia’s upcoming ED
Thursday May 01st 2014, 5:49 pm
Filed under: fly-by-wire,ideonomy,knowledge,popular demand,wikipedia

And there was much rejoicing. Welcome, Lila!




Bad Behavior has blocked 192 access attempts in the last 7 days.