Forging Social Proof: the Networked Turing Test Rules the First AI War
A few years ago I wrote about how our civilization was forfeiting the zeroth AI war — allowing individual attention hacks, deployed at scale, to diminish and replace our natural innovation and productivity in every society. We gained efficiency in every area of life, and then let our new wealth and spare time get absorbed by newly-efficient addictive spirals.
Exploit culture
This war for attention affects what sort of society we can hope to live in. Channeling so much wealth to attention-hackers, and the networks of crude AI tools and gambling analogs that support them, has strengthened an entire industry of exploiters, allowing a subculture of engineers and dealmakers to flourish. That industry touches on fraud, propaganda, manipulation of elections and regulation, and more, all of which influence what social equilibria are stable.
The first real AI war
Now we are facing the first real artificial-intelligence war — dominated by entities that appear as avatars of independent, intelligent people, but are artificial, scripted, automated.
What is new in this? Earlier low-tech versions of this required no machine learning or programming: they used the veil of pseudonymity to fake authorship, votes, and small-scale consensus. In response, we developed layers of law and regulation around earlier attacks — fraud, impersonation, and scams are illegal. AI can smoothly scale this to millions of comments on public bills, and to forging microtargeted social proof in millions of smaller group interactions online. And these scaled attacks are often still legal, or lightly penalized and enforced.
(more…)
Trump’s tee-totalling: why are so many meetings held on the golf course?
It is time we stop talking about “golf time” as leisure time away from the presidency, and start treating it as a primary channel for meetings, negotiations, and decision-making. (See for instance the last line of this remarkable story.)
Trump’s presidential schedule is full of empty days and golf weekends – roughly two days a week have been spent on his own resorts, throughout his presidency. Combined with his historically light work schedule, averaging under two hours of meetings per day, the majority of small-group meetings may be taking place at his resorts.
He has also directed hundreds of government groups, and countless diplomatic partners and allies, to stay at his resorts and properties.
On his properties, his private staff control the access list, security videos and other records. They are also able to provide privacy from both press and government representatives that no federal property could match.
How might we address the issues involved with more clarity?
Paying himself with government funds
To start with, this is self-dealing on an astronomical scale: the 300+ days spent at his golf clubs and other properties have cost the US government, by conservative estimate, $110 million. The cost of encouraging the entire government to stay at Trump properties is greater still, if harder to estimate. (more…)
Anonymizing data on the users of Wikipedia
Updated for the new year: with specific things we can all start doing 🙂
Wikipedia currently tracks and stores almost no data about its readers and editors. This persistently foils researchers and analysts inside the WMF and its projects; and is largely unnecessary.
Not tracked last I checked: sessions, clicks, where on a page readers spend their time, time spent on page or site, returning users. There is a small exception: data that can fingerprint a user’s use of the site is stored for a limited time, made visible only to developers and checkusers, in order to combat sockpuppets and spam.
This is all done in the spirit of preserving privacy: not gathering data that could be used by third parties to harm contributors or readers for reading or writing information that some nation or other powerful group might want to suppress. That is an essential concern, and Wikimedia’s commitment to privacy and pseudonymity is wonderful and needed.
However, the data we need to improve the site and understand how it is used in aggregate doesn’t require storing personally identifiable data that can be meaningfully used to target editors in specific. Rather than throwing out data that we worry would expose users to risk, we should be fuzzing and hashing it to preserve the aggregates we care about. Browser fingerprints, including the username or IP, can be hashed; timestamps and anything that could be interpreted as geolocation can have noise added to them.
We could then know things such as, for instance:
- the number of distinct users in a month, by general region
- how regularly each visitor comes to the projects; which projects + languages they visit [throwing away user and article-title data, but seeing this data across the total population of ~1B visitors]
- particularly bounce rates and times: people finding the site, perhaps running one search, and leaving
- the number of pages viewed in a session, its tempo, or the namespaces they are in [throwing away titles]
- the reading + editing flows of visitors on any single page, aggregated by day or week
- clickflows from the main page or from search results [this data is gathered to some degree; I don’t know how reusably]
These are just rough descriptions — great care must be taken to vet each aggregate for preserving privacy. but this is a known practice that we could do with expert attention..
What keeps us from doing this today? Some aspects of this are surely discussed in places, but is hard to find. Past discussions I recall were brought to an early end by [devs worrying about legal] or [legal worrying about what is technically possible].
Discussion of obstacles and negative-space is generally harder to find on wikis than discussion of works-in-progress and responses to them: a result of a noun-based document system that requires discussions to be attached to a clearly-named topic!
What we can do, both researchers and data fiduciaries:
- As site-maintainers: Start gathering this data, and appoint a couple privacy-focused data analysts to propose how to share it.
- Identify challenges, open problems, solved problems that need implementing.
- Name the (positive, future-crafting, project-loving) initiative to do this at scale, and the reasons to do so.
- By naming the positive aspect, distinguish this from a tentative caveat to a list of bad things to avoid, which leads to inaction. (“never gather data! unless you have extremely good reasons, someone else has done it before, it couldn’t possibly be dangerous, and noone could possibly complain.“)
- As data analysts (internal and external): write about what better data enables. Expand the list above, include real-world parallels.
- How would this illuminate the experience of finding and sharing knowledge?
- Invite other sociologists, historians of knowledge, and tool-makers to start working with stub APIs that at first may not return much data.
Without this we remain in the dark —- and, like libraries who have found patrons leaving their privacy-preserving (but less helpful) environs for data-hoarding (and very handy) book-explorers, we remain vulnerable to disuse.
Psych statistics wars: new methods are shattering old-guard assumptions
Recently, statistician Andrew Gelman has been brilliantly breaking down the transformation of psychology (and social psych in particular) through its adoption of and creative use of statistical methods, leading to an improved understanding of how statistics can be abused in any field, and of how empirical observations can be [unwittingly and unintentionally] flawed. This led to the concept of p-hacking and other methodological fallacies which can be observed in careless uses of statistics throughout scientific and public analyses. And, as these new tools were used to better understand psychology and improve its methods, existing paradigms and accepted truths have been rapidly changed over the past 5 years. This shocks and anguishes researchers who are true believers in”hypotheses vague enough to support any evidence thrown at them“, and have built careers around work supporting those hypotheses.
Here is Gelman’s timeline of transformations in psychology and in statistics, from Paul Meehl’s argument in the 1960s that results in experimental psych may have no predictive power, to PubPeer, Brian Nosek’s reprodicibility project, and the current sense that “the emperor has no clothes”.
Here is a beautiful discussion a week later, from Gelman, about how researchers respond to statistical errors or other disproofs of part of their work. In particular, how co-authors handle such new discoveries, either together or separately.
At the end, one of its examples turns up a striking example of someone taking these sorts of discoveries and updates to their work seriously: Dana Carney‘s public CV includes inline notes next to each paper wherever significant methodological or statistical concerns were raised, or significant replications failed.
Carney makes an appearance in his examples because of her most controversially popular research, with Cuddy an Yap, on power posing. A non-obvious result (that holding certain open physical poses leads to feeling and acting more powerfully) became extremely popular in the popular media, and has generated a small following of dozens of related extensions and replication studies — which starting in 2015 started to be done with large samples and at high power, at which point the effects disappeared. Interest within social psychology in the phenomenon, as an outlier of “a popular but possibly imaginary effect”, is so great that the journal Comprehensive Results in Social Psychology has an entire issue devoted to power posing coming out this Fall.
Perhaps motivated by Gelman’s blog post, perhaps by knowledge of the results that will be coming out in this dedicated journal issue [which she suggests are negative], she put out a full two-page summary of her changing views on her own work over time, from conceiving of the experiment, to running it with the funds and time available, to now deciding there was no meaningful effect. My hat is off to her. We need this sort of relationship to data, analysis, and error to make sense of the world. But it is a pity that she had to publish such a letter alone, and that her co-authors didn’t feel they could sign onto it.
Update: Nosek also wrote a lovely paper in 2012 on Restructuring incentives to promote truth over publishability [with input from the estimable Victoria Stodden] that describes many points at which researchers have incentives to stop research and publish preliminary results as soon as they have something they could convince a journal to accept.
Inversionistas inmobiliarimos en Chile de hoy
En Puerto Varas, para ser precisos. Un articulo por Sebastian. ᔥmadre.
“Hay paisajes extraordinarios, pienso, y luego este. Esos campos y poblados guardan un centenario orgullo que emociona.”

To “snub” you must find someone who can be made to feel inferior
“A snub,” defined Lady Roosevelt, “is the effort of a person who feels superior to make someone else feel inferior. To do so, he has to find someone who can be made to feel inferior.”
ᔥ Quote Investigator, ↬ Meredith Patterson
A New ‘Pedia: planning for the future of Wikipedia
Wikipedia has gotten more elaborate and complex to use. Adding a reference, marking something for review, uploading a file or creating a new article now take many steps — and failing to follow them can lead to starting all over. The curators of the core projects are concerned with uniformly high quality, and impatient with contributors who don’t have the expertise and wiki-experience to create something according to policy. Good stubs or photos are deleted for failing to comply with one of a dozen policies, or for inadequate cites or license templates; even when they are in fact derived from reliable sources and freely licensed.
The Article Creation Wizard has a five-step process for drafting an article, after which it is submitted for review by a team of experienced editors, and finally moved to the article namespace. 7 steps for approval is too much overhead for many. And the current notability guidelines on big Wikipedias excludes most local and specialist knowledge.
We need a simpler scratch-space to develop new material:
- A place not designed to be high quality, where everything can be in flux, possibly wrong, in need of clarification and polishing and correction.
- A place that can be used to build draft articles, images, and other media before posting them to Wikipedia
- A place where everyone is welcome to start a new topic, and share what they know: relying on verifiability over time (but not requiring it immediately), and without any further standard for notability
- A place with no requirements to edit: possibly style guidelines to aspire to, but where newbies who don’t know how the tools or system works are welcomed and encouraged to contribute more, and not chastised for getting things wrong.
Since this will be a new sort of compendium or comprehensive cyclopedia, covering all topics, it should have a new name. Something simple, say Newpedia. Scripts can be written to help editors work through the most polished Newpedia items and push them to Wikipedia and Wikisource and Commons. We could invite editors to start doing their rough work on Newpedia, to avoid the conflict and fast reversion on the larger wiki references that make it hard to use for quick new work.
Update: Mako discussed Newpedia (or double-plus-newpedia) in his panel about “Wikipedia in 2022“, and Erik Moeller talked about how the current focus on notability is keeping all of our projects from growing, in his “Ghosts of Wikipedia Future“. I look forward to the video and transcripts.
What do you think? I started a mailing list for people who are interested in developing such a knowledge-project. I look forward to your thoughts, both serious and otherwise 😉
Plumpy’Nut Patent – Has their “patentleft” option seen wide use so far?
In 1996, two French food scientists, André Briend and Michel Lescanne, developed a nut-based food formulation to serve as an emergency food relief product in famine-stricken areas. The goal was to have a high-density balanced food with a long and robust shelf life – one which, unlike the previous standard of milk-based therapeutic food, could be taken at home rather than in a hospital.
They soon formed the company Nutriset to further develop and commercialize the idea. Their most popular product, Plumpy’Nut, has shipped millions of units and currently makes up roughly 90% of UNICEF’s stocks of ready-to-use therapeutic foods [RUTFs] for famine relief.
In forming their company, they captured their idea in the form of a patent (a standard way to declare ownership of and investment) and went on to build a production chain around it. This included tweaked formulas and a family of products; production and packaging factories; and grant-writing and research to get certification + field-feedback + approval from various UN bodies. This involved few years of up-front investment and reputation-building, and then ramping up mass production of millions of pounds of Plumpy’Nut and its derivatives. They later set up a novel “patentleft” process allowing companies in developing countries to use the patent commercially, and make derivatives from it, at no cost — after a brief online registration. This is something which has received surprisingly little attention since, considering how simple and elegant their solution. Read on for details! (more…)
Web <30 – the Future of the Web is Intertextual
From a recent discussion about Web 3.0 and the far future, on the AIR-L list:
In fact, the Web is currently developing Web <30, to be rolled out
with Chrome 25, Firefox 20, Opera 15, and IE 10 later this winter.
If you are interested in cutting-edge research and convolving
observation with participation, you can take part in the design of Web
<30 yourself. It is being developed through a massively
multistakeholder open online crowd-refined platform generation
(MMOOCRPG) design.
Building on the exponential success of past
efforts, the development mailing list includes a periodic
distributed auto-immolating critique of its own work, where the future
web is continuously redefined as its own dual.
Three Copyright Myths and Where to Start to Fix it – a policy brief
Tuesday November 20th 2012, 5:41 pm
Filed under:
citation needed,
Glory, glory, glory,
international,
meta,
metrics,
Not so popular,
Rogue content editor,
wikipedia
A lovely short policy brief on designing a better copyright regime was published on Friday – before being quickly taken offline again. I’ve reposted it here with light cleanup of its section headings.
If you care at all about copyright and its quirks, this is short and worth reading in full.
Recursive β-Metafunctions In the Case of Polypolice
I just finished reading about how bogus transmogrification conversion on an oscillating harmonic field of glass bells, with green gig and kerosene lamps for diversion, can be solved by beastly incarceration-concatenation. I was reminded of how much the great scienxplorers such as Watterson and others owe to this cloud of novel scientific inquiry from the ’60s and ’70s.
It makes me simultaneously want to immortalize Lem and Kandel in an eternally entangled quantum fringe, and to fire up a Trurlapaucius abstract-generator based on snarXiv code.
Bigipedia 2.0 – Britain sends up the wisdom of crowds
“At last, the long-awaited release of Bigipedia 2.0 – the infallible, ever-present cyberfriend is back! Now with all errors and mistakes.”
Every episode of Bigipedia is worth listening to. From David Tyler and #Pozzitive, via the UK wikivine.