Archive for the 'architecture' Category

One Web or Multiple Webs


This article from the Korea Times provides an interesting commentary on the question of whether there is one Web or multiple Webs. It indicates that Korean web applications are cut off from the rest of the world for two major reasons. The first is technological and the second is regulatory. First, Korean Internet users have become accustomed to websites laden with graphics, animations, and banners – features that rely upon a high speed connection for expeditious Web surfing. In other countries where broadand connections are not as prevalent (although the article notes that Korea is no longer the world leader in high speed broadband, it remains among the leaders), these web pages take an excessively long time to load and are consequently unpopular. Second, Korean web applications have not attained international popularity because of government regulation. The Korean government enforces identification and security requirements that significantly exceed those of other countries. Score one point for the multiple Webs theory.

Monday’s class notes (long post!)


Monday, March 3, 2008

Today’s topic was “Knowing on the Web.” For the first part of class, we delved into the murky world of philosophy, talking about the history of knowledge and Descartes’ Meditations. We finished by examining several controversial Wikipedia entries and the accompanying discussion threads.

Perfection and the Web
David Weinberger (DW) began class by bringing up the perfection-related idea we talked about last time: that the Web is always a little “broken.” DW thinks this is important because we tend to think systems “work” when they approach perfection: the more perfect they are, the better they are. But this is actually a trick we play on ourselves by defining what is “broken.” We could interpret a busy signal or 404 page as a sign that the system is broken, but we don’t do that. DW suggests that much of what’s good about the Web comes from the fact that it doesn’t even pretend to aim for “perfection.” We could have easily built into the architecture of the Web a mechanism by which broken links don’t show up as hyperlinks. But the cost of reworking that architecture would be so high that, in attaining a more perfect Web, we would end up with a very different Web.

DW then asked us to come up with other examples of things that can be taken as “imperfections” on the Web — things that are not controlled, but could have been. Tom pointed to errors in newspaper articles that we know are mistaken, but we leave up as a historical record of the error. This led us to wonder: how often do bloggers take down or edit posts they later learn are mistaken?

In an informal poll of bloggers in the class, we found out that sometimes bloggers go back and change mistaken posts, sometimes not. It may depend on whether the error is content-related, or grammatical. Things like getting names or genders wrong can be embarrassing, and DW and others often just fix those. But the students (i.e. Kevin) who don’t strike mistaken posts said they don’t do it to make sure readers don’t “think they’re going crazy.” DW said he sometimes leaves visible errors in case people try to link to it, and cited the craziness factor (i.e. “I could have sworn Chris was a guy!”). This is certainly an example of leaving the Web imperfect because it’s better that way. Another example of that, as Justin reminded us, was the “flaw” in MySpace code that allowed MySpacers to design their own pages in html.

Cutting against this idea that leaving “mistakes” online is a good thing, Dorcas noted that most people don’t bother to remove the mistaken or outdated parts. This can mislead people who get to that page thinking it’s fresh. DW pointed out that there would be ways to minimize this, but that in doing so we’d lose a lot of valuable info.

Another student questioned whether this “mistakes” thing is really a web-specific difference, and argued that these things happen with print media too. I argued the other side – I do think there’s a web difference here. Because it’s easier to make corrections, people online have a greater expectation that corrections will be made. (Although perhaps people often expect mistakes online in a way they don’t with traditional media – as DW pointed out, making it easier to publish also makes it easier to publish mistakes.)

This part of the discussion concluded with most of us recognizing that systems make decisions about the proper balance of control and accuracy. CNN makes these kinds of decisions all the time: they may “get it wrong,” but they can go back later and correct. Typically, though, CNN will want more control than, say, DW wants to have over his blog.

Knowledge: history and meaning

We then took up the question of knowledge: what is it really, and where does it come from?

(DW got pretty excited about using his “visual aid,” which led to a brief but animated discussion about various outdated media like overhead projectors.)

He showed us a USA Today crossword puzzle that he himself (and not a research assistant!) had completed – much to John Palfrey’s surprise. (Perhaps law school professors have become too reliant on RAs?) Anyway, the right answer to the “have no doubts” clue was “know.” But DW pointed out that knowledge didn’t start out that way – as being about certainty. It began as a practical distinction ancient Greeks had to make in everyday political life. In Athens, citizens (at least rich, white, land-owning ones) could get up and argue their views to the people. But some way of sorting out the stupid opinions from the interesting ones was needed. This is how philosophers started out investigating the meaning of “knowledge.” It wasn’t enough that what the person said was TRUE – the person had to be “justified” in believing it. Most of the subsequent philosophical efforts centered on the meaning of justification.

DW then asked us what “true” means. Is it “true” that John Palfrey has a Thinkpad? Yes. We could see “true” as meaning “everyone can/ would/does agree on it.” But we could also see it as being independent from what people think, and being purely about existence (Conor’s view). Yelena thinks it includes both objective and subjective components.

DW drew some interesting figures on the board, one representing a person’s head (the “knower”), and another representing a laptop (the world). He suggests that if the statement represents or matches the world, then it’s “true.” Kevin points out that it’s not the thought that’s true; it’s more like running an experiment – determining truth doesn’t require people. The experiment verifies the claim, and the process is what makes something true. DW said that there’s lots of debate about this stuff, but reiterated that truth requires some type of correspondence between the statement and the world. This is called the “correspondence theory of truth,” and it’s dominated Western culture for many centuries.

Descartes and his Meditations
We then turned to discussion of Descartes’ Meditations. Damien outlined Descartes’ basic idea: he wants to question the veracity of everything that isn’t verifiably true. To that end, he starts on a project: can he tell whether he’s awake versus dreaming?

Conor elaborated on how Descartes goes about this project in the Meditations. Descartes eliminated all thoughts or beliefs that were based on potentially fallible sources. For example, anything he gets through his senses is potentially faulty because his senses have deceived him in the past.

So, in Meditations, Descartes goes through a process thinking he can examine all his beliefs. Senses are out because they’ve fooled him before, and could fool him again. DW asked: why don’t we doubt our senses the way Descartes did? JP suggested that it’s because we’re not philosophers… Richard said it’s because we don’t care whether what we’re getting from our senses is “fundamentally true” – we are pragmatists and the senses work for us most of the time. Yelena added that we trust our senses out of necessity – there are no alternatives, except paralysis. Finally, another student noted that we’re usually right about our senses, and when we’re wrong, we learn from it – so it’s a workable system of reliance. There’s a difference between mistakes in our senses and random hallucinations.

DW then asked: how can Descartes doubt that 2+2=4? Doesn’t he “know” that for sure? Damien responds by saying that this part depends on Descartes’ belief in a deity – someone who could arrange the world in a way that would completely convince him of its truth.

Then, after a very confusing moment about whether 2+2 really does equal 4, DW brought up the idea of “the malignant demon.” (Apparently Descartes had to talk about this as a demon because of the religious constraints of the time – one couldn’t talk about God as being deceptive.) DW asks why Descartes engaged in this extreme experiment (so extreme that he was doubting whether he has hands!). (Ultimately Descartes goes on to discover “I think, therefore I am,” and on the basis of that, he’s able to build back everything he reasonably knew before. DW admits that it doesn’t hold together very well.)

So again — why do this crazy experiment? One student used the analogy of creating art on a blank canvas – you need to start with a clean slate. Richard added that Descartes seems to be cutting it all down to the most basic thing that can’t be disproven – he’s creating a super-strong foundation for the “building,” which makes it harder to refute.

On a side note, DW asked: how helpful is Descartes’ method for making political decisions? The answer seems obvious: not very. We tried to envision two politicians debating, and then one asking, “How do we know we’re all here?” Suffice it to say that we don’t need that kind of certainty in the political realm.

Stepping back, DW concluded that Descartes was aiming for the perfection of knowledge. But the problem is that, in seeking such perfection, you end up with basically nothing. And the only way Descartes is able to build it back up is by saying that God wouldn’t allow him to be wrong about various things (his senses, for example). Descartes has tried to say the only things we can put in the “knowledge” category are those which pass the really strict test. And over time, the bar of certainty has been continually raised. We circled back to the crossword puzzle, which defined knowledge as that which we know for sure – it’s about certainty, not justification.

Phew. After all this profound talk of philosophy and Descartes, we turned to Wikipedia.

Wikipedia and “knowledge”
We started with Shakespeare’s sonnets. The Encyclopedia article is mostly about the sexuality of the object of the sonnets. DW was surprised that the article might as well have been called, “Just how gay are the sonnets?” The Wikipedia article is much different. It lists a number of “contenders” for the mysterious W.H., for example, instead of simply saying who it was.

Then we talked about the discussion section attached to the Wikipedia article. Apparently most of the discussion centers on the identity of W.H., but also on the sexuality of Shakespeare himself (and what to say about it). There’s a question of whether or not the W.H. question should even be talked about at all in an article about the Sonnets. So even settling the fact – who is W.H.? – wouldn’t necessarily settle the issues. (Christina was initially impressed that the Wikipedia community was keeping all these comments up, but it turns out that you can’t delete discussions…who knew?)

The discussion pages raise the broader issue of the purpose of Wikipedia entries. Is it just about presenting competing views (what other people have said is true)? Or is it about presenting “truth”? DW made the point that there are decisions being made about what’s a mainstream dispute and what’s not. (For example, there’s very little discussion about Shakespeare’s death date.)

We then moved from Elizabethan England to Colonial America to talk about Sally Hemings. The controversial issue was whether she was the mother of Jefferson’s child. The Wikipedia discussion was interesting in that there was no real consensus or closure about this stuff. There was also lots of talk about the DNA evidence – and that there were four Jefferson males who could have been the father. DW thought it was interesting that this is all back in the discussion page and not on the entry page. Others commented on the debate over the more stylistic aspects of the piece (opening with “Sally Hemmings was the chambermaid of Thomas Jefferson” etc.).

Chiming in appropriately, JP talked about his interviews w/ young people in which he and Dana ask how students start research projects these days. Most said they start with Google and then Wikipedia, linking to external sources for verification. But, interestingly enough, they almost never edit the entries, even when they find mistakes. He wonders why more people don’t edit (other than the alleged group of drunk German grad students at the helm).

DW said that in all of these controversial articles, there are statements put forward as the clear truth; and then there are statements qualified by phrases like “some believe that.” We looked at the Swiftboating entry for signs of these “weasel words” (which Kevin argued were not as commonplace as DW suggested). “Were criticized” was one example. Conor gave us an overview of the Swiftboating discussion pages: most of the argument was about the appropriateness of the introductory paragraph. As for the article, DW thought that it wasn’t an unreasonable presentation of the basic issue.

Finally, the JFK assassination page. Students noted that the tone of the discussion is pretty contentious, especially with regard to the authority of the Warren Commission report. Evan suggested that this article is an outlier. He would compare it to the 9/11 conspiracy theories and the corresponding Wikipedia pages. (DW explained that a separate assassination page was created because the JFK page was getting overridden with conspiracy theories.)

Class ended with DW posing a broader “web difference” question: what is the analog in the real world for what we’re seeing here in the discussion pages of Wikipedia? Justin suggested that it is talking to friends and reading books. But DW said he doesn’t know anyone with the depth of knowledge and obsession demonstrated in these Wikipedia discussions. Another analogy was made to peer review in academia.

Ultimately we were all left wondering whether Wikipedia has any kind of analog in the non-Internet world. Is knowledge all about certainty? Do you only “know” something if you can logically justify why? Are traditional notions of perfection simply inapplicable to the internet – and is that a good thing? These were all questions we addressed in class, and ones we’ll no doubt revisit throughout the remainder of the course.

Pakistan blocks YouTube… from the entire world



The worldwide “block” was only for two hours but it’s still pretty ridiculous stuff. I can’t quite wrap my head around how they managed to change the “internet’s routing tables” for the entire world. From my understanding, you have to be pretty huge (like an entire country in this case) to get other country’s routers to mis-route addresses. I don’t think it was their intention to take YouTube down for the entire world, but it’s pretty scary that they had the power to at all. Meanwhile, the block in Pakistan will continue until further notice.

Are other ISPs doing what Comcast does?


In light of the scandal regarding Comcast’s blocking (or “deprioritization”) of certain p2p applications, I thought it might be interesting to investigate the disclosures of other ISPs regarding prioritization. Here is what I found. (It is worth noting that some of this may change very soon in light of Comcast’s troubles.)

Most telecomm and cable companies do not publicly provide much information regarding their prioritization techniques. Time Warner Cable is one of the most forthright companies in this respect. In its Operator Acceptable Use Policy…), the company gives some insight into its approach to prioritization. The policy explains that Time Warner Cable “may use various tools and techniques in order to efficiently manage its networks and to ensure compliance with its Acceptable Use Policy.” Such tools may include “limiting the number of peer-to-peer sessions a user may conduct at one time” and “limiting the aggregate bandwidth available for certain usage protocols such as peer-to-peer and newsgroups.” These statements indicate that Time Warner gives p2p applications low priority relative to other types of data packets.

Other ISPs offer less information. Verizon promises prospective high-speed internet service subscribers “a dedicated connection to the Verizon central office so that you don’t have to share your local access connection with other users”…) Nonetheless, Verizon acknowledges that upstream congestion may hinder connection speed. Verizon also gives the vague explanation that “other factors” may influence connection speed. An analogous set of representations appears in relation to FIOS, Verizon’s fiber optic broadband internet service…). AT&T does not admit to any degree of prioritization. The AT&T website cites only “heavy Internet traffic, the condition of your telephone lines, and the distance of your home to the telephone company’s central switching station” as factors that may affect download speed…)

Net neutrality article


There’s a good article on some aspects of Net neutrality (which is related to the end to end principle) at MacWorld. (Note: I’m biased on this topic.)

Mandating edge-based filtering


The recording industry is now suggesting that copyright filters be imposed on end-users’ machines. This is yet another case where the center/end distinction seems to me to be difficult to apply. If all machines were, through law or market forces, to include copyright filters, should we say that the filter is violating the end-to-end principle? Law and markets can create the effect of center-based alterations without ever actually touching the center.

Of course, that leaves untouched the question of whether we think end-based copyright filters are a good idea.

Verizon won’t jon AT&T in policing for copyright violation


There’s a very interesting discussion at the NY Times blog with a Verizon person about why they’re not going to join AT&T in blocking packets that they algorithmically decide violate copyright.

Besides bearing on our copyright discussions, this also also bears on the question of the import and extent of the end-to-end argument. If there were only one carrier (note the hypothetical), if that carrier decided to inspect packets and treat them differently, that would not be occurring at the architectural level, but one could still argue that it violates the end-to-end principle. (Of course, the e2e principle is not inviolable; as the e2e paper itself says, it’s a design principle that should be violated when there are good reasons to.)

What I meant (class 2)


I completely blew my explanation of HTML 5, because I didn’t leave myself enough time. The handout I distributed at the beginning of class goes over the dreary details if you care about them, but I want to at least try to clarify why I brought it up at all. (Blogs mean never having to say you’re sorry?)

I wanted to give an example of answering the question “What is the Web?” So, let’s answer by saying, “The Web is a standard.” My plan then was to compare HTML to the Dublin Core; thankfully I didn’t try to get <u>that</u> into the last few minutes, too. Anyway, the Dublin Core is a standard for online documents that includes fields for author, language, and publisher, all of which are lacking in HTML. The proposed new version of HTML (HTML 5) makes a different set of decisions about what elements to include. My point was supposed to be that if the Web is a standard, that standard consists of decisions based upon anticipated uses…which is exactly what the End to End principle says we should <u>avoid</u> in network design. That’s not a criticism or a contradiction. In fact, standards always require us to make such decisions, based on anticipated and desired uses. Even the Internet’s design overall assumes that it’s good to pass information openly, freely, and in mass quantities.

But I jumped so far into the weeds of HTML 5, and tried to say too much too quickly, that there was no possibility that I communicated any of that. Sorry!

I hope the larger point of the session was clear, however. I’d say it’s something like this: We’re not going to be able to define the “it” of the course too clearly, but that’s fine. The Web is deep and important enough to resist easy definition, and there’s no reason why we should rule out of discussion areas of the Net based merely upon their technical protocols. Further, there are many useful ways of taking the Web: As standard, medium, social phenomenon, market, sphere, technical infrastructure, new public space, etc. These ways themselves stand in complex relations, each raising its own set of questions and issues. Since it’s not going to be a neat and tidy topic, we will do well to pay attention to how we’re taking the Web as we proceed with our discussions…

A cost of an end-to-end network


Slashdot is discussing the discovery that the Snopes site — a popular and trusted urban-myth buster — installs adware on the machines of the unwary. Here’s how one commenter (patio11) explains it:

A quick primer in online advertising, for those of you who block it:

At one end of the chain, we have Content Provider A. At the other end of the chain, we have Service Provider Z. Z wants to place advertising on A’s site but, importantly, doesn’t know how to do it, doesn’t generally know specifically who A is, and needs this to scale to potentially thousands of As. This is where participants B, C, D, E, F, Google, H… etc come in. There are advertising aggregators, affiliate networks, affiliates, affiliates of affiliates, affiliates of affilates of networks of affiliates who subdivide the advertising market into smaller and smaller slices before it finally gets on A’s site.

Now, somewhere in the chain, let us inject one person who is less than scrupulous. He doesn’t work at Snopes — this would tarnish a brand for a week’s worth of income, not a smart play. He probably has a steady stream of relationships with each of the numerous advertising concerns on the Internet, picking up and moving from one after he has collected a check or three and then had the banstick for TOS violations catch up with him. He is the one working for, most probably, affiliate of an affiliate of an affiliate of Zango.

This is the way most malware makes its way onto ad networks and, from there, onto high-trust sites. Volokh Conspiracy, one of my favorite blogs, had a nasty browser hijacker which affected non-US users for months before their advertising network caught wind of it. A few popular MMORPG sites have ended up hosting keyloggers in the same fashion. It is an unintended consequence of a system without central control — much like the Internet itself, actually. (The system being split up this way does have its advantages, for both endpoints of the chain and for everybody between. Google’s business model is based on snapping the chain and replacing it with a big cloud labeled Gooooooogle, but they’re not yet the only game in town.)