Untitled Document

Benkler on network topology and its implications for democracy (From Chapter Seven, section 5 of The Wealth of Networks)

Developments in network topology theory and its relationship to the structure of the empirically mapped real Internet offer a map of the networked information environment that is indeed quite different from the naïve model of "everyone a pamphleteer."

However, that is the wrong baseline.

There never has been a complex, large modern democracy in which everyone could speak and be heard by everyone else.

The correct baseline is the one-way structure of the commercial mass media.

The normatively relevant descriptive questions are whether the networked public sphere provides broader intake, participatory filtering, and relatively incorruptible platforms for creating public salience.

I suggest that it does.

Four characteristics of network topology structure the Web and the blogosphere in an ordered, but nonetheless meaningfully participatory form.

First, at a microlevel, sites cluster - in particular, topically and interest-related sites link much more heavily to each other than to other sites.

Second, at a macrolevel, the Web and the blogosphere have giant, strongly connected cores - "areas" where 20-30 percent of all sites are highly and redundantly interlinked; that is, tens or hundreds of millions of sites, rather than ten, fifty, or even five hundred television stations.

That pattern repeats itself in smaller subclusters as well.

Third, as the clusters get small enough, the obscurity of sites participating in the cluster diminishes, while the visibility of the superstars remains high, forming a filtering and transmission backbone for universal intake and local filtering.

Fourth and finally, the Web exhibits "small-world" phenomena, making most Web sites reachable through shallow paths from most other Web sites.

I will explain each of these below, as well as how they interact to form a reasonably attractive image of the networked public sphere.

First, links are not smoothly distributed throughout the network.

Computer scientists have looked at clustering from the perspective of what topical or other correlated characteristics describe these relatively high-density interconnected regions of nodes.

What they found was perhaps entirely predictable from an intuitive perspective of the network users, but important as we try to understand the structure of information flow on the Web.

Web sites cluster into topical and social/organizational clusters.

Early work done in the IBM Almaden Research Center on how link structure could be used as a search technique showed that by mapping densely interlinked sites without looking at content, one could find communities of interest that identify very fine-grained topical connections, such as Australian fire brigades or Turkish students in the United States./20

A later study out of the NEC Research Institute more formally defined the interlinking that would identify a "community" as one in which the nodes were more densely connected to each other than they were to nodes outside the cluster by some amount.

The study also showed that topically connected sites meet this definition.

For instance, sites related to molecular biology clustered with each other - in the sense of being more interlinked with each other than with off-topic sites - as did sites about physics and black holes./21

Lada Adamic and Natalie Glance recently showed that liberal political blogs and conservative political blogs densely interlink with each other, mostly pointing within each political leaning but with about 15 percent of links posted by the most visible sites also linking across the political divide./22

Physicists analyze clustering as the property of transitivity in networks: the increased probability that if node A is connected to node B, and node B is connected to node C, that node A also will be connected to node C, forming a triangle.

Newman has shown that the clustering coefficient of a network that exhibits power law distribution of connections or degrees - that is, its tendency to cluster - is related to the exponent of the distribution.

At low exponents, below 2.333, the clustering coefficient becomes high.

This explains analytically the empirically observed high level of clustering on the Web, whose exponent for inlinks has been empirically shown to be 2.1./23

Second, at a macrolevel and in smaller subclusters, the power law distribution does not resolve into everyone being connected in a mass-media model relationship to a small number of major "backbone" sites.

/24

That is, nodes within this core are heavily linked and interlinked, with multiple redundant paths among them.

Empirically, as of 2001, this structure was comprised of about 28 percent of nodes.

At the same time, about 22 percent of nodes had links into the core, but were not linked to from it - these may have been new sites, or relatively lower-interest sites.

The same proportion of sites was linked-to from the core, but did not link back to it - these might have been ultimate depositories of documents, or internal organizational sites.

Finally, roughly the same proportion of sites occupied "tendrils" or "tubes" that cannot reach, or be reached from, the core.

Tendrils can be reached from the group of sites that link into the strongly connected core or can reach into the group that can be connected to from the core.

Tubes connect the inlinking sites to the outlinked sites without going through the core.

About 10 percent of sites are entirely isolated.

This structure has been called a "bow tie" - with a large core and equally sized in- and outflows to and from that core (see figure 7.5).

Figure 7.5: Bow Tie Structure of the Web

One way of interpreting this structure as counterdemocratic is to say: This means that half of all Web sites are not reachable from the other half - the "IN," "tendrils," and disconnected portions cannot be reached from any of the sites in SCC and OUT.

On the other hand, one could say that half of all Web pages, the SCC and OUT components, are reachable from IN and SCC.

That is, hundreds of millions of pages are reachable from hundreds of millions of potential entry points.

This represents a very different intake function and freedom to speak in a way that is potentially accessible to others than a five-hundred-channel, mass-media model.

More significant yet, Dill and others showed that the bow tie structure appears not only at the level of the Web as a whole, but repeats itself within clusters.

That is, the Web appears to show characteristics of self-similarity, up to a point - links within clusters also follow a power law distribution and cluster, and have a bow tie structure of similar proportions to that of the overall Web.

Tying the two points about clustering and the presence of a strongly connected core, Dill and his coauthors showed that what they called "thematically unified clusters," such as geographically or content-related groupings of Web sites, themselves exhibit these strongly connected cores that provided a thematically defined navigational backbone to the Web.

It is not that one or two major sites were connected to by all thematically related sites; rather, as at the network level, on the order of 25-30 percent were highly interlinked, and another 25 percent were reachable from within the strongly connected core./25

Moreover, when the data was pared down to treat only the home page, rather than each Web page within a single site as a distinct "node" (that is, everything that came under www.foo.com was treated as one node, as opposed to the usual method where www.foo.com, www.foo.com/nonsuch, and www.foo.com/somethingelse are each treated as a separate node), fully 82 percent of the nodes were in the strongly connected core, and an additional 13 percent were reachable from the SCC as the OUT group.

Third, another finding of Web topology and critical adjustment to the basic Barabási and Albert model is that when the topically or organizationally related clusters become small enough - on the order of hundreds or even low thousands of Web pages - they no longer follow a pure power law distribution.

Instead of continuing to drop off exponentially, many sites exhibit a moderate degree of connectivity.

Figure 7.6 illustrates how a hypothetical distribution of this sort would differ both from the normal and power law distributions illustrated in figure 7.4.

David Pennock and others, in their paper describing these empirical findings, hypothesized a uniform component added to the purely exponential original Barabási and Albert model.

This uniform component could be random (as they modeled it), but might also stand for quality of materials, or level of interest in the site by participants in the smaller cluster.

At large numbers of nodes, the exponent dominates the uniform component, accounting for the pure power law distribution when looking at the Web as a whole, or even at broadly defined topics.

In smaller clusters of sites, however, the uniform component begins to exert a stronger pull on the distribution.

The exponent keeps the long tail intact, but the uniform component accounts for a much more moderate body.

Many sites will have dozens, or even hundreds of links.

The Pennock paper looked at sites whose number was reduced by looking only at sites of certain organizations - universities or public companies.

Chakrabarti and others later confirmed this finding for topical clusters as well.

That is, when they looked at small clusters of topically related sites, the distribution of links still has a long tail for a small number of highly connected sites in every topic, but the body of the distribution diverges from a power law distribution, and represents a substantial proportion of sites that are moderately linked./26

Even more specifically, Daniel Drezner and Henry Farrell reported that the Pennock modification better describes distribution of links specifically to and among political blogs./27

Figure 7.6: Illustration of a Skew Distribution That Does Not Follow a Power Law

These findings are critical to the interpretation of the distribution of links as it relates to human attention and communication.

The former leaves all but the very few languishing in obscurity, with no one to look at them.

The latter, as explained in more detail below, offers a mechanism for topically related and interest-based clusters to form a peer-reviewed system of filtering, accreditation, and salience generation.

It gives the long tail on the low end of the distribution heft (and quite a bit of wag).

The fourth and last piece of mapping the network as a platform for the public sphere is called the "small-worlds effect."

Based on Stanley Milgram's sociological experiment and on mathematical models later proposed by Duncan Watts and Steven Strogatz, both theoretical and empirical work has shown that the number of links that must be traversed from any point in the network to any other point is relatively small./28

Fairly shallow "walks" - that is, clicking through three or four layers of links - allow a user to cover a large portion of the Web.

What is true of the Web as a whole turns out to be true of the blogosphere as well, and even of the specifically political blogosphere.

In two blog-based studies, Clay Shirky and then Jason Kottke published widely read explanations of how the blogosphere was simply exhibiting the power law characteristics common on the Web./29

The emergence in 2003 of discussions of this sort in the blogosphere is, it turns out, hardly surprising.

In a time-sensitive study also published in 2003, Kumar and others provided an analysis of the network topology of the blogosphere.

They found that it was very similar to that of the Web as a whole - both at the macro- and microlevels.

Interestingly, they found that the strongly connected core only developed after a certain threshold, in terms of total number of nodes, had been reached, and that it began to develop extensively only in 2001, reached about 20 percent of all blogs in 2002, and continued to grow rapidly.

They also showed that what they called the "community" structure - the degree of clustering or mutual pointing within groups - was high, an order of magnitude more than a random graph with a similar power law exponent would have generated.

Moreover, the degree to which a cluster is active or inactive, highly connected or not, changes over time.

In addition to time-insensitive superstars, there are also flare-ups of connectivity for sites depending on the activity and relevance of their community of interest.

This latter observation is consistent with what we saw happen for BoycottSBG.com.

Kumar and his collaborators explained these phenomena by the not-too-surprising claim that bloggers link to each other based on topicality - that is, their judgment of the quality and relevance of the materials - not only on the basis of how well connected they are already./30

This body of literature on network topology suggests a model for how order has emerged on the Internet, the World Wide Web, and the blogosphere.

We now know that the network at all its various layers follows a degree of order, where some sites are vastly more visible than most.

This order is loose enough, however, and exhibits a sufficient number of redundant paths from an enormous number of sites to another enormous number, that the effect is fundamentally different from the small number of commercial professional editors of the mass media.

Individuals and individual organizations cluster around topical, organizational, or other common features.

Because even in small clusters the distribution of links still has a long tail, these smaller clusters still include high-visibility nodes.

These relatively high-visibility nodes can serve as points of transfer to larger clusters, acting as an attention backbone that transmits information among clusters.

Subclusters within a general category - such as liberal and conservative blogs clustering within the broader cluster of political blogs - are also interlinked, though less densely than within-cluster connectivity.

The higher level or larger clusters again exhibit a similar feature, where higher visibility nodes can serve as clearinghouses and connectivity points among clusters and across the Web.

These are all highly connected with redundant links within a giant, strongly connected core - comprising more than a quarter of the nodes in any given level of cluster.

The small-worlds phenomenon means that individual users who travel a small number of different links from similar starting points within a cluster cover large portions of the Web and can find diverse sites.

By then linking to them on their own Web sites, or giving them to others by e-mail or blog post, sites provide multiple redundant paths open to many users to and from most statements on the Web.

High-visibility nodes amplify and focus on given statements, and in this regard, have greater power in the information environment they occupy.

However, there is sufficient redundancy of paths through high-visibility nodes that no single node or small collection of nodes can control the flow of information in the core and around the Web.

This is true both at the level of the cluster and at the level of the Web as a whole.

The result is an ordered system of intake, filtering, and synthesis that can in theory emerge in networks generally, and empirically has been shown to have emerged on the Web.

It avoids the generation of a din through which no voice can be heard, as the fears of fragmentation predicted.

And, while money may be useful in achieving visibility, the structure of the Web means that money is neither necessary nor sufficient to grab attention - because the networked information economy, unlike its industrial predecessor, does not offer simple points of dissemination and control for purchasing assured attention.

What the network topology literature allows us to do, then, is to offer a richer, more detailed, and empirically supported picture of how the network can be a platform for the public sphere that is structured in a fundamentally different way than the mass-media model.

The problem is approached through a self-organizing principle, beginning with communities of interest on smallish scales, practices of mutual pointing, and the fact that, with freedom to choose what to see and who to link to, with some codependence among the choices of individuals as to whom to link, highly connected points emerge even at small scales, and continue to be replicated with ever-larger visibility as the clusters grow.

Without forming or requiring a formal hierarchy, and without creating single points of control, each cluster generates a set of sites that offer points of initial filtering, in ways that are still congruent with the judgments of participants in the highly connected small cluster.

The process is replicated at larger and more general clusters, to the point where positions that have been synthesized "locally" and "regionally" can reach Web-wide visibility and salience.

It turns out that we are not intellectual lemmings.

We do not use the freedom that the network has made possible to plunge into the abyss of incoherent babble.

Instead, through iterative processes of cooperative filtering and "transmission" through the high visibility nodes, the low-end thin tail turns out to be a peer-produced filter and transmission medium for a vastly larger number of speakers than was imaginable in the mass-media model.

The effects of the topology of the network are reinforced by the cultural forms of linking, e-mail lists, and the writable Web.

The emergence of the writable Web, however, allows each node to itself become a cluster of users and posters who, collectively, gain salience as a node.

Slashdot is "a node" in the network as a whole, one that is highly linked and visible.

Slashdot itself, however, is a highly distributed system for peer production of observations and opinions about matters that people who care about information technology and communications ought to care about.

Some of the most visible blogs, like the dailyKos, are cooperative blogs with a number of authors.

More important, the major blogs receive input - through posts or e-mails - from their users.

Recall, for example, that the original discussion of a Sinclair boycott that would focus on local advertisers arrived on TalkingPoints through an e-mail comment from a reader.

Talkingpoints regularly solicits and incorporates input from and research by its users.

The cultural practice of writing to highly visible blogs with far greater ease than writing a letter to the editor and with looser constraints on what gets posted makes these nodes themselves platforms for the expression, filtering, and synthesis of observations and opinions.

Moreover, as Drezner and Farrell have shown, blogs have developed cultural practices of mutual citation - when one blogger finds a source by reading another, the practice is to link to the original blog, not only directly to the underlying source.

Jack Balkin has argued that the culture of linking more generally and the "see for yourself" culture also significantly militate against fragmentation of discourse, because users link to materials they are commenting on, even in disagreement.

Our understanding of the emerging structure of the networked information environment, then, provides the basis for a response to the family of criticisms of the first generation claims that the Internet democratizes.

The first claim was that the Internet would result in a fragmentation of public discourse.

The clustering of topically related sites, such as politically oriented sites, and of communities of interest, the emergence of high-visibility sites that the majority of sites link to, and the practices of mutual linking show quantitatively and qualitatively what Internet users likely experience intuitively.

While there is enormous diversity on the Internet, there are also mechanisms and practices that generate a common set of themes, concerns, and public knowledge around which a public sphere can emerge.

Any given site is likely to be within a very small number of clicks away from a site that is visible from a very large number of other sites, and these form a backbone of common materials, observations, and concerns.

All the findings of power law distribution of linking, clustering, and the presence of a strongly connected core, as well as the linking culture and "see for yourself," oppose the fragmentation prediction.

Users self-organize to filter the universe of information that is generated in the network.

This self-organization includes a number of highly salient sites that provide a core of common social and cultural experiences and knowledge that can provide the basis for a common public sphere, rather than a fragmented one.

The second claim was that fragmentation would cause polarization.

Given that the evidence demonstrates there is no fragmentation, in the sense of a lack of a common discourse, it would be surprising to find higher polarization because of the Internet.

Moreover, as Balkin argued, the fact that the Internet allows widely dispersed people with extreme views to find each other and talk is not a failure for the liberal public sphere, though it may present new challenges for the liberal state in constraining extreme action.

Only polarization of discourse in society as a whole can properly be considered a challenge to the attractiveness of the networked public sphere.

However, the practices of linking, "see for yourself," or quotation of the position one is criticizing, and the widespread practice of examining and criticizing the assumptions and assertions of one's interlocutors actually point the other way, militating against polarization.

A potential counterargument, however, was created by the most extensive recent study of the political blogosphere.

In that study, Adamic and Glance showed that only about 10 percent of the links on any randomly selected political blog linked to a site across the ideological divide.

The number increased for the "A-list" political blogs, which linked across the political divide about 15 percent of the time.

The picture that emerges is one of distinct "liberal" and "conservative" spheres of conversation, with very dense links within, and more sparse links between them.

On one interpretation, then, although there are salient sites that provide a common subject matter for discourse, actual conversations occur in distinct and separate spheres - exactly the kind of setting that Sunstein argued would lead to polarization.

Two of the study's findings, however, suggest a different interpretation.

The first was that there was still a substantial amount of cross-divide linking.

One out of every six or seven links in the top sites on each side of the divide linked to the other side in roughly equal proportions (although conservatives tended to link slightly more overall - both internally and across the divide).

The second was, that in an effort to see whether the more closely interlinked conservative sites therefore showed greater convergence "on message," Adamic and Glance found that greater interlinking did not correlate with less diversity in external (outside of the blogosphere) reference points./31

Together, these findings suggest a different interpretation.

Each cluster of more or less like-minded blogs tended to read each other and quote each other much more than they did the other side.

This operated not so much as an echo chamber as a forum for working out of observations and interpretations internally, among like-minded people.

Many of these initial statements or inquiries die because the community finds them uninteresting or fruitless.

Some reach greater salience, and are distributed through the high-visibility sites throughout the community of interest.

Issues that in this form reached political salience became topics of conversation and commentary across the divide.

This is certainly consistent with both the BoycottSBG and Diebold stories, where we saw a significant early working out of strategies and observations before the criticism reached genuine political salience.

There would have been no point for opponents to link to and criticize early ideas kicked around within the community, like opposing Sinclair station renewal applications.

Only after a few days, when the boycott was crystallizing, would opponents have reason to point out the boycott effort and discuss it.

This interpretation also well characterizes the way in which the Trent Lott story described later in this chapter began percolating on the liberal side of the blogosphere, but then migrated over to the center-right.

The third claim was that money would reemerge as the primary source of power brokerage because of the difficulty of getting attention on the Net.

It differs in the mechanism of concentration: it will not be the result of an emergent property of large-scale networks, but rather of an old, tried-and-true way of capturing the political arena - money.

But the peer-production model of filtering and discussion suggests that the networked public sphere will be substantially less corruptible by money.

In the interpretation that I propose, filtering for the network as a whole is done as a form of nested peer-review decisions, beginning with the speaker's closest information affinity group.

Consistent with what we have been seeing in more structured peer-production projects like Wikipedia, Slashdot, or free software, communities of interest use clustering and mutual pointing to peer produce the basic filtering mechanism necessary for the public sphere to be effective and avoid being drowned in the din of the crowd.

The nested structure of the Web, whereby subclusters form relatively dense higher-level clusters, which then again combine into even higher-level clusters, and in each case, have a number of high-end salient sites, allows for the statements that pass these filters to become globally salient in the relevant public sphere.

This structure, which describes the analytic and empirical work on the Web as a whole, fits remarkably well as a description of the dynamics we saw in looking more closely at the success of the boycott on Sinclair, as well as the successful campaign to investigate and challenge Diebold's voting machines.

The peer-produced structure of the attention backbone suggests that money is neither necessary nor sufficient to attract attention in the networked public sphere (although nothing suggests that money has become irrelevant to political attention given the continued importance of mass media).

These suggest that attention on the network has more to do with mobilizing the judgments, links, and cooperation of large bodies of small-scale contributors than with applying large sums of money.

There is no obvious broadcast station that one can buy in order to assure salience.

There are, of course, the highly visible sites, and they do offer a mechanism of getting your message to large numbers of people.

However, the degree of engaged readership, interlinking, and clustering suggests that, in fact, being exposed to a certain message in one or a small number of highly visible places accounts for only a small part of the range of "reading" that gets done.

More significantly, it suggests that reading, as opposed to having a conversation, is only part of what people do in the networked environment.

In the networked public sphere, receiving information or getting out a finished message are only parts, and not necessarily the most important parts, of democratic discourse.

The central desideratum of a political campaign that is rooted in the Internet is the capacity to engage users to the point that they become effective participants in a conversation and an effort; one that they have a genuine stake in and that is linked to a larger, society-wide debate.

This engagement is not easily purchased, nor is it captured by the concept of a well-educated public that receives all the information it needs to be an informed citizenry.

Instead, it is precisely the varied modes of participation in small-, medium-, and large-scale conversations, with varied but sustained degrees of efficacy, that make the public sphere of the networked environment different, and more attractive, than was the mass-media-based public sphere.

The networked public sphere is not only more resistant to control by money, but it is also less susceptible to the lowest-common-denominator orientation that the pursuit of money often leads mass media to adopt.

It begins with what irks you, the contributing peer, individually, the most.

This is, in the political world, analogous to Eric Raymond's claim that every free or open-source software project begins with programmers with an itch to scratch - something directly relevant to their lives and needs that they want to fix.

The networked information economy, which makes it possible for individuals alone and in cooperation with others to scour the universe of politically relevant events, to point to them, and to comment and argue about them, follows a similar logic.

This is why one freelance writer with lefty leanings, Russ Kick, is able to maintain a Web site, The Memory Hole, with documents that he gets by filing Freedom of Information Act requests.

In April 2004, Kick was the first to obtain the U.S. military's photographs of the coffins of personnel killed in Iraq being flown home.

No mainstream news organization had done so, but many published the photographs almost immediately after Kick had obtained them.

Like free software, like Davis and the bloggers who participated in the debates over the Sinclair boycott, or the students who published the Diebold e-mails, the decision of what to publish does not start from a manager's or editor's judgment of what would be relevant and interesting to many people without being overly upsetting to too many others.

It starts with the question: What do I care about most now?

To conclude, we need to consider the attractiveness of the networked public sphere not from the perspective of the mid-1990s utopianism, but from the perspective of how it compares to the actual media that have dominated the public sphere in all modern democracies.

This nonmarket alternative can attenuate the influence over the public sphere that can be achieved through control over, or purchase of control over, the mass media.

It offers a substantially broader capture basin for intake of observations and opinions generated by anyone with a stake in the polity, anywhere.

It appears to have developed a structure that allows for this enormous capture basin to be filtered, synthesized, and made part of a polity-wide discourse.

This nested structure of clusters of communities of interest, typified by steadily increasing visibility of superstar nodes, allows for both the filtering and salience to climb up the hierarchy of clusters, but offers sufficient redundant paths and interlinking to avoid the creation of a small set of points of control where power can be either directly exercised or bought.

There is, in this story, an enormous degree of contingency and factual specificity.

They are instead based on, and depend on the continued accuracy of, a description of the economics of fabrication of computers and network connections, and a description of the dynamics of linking in a network of connected nodes.

As such, my claim is not that the Internet inherently liberates.

I do not claim that commons-based production of information, knowledge, and culture will win out by some irresistible progressive force.

That is what makes the study of the political economy of information, knowledge, and culture in the networked environment directly relevant to policy.

The literature on network topology suggests that, as long as there are widely distributed capabilities to publish, link, and advise others about what to read and link to, networks enable intrinsic processes that allow substantial ordering of the information.

The pattern of information flow in such a network is more resistant to the application of control or influence than was the mass-media model.

But things can change.

Google could become so powerful on the desktop, in the e-mail utility, and on the Web, that it will effectively become a supernode that will indeed raise the prospect of a reemergence of a mass-media model.

Then the politics of search engines, as Lucas Introna and Helen Nissenbaum called it, become central.

The zeal to curb peer-to-peer file sharing of movies and music could lead to a substantial redesign of computing equipment and networks, to a degree that would make it harder for end users to exchange information of their own making.

Understanding what we will lose if such changes indeed warp the topology of the network, and through it the basic structure of the networked public sphere, is precisely the object of this book as a whole.

For now, though, let us say that the networked information economy as it has developed to this date has a capacity to take in, filter, and synthesize observations and opinions from a population that is orders of magnitude larger than the population that was capable of being captured by the mass media.

It has done so without re-creating identifiable and reliable points of control and manipulation that would replicate the core limitation of the mass-media model of the public sphere - its susceptibility to the exertion of control by its regulators, owners, or those who pay them.