Computer-Mediated Communication Magazine / Volume 1, Number 6 / October 1, 1994 / Page 9

Challenges for Web Information Providers

by John December (john@december.com)

Continued from page 8 / Link to article's front page

Growth in Web Information Variety

While there's no quantifiable way to characterize Web growth in terms of the variety and extent of Web information available, it is possible to gain a qualitative sense for growth in Web information by examining the "What's New with NCSA Mosaic and the WWW" page and archives. The NCSA "What's New" page is a good indicator of the growth of the variety, quantity, and extent of information provided through the Web--particularly resulting from the popularity of the Web as a result of Mosaic development and use. The "What's New" page's archives go back to June 1993. The June 1993 page had 26 entries (11,426 bytes). A typical entry from that month is:

June 25, 1993
A Web server has been installed at the Centre Universitaire d'Informatique of the University of Geneva. Information about various research groups at the CUI is available, as well as a number of other experimental services.

Six months later, the December 1993 page had 124 entries (40,750 bytes), including not only institutional offerings such as:

December 10, 1993
A new Web server is online at the Nippon Telegraph and Telephone Corporation, in Tokyo, Japan, serving Japanese information. This server contains documents in Japanese, as well as notes on Japanese encoding methods and WWW browsers that can display Japanese.

But also more informal information:

December 10, 1993
The first ice hockey team on the Web!

and:

December 26, 1993
A J.R.R. Tolkien information page is now online at University of Waterloo.

By June 1994, the "What's New" page included 297 entries (146,684 bytes), and included more specialized webs such as:

June 29, 1994
The protein H-Bond analysis software, HBPLUS, now has a WWW page.
June 27, 1994
For lovers of plastic arts a page on stone sculpture from Zimbabwe has been placed on the Web. See these pages on Shona Sculpture! There also is a list of exhibitions on the subject. These pages are regularly expanded as new information becomes available.

These entries illustrate the trend on the Web during the spring and summer of 1994 for increasingly specialized webs. Also, during the early part of 1994, subject trees continued to flourish. Established subject-oriented webs, such as CERN's Virtual Library continued to grow, while newcomers like Yahoo grew at an astounding rate. According to Yahoo developer Jerry Chih-Yuan Yang, Yahoo started in late March 1994 with about 100 links. It then grew approximately as shown in the following table.

Yahoo Database Growth
1994 date   Number of URL Links
Mar           100
May 9        1666
Jun 1        2823
Jun 15       3607
Jul 11       5479
Jul 23       6121
Aug 5        7337
Aug 11       8265
Aug 17       8566

Therefore, along with an explosion in number of Web servers and traffic, there has been an expansion in content--both in terms of amount and diversity--with many institutional "official" webs as well as informal, entertainment, and individual webs growing in the numbers of the links they contain. While older subject trees such as CERN's Virtual Library operate according to a more conservative model involving distributed moderators (people who oversee development of an individual topic page within the virtual library), some informal, single-site, subject trees (exemplified by Yahoo) have grown very rapidly.

Growth Challenges for Information Providers

With all this growth--in Web servers, traffic, and information--what issues face the providers of Web information? Because an information provider's first concern should be the information user's needs, information discovery and retrieval tops the list of concerns. Web spiders, particularly newer spiders such as Lycos and systems such as Harvest, help solve parts of the information discovery and retrieval problem. Subject-oriented webs also assist users who want to browse information.

But another aspect of the user's needs happens after the user discovers and retrieves information--issues of content and presentation become very important. With rapid Web growth, the user can no longer easily browse alternate or multiple sources for the same or related information--the Web information space becomes "saturated"--that is, there is so much information that a human being can't adequately compare the value of available information sources on a particular subject or topic. Along with saturation, growth in the Web's information space leads to "pollution"--redundant, erroneous, or poorly maintained information that can obscure other information.

This is not to say that Web growth has created no valuable information nor is there a universal standard for information quality. Parts III and IV of this book highlight many valuable webs. Information saturation and pollution are an offshoot of diverse, distributed, creative, and chaotic development--much like the "noise" in Usenet. The goal of information providers is not to eliminate this "noise" on a global scope or crush creative expression--this would lead to unacceptable levels of control, censorship, loss of diversity, and undue dominance of one taste for information over another. (What is excellent information presentation to me may be very poor information presentation to you.) Moreover, information space saturation often leads to competition among information sources, and this competition can drive web weavers to implement information quality increases in order to garner attention and use.

So while saturation and pollution are issues that information providers should consider, the goal is not to seek out or eliminate offending webs or excoriate web weavers. Diversity of web information and views is not pollution or saturation. Rather, pollution and saturation occur when individual information providers don't meet their user's needs or interests. The challenge is for information providers to best define--according to the needs of their users--the content and presentation methods that define quality information so that their webs can meet user needs. This may even include eliminating web structures and information that other webs already offer (reducing redundancy) or correcting erroneous or poorly presented information. In Part V of this book, "Weaving a Web," I propose a methodology web weavers might use to achieve this process of meeting user needs and continuously improving a web.

While information spaces such as traditional media (TV, radio, newspapers) exhibit both saturation and pollution, these aspects are not as salient because of editorial and institutional control in these media. For example, there's a great deal of news during a single day, and I personally don't have time to sort it all out and decide what is important (saturation) or correct (pollution). Therefore, I choose various media--such as a network evening news broadcast or a weekly news magazine--to select and filter that information for me. So while I won't attend to wire reports or C-SPAN during the day, I can depend on Dan Rather or Bernard Shaw to provide me with a daily selection of important events. I choose information sources based on my previously developed trust of their work.

On the Web, however, there's no Dan Rather. Web information providers must become aware of the nature of their medium:

Surface cues are gone on the Web. All Web information is presented equivalently in the same way through a browser. Cues that might indicate quality or value (production values in a television broadcast, paper and print quality in a magazine) are gone. Individual Web browsers transform every HTML files using a specificied mechanism--so the surface cues about quality don't come from the interface. Instead, as covered in Part V and as discussed later in this chapter, the design of webs and arrangement of information in the HTML file are the main contributors to perceptions of quality differences. It is as if everyone had the same television production team, but they could only change the content and arrangement of the information to differentiate their show from others.
Web information providers don't face scarcities in presentation media. While scarcities of resources and time in print and broadcast media lead to traditions of editorial selection and control, Web publishers don't often face similar time or resource constraints for dissemination (unless, of course, in some cases, due to disk size or other constraints). The 22 minutes Dan Rather fills each night forces a selectivity (and a corresponding simplification) of content. On the Web, without constraints, information providers are not necessarily limited by arbitrary resource sizes to be selective about providing information.

Like cheap, suburban land, Webspace can thus fill with banalities.
Web information encounters different patterns of peer review. Voices of experience are not always heard on the Web. Unlike the peer review processes formal scholarly work often undergoes, the traditions for Web information review are not mature. Often, Web information can encounter a maelstrom of comment and critique similar to what a Usenet FAQ list faces. In other cases, collaborators or experts in a field or topic assist in reviewing and correcting Web information. In still other cases, "peer review" has little meaning: personal information (home pages), artistic expressions, and other information on informal webs (opinion, descriptions, product information, etc.) doesn't necessarily require close critique aside from accuracy checks to be valuable. Moreover, measures of value and "correctness" gained from traditional media can't be applied to a medium that is highly dynamic and, by its nature, always incomplete.

The challenge is for information providers to be aware of these issues that have arisen as a result of Web growth and the dynamic nature of Web information. Providers can consider how their information can:

Meet user needs
Cue the user as to the level of quality and completeness in the information
Face some noise-reducing filtering or selectivity, as the result of review or critique where appropriate

I see a tendency for an institutional imprimatur to be increasingly useful as a rule of thumb for ensuring some of these aspects of quality (borrowing the "trusted information source" idea described earlier for encountering news.) For example, I might seek out the Web page of a university or government research center for information related to a particular topic. This information is valuable, ultimately, because specialists and experts maintain it.

Still other challenges for information providers go beyond these considerations for information space saturation and pollution resulting from Web growth. These challenges involve the details of implementing specific information structures and developing processes for gathering, selecting, and presenting online information. To illustrate these issues, in the next section, I discuss some lessons I've learned from my experience in providing information.

Continued on page 10

This Issue / Index