finn-proj.htmlTEXTMOSS0S@H@I}
 
 
 
Mike Finn              214 827-7917 (home)
Dallas Morning News            214 977-8705 (work)
Knight Fellow, University of North Carolina         quipson@aol.com
 
 Two men, two visions.
 The sizzle vs. the steak.
 USA Today vs. The Wall Street Journal.


 Former Time magazine design director Nigel Holmes believes that creativity and color in data graphics will draw the reader to the story. He views graphics as a door-opener to what otherwise might be seen as a boring story about numbers.

 Yale professor Edward Tufte teaches statistics and political science. He views quantitative graphics as an end in themselves. He coined the word chartjunk to refer to the decorative design that has nothing to do with data. He believed so strongly that the media were doing a poor job of presenting data that he took out a second mortgage on his home to self-publish The Visual Display of Quantitative Information. Since 1983, more than 250,000 copies have been sold. Tufte also conducts workshops to sold-out houses drawn by his colorful presentations.

 This view of these two giants in graphic design may be an oversimplification, but it does provide a perspective in which to evaluate the current state of media graphics. In fairness to Holmes, it should be noted that he has modified his position slightly: The value of the picture, as opposed to decorations, is to help people understand. But, if the picture takes over and obscures the information, then it would be better not to have it at all even if you risk losing the reader.

 Overall, the artistic approach works better when a visual explanation is called for. The statistical approach works better when large amounts of numerical data need to be summarized. The Tufte approach?exemplified by the New York Times, the Wall Street Journal and the Washington Post?not only reflects a decision to let the data speak for themselves but also reflects a decision to increase coverage of data-rich topics and to use anecdotal story-telling to add texture. Tuftes no-frills approach to data graphics is far from universal, but it is clearly the trend.

 The focus of this examination will deal with the current state of data graphics and will not attempt to assess the pros and cons of artistically driven graphics.

IMPROVEMENTS IN MEDIA GRAPHICS

 In his 1983 book, Tufte examined 12 of the worlds leading newspapers and magazines and found that U.S. publications were less sophisticated in their use of data graphics. Of the 1,865 statistical graphics that he sampled from 1976-80 in U.S. publications, only seven showed more than one variable, excluding time-series and map graphics. A survey of the same U.S. publications in early 1998 showed an increase in multivariate data graphics, although the large majority of the 119 graphics surveyed (about 80 percent) were time-series charts. The following table shows the percentages for the five U.S. publications that Tufte survey two decades earlier.
                                    Tufte Pct.   98 Pct.
 Business Week                0.6%      10.0%
 New York Times             0.5%      17.6%
 Time                                0.0%      23.1%
 Wall Street Journal          0.0%      23.3%
 Washington Post             0.0%      10.5%

 One of the major complaints critics have made about media graphics involves distortions created by certain three-dimensional graphics. An oft-cited example shows a series of oil barrels depicting an increase in imported oil. If the smallest barrel was represented by a 2-inch-high graphic, the artist would draw a 3-inch-high graphic to represent a 50 percent increase. But assuming the height-width-diameter ratios remained the same, the larger barrel would hold more than three times as much oil?not the 50 percent that the height ratio alone represented. Similarly, two-dimensional symbols can distort reality when they expand in both length and width.

 Researchers have found that individuals underestimate the differences between objects drawn in 3-D. This finding raises the question of whether 3-D objects should be drawn to allow for perceptual bias or to represent geometric honesty. The oil barrel that represented the 50 percent increase would be only 15 percent taller than the smaller barrel if it were drawn to represent geometric honesty. But clearly, the visual impact of the graphic would fail to represent the significance of the change.

 In recent years, the print media have avoided this sticky issue by eliminating the offensive 3-D data graphics. Some newspapers and many advertisers still use 3-D bar graphs but hold constant the width and depth dimensions. This approach doesnt create a visual distortion. In geometric terms, a 3x2x2 block is 50 percent larger than a 2x2x2 block and is perceived as 50 percent larger.

 The perceptual research on data graphics has been fairly consistent. The most accurately perceived graphic elements are line length, followed by bars, slopes, areas, volumes and colors.

Colors are a particularly troublesome area because the brain appears to be using both nominal and ordinal criteria. Subjects unfamiliar with the ROY G BIV acronym for the order of rainbow colors are not very proficient at arranging the colors in order.
 The other problem is the lack of an agreed upon standards. For example, severe weather on local television can be represented by a variety of colors depending upon which software package they use.

Most often, the best advice is to use as few colors as possible and vary the hue to show change.

OTHER IMPROVEMENTS

The print media have adopted many of Tuftes suggestions on gridlines. They are much less intrusive, often fewer?if present at all?and almost without exception thinner than they were two decades ago. One innovation involves the use of white over a shaded background, a twist on Tuftes suggestion that white space be used to show ticks on bar graphs. (Ticks are short lines on the axes that show the numerical values of the data points. Ticks can also be used on bar charts in place of gridlines to show where the gridlines would have intersected.)
Other elements in Tuftes theory of data graphics have shown varying degrees of improvement. Tufte calls the decorative frills that often accompany data graphics chartjunk. Tufte believes nothing is gained by the inclusion of non-data. They are simply a distraction to the data and are unnecessary. He introduces the concept of data/ink ratio. Simply stated, the ratio tells how much of the graphic can be erased and still show the findings.

A third concept he uses is data density or how many bits of numerical data are presented in a given area. One extremely appealing graphic is the New York Times yearly summary of weather. The high and low temperatures for each day are shown in a display that looks similar to a cardiogram on a roller-coaster. Superimposed over this squiggly line are two bands that show the normal highs and lows. In all, more than 2,000 individual bits of information are represented.

THE NEED FOR IMPROVEMENT

One area that remains troublesome is indexing or adjusting data. Two major data sets that need to be indexed are inflation and population growth. When stories involve change over time, reporters and editors should be aware that some change may be a result of factors that are not immediately obvious and can lead to apples-and-oranges comparisons.

For example, one researcher cited a Forbes article in which an increase in SAT scores was linked to an increase in educational spending. The dollar amounts were neither adjusted for inflation nor population growth. After the data were adjusted to show per capita spending on education, there was no relationship evident.

Its often prudent to use rates rather than raw numbers. Crime statistics are an example of data that local authorities usually dont adjust for population growth; they simply report the number of crimes. On the other hand, the FBI reports crime rates, the number of crimes per 100,000 people. It is possible that fast-growing communities could show both an increase in total crimes and a decrease in the crime rate.

TABLES VS. CHARTS

Tufte suggests that data with fewer than 20 numbers be presented in a tabular format and not as a graphic. Often, the only purpose of data graphics is to break up gray areas of type. In a perfect world, Tufte may be right. But adherence to his view of graphic purity often has to compete with newsroom constraints such as deadlines and limited personnel.

Tuftes theory implies that data graphics use more numbers. The number of data junkies among newspaper readers may surprise some, but no sports editor nor business editor has rarely, if ever, been criticized for running too much agate.

The difference between lists of numbers and graphical displays usually involves analysis, trying to link one set of numbers to another set, looking for possible cause-and-effect relationships. Tuftes criticism of data graphics that use only one variable plotted against time bears directly on the medias responsibility to look for connections.

THE ZERO BASELINE FALLACY

 The use of zero base lines continues to be a problem. It is one of those absolutes that many newspaper art directors accept as gospel. But the time has come to drop the stridency. Noted statistician William Cleveland suggests that the widespread adherence to, and fervor in defense of, this rule probably can be traced to a 1954 book, How to Lie With Statistics by Darrell Huff. A graphic cited by Huff eliminated 90 percent of the vertical axis and, consequently, a trivial change was magnified in the process. The lie was not the result of the data; it was result of using a graphic to show meaningless variation. If the differences in data points are trivial, there is rarely any need to use a data graphic. A good editor would have said, Big deal. So what? We arent going to use a graphic to show such a piddly difference.

 The decision to use or not use a zero base should be based on news judgment not some arbitrary rule. Most papers?even those with a strict adherence to the zero base rule?wisely ignore the zero base when dealing the Dow Jones industrial average. Eliminating the zero base will allow the variation in the Dow to be seen more readily. Meaningful differences can appear visually trivial when following the zero baseline rule. University of North Carolina professor Philip Meyer has compared this unnecessary use of a zero base line to plotting a hurricane surge against the depth of the ocean. Using a zero base line would be the flip side to Huffs argument: Significant differences are minimized visually by expanding the length of the vertical axis unnecessarily.

 A similar problem in displaying data involves the slope of the line connecting data points or the general outline of bar charts. Readers perceive steep slopes as showing significant variation and flat lines as trivial. The slope can be manipulated by adjusting the ticks on the axes. Packing the data points closely together on the horizontal axis and expanding the area between points on the vertical axis will increase the steepness of the slope.
 Again, this is a judgment call.

 An example of small differences resulting in significant differences to the reader involves interest rates on home loans. Mortgage rates closely track bond rates?usually about 2 percentage points higher than the 30-year bond rate. A drop of 1 percentage point in the bond rate could make it possible for home buyers to afford a house costing an extra $10,000 or could mean the difference between 20 years of mortgage payments vs. 30 years.

 Perhaps the most unreadable chart is the one that looks like the cut-away view of a geological sample. Generally, this type of area chart has about five variables that appear as layers with the horizontal axis serving as a measure of time. This type of data graphic is usually attempting to show how the mix of variables changes over time. For example, the tastes in music have changed over time. The percentage of rock n roll has certainly gone down in the past 25 years; country has probably gone up and down; hip-hop and rap have probably taken share away from rock. And overall, the sale of recorded music has increased.

 This type of data graphic makes nearly impossible demands on readers. First they have to compare a specific type of music against the total sold, then compare it with the other types of music and then compare its share over time. And to top it off, only the music segment situated on the horizontal axis has a reference point that makes it easy to judge. The other sectors are stacked on top of each other with hard to define base lines.

 The solutions largely depend on clarifying the intent of the graphic.

 If the object is to show that all music sales are increasing but the growth rates vary considerably, one innovative solution is to draw two vertical bars and locate the sales figures at Time 1 and Time 2 for each type of music. The respective sales figures are then connected with arrows. The steepness of the slope represents the rate of change.

 Another solution would be compare the growth rates of individual segments against the industry average. One limitation of this approach is that it doesnt show absolute share or sales figures. But this shortcoming could be overcome with an additional graphic or table. However, this technique is appropriate when the goal is to show the winners and losers in a contest to gain market share.

 In other cases, it may be necessary to combine certain categories or to eliminate segments to emphasize the changing trends.

 Other solutions include using separate graphics to show each segment or a single graphic superimposing the sales of the various segments. The experimental evidence on which technique is more effective is muddled and inconclusive. A visual examination of these alternatives is probably necessary to ascertain whether they convey the desired effect.
 The use of multiple graphics that use different units can also create problems. One series of charts showed the declining values of various Asian currencies against the dollar but failed to standardize the currencies against themselves. Consequently the slopes of the various declines did not mirror the percentage drop in the various currencies.

 The use of pie charts remains a staple in the mass media but has been largely abandoned by the scientific journals. Opposition to their use has grown among experts in data graphics. Their opposition largely deals with perceptual problems in judging the relative sizes of slices. This difficulty is exacerbated when ovals and three-dimensional pie charts are used. Other problems that can plague pie charts include an excessive number of categories, extremely thin slices and cluttered designs.

LOOKING TO THE FUTURE

 The history of data graphics is relatively short compared to the history of mathematics and statistics. William Playfair is credited with their introduction in the late 1700s. The limitations of printing are generally cited as a major reason for their late introduction.
 The design of data graphics remained relatively stagnant until the latter half of the 20th century when statisticians began experimenting with new formats. John Tukey, to whom Tufte dedicated his book, was the foremost statistician who worked on new designs. Tufte was a colleague of Tukey when both men were on the faculty at Princeton.

 One of the new designs that has simple elegance has already been mentioned: the use of two vertical bars that connect data points with arrows to show the degree of change over time.
 Two other simple designs provide more information than bar or line charts and could easily be adopted by newspapers. A variation of Tukeys box-and-whisker plot would be extremely informative in showing distributions, such as income. Mean or median incomes for various groups are represented by a dot and lines extend from the dot to show where the 25th percentile and 75th percentile figures are. This design is far superior to simply reporting the mean income of a group of people. It is also relevant to reporting school test scores.
 This graphic may appear strange to readers, and a legend will probably be necessary to explain the design. But the additional information that the design provides seems well worth the effort.
 Another viable design involves the use of logarithmic scales. These are particularly relevant in technological advances. Plotting the speed of various computer chips using conventional scaling would not be particularly informative because the exponential increase in chip speed would make the plot look like a backward L. It would be almost impossible to discern any difference in chip speed in the early years.

Log scaling can be accomplished in several ways. Investors Business Daily decreases the distance between the ticks and maintains the traditional numerical labels. This allows its readers to visually inspect the chart and assess percentage changes. For example, the vertical distance between $20 and $25 is the same as the distance between $40 and $50, and both represent a 25 percent increase.

 Another technique, which could be applied to chip speed or number of Internet users, might use a log scale with the base 10. In this example, the ticks would be equally spaced but the numerical values would increase by a factor of 10, proceeding in this manner: 1, 100, 1,000, 10,000, 100,000, etc.

If readers are familiar with the subject of the graphic, they are likely to comprehend why a log scale was used?even if they dont understand the subtleties of the scales. The Richter scale of earthquakes is an example: Persons living along the San Andreas fault are much more likely to understand its significance that those who have never experienced an earthquake.
 University of North Carolina statistics professor Richard Smith agrees with Tufte that the only reason this type of data graphic is not used in newspapers is the readers lack of familiarity with the design?and not any increased complexity.

FINAL THOUGHTS

 The Internet has opened vast warehouses of information that many reporters hadnt known existed. Combined with the computer revolution, the potential for in-depth investigations is greater now than ever.  The time is ripe to go beyond descriptive and proceed to analytical.
 Tufte hypothesizes and anecdotal evidence suggests that graphic expertise and numerical aptitude are not correlated. Some journalists are skilled in both areas, some in neither, others in one or the other. Teamwork or the use of supervisors skilled in both areas may proved to be a more effective solution than training.

 The computer has increased the medias ability to present data graphics more easily and more colorfully. Several graphic designers have indicated in interviews that their newspapers?and they personally?have scaled back their use of the bells and whistles available in their software packages. Several graphics experts have compared the early immaturity of computer data graphics to the current state of Web pages on the Internet. Computer users go to the Net for information not to look at colorful designs. They dont want to go through four or five screens to get the information they are seeking or, worse yet, waste all that time finding out that what they want isnt there.