Call (877) SITE-OPT (748-3678)

The Web is a Patchy Information Environment

The distribution of valuable information on the Internet is not uniform. Hubs and authority sites tend to have the most outbound and inbound links (or citations). The Web's link structure (or graph) has evolved to resemble the network architecture of many other complex systems including cellular metabolics and social networks. The architecture of such networks follows an inverse power law distribution, so the distribution of hub and authority sites on the Web appears linear when plotted in log-log coordinates. This scale-free hub/authority distribution on the Web means that the link distance (degrees of separation in the graph) to some target page will be smaller in a random larger hub than a smaller hub. That is one reason why people tend to start searching at large web portals. The smaller the link distance, the more closely related two pages will be.

Pirolli gives an example in Chapter 3 where he shows how the topical similarity between web pages diminishes when the link distance grows between them. Figure 1 shows that the Web has topically related information patches.

similarity of web pages versus link distance

Figure 1: Similarity of Pairs of Web Pages versus Minimum Link Distance

The topical similarity between web pages diminishes as the link distance between them grows. Similarities were computed using normalized correlations of word frequency vectors in a pair of documents. Data collected from the Xerox.com Web site, May 1998.

From Pirolli, P. (2007). "Information Foraging Theory: Adaptive Interaction with Information." New York, NY: Oxford University Press. Reprinted with permission.

Information Scent and Link Cues

Information scent is made up of the proximal cues associated with links that users use to determine the utility of distal content. Link cues such as link text, URL name, and surrounding text and graphics, are used by the hungry information forager to calculate the utility of different potential paths. Link cues are also used to determine when one has left an information patch. Foragers seek out rich patches of information, and leave when diminishing returns are detected.

Search engines sort query results in an easily scannable format, with the most relevant results listed first. As the forager scans down the results page, the likelihood of a relevant document diminishes with the number of items scanned. This law of diminishing returns is one reason why over 80% of searchers don't scan past the third page of search results.

Following Information Scent

Pirolli evaluated his theory by testing 14 Stanford students in a controlled search experiment to see how users followed information scent. He found that the difficulty of foraging on the Web appears to be related to the quality of the information scent cues available to the user. With strong scent, users moved quickly towards the target information. When the information scent was weak, a more random walk pattern of foraging was detected (see Figure 2). Our foraging behavior is not unlike the male grain beetle who smells a whiff of sex pheromone.

foraging paths of male grain beetle and web users

Figure 2: The Foraging Paths of the Male Grain Beetle and a Web User

(a) the beetle's search path following a 2-second nondirection puff of sex pheromone, (b) the beetle's search path in the presence of a constant uniform wind bearing sex pheromone, (c) a Web Behavior Graph (WBG) for a user on a task with poor directional information scent, and (d) a WBG for a user on a task with good information scent. Each WBG box represents a state of the user-system interaction.

From Pirolli, P. (2007). "Information Foraging Theory: Adaptive Interaction with Information." New York, NY: Oxford University Press. Reprinted with permission.

Switching when Information Scent Drops

Pirolli found that participants tend to not flit between web sites, but have more transitions within websites than without. Plotting the average scent ratings (rated by a panel of experts) of all the web pages visited, Pirolli found the reason why people switch to another site (information patch). Initially the information scent is high, but when the information scent falls below the average information scent in the pages encountered, users switch to another site or search engine (see Figure 3). Pirolli also found that starting with a high information scent was associated with longer runs at a web site (stickiness).

information scent ratings of sequence of web pages visited

Figure 3: Information scent of sequence of pages visited at web sites prior to going to another web site

Each data point represents the geometric mean of ratings from ten judges.

From Pirolli, P. (2007). "Information Foraging Theory: Adaptive Interaction with Information." New York, NY: Oxford University Press. Reprinted with permission.


The High Cost of Low Information Scent

Web sites tend to be organized in hierarchical tree-like structures. The deeper and wider a site, the more costly the "false alarms" of low information scent become (see Information Scent). In fact, small improvements in the false alarm factor associated with individual links can have dramatic effects on the cost of surfing large hypertext collections (see Figure 4). In the example below, when the false alarm factor exceeds 10%, the search cost of following unproductive paths (backtracking, traversing up and down trees) goes from linear to exponential. Therefore a small improvement in information scent can have a dramatic impact on usability. Slow websites exacerbate this phenomenon, especially as site breadth and depth increase.

false alarm factor versus branches searched

Figure 4: Number of Branches searched versus false alarm factor

As the false alarm factor times the number of branches searched exceeds 1 (fb=1), search costs move from linear to exponential.

From Pirolli, P. (2007). "Information Foraging Theory: Adaptive Interaction with Information." New York, NY: Oxford University Press. Reprinted with permission.

SNIF-ACT: Modeling Information Foraging on the Web

In order to formalize and test his newfound theory of information foraging, Pirolli and colleague Wai-tat Fu developed a cognitive model called SNIF-ACT, which stands for Scent-based Navigation and Information Foraging in the ACT architecture. SNIF-ACT simulates how users navigate over a series of web pages. The model builds on the rational analysis of Anderson and Lebiere's ACT cognitive architecture (2000) with information scent. Pirolli and Fu validated the model against actual web user data called user traces, or transcripts of users interaction with the Web (see Figure 5).

snif-act user modeling architecture

Figure 5: The SNIF-ACT User Modeling Architecture

The SNIF-ACT user modeling architecture (top) builds on the ACT-R theory. The user tracer architecture (bottom) the SNIF-ACT simulation and compares it to user trace data.

From Pirolli, P. (2007). "Information Foraging Theory: Adaptive Interaction with Information." New York, NY: Oxford University Press. Reprinted with permission.

SNIF-ACT assumes spreading activation networks have computational properties that reflect the properties of the linguistic environment. The key takeaway from SNIF-ACT is that predictions can be made for web users with particular goals using spreading activation networks constructed beforehand (a priori) with no messy parameters needed. Unlike after-the-fact usability heuristics that are based on empirical research and often give conflicting advice, Pirolli's model can be used to predict behavior before the user acts based on sound theoretical foundations. Pirolli's hope is that these psychological models can give a deeper understanding and solve real-world problems like those outlined in Chapter 9 of his book. SNIF-ACT does not currently make use of the full set of ACT-R modeling capabilities such as eye movement and "seeking plans" characteristic of expert users, so there is room for further improvement in the model.

Automated Information Scent Tools

Pirolli and his colleagues at PARC have created automated tools to predict link following behavior, calculate information scent for particular paths, and analyze log files to glean user behavior. Bloodhound (information scent), and Lumberjack (analyzing log files) are two tools they created based on his SNIF-ACT model. Unfortunately both of these tools are currently in digital mothballs. However Pirolli is collaborating with a team at Carnegie Mellon University (CMU) on new tool called CogTool-Explorer that combines a tool for designers called CogTool (built at CMU) with SNIF-ACT. You can mock up web pages - or menus on PDAs and cell phones, and have a cognitive model do an evaluation in the background. Leonghwee Teo is also working on automating the page mockup process for web page input.

Testing SNIF-ACT 1.0 and 2.0 against actual data

Testing the version 1.0 model for individual users against actual log files, Pirolli found good correlation of the information scent of links with link following and site-leaving actions. The results showed that SNIF-ACT was able to predict when people will leave a site by measuring the information scent. When people left a site, the scent was dropping. Interestingly, just before a participant left a site, the information scent was much lower than the web page he switched to. Again, when a user perceives that the current information scent is lower than the average, he switches to a more promising patch.

A more sophisticated SNIF-ACT 2.0 was developed that implemented more real-world satisficing behavior for link selection, an adaptive stopping rule for leaving web pages, and the ability to more accurately model the behavior of groups of users using Monte Carlo simulations. Testing log files from Yahoo and ParcWeb SNIF-ACT 2.0 accurately predicted the length of path taken through the sites (see Figure 6), and was slightly more accurate than the Law of Surfing (R2 = 0.99 versus R2=0.98).


path length prediction for snif-act, law of surfing versus observed data

Figure 6: Path Length Prediction for SNIF-ACT, Law of surfing, and observed data

The cumulative distribution function as a function of length of path (number of clicks) for the observed users on Yahoo and ParcWeb tasks, the fitted Law of Surfing (mean =2.3) clicks, variance = 1.35), and SNIF-ACT 2.0.

From Pirolli, P. (2007). "Information Foraging Theory: Adaptive Interaction with Information." New York, NY: Oxford University Press. Reprinted with permission.

Scatter/Gatherer Clustering and Re-Clustering Searches

Using search engines, we tend to select information clusters with keyphrase queries (gather), and further refine queries by refining the search terms (scatter) within search results or with entirely new queries. This selective refinement to enrich information patches is called scatter/gather interaction. Pirolli created a scatter/gather browser to test a refined ACT-IF model based on his SNIF-ACT model. The browser showed clusters of document titles and extracted keywords that users manipulated with a GUI interface. The ACT-IF model accurately predicted the observed ratings of relevant documents based on spreading activation and information scent (see Figure 7). This was one of Pirolli's ah-ha moments he referred to earlier in his text.

percentage of relevant document comparison

Figure 7: Observed versus Predicted Relevant Documents in Scatter/Cluster

Observed ratings of the percentage documents in each cluster that are relevant and the ratings predicted by activation-based assessment of information scent.

From Pirolli, P. (2007). "Information Foraging Theory: Adaptive Interaction with Information." New York, NY: Oxford University Press. Reprinted with permission.

Social Information Foraging

I found the last three chapters of the book to be the most interesting. Chapter 8 "Social Information Foraging" shows how a group of people can more efficiently discover, invent, and innovate than a single user. Undiscovered public knowledge can be found when groups forage for information to "connect the dots" and bridge between two network clusters of knowledge and information. Pirolli gives an example of fish oil helping Raynaud's syndrome, a blood circulation disorder associated with high blood viscosity and vasoconstruction giving sufferers cold hands. Fish oil can help this disorder, but this was implicit public knowledge for 5 +-3 years before someone put the two worlds together. There were two clusters of research papers and researchers, each citing and staying in their own worlds. One group discussed the benefits of fish oil, another talked about Raynaud's syndrome.

Swanson (1986) did a co-citation analysis that showed that the two groups cited some of the same outside papers. Investigating further he found that fish oil could possibly help Raynaud's syndrome. Most authors tend to stay within their own discipline, but those who spanned multiple research areas are likely to be the ones who enable knowledge flow from one area to another, and see the big picture to enable breakthrough discoveries.

Pirolli gives another example of structural holes and clusters in social networks. Organizational studies have shown that effective work groups are the ones who share information and knowledge with external members. Their effectiveness is improved by the structural diversity of the group. Social networks are typically formed into densely connected clusters of people. The sparse linkages between these clusters are structural holes. People on the edge of these clusters that bridge these structural holes are exposed to a greater diversity of information and knowledge, and broker information across these groups that becomes social capital that yields greater discovery of useful knowledge. Pirolli cites a study that found that managers who discussed issues with managers in other groups were not only better paid, but were more likely to receive positive reviews and be promoted.

Brokerage across groups has been shown to be important in success in many domains, including jazz, photography, engineering, and software development. Even serendipity in scientific discovery can be influenced by those few scientists that bridge disparate groups of people. It seems that your mother's maxim to get out and network has some theoretical foundations.

The Benefits of Cooperative Information Foraging

Pirolli gives a hypothetical example of solo versus group foraging. Like birds and other animals foraging for food en mass, humans can discover knowledge more quickly and thoroughly by foraging in groups. Figure 9 shows the benefits of joining a cooperative foraging group: the gain nearly triples at the optimum group size in this hypothetical example. Individuals tend to continue joining the group even after the optimal group size is reached, at least until the equilibrium state is reached whereupon the group is slowed by intergroup communication costs and becomes less efficient than the individual forager (see Figure 8). The benefits of cooperative information foraging help explain the success of Web 2.0 sites that allow groups of people to discuss problems en mass and discover knowledge at a faster pace than the lone information forager. Next page » boosting information scent ».

cooperative information foraging

Figure 8: Benefits of Cooperative Information Foraging

When interference costs are included, the gains from cooperative informating foraging peak at around an optimum group size of n=7, whereas equilibrium size of the group is at n=45. The solitary information forager is the dashed line. People will join the group even after the optimum group size.

From Pirolli, P. (2007). "Information Foraging Theory: Adaptive Interaction with Information." New York, NY: Oxford University Press. Reprinted with permission.

Information Foraging Theory Book Review > Theory > Boosting Information Scent > Interview

Copyright © 2002-2017 Website Optimization, LLC. All Rights Reserved - Free website speed test - Privacy Policy
Last modified: February 18, 2009.

Follow us on: Twitter, Google+, Facebook, Linked In