The distribution of valuable information on the Internet is not uniform. Hubs and authority sites tend to have the most outbound and inbound links (or citations). The Web's link structure (or graph) has evolved to resemble the network architecture of many other complex systems including cellular metabolics and social networks. The architecture of such networks follows an inverse power law distribution, so the distribution of hub and authority sites on the Web appears linear when plotted in log-log coordinates. This scale-free hub/authority distribution on the Web means that the link distance (degrees of separation in the graph) to some target page will be smaller in a random larger hub than a smaller hub. That is one reason why people tend to start searching at large web portals. The smaller the link distance, the more closely related two pages will be.
Pirolli gives an example in Chapter 3 where he shows how the topical similarity between web pages diminishes when the link distance grows between them. Figure 1 shows that the Web has topically related information patches.
Information scent is made up of the proximal cues associated with links that users use to determine the utility of distal content. Link cues such as link text, URL name, and surrounding text and graphics, are used by the hungry information forager to calculate the utility of different potential paths. Link cues are also used to determine when one has left an information patch. Foragers seek out rich patches of information, and leave when diminishing returns are detected.
Search engines sort query results in an easily scannable format, with the most relevant results listed first. As the forager scans down the results page, the likelihood of a relevant document diminishes with the number of items scanned. This law of diminishing returns is one reason why over 80% of searchers don't scan past the third page of search results.
Pirolli evaluated his theory by testing 14 Stanford students in a controlled search experiment to see how users followed information scent. He found that the difficulty of foraging on the Web appears to be related to the quality of the information scent cues available to the user. With strong scent, users moved quickly towards the target information. When the information scent was weak, a more random walk pattern of foraging was detected (see Figure 2). Our foraging behavior is not unlike the male grain beetle who smells a whiff of sex pheromone.
Pirolli found that participants tend to not flit between web sites, but have more transitions within websites than without. Plotting the average scent ratings (rated by a panel of experts) of all the web pages visited, Pirolli found the reason why people switch to another site (information patch). Initially the information scent is high, but when the information scent falls below the average information scent in the pages encountered, users switch to another site or search engine (see Figure 3). Pirolli also found that starting with a high information scent was associated with longer runs at a web site (stickiness).
Web sites tend to be organized in hierarchical tree-like structures. The deeper and wider a site, the more costly the "false alarms" of low information scent become (see Information Scent). In fact, small improvements in the false alarm factor associated with individual links can have dramatic effects on the cost of surfing large hypertext collections (see Figure 4). In the example below, when the false alarm factor exceeds 10%, the search cost of following unproductive paths (backtracking, traversing up and down trees) goes from linear to exponential. Therefore a small improvement in information scent can have a dramatic impact on usability. Slow websites exacerbate this phenomenon, especially as site breadth and depth increase.
In order to formalize and test his newfound theory of information foraging, Pirolli and colleague Wai-tat Fu developed a cognitive model called SNIF-ACT, which stands for Scent-based Navigation and Information Foraging in the ACT architecture. SNIF-ACT simulates how users navigate over a series of web pages. The model builds on the rational analysis of Anderson and Lebiere's ACT cognitive architecture (2000) with information scent. Pirolli and Fu validated the model against actual web user data called user traces, or transcripts of users interaction with the Web (see Figure 5).
SNIF-ACT assumes spreading activation networks have computational properties that reflect the properties of the linguistic environment. The key takeaway from SNIF-ACT is that predictions can be made for web users with particular goals using spreading activation networks constructed beforehand (a priori) with no messy parameters needed. Unlike after-the-fact usability heuristics that are based on empirical research and often give conflicting advice, Pirolli's model can be used to predict behavior before the user acts based on sound theoretical foundations. Pirolli's hope is that these psychological models can give a deeper understanding and solve real-world problems like those outlined in Chapter 9 of his book. SNIF-ACT does not currently make use of the full set of ACT-R modeling capabilities such as eye movement and "seeking plans" characteristic of expert users, so there is room for further improvement in the model.
Testing the version 1.0 model for individual users against actual log files, Pirolli found good correlation of the information scent of links with link following and site-leaving actions. The results showed that SNIF-ACT was able to predict when people will leave a site by measuring the information scent. When people left a site, the scent was dropping. Interestingly, just before a participant left a site, the information scent was much lower than the web page he switched to. Again, when a user perceives that the current information scent is lower than the average, he switches to a more promising patch.
A more sophisticated SNIF-ACT 2.0 was developed that implemented more real-world satisficing behavior for link selection, an adaptive stopping rule for leaving web pages, and the ability to more accurately model the behavior of groups of users using Monte Carlo simulations. Testing log files from Yahoo and ParcWeb SNIF-ACT 2.0 accurately predicted the length of path taken through the sites (see Figure 6), and was slightly more accurate than the Law of Surfing (R2 = 0.99 versus R2=0.98).
Using search engines, we tend to select information clusters with keyphrase queries (gather), and further refine queries by refining the search terms (scatter) within search results or with entirely new queries. This selective refinement to enrich information patches is called scatter/gather interaction. Pirolli created a scatter/gather browser to test a refined ACT-IF model based on his SNIF-ACT model. The browser showed clusters of document titles and extracted keywords that users manipulated with a GUI interface. The ACT-IF model accurately predicted the observed ratings of relevant documents based on spreading activation and information scent (see Figure 7). This was one of Pirolli's ah-ha moments he referred to earlier in his text.
I found the last three chapters of the book to be the most interesting. Chapter 8 "Social Information Foraging" shows how a group of people can more efficiently discover, invent, and innovate than a single user. Undiscovered public knowledge can be found when groups forage for information to "connect the dots" and bridge between two network clusters of knowledge and information. Pirolli gives an example of fish oil helping Raynaud's syndrome, a blood circulation disorder associated with high blood viscosity and vasoconstruction giving sufferers cold hands. Fish oil can help this disorder, but this was implicit public knowledge for 5 +-3 years before someone put the two worlds together. There were two clusters of research papers and researchers, each citing and staying in their own worlds. One group discussed the benefits of fish oil, another talked about Raynaud's syndrome.
Swanson (1986) did a co-citation analysis that showed that the two groups cited some of the same outside papers. Investigating further he found that fish oil could possibly help Raynaud's syndrome. Most authors tend to stay within their own discipline, but those who spanned multiple research areas are likely to be the ones who enable knowledge flow from one area to another, and see the big picture to enable breakthrough discoveries.
Pirolli gives another example of structural holes and clusters in social networks. Organizational studies have shown that effective work groups are the ones who share information and knowledge with external members. Their effectiveness is improved by the structural diversity of the group. Social networks are typically formed into densely connected clusters of people. The sparse linkages between these clusters are structural holes. People on the edge of these clusters that bridge these structural holes are exposed to a greater diversity of information and knowledge, and broker information across these groups that becomes social capital that yields greater discovery of useful knowledge. Pirolli cites a study that found that managers who discussed issues with managers in other groups were not only better paid, but were more likely to receive positive reviews and be promoted.
Brokerage across groups has been shown to be important in success in many domains, including jazz, photography, engineering, and software development. Even serendipity in scientific discovery can be influenced by those few scientists that bridge disparate groups of people. It seems that your mother's maxim to get out and network has some theoretical foundations.
Pirolli gives a hypothetical example of solo versus group foraging. Like birds and other animals foraging for food en mass, humans can discover knowledge more quickly and thoroughly by foraging in groups. Figure 9 shows the benefits of joining a cooperative foraging group: the gain nearly triples at the optimum group size in this hypothetical example. Individuals tend to continue joining the group even after the optimal group size is reached, at least until the equilibrium state is reached whereupon the group is slowed by intergroup communication costs and becomes less efficient than the individual forager (see Figure 8). The benefits of cooperative information foraging help explain the success of Web 2.0 sites that allow groups of people to discuss problems en mass and discover knowledge at a faster pace than the lone information forager. Next page » boosting information scent ».