dc.description.abstract | Weblogs have been widely used to represent the behavior of online
users. However, we found that weblog only records part of users’ behaviors.
For example, traditional weblogs do not record tab switching and
browser window switching. Besides, weblog may record some visits that
do not come from a users’ conscious actions. For instance, web pages resulted
from page redirects and page pop-ups are recorded in the browsing
history, but users may not have intentions to visit these pages. We discover
that, on average, weblogs approximately record only half of a users’
page visits and 5.6% of the visits recorded in the weblog belongs to users’
unconscious actions. To collect and analyze the conscious visits, unconscious
visits, and ”missing”visits (i.e., the visits that are unrecorded in
the traditional weblog), we created a Google Chrome plugin and recruited
users to install the plugin. We reported the statistics of visits and showed
that sorting the popular website categories based on the traditional weblog
is different from the rankings obtained from including the missing visits or
excluding the unintentional visits. Therefore, traditional weblog may be a
biased representation of a user’s online behaviors, and the observations or
conclusions derived from weblog analysis are questionable. Additionally,
we predicted users’ future behaviors based on three types of training data –all the visits in traditional weblogs, intentional visits in weblogs, and intentional
visits plus missing visits in weblogs. We applied supervised learning
algorithms to make predictions. The experiment results show that using
intentional visits in weblogs or intentional visits plus missing visits in weblogs
usually perform better compared to using all the visits in traditional
weblogs. This result indicates that missing visits in weblogs may contain
additional information, and unintentional visits in weblogs may have more
noise than information. | en_US |