Abstract

Through analyzing the data about the releases, comment, and forwarding of 120,000 microblog messages in a year, this paper finds out that the intervals between information releases and comment follow a power law; besides, the analysis of data in each 24 hours reveals obvious differences between microblogging and website visit, email, instant communication, and the use of mobile phone, reflecting how people use fragments of time via mobile internet technology. The paper points out the significant influence of the user's activity on the intervals of information releases and thus demonstrates a positive correlation between the activity and the power exponent. The paper also points out that user's activity is influenced by social identity in a positive way. The simulation results based on the social identity mechanism fit well with the actual data, which indicates that this mechanism is a reasonable way to explain people's behavior in the mobile Internet.

1. Introduction

In traditional studies, it is usually assumed that people’s behaviors are random in time and thus can be simply described as Poisson processes. However, as the ways of data collection and capability of data processing are becoming significantly improved, more and more empirical studies on people’s behaviors prove that many of them deviate from the Poisson distribution. Studies in this field mainly fall into two categories, namely, empirical study on the statistical features of people’s behaviors in different contexts especially those in the Internet context; and the study on theoretical models interpreting people’s behaviors. This paper focuses on features of people’s behaviors in the mobile Internet as well as the theoretical model concerned.

As for the empirical study on statistical features of people’s behaviors, Barabási researched first the interval distribution of behaviors like sending emails, waiting for reply and mail communication [1]. The research reveals that the distribution of intervals between sending emails and getting reply is consistent with power law, with the power exponent being 1, but does not agree with the Poisson distribution, due to the coexistence of long periods of inactivity and bursts of occurrence in a short-time period. In the same year, Oliveira and Barabási studied the correspondences via water and land of Einstein and Darwin and found that these correspondences follow a power law despite the difference from email [2]. According to analysis of time recorded in an MMORPG server, Grabowski and Kruszewska pointed out that the possibility of individuals spending time on an activity follows a power law [3]. Gonçalves and Ramasco found that the distribution of intervals between the visits of a specific user to the same website also follows a power law, with the exponent being 1.25 [4]. Vojnovi’c conducted a statistic analysis of a search log of mobile network over one week supplied by a US mobile service provider and revealed that the distribution of intervals between search via mobile devices in a day follows a power law [5]. Chmiel et al. pointed out that the distribution of time spent on browsing subpages of portal websites follows a power law [6]. Based on analysis of people’s behaviors, Yu and other scholars pointed out the numbers of browsing and replies in online forums follow a power law, with the exponent between 3.5 and 1.6, and times of checking and number of replies follow nonlinear power distribution [7]. Hu and Han studied a log data of online music system and found that the distribution of intervals between songs consecutively ordered has a heavy tail, and the heavy-tail distribution becomes more and more irregular as the users’ activities are reduced [8]. Jiang et al. analyzed the number of roles in an online role-playing game and found that the fluctuation of the user number follows a non-Gaussian distribution with a peak and a tail following a power law [9]. Candia et al. provided cases of heavy-tail distribution of intervals between consecutive calls by mobile phone [10]. Wu et al. analyzed messaging between two users, which revealed that the distribution of intervals between messages is not entirely a Poisson distribution or follows a power-law distribution, but a bimodal distribution combing the two [11]. According to research on movie ordering, the interval distribution follows a power law, with the exponent being 2.08, and the power exponent is monotonically correlated to the activity of the corresponding filmgoers [12]. Strongly positive correlations between the individual activity and power law exponent of intervals distribution have also been proved in [1315].

As for models explaining people’s behaviors, Barabási and Vázquez and others pointed out that the bursty nature of human behavior is a consequence of a decision-based queuing process: when individuals execute tasks based on some perceived priority, the timing of the tasks will be heavy tailed, most tasks being rapidly executed, with a few experiencing very long waiting times [1]. According to Vazquez et al., the response time distribution of the tasks could in fact drive the interval distribution and both the two distributions should decay with the same exponent [16]. This has also been proved by other researches [1, 17]. Blanchard and Hongler proposed an “aging mechanism” which connects the priority of each task with time and assumed the “Earliest Deadline First” principle [18]. Based on Barabbas’s research, Deng and other researchers added the task deadline as a restrictive condition and analyzed the influence of the deadline on the waiting time of the task [19]. Dall’Asta et al. interpreted the process of task fulfillment from the perspective of economic optimum [20]. Vázquez pointed out that people perceive their past activity ratio and may increase or reduce their activities accordingly [21]. In other words, memory, an important attribute of human beings, significantly influences the features of human dynamics. Through statistical analysis of browsing times and number of replies, Yu et al. concluded that people reply in BBS according to their interests and are hardly influenced by their memories [7]. Early replays cast no impact on later visitors, and people just reply as they like.

People behave because of complex motivations, and not all things are tasks, for example, browsing and ordering movies, but such behaviors follow certain statistical laws too. According to Shang and other scholars, people’s interest in new things wanes or even disappears due to frequent involvement, but may suddenly revive after lasting indifference. The change of people’s interest may cause the heavy-tail distribution of their behaviors. Shang and others proposed to quantify people’s interest by the probability of occurrence during intervals [22]. Han and others also noticed the fact that people’s interest in a certain activity may be changed due to their feelings, and thus proposed the self-adapting human dynamic mechanism [23]. Guo held that blogging is becoming less attractive as people are involved more and more [24]. Wu and others established a human dynamic model, holding that individuals’ behaviors in online commenting system are influenced by other individuals [25].

Existing studies have proved that the Internet has laid a sound data foundation for human dynamic research, and its wide use in people’s daily life also determines the significance of Internet-based human dynamics research. However, the above-mentioned research results are challenged with the rapid development of the mobile Internet in recent years.

In the context of a mobile Internet, people’s behaviors in fragments of time are hardly “task oriented” as they perform according to what they see and feel. This shows that the priority-based task queue model is limited when explaining such behaviors. A dynamic model with memory, as well as an interest-driven dynamic model, is also limited due to the fragmentation of time and content for behaviors in the mobile Internet. Moreover, most users of the mobile Internet are youngsters who try to express their unique personality on the one hand and on the other hand desire a social identity which is manifested in behaviors like online comment, collecting, ordering, and forwarding. Therefore, the social identity as a driving force might supplement interest-driven human dynamics research.

This paper analyzes data about microblog on http://www.sina.com/, which is among the largest Chinese websites for microblog in the world. Empirical analysis shows that the distribution of intervals between information releasing on microblog follows a power law and verifies a positive correlation between user’s activity and the power exponent. Based on rescaling method, our research presents a universal behavior for users with different activities, which is in consistency with the results in [10, 13, 15]. The paper assumes that the activity of individual user is influenced by social identity and quantifies the social identity to improve the memory model [5]. The simulation results are in consistency with the empirical data.

The remainder of the paper is organized as follows. Section 2 presents the empirical analysis results. Section 3 analyzes the influence of user’s activity on statistical features of people’s behaviors, proposes a social identity mechanism, and compares the simulation-based results with empirical data. Finally, the conclusions of the whole paper are presented in Section 4.

2. Statistic Features of Empirical Data

This research collected microblog messages from http://www.sina.com/. The website organizes messages by different topics. Considering that topics about entertainment have aggregated many users at current stage, we collected all messages about an entertainment topic, ranging from August 20, 2009 to September 3, 2010. During this period, there are totally 125,152 messages released by 175 users, which have been forwarded 2,260,826 times and triggered 1,786,000 comments.

To analyze the releasing behavior, releasing intervals of each individual are calculated separately. Then intervals of all individuals are aggregated to figure out the distribution. For example, suppose Table 1 is the releasing information of all individuals. Each line of the table presents the releasing time of every message from corresponding users. For user u1, releasing intervals are (t3–t1), (t4–t3), and so on. For user u2, releasing intervals are (t5–t2), (t6-t5), and so on. Intervals of comments are calculated similarly.

According to the data collected, the interval distributions of both releasing and commenting follow power law in the log-log grid, as indicated in Figure 1.

According to Figure 1, the power-law exponents of message releasing and commenting intervals are almost the same, which is consistent with the results of [16, 17]. Individual’s message releasing behavior is driven by commenting behavior of others to some extent. Thus both the two distributions decay with the same exponent. However, the exponent 1.34 in this paper is higher than that in the research about email communication [16], and lower than the exponent concluded in [17]. In email communication, spam and irrelevant emails often mix with normal emails, which may cause the long tail of the interval distribution. However, in the one hand, microblog messages in http://www.sina.com/ are organized by topics, and communications in microblog communities are in a way of multipoint to multipoint, which improves the frequency of communications. In the other hand, microblog is based on mobile Internet technologies that enable individuals to release messages at anytime and anywhere. The differences in the way of communication and technologies may cause the diversity of the power law exponents in different scenarios.

Figure 2 explains the correlation between the number of comments and the times of forwarding. refers, the times of forwarding to and means the number of comments. When is lower than . When is lower than . There is almost no information in the area where both and are above 2,500. In over 95% of all cases, both and are lower than 500, and the two are not obviously correlated. On this basis, messages on microblogs can be divided into two groups: those more frequently forwarded while less commented, which are intended to inform more individuals rather than attract comments, and those more commented but less forwarded, which may arouse extensive discussion, instead of wide spread.

As indicated in Figure 3 about the analysis of microblog data over 24 hours, information releases peak between 11 am and 12 am both at weekends and on weekdays, and that is quite different from the peak of using email, instant communication devices, and mobile phones which is usually at 10 am. This lagging peak of microblog indicates that most messages on microblogs have nothing to do with work and are just small talks in break time.

Figure 4 illustrates the releasing time of all messages from the 175 users in one day. Two peak time periods, namely, 11:00–13:00 and 23:00–01:00, again reflect the features of using microblog in fragments of time.

3. Social Identity-Based Human Dynamic Model

3.1. Influence of User’s Activity on Statistical Features

The intensity of messaging (activity) has significant influence on distribution of intervals of messaging, and the more active the user is, the larger the power exponent for interval distribution will be. Suppose the activity of user is , indicating the total sum of messages sent by the user, being the total time spent on by all behaviors. In this paper, means the number of messages sent by user per day. To explore the role of user’s activity in the online social commenting system, users sending over 1,000 messages are picked put. As shown in Figure 5, the average activity of these users is .

These users are arranged in descending order of their activity and then divided into 5 groups with each including the same number of members. As for the average activity of each group,

Figure 6 illustrates the interval distributions of the five groups. According the figure, the decay exponents depend on the activity of the users, and therefore the interval distributions corresponding to different level of activity are more representative than a global one. Lower average activity corresponds to lower power exponent and thus allows longer intervals, which is in consistency with the conclusion in [12]. The correlation between exponents and corresponding average activities is presented in Figure 7.

The variation of activity reflects the difference of individuals’ behavior patterns. Here we use recalling method in order to describe the universal behavior of different groups. Instead of considering the values for interval , we take into account the rescaled variable represents the average interval of the respective group of users. As illustrated in Figure 8, the scaling produces a data collapse between the different curves of five groups. This phenomenon has also been noticed in other systems [10, 13, 15].

3.2. Influencing Mechanism of Social Identity on Information Releases Behaviors

To analyze the influences on user’s activity in mobile communities, this paper introduces social identity which is quantified and used to improve the memory model proposed by Vázquez. Based on intuitive experience on users’ messaging habits, if messages from user are recognized by other users, namely, if such messages are commented or forwarded, user would like to send more messages, otherwise he may become less interested. According to the equation proposed in reference [21], the parameter controls the degree and type of intuitive perception. In this paper, means the probability of sending another message after the user’s previous message is commented or forwarded. In case of no comment or forwarding, the possibility of sending another message is . The value of approximates the ratio between the number of messages commented or forwarded and the total number of messages sent by the user. From (3.2) we can get the probability density function following power law:

In case of comment or forwarding, the social identity mechanism promotes users to send more messages (), and the corresponding power exponent is

If no comment or forwarding occurs, the social identity is discouraging (), and the corresponding power exponent is

3.3. Comparison between Simulation Results and Empirical Data

Two typical users are picked out, respectively, from group 2 and group 4 and are marked as user 2 and user 4. Their information is as shown in Table 2.

The social identity mechanism accelerates message sending in case of , then the range of corresponding power exponent obtained via formula (3.5) is . While the social identity mechanism is discouraging, the range of corresponding power exponent calculated according to the same formula is .

Figure 9 is about the distribution of intervals between messages sent by user 2 under the social identity mechanism. Figure 9(a) tells the distribution of intervals between messages sent by user 2 in case of comment or forwarding, with and . Figure 9(b) shows the distribution of intervals between messages in case of no comment or forwarding, with and . The values of power exponents are in ranges calculated when the social identity mechanism is encouraging or discouraging.

Figure 10 deals with the distribution of intervals between messages sent by user 4 under the social identity mechanism. Figure 10(a) tells the distribution of intervals between messages sent by user 4 in case of comment or forwarding, with and . Figure 10(b) shows the distribution of intervals between messages in case of no comment and forwarding, with and . The values of power exponents are in ranges calculated when the social identity mechanism is encouraging or discouraging.

The model-based results fit well with the empirical data, indicating the influence of social identity on the activity of individual users. Namely, the higher the social identity is, the more active the user will be, and vice versa.

4. Conclusions

In recent years, many studies have been conducted on the statistical law of people’s behavior in the context of the Internet and theoretical mechanisms concerned and found that many behaviors follow power-law distribution in terms of statistical features about time. This paper analyzes microblog and reveals the law of information releases in mobile communities.

Analysis of over 120,000 messages sent through microblog at http://www.sina.com/ reveals a power-law interval distribution; besides, the law of microblogging differs from that of email, instant communication, and mobile phone calls, according to the 24-hour behaviors analysis. The mobile internet technology-based microblog enables users to send messages by mobile phone whenever and wherever they like, rather than only via computer. This paper explores the features of using mobile Internet technology in fragments of time and the difference between that and online browsing in terms of time and space. In addition, it points out the significant influence of user’s activity on distribution of intervals between information releases and testifies the positive correlation between the user’s activity and the power exponent. What is more important is that it finds that the social identity has direct influence on the user’s activity in mobile communities: the higher the social identity is, the more active the user will be, and vice versa. Results from social identity-based simulation are in consistency with actual data, indicating that the social identity mechanism is a proper way to interpret people’s behaviors in the context of the mobile internet.

Acknowledgments

This work was supported by Program for New Century Excellent Talents in University, NSFC (70671010), Key Project of Chinese Ministry of Education. (108011), Joint Construction Project of Beijing Municipal Commission of Education (XK100130439) and the Fundamental Research Funds for the Central Universities (2009RC1005).