This webpage contains the data set for our ICWSM'14 paper:
“The tweets they are a-changin': Evolution of Twitter users and behavior”.

Dataset

Below is the data for Figure 1, containing the estimated sampling rate (the average value of rate for users with more than 1,000 statues in a month) of the Gardenhose dataset over time.

Below is the data for Figure 2, containing the number of observed users in each month in the Crawl and Gardenhose datasets.

Below is the data for Figure 3, containing the percentage of the entire Twitter user base over time whose accounts are protected, deactivated, suspended, or inactive (for at least a year), based on the UserSample dataset.

Below is the data for Figure 4, containing the median number of tweets per user per month over time, based on the first and last statuses_count observed for each user.

Below is the data for Figure 5, containing the percentage of tweets created in different geographical regions over time. Shown are locations inferred from self-reported user locations (UserSample dataset) and geo-tags (Gardenhose dataset).

Below is the data for Figure 6, containing the percentage of users self-reporting the six most popular languages over time.

Below is the data for Figure 7, containing the percentage of users who have used more than one screen names in each month.

Below is the data for Figure 8, containing the percentage of tweets of different types over time.

Below is the data for Figure 9, containing the median number of friends and followers across all users, and the median ratio of friends to followers as derived from the Gardenhose dataset.

Below is the data for Figure 10, containing the percentage of tweets created at different times that are unavailable during the collection of our UserSample dataset.

Below is the data for Figure 11, containing the percentage of tweets with different types of entities, and average number of entities for such tweets over time.

Below is the data for Figure 12, containing the percentage of tweets created with different sources (i.e., different clients) over time.