Twitter Data

Until I enrolled in this program, Twitter was just like any other social media platform for me. However, that opinion has definitely changed with the digital data class. Twitter as a microblogging website works as one of the most popular communication tools among internet users. In 2014, the number of estimated tweets per day exceeded the 500 million mark with around 200 billion tweets every year (Twitter, 2014). Even though each tweet is limited to 140 characters, it is evident that 200 billion tweets per year generate massive amounts of data. This data is filled with information on public behavior, current trends, political discourse, and mundane details of one’s life. Twitter’s audience varies from regular users to celebrities, company representatives, politicians, and academics as well (Pak & Paroubek, 2010). Moreover, its audience makes up of users from varied countries and the data is available in many other languages (Pak & Paroubek, 2010).


All this data can potentially be used for social sciences research; Ahmed (2015) states that “Twitter can be used as a source of data for social science research both current and historical in-of-itself, but it can also be used to compliment more traditional data sources such as surveys and interviews”. In social science research, a researcher has to collect, arrange, and analyze the data over a period of time, but data on Twitter can be useful as it is easily accessible. The traditional social sciences research methodologies have been tried and tested for decades, making them more reliable and cannot be completely replaced by social media data. However, the analysis of data from social media has added a new dimension to the social science research. To analyze the Twitter data, various methods like sentiment analysis, opinion mining, content analysis, time series analysis, and network analysis can be applied.

Sloan et al (2013) explain that “sentiment analysis can be used to understand large amounts of text-based data and is a useful means of reducing the complexity of big text data such as archived tweets”. The sentiment analysis and opinion mining are often employed top understand people’s opinions and views (Pak & Paroubek, 2010). Often, people post tweets to express their opinions of certain things like reviewing products, expressing opinions on political events, their views on popular culture, and religion. This data can be analyzed to understand people’s behavior. Also, Pak & Paroubek (2010) state that this data is efficiently used for marketing, advertising, and social studies. Moreover, Sloan et al (2013) explain that “sentiment analysis renders the emotive characteristics of large text available to further qualitative inspection and interpretation and enables cross-referencing and correlation with other variables of interest, e.g. geo-location, types of event and gender”. Ahmed (2015) suggests that time series analysis can be used to analyze the tweets over the period of time to understand when a peak of tweets may occur. He also explains that network analysis is used to “visualize” the connection between people and to comprehend the structure of the conversation.

There are many instances of sentiment analysis being used in social science research. In 2011, Tumasjan et al conducted a sentiment analysis on twitter data to provide an election forecast of 2009 German federal election. They conducted a sentiment analysis of over 100,000 messages containing a reference to either a political party or a politician. Not surprisingly, they found out that Twitter is used extensively for political deliberation and that the mere number of party mentions accurately reflects the election result. In fact, there are many other studies that have employed sentiment analysis for their research findings. In 2015, Durahim et al conducted a research study to calculate the Gross National Happiness in Turkey by collecting and analyzing twenty thousand tweets posted in the year 2013 and 2014. Additionally, Choudhury et al (2012) conducted a study with sentiment analysis where they tried to examine user’s emotional states by collecting and analyzing their tweets via emotion hashtags.

There are several benefits of using Twitter data for research. First of all, according to Ahmed (2015), due to its cultural status, Twitter attracts more research. Additionally, as hashtags are often used with each trending topic, the tweets are easier to access, making it convenient to gather and sort the data. Ahmed (2015) suggests that Twitter’s API is more accessible than other social media platforms, making it favorable and as a result, comfortable to developer creating tools to access data. I do not find this surprising at all. This week’s lessons on Twitter data mining completely support this statement. However, we cannot ignore the challenges in using Twitter data.

Murthy (2012) states that Twitter and social media generally has enabled its users to become more acquainted with certain everyday aspects of fellow users’ lives. We tweet about where we are, what we do, what we eat, what we like, and more; making some of the personal details of our lives public and accessible to data mining. While collecting and using the Twitter data for research, it may not be possible to ask for consent from each Twitter user to collect and analyze their data due to the volume of tweets, bringing the ethics of using twitter data in question. Ahmed (2015) also notes that Twitter is not representative of the population who are offline and do not have access to social media. In fact, he argues that Twitter is not even representative of all of its users since each Twitter user does not tweet on all the topics.

We live in a Twitter generation. Twitter is used by a regular college student as well as president and political leaders of many countries. The amount of data available on Twitter for us to analyze is enormous. The potential of available Twitter data is too great to discard as a unviable source. However, it is important to consider if these methodologies suit the type of question we are hoping to get answers to and whether our research subjects are protected. Sloan et al (2013) make a similar suggestion that “the use of proxies, data augmentation, archiving and harvesting need to be informed and develop within an emerging ethical context that is able to balance the digital public sphere with commercial interests, the privacy and protection of individual citizens and the requirements of critical and public social science”.



Ahmed, W. (2015, July 10). Using Twitter as a data source: An overview of current social media research tools. Retrieved from


Ahmed, W. (2015, September 28). Challenges of using Twitter as a data source: An overview of current resources. Retrieved from

Choudhury, M., Counts, S., and Gamon, M. Not all moods are created equal! Exploring human emotional states in social media. In Proc. 6th ICWSM, (2012), 66–73.

Durahim, A. O., & Coşkun, M. (2015). #iamhappybecause: Gross National Happiness through Twitter analysis and big data. Technological Forecasting and Social Change,99, 92-105. doi:10.1016/j.techfore.2015.06.035


Murthy, D. (2012). Towards a sociological understanding of social media: Theorizing Twitter. Sociology, 46(6), 1059–1073. doi:10.1177/0038038511422553


Pak, A. & Paroubek, P. (2010, January). Twitter as a Corpus for Sentiment Analysis and Opinion Mining. Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, 17-23 May 2010. Valletta, Malta. Retrieved from

Park, C. S. (2013). Does Twitter motivate involvement in politics? Tweeting, opinion leadership, and political engagement. Computers in Human Behavior,29(4), 1641-1648. doi:10.1016/j.chb.2013.01.044

Sloan, L., Morgan, J., Housley, W., Williams, M., Edwards, A., Burnap, P., & Rana, O. (2013). Knowing the Tweeters: Deriving sociologically relevant demographics from Twitter. Sociological Research Online, 18(3)7. doi:10.5153/sro.3001



One thought on “Twitter Data

  1. You don’t often think of college students when you think of Twitter. What do college students tweet about? Do different colleges tweet different things? What hashtags are generated by college students? High school students? Age would be an interesting comparison point between hashtags. I also wonder if reciprocal communities develop in hashtags? Can we see if some types of enclosure or gate keeping happens where outsiders are pushed out? Remember, the goal of these blogs is to review the methodologies and to brainstorm possible questions for your project. These are just some of my thinking out loud questions.

Leave a Reply

Your email address will not be published. Required fields are marked *