Category Archives: SNA

Scale-free networks

A scale-free network shows a power law distribution where there is a predictable imbalance. The 80/20 Rule is an example of a power law distribution: 20% of the population holds 80% of the wealth (except it doesn’t seem to be these numbers anymore).

In a scale-free social network, a few nodes have a high number of connections and most have a low number of connections. The few nodes become “hubs” in the network (as seen by the degree distribution in the graph on the right). In a random social network,  most nodes have an average number of connections with a wider degree distribution (as seen in the graph on the left).

Internet evangelists, such as Clay Shirky, have heralded the potential for every consumer to also act as a producer of online content, resulting in a newfound democratic medium. However, power law distributions in online networks present a significant obstacle to this democratic ideal. Shirky would argue that power law affects large networks, such as the the most popular news sites in the public sphere, but may not affect small networks, such as family and friends reading your blog. If you’re hoping to follow Adriana Huffington’s example with your blog, the likelihood of becoming the next Huffington Post is slim because the existing top news blogs have a significant advantage of widespread recognition and reputation. (For example, when you read the previous, you’ve probably heard of the Huffington Post before.)

Next, think about how many blogs we might read on a regular basis. This number is finite, although the exact number might be higher or lower depending on our responsibilities.  Meanwhile, the number of blogs has grown dramatically. For example, Tumblr reported 357.7 million blogs in July 2017.

Prior posts have discussed the conscious and unconscious influence of our networks on our choices. Therefore, we’re more likely to read blogs that our friends read. Multiply this line of reasoning by millions of individuals and millions of existing blogs. The result is the power law distribution in blogs. A few blogs, such as the Huffington Post and others in the chart below, end up with more power than others because of their high readership.

So, the Internet reflects similar inequalities in social media networks as offline networks.  Melvin Kranzberg’s first law of technology applies here as a good reminder: “Technology is neither good nor bad; nor is it neutral.” The technology (power of networks in this case) does not have an inherent ethical value because it’s the users of the technology that determine the ethical tone of the application, which can certainly be labeled as good or bad.

Social network analysis for family

While social network analysis has been useful in examining complex relationships between large institutions (such as universities) or ideas (such as predicting disruptive technologies), this week’s discussion focuses on more personal matters. In Connected, Christakis argued our social networks influenced happiness or weight. Lois (2016) identified four types of egocentric networks through cluster analysis to predict when couples became parents. From prior research, the four factors supporting the influence of social networks on fertility include: 1) social pressure, 2) social support, 3) emotional contagion, and 4) social learning. His research question is whether his four types of networks identified through have predictive validity.

Lois examined six years of data (in waves) from a sample spanning three birth cohorts, narrowed down to 3,104 respondents between ages 20-42 that initially had no children from the German Family Panel Study dataset. Out of this sample, 332 respondents (the egocentric nodes in the networks) became pregnant or had children. Each of these individuals responded to questions to generate names about their social network (other nodes) and then interpret their relationships (forming edges in the network).

He found four types of clusters within the network: family-remote (small, homogenous friend network), polarized (large, heterogeneous friends and family network), disintegrated (small network without many connections), and family-centered (strong connections with family). The results show the most significance in social mechanisms (the four factors supporting the influence of social networks mentioned earlier) between the family-remote and family-centered types. However, the author notes a significant limitation as the effect (having children) may be caused by other factors (such as age) than networks. Social network analysis helped reinforce the idea that our network can influence life choices, particularly in the area of starting a family.

In addition, I also would argue that this study applies the most to Germans. For example, the following chart shows the percentages of household types:

In comparison:

The differences may be statistically significant and cause different kinds of clusters to be seen in networks that likely would affect the analysis using information from another country.

In contrast, I chose to look at an second article on the other end of the spectrum. Burholt and Dobbs (2014) studied the support networks for elderly individuals in multigenerational/extended households to identify network typologies. The authors believed that the presence of other family might skew researchers’ ability to using network typologies to estimate levels of wellbeing in elderly individuals, so they wanted to identify if these multigenerational/extended households resulted in different types of networks as their primary research question. Their secondary question asked whether these typologies had the ability to predict outcomes (wellbeing or loneliness/isolation).

The authors collected a total sample of 590 older individuals (half male/half female) from the Families and Migration: Older People from South Asia project. This project had collected data using eight questions to support classification by the Wenger Support Network Typology: local family-dependent network, locally integrated network (local family and community involvement), local self-contained network (household centered), wider community-focused network (no local family and more community involvement), and private restricted network (no local family and few other connections). The elderly individual’s family and friends became the nodes with the various supportive relationships (from selected questions about chatting or helping with laundry) as the links.

Using cluster analysis, the authors selected a four cluster result as the most clear and interpretable: 28% multigenerational households: older integrated networks, 27% multigenerational households: younger family networks (largest type of household and more family-focused), ~27% family and friends integrated networks, and 18% restricted non-kin networks. When the authors compared the Wenger types with their new types, they found that significantly more individuals fell within their “restricted non-kin networks” (18%) than the private restricted network (4%). As a result, they concluded that their new typologies better identified individuals who might receive formal services because they lack other forms of support. Social network analysis helped identify a larger vulnerable population than expected because the restricted non-kin networks might decline family assistance and be more willing to pay for outside services.



Technologists look for “disruptive technologies” or something that dramatically causes a paradigm shift. For example, smartphones have largely replaced “traditional” cell phones (chart below is older and stops in 2011). Smartphones can be considered “disruptive” because they enabled widespread mobile internet use, social media, and video calls (see second chart below), which was difficult to use or nonexistent for traditional cell phones.

A. Momeni and K. Rost (2015) examined trends to predict potential disruptive technologies in the photovoltaic industry (VCU library proxy link to article). The research question: can patent-development paths, k-core analysis and topic modeling be used to better predict which technologies might become the next disruptive technologies?  Other previous methodologies had significant limitations for forecasting technological change.

The authors collected patent data from the European Patent Office (EPO) World- wide Patent Statistical Database between 1978-2012.  They used a keyword search for “photovoltaic” and “solar cell,” then cleaned up the data by selecting all hits from a certain patent classification. They also collected the extended patent family to consider all possible innovations and their citations. The final dataset had 9,328 patents.

From this data, the authors constructed a network of patents (nodes) and citations (edges). They selected the largest connected subnetwork (5,029 nodes) and then traced a patent development path based on the citation directionality, which resulted in 735 highly cited patents. Next, the authors performed k-core analysis to identify three subnetworks in the remaining nodes that corresponded with three different technological developments: thin-film, organic, and crystalline silicon (see below). Also, they analyzed the networks based on subset of years and found trends in the convergence of technologies.

The authors used the results to predict change in the industry for on the most rapidly growing technology based on the most highly cited patents. They also identified “hidden” technologies within each subnetwork that might become the disruptive technology.

While the paper presented a positive outcome, the authors did warn of several limitations. For example, their analysis depended on inventors seeking patents to be included in the sample. They also suggested that their method needs to be applied to other industries for testing.



Habermas and Castells

Jurgen Habermas defined the public sphere as a separate space where informed people debate social and political issues, form public opinion, and influence the state and society. In a democratic society, the public sphere ideally allows for everyone to have access to information and be able to participate equally in discussions. His vision allows for an open public sphere, although the reality might constrain participation for certain segments of society who may not have enough ability or resources. In the past, this became evident in the dominance of the bourgeoisie who came to salons and coffeehouses to discuss societal issues, which largely excluded the working class and sometimes women.

Manuel Castells declared that society has moved from the Industrial Revolution (production of material goods) to the Information Age (knowledge economy). The network society has been enabled by current technologies (such as smartphones and internet). Communication is based on an open structure network, which breaks down some of the traditional social hierarchies and national borders because information flows almost anywhere (China’s state censorship might be a notable exception). Different participants might have different value within the network, such as highly connected individuals.

Castell’s theory operates within the idea of the public sphere by somewhat eliminating time and space. Electronic communication is instantaneous and possible with anyone across the world. It also could used to communicate with individuals or communities, which could known or unknown. However, his perception of “timeless time” may seem like digital networks allow for disruption of the flow of linear time, but I would argue that multi-tasking is not new or unique to the digital age. Time even may gain linear importance in terms of “keeping up” with the latest news and trends. For Twitter, a single tweet might get lost among 6,000 tweets a second if a user doesn’t have many followers (network connectedness) or a particular hashtag isn’t trending.  In addition, network theory is compatible with traditional local and in-person networks.

The major effect from the network society has been increased participation and access to information. Want a graduate degree? Take online classes. The federal government has piloted public participation in coding through Github. Politicians get fewer letters and phone calls from constituents, but more emails and contacts from social media. The flow of digital information has increased from a river to a flood, which may be the greatest downside to the digital revolution.  Now someone might be able to search online for health symptoms and get hundreds of possibilities from various websites. Dr. Google will present mild possibilities from the common cold to deadly illnesses along with suggestions for folk remedies. Which source do you trust: the Mayo Clinic (based on their brick-and-mortar reputation) or the Wellness Mama blogger?

Castell is onto something that others have suggested: the form of the media matters. Habermas appears mostly concerned about the ability of the mass media to inform the public sphere and act as a good intermediary. Marshall McLuhan (infographic below) also argues that the medium fundamentally affects our ability to communicate. For example, the “tribal era” is characterized by an oral tradition of memorization and listening to storytelling, which is limited to a local community. The print era allows for the dissemination of more materials, but actually limits communication to a one-way exchange of ideas (from print to reader). Television expands the capability of the print era in being able to reach an even larger audience with only slightly more interaction than print (such as telephone interviews or arranging live appearances). The digital age finally expands the ability for participation: either one to one, one to many, or many to many. This is the root cause of why the digital age seems so remarkable to Castell.

SNA project proposal

For my first social network analysis project, I selected Neil deGrasse Tyson as a famous scientist on Twitter to create an egocentric network. He follows far fewer entities (43) than those that follow him (9.41 million). I’m curious about the significance of the entities that he has chosen to follow and any interrelationships between them. Other famous scientific figures (and celebreties) seem to follow similar patterns of following few entities while having many followers, so this would suggest that these users obtain limited information from their news feed and primarily use tweets to disseminate information. Other randomly selected famous scientists include Carolyn Porco, Cassini imaging lead, following 342 and 57.6k followers; Michio Kaku, physicist, following 51 and 622k followers; or Richard Dawkins, biologist, following 368 and 2.46 million followers. The first two scientists might intersect with Tyson’s world, but Dawkins is a controversial figure in a different field (biology and evolution). However, Dawkins is included in the 43 as well as Pee Wee Herman (entertainer), while Porco and Kaku have not be followed. However, Tyson did mention Porco in a tweet about Cassini’s end (Sept 19), which had been retweeted 548 times.

The most basic research question is what does an egocentric network of a famous scientist look like? What types of entities does he choose to follow and can they be classified into a few buckets: other scientists, media organizations, government agencies, any unknown individuals, or random famous individuals (Pee Wee Herman)? Would this information support a hypothesis for a larger project that Twitter might be similar to LinkedIn for certain types of users: a professional network rather than friends and family? And the sociological research question: is there a correlation between entities followed and status (such as famous people with a threshold of followers in the millions)? For example, does it indicate greater status for Tyson to mention Porco in a tweet, but she doesn’t have enough status to be followed?

The relationships between the ego and alters may be supplemented through other manual research to discover more information (such as publishing a scientific paper together using a scholarly database, appearing on the same panel at a conference, or a media appearance for a media entity). In addition, other data points may be gathered from a sampling of retweets and mentions for the last week.

I’m hoping to see some clusters based on the buckets of entities. For the example, the scientists might be linked to each other through social media ties or academic work. The comedians might be linked together by appearing in the same event with a science theme or  linked to the other scientists through a media appearance (such as Stephen Colbert’s show). Charles Kadushin, Graham Wright, Michelle Shain, and Leonard Saxe had an interesting study of social integration for young American Jews. It reminded me that these clusters tend to be based on homophily, creating sub-networks. Also, I’m wondering if Tyson’s network might result in a small world model (like the previously discussed six degrees of separation and Kevin Bacon).

Kreb’s article on “Mapping Networks of Terrorist Cells” makes the point that the network may be incomplete because I can’t simply ask why for following certain entities and also dynamic because the network has increased by 1 since last week.  In addition, Kadushin (2012) brings up an ethical question about social network research: would Tyson be annoyed at being the focus of this network project or be amused?

Node centrality measures

Robins (2015) lists the different measures of node centrality:

  • Degree centrality shows the node with the most connections (edges). Who is the most connected individual in the crime network?
  • Betweenness is the importance of a node in connecting the network. In other words, betweenness identifies nodes that make bridges to otherwise disconnected nodes. Removing these nodes might break the network or reduce its size.
  • Closeness is the sum of distances from one node to all other nodes, which would show how quickly information might flow through the network.
  • Eigenvector indicates whether a node is connected to well-connected nodes. In other words, a node might not be well connected, but it’s connected to other well connected nodes (so don’t make them mad anyway).
  • Beta shows the total number of paths to get from one node to the others, which indicates power or influence.

The above diagram illustrates the node centrality measures, except for beta. Researchers may choose a few particularly relevant node centrality measures, such as degree and betweenness used together. Others may not be as relevant for their network and the desired measured effect, so it is dependent on situation. For example:

“Examining the network components of a Medicare fraud scheme: the Mirzoyan-Terdjanian organization” by Travis Meyers examined a white-collar transnational crime organization that stole more than $100 million from Medicare. Using social networking analysis through archival data, he examined the structure of the organization using degree centrality and betweenness centrality. Meyers deemed closeness centrality not as “pertinent within criminal networks given their unclear and often fuzzy boundaries,” but still helpful for interpreting ties between nodes. He also used brokerage measures “to determine the actors who are in advantageous positions to broker the flow of resources to various sects of the network.” The study hoped to show the effectiveness in social networking analysis uncover criminal activity and reduce fraud.

John Tawa, Ruqian Ma, and Shinji Katsumoto conducted an experiment using avatars in Second Live in ‘‘All Lives Matter’’: The Cost of Colorblind Racial Attitudes in Diverse Social Networks.The authors attributed colorblind racial attitudes, rather than outgroup prejudice, as directly related to lower levels of closeness centrality and clustering within the network. In other words, colorblind racial attitudes had adverse effects on relationships in their participants. The authors used closeness centrality, betweenness centrality, degree centrality, and clustering coefficient (how much a node has connections with mutual connections). The methodology also used a  scale for colorblind racial attitudes, physical measures of virtual distance within the game, and participation in chatting.

Social capital

Social capital serves as the connecting glue between individuals and groups within a network. It has been characterized as support given to others or participation in activities. Kadushin (2012) describes “social capital” as an amorphous concept because its execution and form has wide variation depending on context. He states: “Social networks have value because they allow access to resources and valued social attributes such as trust, reciprocity, and community values.” For example, an individual might borrow a cup of sugar from their neighbor or an entire neighborhood might participate in a trick or treating event for local children. The ideal situation is that our neighbor will help us today and we might help them with something else later. If everyone around us also acts similarly, then we have a lovely community to live in. The following chart shows the various interrelated aspects of social capital in one model for a successful community:

However, what happens if only a few neighbors refuse to participate in trick or treating? What is the point that the network breaks and the neighborhood does not have a sense of community (or civic virtue)? What does the following chart suggest?

The chart compares three examples of community engagement (church attendance, league bowling, and parent-teacher association attendance) with one potentially solitary activity (watching television). This does not actually indicate doom for communities or even a sharp decline in community engagement–the statistic for owning a television (or other digital devices) is different than a detailed usage statistic. A single individual might watch television rather than engaging in the activities (also an assumption). Or perhaps the individual is an avid sports fan that has friends and family visiting frequently to watch events together. I find the chart ultimately misleading because a simple assumption cannot be made that community engagement declined as television ownership increased.  Along these lines, I would argue that it is difficult to measure online participation with social capital. An individual might use digital devices to play Solitaire. Alternatively, the individual might be heavily involved in moderating a Dancing with the Stars community forum or enjoy playing a massively multiplayer online game that requires group coordination. This online “neighbor” might be very far away and the social capital generated require an online form (rather than a cup of sugar).

Social capital seems like a net positive in both local and online communities. However, if individuals or groups decide to be exclusionary, then this could lead to inequity based on the selective exchange of social capital. For example, a church group volunteers to assist homeless individuals in their community. However, if they screen for individuals matching their beliefs to help, then it would be harder to judge this situation as a net positive. Using social network analysis, would be be possible to discover discriminatory practices related to social capital or view gaps resulting in underserved groups? What size of network might this involve? Would a large network make this example easier or more difficult to detect?


Small world theory

Small world theory is based on the idea that two individuals will be connected through a series of intermediaries. In the 1960s, Stanley Milgram tested this theory by giving packages to one person with a particular destination, but they could not mail it directly to the second person. Instead, the first person had to mail it to someone they knew and the next person would do the same. Some packages made it to their destination and others didn’t. However, this experiment became the first quantitative evidence for the small world theory. The “six degrees of separation” (or less) also comes from the same idea, but attempts to quantify the number of intermediaries needed within a network. The small world sits in the middle between a more linear network (mostly connections to neighbors) and a random network (see image below) by having both connections to neighbors and random individuals.

Weak ties allow for more random connections within a network. For example, ties between immediate family in an area would likely form a cluster with strong ties. Everyone gets together to celebrate holidays and birthdays. However, a tie with an extended family member would be a weaker tie and bring in additional possibilities due to distance or alternate circumstances. There might be a family reunion at a vacation spot every four years where you meet second cousins or see a well-traveled retired aunt. While strong ties might have significance because they have greater influence over your day to day activities, these weaker connections create more possibility for variability within your egocentric network. Job opportunity is one often cited area where extended networks could be of benefit.

The small world theory makes the “big world” more approachable and less alone. In 2016, Facebook claimed that their network had three and a half degrees of separation! However, Facebook is not necessarily the most altruistic of organizations collecting big data posted by their users and on their users. Some researchers use the term “prosumer” to describe the phenomenon where users offer their work freely online, which allows the collecting website to profit. SNA and big data offer unprecedented opportunities from the abundance of information to understand networks. However, many ethnical questions will follow big data, such as whether individuals have given permission for the use of their information or whether giving permission once for the original use applies to all derivative uses for other research.

Finally, Kevin Bacon has been used as a frequent example for small world theory within the network of Hollywood. Most actors have connections through other actors who have costarred in a movie with Bacon. This has become a joke where people will name a star and try to figure out the shortest connection (see image above).

Social network analysis perspective

Keim (2011) describes the social network perspective as a balancing act between traditional sociological theory about individuals and groups. An individual’s social relations form the basis of a society, but groups also influence how an individual behaves through social norms. Social networking analysis (SNA) provides an alternate methodology for understanding the structure of relationships in a society. In SNA, Keim states that a collection of individual social relations within the context of a social structure is the unit of study. These relationships often form patterns, which help answer research questions.

For example, Robins (2015) outlines several common types of social network research questions including:

  • How does the network affects outcomes, individually and across the group?
  • How is the network structured and how does this affect individuals/outcomes?
  • How do individuals affect the network? How do some individuals have different outcomes based on their position in the network?

SNA allows for a variety of perspectives: starting from individuals (bottoms up approach) or the group (top down approach); comparing positions, relations, or outcomes; choosing boundaries or sampling, etc. McPherson, Smith-Lovin, and Cook (2001) studied the tendency of individuals to connect with other people that have similar characteristics (homophily). They observed a very strong structural effect for race/ethnicity and a strong effect for other relational aspects such as sex, age, religion, and education.  In addition, the authors briefly discussed patterns in tie dissolution. I found this section to be more thought provoking because this research predates the social media revolution where individuals no longer have to lose touch with friends. Facebook might reflect close family/friends and friends of friends (acquaintances), but also someone that you met once at a conference five years ago and a person you barely knew from your high school class of 200. How do these lingering weak ties influence us or our networks?

SNA is fundamentally different from traditional social science data because it studies the existing pattern of relationships and therefore cannot be a predictive tool. However, Haythornthwaite (1996) illustrates the potential to use the method for “identification, diagnosis, and active modification.”

For example, the above image presents two different graphs based on the relational data of an organizational chart versus a social network analysis of how people get work done. However, descriptive analysis would better provide evidence of the differences between the two charts, which means examining the properties within the network, such as the density (proportion of the number of ties to the total possible number of ties) and degrees (number of edges per node), connectivity (paths that connect nodes), centrality (importance of certain individuals), etc. These properties form the basis for network theories, such as homophily (individuals within a department work together) or network self-organization (individuals across departments exchange favors to get work done). SNA helps us keep up with the increasing scope of modern networks, such as companies growing internationally, workplaces adopting telework, or meeting new friends online/retaining old friends through social media.


Social networks shape the choices that we make whether we’re conscious or unconscious of them. Our family is the first influence on our lives, imparting cultural values and morals, but also habits. For example, compare childhood experiences between someone born in the 1980s, 2000s, and now. In the 1980s, “screen time” meant television. In the 2000s, it would have probably been computers. Now, we’re talking about smart phones and tablets. Our personal relationship with technology has changed over time as well as the digital nature of our networks. A “long-distance” relationship used to mean keeping in touch via phone calls and mail. Now, it might mean instantaneous connection via apps (Facebook Messenger, Snapchat, etc.) but also never losing touch with someone because you’ll remain Facebook friends.

Connected: The Surprising Power of Our Social Networks and How They Shape Our Lives argues that the connections in our social network may be the best predictor for certain human factors, such as obesity or happiness.  These connections show that the “whole is greater than the sum of its parts.” Networks have more than one-to-one (mutual) ties (showing 100 people and 99 ties), but more structural ties across its members (such as 100 people and 450 ties). While we make choices that shape our network and our friends shape us, the authors also argue that this has an extended effect, which leads to unconscious collective influences. For example, weight becomes a social norm that is perpetuated through social ties. The closer the tie, the more direct the effect. The authors showed that spouses had a higher effect than friends of friends. However, extended ties still had an effect on individuals within the network.

The above image is an example of using connections on Facebook (from Christakis’s Human Nature Lab website). While inherent randomness seems to exist across the network, some clusters of overweight men or women appear. These occur more often from direct ties to spouses or siblings. The authors would argue that the existence of these clusters increase the likelihood of individuals throughout the network to be overweight because of our distant ties to overweight friends of friends. As a result, the randomness would actually represent the network’s influence from a more distant tie.

The connectedness argument seems to support predictable outcomes based on our social network for a lot of behaviors. However, I believe that the humans experience includes some unpredictability. How would unpredictable behavior manifest in a social network? How do new unconventional new ideas emerge? What goes on in the first person that starts a new trend in the network? I would have liked more discussion of the outliers and ways change might occur in large networks.

Also, the connectedness theory seems to  contrast with a recent study (abstract) that found connected teenagers  had decreased feelings of well-being. The speculation around this study is that people post their happiest moments online, but not necessarily the ordinary or bad events. For example, a teenager would post about their college acceptances and not necessarily their rejection letters. This incomplete picture of someone else’s life leads to bad comparison (their life appears happier than mine), resulting in decreased happiness. How does this self selection affect the connectedness argument if some social ties might be wholly digital? In this case, how would you know if someone is overweight since you’ve never met them and their profile picture is 10 years old (which you also don’t realize)? Would labeling this as a “weak” tie be sufficient? Can you know someone well without knowing details about them such as whether they’re overweight?