Putting it all together with D3 and TOPOJson

It seems I’m at a point where I need to define what my goal is/has been for this course, since I don’t think I have explicitly stated it quite yet.

Looking at my projects thus far, it seems like a crass menagerie of vaguely data visualization-esque achievements. I have about half of a stacked bar chart, a bunch of circles floating in space (albeit a confined space now, thank you very much), and a pseudo-timeline thing that looks like a flaccid scatterplot. Granted these are all works in progress, but they are still incredibly rudimentary, without question.

I would probably find this more disappointing if my goal was to make a timeline, and a bar chart, and a network visualization. But my goal-with-a-capital-“G”, my macro goal, was to become more familiar with a data visualization utility, learn its basics, and use it to apply some design theory to make presentation worthy documents of my work and other researchers’ work. In that sense, it may also seem like I’m falling short. So this week I decided to make one project, beginning to end, that encompassed all of the design elements that I wanted to achieve in those other projects based on what I’ve learned thus far.

D3 is impressive for a myriad of reasons, but one of the most impressive (in my opinion) is its ability to render complex maps completely client-side. In lieu of hardcoding images or flash elements, D3, in tandem with another Bostock library, TOPOJson, renders svg paths using coordinates passed from JSON files. This allows for some interesting interactive and real time mapping, especially in tandem with an API or other real time data source. It also allows for more robust styling options that can add more user-friendly ways to present our data.

To test the waters with TOPOJson, I used data from the 2016 American Community Survey investigating gross rent as a percentage of household income. The ACS is a supplementary battery of questions released by the census bureau that charts variables about changes within communities. The data itself is incredibly easy to come by through the census bureau website and usually comes pre-cleaned and with some user-friendly variables for use-cases such as mine. The data can also be exported by census tract, county, state, or as national data depending on your desired level of analysis. For this visualization, I wanted to use counties to illustrate the disparate cost of living not only between states but between urban centers and surrounding areas.

One of the first design choices I had to make was in choosing what kind of color scale to use. Most examples of d3 chloropeths, including creator Mike Bostock’s own work, use threshold scales to set breaks in the data set and then map those values to an array of color keys. One problem I found when trying to implement this across several data sets, however, is that threshold scaling is highly influenced by outliers; in many cases where a data set had an extreme high or low value, the color scale became basically indistinguishable for most areas except those extreme cases. To avoid this, I used a sequential scale to represent this data. Along with D3’s color interpolation functions, this mean I could input two hex codes to use as range’s my “min” and “max”, and D3 would return an array of RGB values matching all the colors in between.

var colors = d3.scaleSequential(d3.interpolate("#F2CC8F","#E07A5F"));

Next, I had to find a JSON file with census tract coordinates to draw the map itself. Fortunately, TOPOJson has a repository of json files that made this task incredibly easy. One difficulty I faced when iterating over this data, however, was the sheer size of the files themselves. The continental US alone has over 3,000 counties, so computing all of that information can be time and resource consuming in a client-side language like javascript. Compounding on this, I was joining data from this large JSON file to my original dataset, which was comparable in size. To account for this, I made use of d3’s .queue function to preload my datasets before implementing any content generation. Similarly, since javascript is an asynchronous language, this avoided any difficulties that may arise from trying to join two rows of data that may not be loaded at the same time. The .await function then calls our function for actually drawing the map itself once both data sets have loaded completely.

d3.queue()
.defer(d3.json , "https://d3js.org/us-10m.v1.json")
.defer(d3.csv, "https://docs.google.com/spreadsheets/d/e/2PACX-1vQK1F0gF62y_UNIsAhThc54HPWZHm-c-gZ1V5HTg5DYDHQ2eIbC3VKoaIJTqWniZnyD_UvfqpNxdBh6/pub?output=csv")
.await(mapLoad);

 

To join the two data sets, I created two empty arrays for county names and the actual rent percentage variable from the ACS data set. Using a .forEach function, I then populated these arrays using the county FIPS number as the entry key to make matching this data to its TOPOJson counterpart more simple. This is obviously not the most elegant way of accessing this information, nor is it the easiest in terms of the processing required, but it works.

function mapLoad(error, us, rent) {  
  
  
  //Write FIPS data and percentages into new array
  //Makes accessing percentages easier after feeding in the JSON coordinate data
  rent.forEach(function(d){ data[d.FIPS] = +d.percentage});
  rent.forEach(function(d){ names[d.FIPS] = d.county});

The rest of the code works similarly to the D3 prototype I’ve illustrated in past weeks; elements are called by a function, data is attached to those objects, and then new objects are created for every iteration of that data set. Some county values are still displaying as undefined, either as a result of missing values in the ACS data or due to FIPS values that do not match up between the two data sets, so I’ll have to take a deeper dive into the JSON data to see how that can be rectified.

One feature of TOPOJson that I have not been able to get working with this example is the library’s .projection functionality; this essentially maps the SVG paths being drawn to a larger SVG so that the size and shape of the object can be adjusted on the fly. This can be used to resize or rotate the projection to look at areas of interest, which could be an interesting implementation for future iterations of this project. I am also not completely satisfied the the sequential scale and may replace it with a more “partitioned” scale in the future to create better visual distinctions between counties. For now, though, I’m just happy that I managed to create my most complete visualization to date!

 

Building on the Force Directed Network Graph

Getting the force directed graph functional in d3 was a good start, but it clearly needs some tuning up.

 

For starters, the uniform fill color needs to be changed. Network graphs are interesting visualizations but mean very little without some differentiation between the nodes. Mapping the degree value to the node size expresses one dimension of this information, but there is a lot more information to be expressed and color is one of the simplest and notable ways to do so.

Color scales are simple to create in d3; it’s simply a matter of taking one of d3’s built in scale functions and mapping colors to the scale’s range, like so:

Var colorScale = d3.scaleOrdinal()

.domain(data.ourXvariable)

.range([“#FFF”,”#666”,”#000”])

Colors are expressed as an array of “bins” to be matched to the values in the domain. The colors can then be called when the objects are drawn.

In the case of this network graph, I wanted to use a color scale to represent the annual budget of each node’s parent organization. The animal rights movement is championed by several organizations with budgets that far exceed that of other organizations, and this information might be interesting to compare to their presence at the animal rights national conference (Do larger organizations have some clout that bolsters their presence at these events? Does the sheer breadth of smaller organizations obfuscate their participation).

In this case, I decided to use a quantized scale to partition the Organization Budget values in 9 domains. A continuous scale may also be fitting, since Budget is a continuous variable, however if the intention is to visually describe similarity or difference between nodes using color, many different shades of many different colors may obfuscate larger trends in the data.
Quantize scales are somewhere in between ordinal and linear scales in the d3 library. Whereas ordinal scales create “bins” based on (typically) nominal or categorical variables, quantize scales take a continuous variable and partition it into equal, discrete “bins” bound to a given domain; in this case

d3.Min(nodes.orgbudget)

to

d3.Max(nodes.orgbudget)

The tooltips were then appended to include the actual Organization Budget of each node, pulled from the row of the nodes dataset while the circles are being drawn.

Another prominent issue with the visualization is that the node x and y positioning is not currently bound to the width and height of the svg canvas. As a result, some nodes fly out of the visual bounds and are not visible to the user. D3 creator Michael Bostock has a proposed solution here, however implementing this in my own code has proven problematic. The gist is that, when the x and y coordinates are pulled, the library is told to put the nodes within the range of [radius, cavaswidth-radius] for the x coordinate, and the range of [radius, canvasheight-radius] for the y coordinate.

The reason the code seems to be breaking is that the variable radius, as it appears in Bostock’s code, is a set integer, whereas in my code, radius is a function of the node’s degree centrality value. In trying to call the variable radius, my code is trying to look for a value that isn’t present in the current data which is being built by the force function. As such, I set an arbitrary value of 20 in place of Bostock’s use of radius.

The result is a graph that still fits the bounds of the box and has a bit more of a visual distinction between the types of nodes. Pretty cool! A key still needs to be implemented to explain what the colors actually mean, of course. Another concern is that the nodes still gravitate towards one another quite arbitrarily, rather than the nice subgroups that are emblematic of network graphs. This week I plan to take a deeper dive into the d3.force function to see how these forces can be used to replicate such results.

 

Data mapping and iterating in D3: The 3 week Stacked Bar Graph Experiment

One phrase that comes up in most of the Stack Overflow threads I’ve seen (which is a lot; I think Stack Overflow has become my visited website since starting this course) is “the d3 way of doing things”.

The name of the game in D3.JS is array manipulation. Most if not all d3 functions, follow a similar format; tell d3 what html element to look at, tell d3 to use some array, iterate over said array and create a new html element for each entry.

This is the double edged sword of using d3. On one hand, this eliminates most of the need to create iterative javascript functions to use your dataset. A typical iterative may look something like

Function(d,i) {

For I in d, i>=0, I == i

Return d.somevalue

I = I + 1

};

Whereas in d3, the heavy lifting is done within the library. So telling d3 to iterate over some data is as simple as d3.data(yourdata). This also means that creating classes of objects with the same/similar properties is incredibly simple. For example, for a simple bar chart, appending x positioning or bar height is as simple as

myGraph.selectAll("rect")//Tell d3 to look at all "rectangles"
.data(myData)//Tell d3 what datasset to iterate
.enter().append("rect")//Put in rectangles if there aren't any
.attr(“height”, function(d) { return d.y}
.attr(“width”, function(d) { return d.x}

With d3 doing the task of iterating over the array and generating bars for each row of your data.

Where this gets hairy, however, is that d3 itself like creating arrays. And it likes creating arrays from your arrays. And then arrays of those arrays from your array.

Enter my tumultuous relationship with the d3.stack function. D3.stack is a function that slices an input array based on a key that the user provides and assigns y values so that values can be compared vertically across comparison groups. So, for example, if your data has a categorical variable that you would like to use for analysis, you could instruct d3 to create subarray of values for each other variable of interest separated by those categories. So something like

Var ageBands = [“Control”,”Experiment”]

Var stack = d3.stack(myData)

.keys(ageBands)

Would return two arrays, one for the control group and one for the experimental group, with subarrays for each other variable in the data.

D3.stack underwent some interesting changes in the migration from version 3 to version 4. The version 3 stack function would output individual rows for each other variable with y0 and y values representing the start and end y position of each bar in the “stack”.

0: [

{x: control, y0:0 , y:12}

{x: control, y0:0, y:20}

{x: control, y0:0, y:3}

]

1: [

{x: experiment, y0:12, y:

In version 4, however, new variables are not defined. The y0 and y values are represented by further subarrays of two values each, like this:

0: [

[0,12]

[0,20]

[0,3]

Key: control

]

1 : [

[12, 24]

[20,40]

[3,6]

Key:experiment

]

Additionally, each subarray has yet another array that contains the original data for that row. The nest array certainly makes for a more parsimonious and readable solution, however calling this information can become more problematic. Instead of accessing some value in a given array, the array has to be called via a function and then entries from the subarrays using code such as

.attr("height", function(d){ return d[0][0] - d[0][1]})

The nesting ultimately saves a lot of processing time, but takes a little getting used to, especially when the majority of resources on the internet, including d3’s API, cite examples using the old .stack() format.

So after three weeks of tinkering, this is what I’ve ended up with. No bells and whistles (yet), but everything displays where it should! In the javascript you can see the calls for the bar structures, as well as some broken tooltip code (which I am hoping to address today). The data comes from this study conducted by Faunalytics investigating the motivations, dietary habits, and recidivism among current and former vegans and vegetarians. Mean values were calculated using descriptive statistics functions in SPSS 24 and then written into the array manually.

New Goals for the Stacked Bar Chart Project

Tooltips are obviously the biggest missing piece here. The stacks themselves mean very little without the actual percentages. Another option would be to append text in the center of each bar with the percentage displayed, however I also don’t want too much visual information cluttering the visualization.

Axis labels are another concern, in the same vein. As of right now, the X-axis label is pulled from the variable names from the original dataset, and give very little contextual meaning about what those variables are supposed to represent. This is something I also considered addressing using tooltips, because of the limited space to write full descriptions along the axis.

There are other quality-of-life updates I would like to do with this code, however these are the big ones for right now.

 

Force Directed Network Graphs in D3

Network analysis is a unique theoretical and methodological approach to sociology that I happened to stumble into last semester. At its most basic, network analysis can help us understand networks not just in terms of who is the most connected, but who helps connect disparate parts of a network, who is closest to the most important actors in a network, and who gets shafted in the network. It ticks a lot of boxes for me; it’s quantitative, it generates a lot of hypotheses for future research projects, and most importantly, the visualizations are just really pretty.

I started toying with D3’s .force functionality shortly after the start of the course, with mixed results; chiefly because I hadn’t mastered the art of drawing SVG objects, let alone applying forces telling them where to situate themselves.

WordPress isn’t always the friendliest when it comes to embedding iframes, but a sample of what I’ve got running can be found on JSFiddle. The code itself is fairly standard; nodes are pulled from a csv list and drawn as circles with varying radii based on their degree values. Links are pulled from another csv list and drawn as lines between the circles. This is a fair start, and this visualization has some basic functions like mouseover tooltips to see who each circle represents. It can be taken a lot further, though.

For starters, the forces drawn here are completely arbitrary. I basically started with a force that would draw everything to the middle of the svg “canvas”, then created two forces to represent repulsion and attraction and stopped once I found a combination of variables that didn’t look atrocious. It would be cool to crack open the d3-force functions a little further and try to program some recognized graphing algorithms such as Yifan Hu’s popular algorithm.

Secondly, nodes are entirely indistinguishable from one another. One idea I had been playing around with was adding image url’s for each node’s organization logo to the original csv, and then assigning those url’s to the svg’s when drawn. This presents some issues considering the size discrepancy between nodes. This method could still be implemented by adding the organization’s logo to the tooltips and creating an ordinal color scale to distinguish different types of organizations.

Thirdly, the tooltips are about as meat and potatoes as you can get. A rectangle with some text. The logo’s would be a nice touch, but it would be interesting to see how d3’s onmouseover and transition functions can be used to create something snappier or more presentable; perhaps a rectangle that draws itself to size out of the cursor’s tip, or a box that wipes horizontally on mouseover to reveal the tooltip.

 

Back to Basics: Starting a stacked bar chart in D3

This week the theme is simplify, simplify, simplify. One of my past professors used to tell me I had a propensity to get “lost in the weeds” with my work; I get so fixated on the minutia of my projects that I forget to think about the big picture. This is a habit I seem to have carried over into my data visualizations; I see some cool niche functionality and get so fixated that it eats up all of my time. To abet this, I wanted to take it back to basics this week with a stacked bar chart.

The data I’m going to be using comes from a study of current and former vegans and vegetarians conducted by Faunalytics. Specifically, two variables stand out that I would like to visualize; the first is the respondents status as either current or former vegan/vegetarian, and the second is a battery of reasons for deciding to abstain from meat consumption (health, animal rights, environmentalism, advice from peers, etc.). The original researchers seemed primarily interested in vegan recidivism and why most vegans relapse with this study, but I think these data points present interesting findings in regards to how vegan/vegetarian groups could structure their outreach to get the most people to try vegan/vegetarian dieting.

To visualize responses and how they vary between the main respondent groups, I am going to create a stacked bar chart with each category represented by a different color. D3’s author, Michael Bostock, has a handy sample of such a visualization here. I have three, short term goals I want to address this week:

  • Drawing the chart using data parsed from the original csv dataset
  • Tooltips on hover to show specific response rates for each group on each variable
  • A legend generated using SVG’s, similar to Bostock’s example.

D3 has some interesting animation and onclick/onhover functions that would be cool to apply later down the line, such as those seen in this example. There’s also some functionality to add gradient effects and append images to svg’s, however for now I’m going to try to leave that pandora’s box closed.

 

Jumping into the deep end with Javascript

I should preface this post by saying that my grasp of coding languages is tenuous at best. Most of my background is html and css stems from coding Myspace profile themes for my friends in the mid-2000’s, and I went to a high school that thought Visual Basic was still a relevant coding language in 2008. I college I took a course on backend web development, but even that was mostly creating sign-up sheets without any styling and praying that the data read to a server database.

That being said, my long term goals for this course may seem a bit lofty. As I read more about interactive visualizations or data visualizations for web publication, it seems like javascript is going to be a necessary component to making my information easily updated and visually appealing for users. I spent the better half of the week reading some basic javascript documentation and getting my IDE and local server running to troubleshoot some code, and then the second half of the week with my head in my hands wondering what I was doing wrong (I hear from my brother, a web-developer, that this is pretty much par for the course).

For our first timeline assignment, I tried to make a pseudo-Gantt chart style timeline of major events in the Animal Rights Movement. My intention was to split major events into three categories; advances for animals used in experimentation, advances in factory farming, and major milestones for the movement itself.

One quality that I wanted to retain with this visualization is that the data could be easily updated without having to recreate the whole visualization. To achieve this, I used the D3 javascript library to parse information from a csv file to use as datapoints.  D3 is a widely used and extensive visualization library for javascript that can create some truly stunning visuals. Basically, I tried to write a function that would iterate over the rows of the csv and return the data as arrays to be used by the chart.

Using start years and end years, I wanted to make something in the vein of this visualization:

Phillipson, G. 2013. Global Volcanic Unrest in the 21st Century: An analysis of the first decade. Journal of Volcanology.

Through this process I made two major discoveries: 1, Javascript does not like it when you try to parse a single year as a date; 2, javascript really doesn’t like it when you try to create a timeline with intermittent or overlapping start and end dates.

I am still toying with the code and I am hoping to have it operational soon, however I think this is a lesson in taking it slow. I will probably go back to the drawing board and declare all my information as constants and see how that goes before I try messing with data parsing any further, and then slowly pluck away at these more long term design goals throughout the semester. Similarly, there are some other beautiful visualization packages like Highchart that I would like to play around with before deciding to rally behind D3 completely.

 

Creating the Perfect Portfolio

Admittedly, a portfolio of visual works is not something I have had to consider up to this point in my ongoing will-they-won’t-they relationship with graduate school.

I started my search on the Information is Beautiful Awards website, which ironically gave me more ideas of what I wanted to avoid in my portfolio than what I would like to adopt. Their showcase is appears to be a floated list of blog entries for each visualization, however the result is much more cluttered than I would like:

One glaring issue I find with this approach is the disparate heights for the thumbnails. In the above example, nearly a third of the screen real-estate is dedicated to the middle visualization due to its vertical orientation, which detracts heavily from the other visualizations showcased on the page. Another issue is that the text area at the bottom of each post is a fixed height, which results in some posts seeming “squashed” due to the orientation of the thumbnail. An easy solution to this would be fixed thumbnail sizes for each visualization.

Another issue I have with this example is that the blurb beneath each thumbnail typically explains more about the dataset than the visualization itself.  In the case of a data visualization portfolio, I think this section would be better suited as a place to explain what tools or software was used to create the visualization, with more information about the data on a separate webpage. Even if not the software used to produced the visual, this might be a place to explain particular features of the visualization, or particular design decisions that went into making the visualization.

Going outside the realm of data visualization, I think a “portfolio” that does this well is the demo reel for the design and development company Support Class. In their demo reel they provide visual examples of their work as well as itemized lists of the particular tools or considerations they had to make when creating their designs.

One portfolio I found that more closely mirrors my ideal portfolio comes from Visual Cinnamon. Their fixed thumbnails more closely mirror what I would like my portfolio to look like, and the mouseover effect that lists the visualization titles and tools used is a nice touch. The minimal screen space dedicated to the fixed header is also a design choice I am a fan of.

One change that I would like to make, and one I will probably adopt in my own portfolio, is limiting the number of visualizations per row to eliminate clutter. In this case, I would like to see the number of thumbnails limited to two per row, with the size of the thumbnails increased to show only one row per screen.

 

Striking Balance Between Vanity and Utility: Why I am Taking Data Viz

One of the basic tenants of writing and publishing of social science research is that your language should be as accessible and approachable as possible; for example, your thesis about disparities in educational achievement would probably not carry much weight if it couldn’t be interpreted by a non-academic audience. Social sciences can However, it seems that the same considerations have not been made for data visualizations interpreting social science data. Although social science research is meant to address complex issues with intricate and interesting minutiae, our visualizations are rarely treated with the same level of depth or care.  Fields like Social Network Analysis, in which complex connections between actors become collapsed into an elaborate needlepoint of nodes, strike me as especially guilty.

This network visualization I created that semester comes to mind. This data represents copanelist data from the 2017 Animal Rights National Conference, and although the visualization makes sense within the context of an academic paper, it leaves much to be desired as a standalone visualization. For example, how is a viewer supposed to discern which nodes are important? In a network of this size, how can I add legible node labels in a static image? How can I relay what node size represents without a paragraph of context? Does the physical distance between nodes have a quantitative significance in this model, and if so how can I illustrate that to my audience? These are adjustments and considerations that I will be carrying through my work this semester.

Admittedly, some of these limitations are consequent of the software in which this visualization was rendered. However, I do not see that as an inevitability. As social scientists I think we underestimate the visualizations tools available in packages like Python, R, or SPSS, or are too scared to take a deep dive into these languages to press their capabilities. This is something I would like to address in order to streamline the workflow from data analysis to data visualization.

Both of these line graphs were rendered using the same Python libraries. How can I better use these tools to create visual appealing and engaging visualizations?

 

 

This is the point at which I see interactive visuals as a necessity. Using the above example, a flash-based network visualization that allowed the user to adjust labels or node sizes may be able to address some of these confusions; even better, a visualization that creates force-directed models based on different centrality measures in real time. Another interesting visual tool could be overlaying pictures or logos over individual nodes to address the issue of node labeling. These are the kind of use case limitations I see as a consumer of data visualizations that I would like investigate and address in my own work.

My goal for this course is to strike a balance between interesting visual storytelling and interpretive clarity with my visualizations. Particularly in regards to interactive data visualizations, I want to be able to relay complex information and offer tools to help clients or viewers interpret that information using different metrics they may find most suitable. Most of my data visualizations have been drafted with print or academic writing in mind, and as a result lack the panache and intuitiveness that I would like to see in visualizations meant for web publication.  I would like to take a deep dive into packages like Tableau or Keshif to see how my work could benefit from a more interactive approach to data analysis and presentation.

 

In Bad Company: SNA as a tool for analyzing ‘dark’ networks

The term homophily pops up frequently in conversations regarding network analysis. Homophily, most simply, is a pithier way of saying “birds of a feather flock together”. While last week we investigated some of the more positive aspects that can be shared through networks, such as happiness or a sense of belonging, sometimes the company we keep is not as uplifting.

A great example of SNA’s applicability can be seen in the realm of hacking and other cybercrimes.  The proliferation of online networks exemplifies this shift from large, hierarchical crime structures to more disconnected pockets of actors. As such, the question that criminologists face is how to target high-profile actors that could disrupt the actions of these decentralized networks. In 2012, researchers from the University of Montreal set out to examine SNA’s ability to do just that. As the authors note, the sheer volume of manpower and resources involved in uncovering cybercrimes makes investigating every individual actor a wholly impractical endeavor. As such, these researchers wanted to use social network analysis not only to quantify the relationships between cybercriminals, but also see if the methodology could identify any persons of interests that investigators may have missed.

Using stored chat logs obtained from the hard drives of convicted hackers, one-on-one conversations were used to construct connections between 771 hackers. These connections were used to determine who among these actors would be considered persons of interest (in this case, defined as persons who were contacted by two or more of the convicted hackers from which the data was obtained).  The final network of hackers and POI constituted 38 actors out of the original sample. For these actors, degree and betweenness centrality measures were calculated to determine who among them was being consulted for their expertise, or who was providing avenues for communication between actors.

In the above figure, red nodes represent convicted hackers, whereas blue nodes represent POI (Décary-Hétu, D.& Dupont, B. 2012)

The above figure represents their final network. Their findings illustrate that this intimate network of hackers and POI are all likely to be in contact with one another, however this finding also implies that the 28 hackers who scored highly on these centrality measures could be just as influential as those that were convicted. This means that although the 10 arrests made were accurately targeted towards prominent actors, and their removal effectively disrupted communication within the network, targeting some of these POI may have done so even more effectively. As the authors put it, “the nail that sticks out gets hammered down.” What is notable, however, is that SNA measures were able to accurately identify those convicted as high profile actors with relative ease, as well as identify a myriad of other actors for further investigation.  The use of social network analysis could help lighten the monetary and cognitive load that more mainstream methods of online investigation may incur.


Just as social network analysis can help characterize the nature of these crime networks, it can also help elucidate the widespread nature of seemingly individualistic crimes. For example, to say that corruption is prominent within the United States Congress is not a contentious point, however how do we determine the magnitude of this corruption? Do congressman commit acts of corruption in solitude, or is there a more overarching culture of corruption in congress?

The 109th House was particularly damning, with three congressman being sentenced for taking bribes from PACS. Though taking money from PACs is obviously not illegal in and of itself, the Federal Bribery Statute clarifies that funding that influences lawmaker voting is seen as a criminal act. Following these indictments, several theoretical explanations arose; some argued that this was a case of a few bad apples who happened to be on the wrong side of the law; others argued that this was a partisan issue, considering all three congressman were Republicans; the majority of the American public, however, theorized that Congress was a wholly corrupt institution.

To test each of these theories, authors Clayton Peoples and James Sutton wanted to illustrate just how much PAC contributions had on voting behavior across all members of congress. In their own words, “is there a statistically significant general effect of shared PAC contributors on vote similarity among pairs of lawmakers in the 109th U.S. House, controlling for other factors?” By investigating relationships rather than individual behavior, their analysis would simultaneously address all three of the proposed “theories” about the prevalence and nature of corruption in congress. In order to quantify these relationships, the researchers turned to SNA.

The authors employed social network analysis in its strictest form. Dyadic ties between congressman were drawn based on their shared attributes in order to view relationships between congressman as units of analysis. Relationships such as party affiliation, shared committee membership, and most importantly shared PAC contributors, were calculated using matrices of dyadic ties. Then, using regression analysis, they controlled for most of these relationships, isolating shared PAC contributions as their independent variable.

Their findings were frightening, to say the least. They illustrated that, across the board, shared PAC contributions played a statistically significant role on the voting habits of lawmakers. An important note is that PAC contributions played a far more significant role than shared committee membership or memberships based on state; the authors use this to discern voting that could constitute a “bribe”, rather than votes that would have occurred with the interest of constituents in mind. This finding was consistent across party lines, decrying the notion that corruption was a purely Republican issue. Through the use of social network analysis, the researchers were able to illustrate that bribery, though typically defined as an individual act, could be a part of a larger network of collusion and institutional corruption that should be investigated as such. Given that these actions seemed to be normalized practice in the 109th congress, we can safely say that this was not the case of a few bad apples; as Décary-Hétu and Dupont put it, it was just the nail that stuck out that got hammered down.

 

 
Privacy Statement