Putting it all together with D3 and TOPOJson

It seems I’m at a point where I need to define what my goal is/has been for this course, since I don’t think I have explicitly stated it quite yet.

Looking at my projects thus far, it seems like a crass menagerie of vaguely data visualization-esque achievements. I have about half of a stacked bar chart, a bunch of circles floating in space (albeit a confined space now, thank you very much), and a pseudo-timeline thing that looks like a flaccid scatterplot. Granted these are all works in progress, but they are still incredibly rudimentary, without question.

I would probably find this more disappointing if my goal was to make a timeline, and a bar chart, and a network visualization. But my goal-with-a-capital-“G”, my macro goal, was to become more familiar with a data visualization utility, learn its basics, and use it to apply some design theory to make presentation worthy documents of my work and other researchers’ work. In that sense, it may also seem like I’m falling short. So this week I decided to make one project, beginning to end, that encompassed all of the design elements that I wanted to achieve in those other projects based on what I’ve learned thus far.

D3 is impressive for a myriad of reasons, but one of the most impressive (in my opinion) is its ability to render complex maps completely client-side. In lieu of hardcoding images or flash elements, D3, in tandem with another Bostock library, TOPOJson, renders svg paths using coordinates passed from JSON files. This allows for some interesting interactive and real time mapping, especially in tandem with an API or other real time data source. It also allows for more robust styling options that can add more user-friendly ways to present our data.

To test the waters with TOPOJson, I used data from the 2016 American Community Survey investigating gross rent as a percentage of household income. The ACS is a supplementary battery of questions released by the census bureau that charts variables about changes within communities. The data itself is incredibly easy to come by through the census bureau website and usually comes pre-cleaned and with some user-friendly variables for use-cases such as mine. The data can also be exported by census tract, county, state, or as national data depending on your desired level of analysis. For this visualization, I wanted to use counties to illustrate the disparate cost of living not only between states but between urban centers and surrounding areas.

One of the first design choices I had to make was in choosing what kind of color scale to use. Most examples of d3 chloropeths, including creator Mike Bostock’s own work, use threshold scales to set breaks in the data set and then map those values to an array of color keys. One problem I found when trying to implement this across several data sets, however, is that threshold scaling is highly influenced by outliers; in many cases where a data set had an extreme high or low value, the color scale became basically indistinguishable for most areas except those extreme cases. To avoid this, I used a sequential scale to represent this data. Along with D3’s color interpolation functions, this mean I could input two hex codes to use as range’s my “min” and “max”, and D3 would return an array of RGB values matching all the colors in between.

var colors = d3.scaleSequential(d3.interpolate("#F2CC8F","#E07A5F"));

Next, I had to find a JSON file with census tract coordinates to draw the map itself. Fortunately, TOPOJson has a repository of json files that made this task incredibly easy. One difficulty I faced when iterating over this data, however, was the sheer size of the files themselves. The continental US alone has over 3,000 counties, so computing all of that information can be time and resource consuming in a client-side language like javascript. Compounding on this, I was joining data from this large JSON file to my original dataset, which was comparable in size. To account for this, I made use of d3’s .queue function to preload my datasets before implementing any content generation. Similarly, since javascript is an asynchronous language, this avoided any difficulties that may arise from trying to join two rows of data that may not be loaded at the same time. The .await function then calls our function for actually drawing the map itself once both data sets have loaded completely.

d3.queue()
.defer(d3.json , "https://d3js.org/us-10m.v1.json")
.defer(d3.csv, "https://docs.google.com/spreadsheets/d/e/2PACX-1vQK1F0gF62y_UNIsAhThc54HPWZHm-c-gZ1V5HTg5DYDHQ2eIbC3VKoaIJTqWniZnyD_UvfqpNxdBh6/pub?output=csv")
.await(mapLoad);

 

To join the two data sets, I created two empty arrays for county names and the actual rent percentage variable from the ACS data set. Using a .forEach function, I then populated these arrays using the county FIPS number as the entry key to make matching this data to its TOPOJson counterpart more simple. This is obviously not the most elegant way of accessing this information, nor is it the easiest in terms of the processing required, but it works.

function mapLoad(error, us, rent) {  
  
  
  //Write FIPS data and percentages into new array
  //Makes accessing percentages easier after feeding in the JSON coordinate data
  rent.forEach(function(d){ data[d.FIPS] = +d.percentage});
  rent.forEach(function(d){ names[d.FIPS] = d.county});

The rest of the code works similarly to the D3 prototype I’ve illustrated in past weeks; elements are called by a function, data is attached to those objects, and then new objects are created for every iteration of that data set. Some county values are still displaying as undefined, either as a result of missing values in the ACS data or due to FIPS values that do not match up between the two data sets, so I’ll have to take a deeper dive into the JSON data to see how that can be rectified.

One feature of TOPOJson that I have not been able to get working with this example is the library’s .projection functionality; this essentially maps the SVG paths being drawn to a larger SVG so that the size and shape of the object can be adjusted on the fly. This can be used to resize or rotate the projection to look at areas of interest, which could be an interesting implementation for future iterations of this project. I am also not completely satisfied the the sequential scale and may replace it with a more “partitioned” scale in the future to create better visual distinctions between counties. For now, though, I’m just happy that I managed to create my most complete visualization to date!

 

Building on the Force Directed Network Graph

Getting the force directed graph functional in d3 was a good start, but it clearly needs some tuning up.

 

For starters, the uniform fill color needs to be changed. Network graphs are interesting visualizations but mean very little without some differentiation between the nodes. Mapping the degree value to the node size expresses one dimension of this information, but there is a lot more information to be expressed and color is one of the simplest and notable ways to do so.

Color scales are simple to create in d3; it’s simply a matter of taking one of d3’s built in scale functions and mapping colors to the scale’s range, like so:

Var colorScale = d3.scaleOrdinal()

.domain(data.ourXvariable)

.range([“#FFF”,”#666”,”#000”])

Colors are expressed as an array of “bins” to be matched to the values in the domain. The colors can then be called when the objects are drawn.

In the case of this network graph, I wanted to use a color scale to represent the annual budget of each node’s parent organization. The animal rights movement is championed by several organizations with budgets that far exceed that of other organizations, and this information might be interesting to compare to their presence at the animal rights national conference (Do larger organizations have some clout that bolsters their presence at these events? Does the sheer breadth of smaller organizations obfuscate their participation).

In this case, I decided to use a quantized scale to partition the Organization Budget values in 9 domains. A continuous scale may also be fitting, since Budget is a continuous variable, however if the intention is to visually describe similarity or difference between nodes using color, many different shades of many different colors may obfuscate larger trends in the data.
Quantize scales are somewhere in between ordinal and linear scales in the d3 library. Whereas ordinal scales create “bins” based on (typically) nominal or categorical variables, quantize scales take a continuous variable and partition it into equal, discrete “bins” bound to a given domain; in this case

d3.Min(nodes.orgbudget)

to

d3.Max(nodes.orgbudget)

The tooltips were then appended to include the actual Organization Budget of each node, pulled from the row of the nodes dataset while the circles are being drawn.

Another prominent issue with the visualization is that the node x and y positioning is not currently bound to the width and height of the svg canvas. As a result, some nodes fly out of the visual bounds and are not visible to the user. D3 creator Michael Bostock has a proposed solution here, however implementing this in my own code has proven problematic. The gist is that, when the x and y coordinates are pulled, the library is told to put the nodes within the range of [radius, cavaswidth-radius] for the x coordinate, and the range of [radius, canvasheight-radius] for the y coordinate.

The reason the code seems to be breaking is that the variable radius, as it appears in Bostock’s code, is a set integer, whereas in my code, radius is a function of the node’s degree centrality value. In trying to call the variable radius, my code is trying to look for a value that isn’t present in the current data which is being built by the force function. As such, I set an arbitrary value of 20 in place of Bostock’s use of radius.

The result is a graph that still fits the bounds of the box and has a bit more of a visual distinction between the types of nodes. Pretty cool! A key still needs to be implemented to explain what the colors actually mean, of course. Another concern is that the nodes still gravitate towards one another quite arbitrarily, rather than the nice subgroups that are emblematic of network graphs. This week I plan to take a deeper dive into the d3.force function to see how these forces can be used to replicate such results.

 

Force Directed Network Graphs in D3

Network analysis is a unique theoretical and methodological approach to sociology that I happened to stumble into last semester. At its most basic, network analysis can help us understand networks not just in terms of who is the most connected, but who helps connect disparate parts of a network, who is closest to the most important actors in a network, and who gets shafted in the network. It ticks a lot of boxes for me; it’s quantitative, it generates a lot of hypotheses for future research projects, and most importantly, the visualizations are just really pretty.

I started toying with D3’s .force functionality shortly after the start of the course, with mixed results; chiefly because I hadn’t mastered the art of drawing SVG objects, let alone applying forces telling them where to situate themselves.

WordPress isn’t always the friendliest when it comes to embedding iframes, but a sample of what I’ve got running can be found on JSFiddle. The code itself is fairly standard; nodes are pulled from a csv list and drawn as circles with varying radii based on their degree values. Links are pulled from another csv list and drawn as lines between the circles. This is a fair start, and this visualization has some basic functions like mouseover tooltips to see who each circle represents. It can be taken a lot further, though.

For starters, the forces drawn here are completely arbitrary. I basically started with a force that would draw everything to the middle of the svg “canvas”, then created two forces to represent repulsion and attraction and stopped once I found a combination of variables that didn’t look atrocious. It would be cool to crack open the d3-force functions a little further and try to program some recognized graphing algorithms such as Yifan Hu’s popular algorithm.

Secondly, nodes are entirely indistinguishable from one another. One idea I had been playing around with was adding image url’s for each node’s organization logo to the original csv, and then assigning those url’s to the svg’s when drawn. This presents some issues considering the size discrepancy between nodes. This method could still be implemented by adding the organization’s logo to the tooltips and creating an ordinal color scale to distinguish different types of organizations.

Thirdly, the tooltips are about as meat and potatoes as you can get. A rectangle with some text. The logo’s would be a nice touch, but it would be interesting to see how d3’s onmouseover and transition functions can be used to create something snappier or more presentable; perhaps a rectangle that draws itself to size out of the cursor’s tip, or a box that wipes horizontally on mouseover to reveal the tooltip.

 

Back to Basics: Starting a stacked bar chart in D3

This week the theme is simplify, simplify, simplify. One of my past professors used to tell me I had a propensity to get “lost in the weeds” with my work; I get so fixated on the minutia of my projects that I forget to think about the big picture. This is a habit I seem to have carried over into my data visualizations; I see some cool niche functionality and get so fixated that it eats up all of my time. To abet this, I wanted to take it back to basics this week with a stacked bar chart.

The data I’m going to be using comes from a study of current and former vegans and vegetarians conducted by Faunalytics. Specifically, two variables stand out that I would like to visualize; the first is the respondents status as either current or former vegan/vegetarian, and the second is a battery of reasons for deciding to abstain from meat consumption (health, animal rights, environmentalism, advice from peers, etc.). The original researchers seemed primarily interested in vegan recidivism and why most vegans relapse with this study, but I think these data points present interesting findings in regards to how vegan/vegetarian groups could structure their outreach to get the most people to try vegan/vegetarian dieting.

To visualize responses and how they vary between the main respondent groups, I am going to create a stacked bar chart with each category represented by a different color. D3’s author, Michael Bostock, has a handy sample of such a visualization here. I have three, short term goals I want to address this week:

  • Drawing the chart using data parsed from the original csv dataset
  • Tooltips on hover to show specific response rates for each group on each variable
  • A legend generated using SVG’s, similar to Bostock’s example.

D3 has some interesting animation and onclick/onhover functions that would be cool to apply later down the line, such as those seen in this example. There’s also some functionality to add gradient effects and append images to svg’s, however for now I’m going to try to leave that pandora’s box closed.

 

Jumping into the deep end with Javascript

I should preface this post by saying that my grasp of coding languages is tenuous at best. Most of my background is html and css stems from coding Myspace profile themes for my friends in the mid-2000’s, and I went to a high school that thought Visual Basic was still a relevant coding language in 2008. I college I took a course on backend web development, but even that was mostly creating sign-up sheets without any styling and praying that the data read to a server database.

That being said, my long term goals for this course may seem a bit lofty. As I read more about interactive visualizations or data visualizations for web publication, it seems like javascript is going to be a necessary component to making my information easily updated and visually appealing for users. I spent the better half of the week reading some basic javascript documentation and getting my IDE and local server running to troubleshoot some code, and then the second half of the week with my head in my hands wondering what I was doing wrong (I hear from my brother, a web-developer, that this is pretty much par for the course).

For our first timeline assignment, I tried to make a pseudo-Gantt chart style timeline of major events in the Animal Rights Movement. My intention was to split major events into three categories; advances for animals used in experimentation, advances in factory farming, and major milestones for the movement itself.

One quality that I wanted to retain with this visualization is that the data could be easily updated without having to recreate the whole visualization. To achieve this, I used the D3 javascript library to parse information from a csv file to use as datapoints.  D3 is a widely used and extensive visualization library for javascript that can create some truly stunning visuals. Basically, I tried to write a function that would iterate over the rows of the csv and return the data as arrays to be used by the chart.

Using start years and end years, I wanted to make something in the vein of this visualization:

Phillipson, G. 2013. Global Volcanic Unrest in the 21st Century: An analysis of the first decade. Journal of Volcanology.

Through this process I made two major discoveries: 1, Javascript does not like it when you try to parse a single year as a date; 2, javascript really doesn’t like it when you try to create a timeline with intermittent or overlapping start and end dates.

I am still toying with the code and I am hoping to have it operational soon, however I think this is a lesson in taking it slow. I will probably go back to the drawing board and declare all my information as constants and see how that goes before I try messing with data parsing any further, and then slowly pluck away at these more long term design goals throughout the semester. Similarly, there are some other beautiful visualization packages like Highchart that I would like to play around with before deciding to rally behind D3 completely.

 

Striking Balance Between Vanity and Utility: Why I am Taking Data Viz

One of the basic tenants of writing and publishing of social science research is that your language should be as accessible and approachable as possible; for example, your thesis about disparities in educational achievement would probably not carry much weight if it couldn’t be interpreted by a non-academic audience. Social sciences can However, it seems that the same considerations have not been made for data visualizations interpreting social science data. Although social science research is meant to address complex issues with intricate and interesting minutiae, our visualizations are rarely treated with the same level of depth or care.  Fields like Social Network Analysis, in which complex connections between actors become collapsed into an elaborate needlepoint of nodes, strike me as especially guilty.

This network visualization I created that semester comes to mind. This data represents copanelist data from the 2017 Animal Rights National Conference, and although the visualization makes sense within the context of an academic paper, it leaves much to be desired as a standalone visualization. For example, how is a viewer supposed to discern which nodes are important? In a network of this size, how can I add legible node labels in a static image? How can I relay what node size represents without a paragraph of context? Does the physical distance between nodes have a quantitative significance in this model, and if so how can I illustrate that to my audience? These are adjustments and considerations that I will be carrying through my work this semester.

Admittedly, some of these limitations are consequent of the software in which this visualization was rendered. However, I do not see that as an inevitability. As social scientists I think we underestimate the visualizations tools available in packages like Python, R, or SPSS, or are too scared to take a deep dive into these languages to press their capabilities. This is something I would like to address in order to streamline the workflow from data analysis to data visualization.

Both of these line graphs were rendered using the same Python libraries. How can I better use these tools to create visual appealing and engaging visualizations?

 

 

This is the point at which I see interactive visuals as a necessity. Using the above example, a flash-based network visualization that allowed the user to adjust labels or node sizes may be able to address some of these confusions; even better, a visualization that creates force-directed models based on different centrality measures in real time. Another interesting visual tool could be overlaying pictures or logos over individual nodes to address the issue of node labeling. These are the kind of use case limitations I see as a consumer of data visualizations that I would like investigate and address in my own work.

My goal for this course is to strike a balance between interesting visual storytelling and interpretive clarity with my visualizations. Particularly in regards to interactive data visualizations, I want to be able to relay complex information and offer tools to help clients or viewers interpret that information using different metrics they may find most suitable. Most of my data visualizations have been drafted with print or academic writing in mind, and as a result lack the panache and intuitiveness that I would like to see in visualizations meant for web publication.  I would like to take a deep dive into packages like Tableau or Keshif to see how my work could benefit from a more interactive approach to data analysis and presentation.

 
Privacy Statement