It seems I’m at a point where I need to define what my goal is/has been for this course, since I don’t think I have explicitly stated it quite yet.
Looking at my projects thus far, it seems like a crass menagerie of vaguely data visualization-esque achievements. I have about half of a stacked bar chart, a bunch of circles floating in space (albeit a confined space now, thank you very much), and a pseudo-timeline thing that looks like a flaccid scatterplot. Granted these are all works in progress, but they are still incredibly rudimentary, without question.
I would probably find this more disappointing if my goal was to make a timeline, and a bar chart, and a network visualization. But my goal-with-a-capital-“G”, my macro goal, was to become more familiar with a data visualization utility, learn its basics, and use it to apply some design theory to make presentation worthy documents of my work and other researchers’ work. In that sense, it may also seem like I’m falling short. So this week I decided to make one project, beginning to end, that encompassed all of the design elements that I wanted to achieve in those other projects based on what I’ve learned thus far.
D3 is impressive for a myriad of reasons, but one of the most impressive (in my opinion) is its ability to render complex maps completely client-side. In lieu of hardcoding images or flash elements, D3, in tandem with another Bostock library, TOPOJson, renders svg paths using coordinates passed from JSON files. This allows for some interesting interactive and real time mapping, especially in tandem with an API or other real time data source. It also allows for more robust styling options that can add more user-friendly ways to present our data.
To test the waters with TOPOJson, I used data from the 2016 American Community Survey investigating gross rent as a percentage of household income. The ACS is a supplementary battery of questions released by the census bureau that charts variables about changes within communities. The data itself is incredibly easy to come by through the census bureau website and usually comes pre-cleaned and with some user-friendly variables for use-cases such as mine. The data can also be exported by census tract, county, state, or as national data depending on your desired level of analysis. For this visualization, I wanted to use counties to illustrate the disparate cost of living not only between states but between urban centers and surrounding areas.
One of the first design choices I had to make was in choosing what kind of color scale to use. Most examples of d3 chloropeths, including creator Mike Bostock’s own work, use threshold scales to set breaks in the data set and then map those values to an array of color keys. One problem I found when trying to implement this across several data sets, however, is that threshold scaling is highly influenced by outliers; in many cases where a data set had an extreme high or low value, the color scale became basically indistinguishable for most areas except those extreme cases. To avoid this, I used a sequential scale to represent this data. Along with D3’s color interpolation functions, this mean I could input two hex codes to use as range’s my “min” and “max”, and D3 would return an array of RGB values matching all the colors in between.
var colors = d3.scaleSequential(d3.interpolate("#F2CC8F","#E07A5F"));
Next, I had to find a JSON file with census tract coordinates to draw the map itself. Fortunately, TOPOJson has a repository of json files that made this task incredibly easy. One difficulty I faced when iterating over this data, however, was the sheer size of the files themselves. The continental US alone has over 3,000 counties, so computing all of that information can be time and resource consuming in a client-side language like javascript. Compounding on this, I was joining data from this large JSON file to my original dataset, which was comparable in size. To account for this, I made use of d3’s .queue function to preload my datasets before implementing any content generation. Similarly, since javascript is an asynchronous language, this avoided any difficulties that may arise from trying to join two rows of data that may not be loaded at the same time. The .await function then calls our function for actually drawing the map itself once both data sets have loaded completely.
d3.queue() .defer(d3.json , "https://d3js.org/us-10m.v1.json") .defer(d3.csv, "https://docs.google.com/spreadsheets/d/e/2PACX-1vQK1F0gF62y_UNIsAhThc54HPWZHm-c-gZ1V5HTg5DYDHQ2eIbC3VKoaIJTqWniZnyD_UvfqpNxdBh6/pub?output=csv") .await(mapLoad);
To join the two data sets, I created two empty arrays for county names and the actual rent percentage variable from the ACS data set. Using a .forEach function, I then populated these arrays using the county FIPS number as the entry key to make matching this data to its TOPOJson counterpart more simple. This is obviously not the most elegant way of accessing this information, nor is it the easiest in terms of the processing required, but it works.
function mapLoad(error, us, rent) { //Write FIPS data and percentages into new array //Makes accessing percentages easier after feeding in the JSON coordinate data rent.forEach(function(d){ data[d.FIPS] = +d.percentage}); rent.forEach(function(d){ names[d.FIPS] = d.county});
The rest of the code works similarly to the D3 prototype I’ve illustrated in past weeks; elements are called by a function, data is attached to those objects, and then new objects are created for every iteration of that data set. Some county values are still displaying as undefined, either as a result of missing values in the ACS data or due to FIPS values that do not match up between the two data sets, so I’ll have to take a deeper dive into the JSON data to see how that can be rectified.
One feature of TOPOJson that I have not been able to get working with this example is the library’s .projection functionality; this essentially maps the SVG paths being drawn to a larger SVG so that the size and shape of the object can be adjusted on the fly. This can be used to resize or rotate the projection to look at areas of interest, which could be an interesting implementation for future iterations of this project. I am also not completely satisfied the the sequential scale and may replace it with a more “partitioned” scale in the future to create better visual distinctions between counties. For now, though, I’m just happy that I managed to create my most complete visualization to date!