(Note: Gists turned into links so as to avoid too many roundtrips to github.com which is slow sometimes)
As a citizen of Toronto, the past few weeks (and months) have been interesting to say the least. Our mayor, Rob Ford, has made headlines for his various lewd remarks, football follies, and drunken stupors. With an election coming up in 2014, I was curious as to how the rest of Toronto felt about our mayor.
Collect data from Twitter, making sure we only use tweets from people in Toronto. Plot those tweets on an electoral map of Toronto's wards, colour coding the wards based on how they voted in the 2010 Mayoral Election.
Alright, here's the final visualization of Toronto's opinion of Rob Ford. Take a look and then come back here and find out more about how we accomplished this.
For those still with me, let's go through the step-by-step of how this visualization was accomplished.
This is a given - if we want to visualize data, first we need to get some! To do so, we used Repustate to create our rules and monitor Twitter for new data. You can monitor social media data using Repustate's web interface, but this is a project for hackers, so let's use the Social API to do this. Here's the code (you first have to get an API key):
Alright, so while Repustate gets our data, we can get the geo visualization aspect of this started. We'll check back in on the data - for now, let's delve into the world of geographic information systems and analytics.
What's a shapefile? Good question! Paraphrasing Wikipedia, shapefiles are a file format that contain vector information describing the layout and/or location of geographic features, like cities, states, provinces, rivers, lakes, mountains, or even neighbourhoods. Thanks to a recent movement to open up data to the public, it's not *too* hard to get your hands on shape files anymore. To get the shapefiles for the City of Toronto, we went to the City's OpenData website, which has quite a bit of cool data there to play with. The shapefile we need, the one that shows Toronto divided up into it's electoral wards, is right here. Download it!
So we have our shapefile, but it's in a vector format that's not exactly ready to be imported into a web page. The next step is to convert this file into a web-friendly format. To do this, we need to convert the data in the shapefile into a JSON-inspired format, called GeoJSON. GeoJSON looks just like normal JSON (it is normal JSON), but it has a spec that defines what a valid GeoJSON object must contain. Luckily, there's some open source software that will do this for us: the Geospatial Data Abstraction Library or GDAL. GDAL has *tons* of cool stuff in it, take a look when you have time, but for now, we're just going to use the ogr2ogr command that takes a shapefile in and spits out GeoJSON. Armed with GDAL and our shapefile, let's convert the data into GeoJSON:
This new file, toronto.json, is a GeoJSON file that contains the co-ordinates for drawing the polygons representing the 44 wards of Toronto. Let's take a look at that one liner just so you know what's going on here:
Alright, GeoJSON file done.
Assuming enough time has passed, Repustate will have found some data for us and we can retrieve it via the Social API:
The data comes down as a CSV. It'll remain an exercise for the reader to store the data in a format that you can query and retrieve thereafter, but let's just assume we've stored our data. Included in this data dump will be the latitude and longitude of each tweet(er). How do we know if this tweet(er) is in Toronto? There's a few ways actually.
We could use PostGIS, an unbelievably amazing extension for PostgreSQL that allows you to do all sorts of interesting queries with geographic data. If we went this route, we could load the original shapefile into a PostGIS enabled database, then for each tweet, use the ST_Contains method to check which tweet(er)s are in Toronto. Another way to do this is to iterate over your dataset and for each one, do a quick point in polygon calculation in the language of your choice. The polygon data can be retrieved from the GeoJSON file we created. Either way you go, it's not too hard to determine which data originated from Toronto, given specified lat/long co-ordinates.
The data for the map comes from the GeoJSON file we created earlier. We're going to tell d3 to load it and for each ward, we're going to draw a polygon (the "path" node in SVG) by following the path co-ordinates in the GeoJSON file. Here's the code:
Path elements have an attribute, "d", which describes how the path should be drawn. You can read more about that here. So for each ward, we create a path node, set the "d" attribute to contain the lat/long co-ordinates that when connected, form the boundaries of the ward. You'll see we also have custom fill colouring based on the 2010 Mayoral election - that's just an extra touch we added. You can play around with the thickness of the bordering lines. We're also adding a mouseover event to show a little tooltip about the ward in question when the user hovers over it. Time to add the data points. We're going to separate them by date, so we can add a slider later to show how the data changes over time. We're also including sentiment information that Repustate automatically calculates for each piece of social data. Here's an example of how one complete data point would look:
Now to plot these on our map, we again load a JSON file and for each point, we bind a "circle" object, positioned at the correct lat/long co-ordinates. Here's the code:
At this point, if you tried to render the data on screen, you wouldn't see a thing, just a blank white SVG. That's because you need to deal with the concept of map projections. If you think about the map of the world, there's no "true" way to render it on a flat, 2D surface, like your laptop screen or iPad. This has given rise to the concept of projections, which is the process of placing curved data onto a flat plane. There are many ways to do this and you can see a list here of available projections; for our example, we used the Albers projection. Last thing we have to do is scale and transform our SVG and we're done. Here's the code for projecting, scaling and transforming our map:
Those values are not random - they came about from some trial & error. I'm not sure how to do this properly or in algorithmic way.