Geolocation remains an opt-in feature on Twitter, which has rendered it relatively unpopular with users -- most statistics suggest between 1% and 2% of users actually choose to provide their exact coordinates. This enormous limitation therefore makes such data largely ungeneralizable to Twitter users more broadly (see, for example, here and here) but that has not diminished the appeal of the idea of getting to understand which users are tweeting what from where.
My graduate class in the Emerging Media Studies program at Boston University recently took up this task while fully understanding what were producing was, at best, only a fraction of all activity on any given topic. Since I have not been able to find a decent tutorial on how to do this elsewhere, I thought I'd post mine here. Please feel free to comment or email me with feedback.
First, we need some geolocative data. For this example, I'll use a sample Excel dataset (in .csv format) of 1,791 tweets about immigration that I've collected and exported of the BU-TCAT. If you would like to access the TCAT and about 140 million tweets on a variety of topics, it is free and you can get a user and pass combo in about 30 seconds here. I will be happy to add custom search terms upon request if you don't find something that suits your interests, just let me know.
Back to the tutorial:
To begin, we will create our nodes file for importing to gephi. To do so, start by deleting all columns from the datafile except:
‘from_user_name’ ‘text’ ‘lang’ ‘to_user_name’ ‘location’ ‘lat’ and ‘lng'
This next step is important – you will break your nodes file if done improperly - and here we have to rename
‘from_user_name’ to ‘id’ and then copy that column and name it ‘label’
Save this file with a name of your choice, here, we will save as ‘imm_nodes.csv’
At this point, we have finished making our data suitable to import into Gephi to visualize our geolocative tweets. That is good. Before going further, make sure you have installed both the 'Map of Countries' and 'GeoLayout' plugins in order to see where your tweets actually are coming from in the world. Both are free and available by going under the Tools -- > Plugins --> Available Plugins menu of Gephi.
Once installed, reopen Gephi as necessary and start by going to
File --> New Project
Then run the 'Map of Countries' using 'Layout' in the Overview tab of Gephi. Once done, you should see an empty general map of the world, like this:
Now, click on the Data Laboratory tab in Gephi and follow these steps:
Import Spreadsheet --> imm_nodes.csv (import as ‘Nodes table’)
Leave ‘Force nodes to be created as new ones’ checked. Once you have imported your nodes, go back to the Overview tab in Gephi. You should see a box of nodes more or less hovering over Atlantic Ocean, Africa, or Europe. Not to worry.
Run the 'Geo Layout' spatialization using the layout menu, be sure here to set
'Latitude' as 'lat' and 'Longitude' as 'lng'
Once run, all the nodes should have a proper geolocative home, as below.
Of course, at this point, it is clear a few things are missing, namely edges and color. While we can deal easily enough with color, we will have to save the adding of edges for the next tutorial.
To add color, in this case by language of tweets, in the Overview tab of Gephi, go to
Filters --> Attributes --> Partition --> background_map (Node)
Select 'null' and Filter. This will allow the nodes to have color added without adding color to the nodes of the background map. To add color the nodes, in the Overview tab of Gephi, go to Partition in the upper left of your screen, not under the Filters menu, make sure 'Nodes' is highlighted and select
Partition --> Refresh --> lang
Once you click Apply, you can see the language that users (nodes) identified in their profile, and this gives some sense of not only where but in which language users are tweeting about immigration around the world.
Go back and turn off the background map Filter and you should see something like this.
That is it -- welcome to the wonderful world of geolocation :) Future posts will address adding edges, as well as making graphs dynamic and interactive for the web.
Let me know questions or issues @jgroshek or jgroshek '@' bu.edu
Hope it is helpful! Thanks!