I had a very interesting discussion yesterday with a friend of mine who stopped using his Foursquare application, mainly because of the "privacy" concerns. He said that while he didn't mind getting all those discounts and special deals, he did not feel comfortable sharing his location with friends and acquantainces on Facebook and Foursquare, and a Big Brother. I suggested that he could opt for not sharing his particular "check-ins" with others by simply toggling a switch on his Foursquare app, and also pointed out that the reason that he is getting all those special offers and discounts is because venue owners and marketers expect him to share his visits to those venues, thus spreading the word. We then talked about how if he wanted to go "off the grid" completely, he would have to get rid of cell phones, stop using his credit cards, and switch from montly MetroCards to single ride passes (paid for by cash). This last point made me think about what kind of data is collected on metro cards and how these data could be possibly used by MTA, city planners and marketers. So I am started looking into that and can already share with you some of my findings so far:
Most of the data, including raw swipes and fare data at each turnstile are available on developers' site here
This fascinating report by hackers' magazine 2600 on attempts to decode MetroCards data is certainly worth reading!
These awesome interactive graphs by Wall Street Journal dissect different aspects of Metro Card usage at Census block level, and and here is even more interesting blog post by the guy who actually analyzed the data and created these graphs
I just downloaded some of the turnstile usage data and will try to play with it later this week. Considering the available data points, these are the potential insights that could be extracted from the data:
- by looking at the turnstile data, one could calculate the average number of riders entering and leaving the station every 4 hours; it would also interesting to see if these numbers differ across the weekdays (e.g., more riders come on Mondays, etc.);
- by looking at the turnstile data, one could see how usage of different types of cards differ across the weeekdays and stations (e.g., if usage of certain type of carda peaks on certain day of the week; stations with the highest number of student or disability cards, etc.)
- by combining swipes and fare data with Census block data, one could see how usage is correlated with the neighborhood characteristics (e.g., Albert Sun looked at the correlation between the median HH income and usage of the unlimited 30-day metrocards).