As Chris points out, most data that academics use are high quality. Researchers spend a lot of time working on the sampling frame to ensure that respondents are randomly selected and take great pains to ensure that the responses are accurate.
Dan O’Brien and Chris Winship have done pioneering work using non-academic, noisy, but very large scale, free data (since the city of Boston already collects these data incidentally) to try to understand Boston neighborhood-level ecometrics and engagement, or even down to the level of Census block groups.
O’Brien and Winship’s effort fall somewhere between traditional academic research and even more frequent but noisier data like Twitter feeds or cellphone logs (from which one can also extract interesting data on friendships or other features of social capital — see this earlier blog post of mine and a recent paper by David Lazer on communication patterns in the wake of the Boston Marathon bombing.]
O’Brien and Winship use the public hotline CRM (“Constituent Relationship Management) dataset that individuals in Boston use to report problems to the government: e.g., requesting a snow plow, or reporting a pothole needing fixing or graffiti or a broken streetlight. There were 365,000 calls in 16 months from 2010 through 2011, and users could also report data using a smartphone app called CitizensConnect.
A cool video displaying a year’s worth of “calls” to the CRM hotline by type of call is here:
They first defined CRM categories that matched physical disorder of neighborhoods (selecting 34 of 178 possible CRM categories), like trash issues (improper or overflowing dumpster) or graffiti, or illegal occupancy of mice infestation… They then sorted these 34 categories by which items clustered together and found 5 factors (housing issues, uncivil use of space, big building complaints, graffiti or trash). They then validated these data by controlling for the engagement of neighborhood residents and their concern for the public space.
To determine engagement they measured things like the percentage of an area that were registered with CRM or the percentage who reported a common event (like requesting a snow plow during a heavy storm).
They could then calibrate an adjusted measure (the raw volume of “calls” on CRM adjusted for higher or lower levels of engagement) and compare this with actual audits by neighborhood of real problems (street garbage or street lighting outages). From this comparison, they have now constructed a model to predict the level of “broken windows” in an area.
They have discovered that CRM data is reliable and they can reliably estimate local civic physical disorder at the Census tract every 2 months from these data.
In the future, they want to use these CRM data and lots of other data that they’ve added like building permits, Census data, 311 calls, etc. to get a better sense of what factors predict which, how neighborhoods are changing, emerging problems or issues, etc. For example, they can see houses or locations that are key places for private disputes (since these turn out to be a downstream trigger for crime, more so than broken windows).
Some interesting takeaways or tidbits from their talk: the CRM model assumed that citizens would rove around the city reporting problems via CRM. It turns out that almost no one did that other than Mayor Menino (who sometimes faked an Italian accent in reporting them, and reported over 1000 incidents in a given year!). 76% of users reported only one problem, and of those reporting multiple problems, most did not specialize in the type of issue reported but 87% of users reporting 2+ times were confined to two blocks.
Because of this hyper-local CRM reporting, they tried an experiment: leafletting homes in an area with two versions of the flyer (pictured above). They found that flyers that asked people to report potholes to help improve Boston had no impact on increasing the response rate, but flyers that asked people to report potholes to improve ___ (filled in with their neighborhood name like Dudley Square or Eagle Hill, did).
Renters were only a quarter as likely to report issues as homeownwers (directionally similar to the general national data on civic engagement although I believe a far stronger effect.) And their measure of engagement (what they called “custodianship”) has the advantage that it is actually tracking their behavior rather than measuring people’s recollections of civic behavior (which could be overstated if people think of themselves as good citizens or under-reported if people don’t remember the times they were involved) and which is a far better measure than people’s attitudes. [They define custodianship as "Any action that seeks to counteract the physical deterioration or degradation of the public space, either by addressing extant issues, or preventing future ones."]
[They have not yet done the work to see if their measure of "custodianship" tracks and is strongly related to other measures of civic engagement, like localized voting rates in local elections, or the percentage of residents who fill out Census forms.]
In CRM, People reporting problems get emails when the problems are fixed and apparently the response is very neutral and technocratic and not based at all on whether it was an area that supported Menino or was wealthier. It is unclear whether that will be equally true under Mayor Walsh, given that his political support much more comes from the eastern part of Boston, except for East Boston. [WBUR picture here.] They have not done studies on whether different messages cause people to have higher collective efficacy (a sense that they can make a change) or report city problems more in the future, but they are doing some studies on the former.
They have also introduced a smartphone app StreetCred (a bit like FourSquare) that enables people to check in when they attend a community meeting or report a community problem and build a higher score by doing certain civic actions., especially by reporting problems further than 1 block from their residence (to encourage people to take civic stewardship of their city more broadly). They have not yet been able to measure its effectiveness. [See "Designing Citizen Relationship Management Systems to Cultivate Good Civic Habits" by Jesse Baldwin-Philippi and Eric Gordon (Engagement Game Lab, Emerson College) on ways to make programs like CRM spur greater civic engagement.]
Read a working paper “Ecometrics in the Age of Big Data” by O’Brien, Rob Sampson, and Winship validating that the CRM data could be used in this way.
[Also, there is a nice geographic visualization of money in politics in Boston showing organizational and individual influence on candidates over 2012 called "MoneyBombs" here.]