Fascinating talk by David Lazer at the Kennedy School of Government. He talked about how new computational power, and new digital traces of our comings/goings, communication, etc. is creating and can create vast data troves even on a minute-by-minute basis that can be mined to understand the structure of social networks at an individual or collective level (organization, town, etc.).
The new data get over some earlier limitations in the data: 1) they are much larger scale than one could capture with survey data — they can be millions of observations not thousands; 2) they are dynamic so one can see how networks evolve over time; 3) they overcome response problems with survey (no problem of memory recall and much reduced response rate bias from who responds to surveys). These new data enable us to understand understudied problems and properties of networks and get a better handle on inferential conclusions.
He highlighted 4 examples of such computational studies: 1) call log analysis; 2) instrumentation of human behavior; 3) natural language processing; and 4) virtual worlds.
Call Log Analysis: this study is summarized in National Academy of Sciences Proceeding paper with 4 international co-authors. They analyzed cellphone call log information over a 9 month period of a medium sized European country cellphone provider with 7 million users and 49 trillion directed dyadic relationships. [Picture of the network here.] Analysis of these data showed that it did exhibit scale-free, power law properties. It did not show “6 degrees of separation“; some nodes in the network were actually 13 degrees of separation away. And the network did not quite show the strength of weak ties; it looked more like dirt roads (infrequent communication) connected the hubs and then superhighways (very frequent communication) connected within the local clusters. Much as a road system constructed in this way would not facilitate the quick dissemination materials across the network, the structure of the social networks was suboptimal for information dissemination. They found in the cellphone network data that it was actually the moderate-frequency ties (rather than weak ties) that most enabled information to be shared (since the weak ties were too infrequent to get the information out.) [Lazer admitted that the cellphone log data said very little about the content of these relationships. A handyman might look like a hub of the network because he used a cellphone for his business and everyone with problems contacted him; or it doesn’t differentiate a short actual call from a wrong number, etc.]
A second study “Revealing Social Relationships Using Contextual Proximity and Communication Data” (Lazer, with Nathan Eagle and Sandy Pentland) monitored about 100 MIT students over 9 months using call log data, locational proximity (bluetooth monitors that detected what other subjects of the study they were near and when) and self reports on proximity, friends and satisfaction. They found a substantial recency effect (subjects overweight who they’ve been near in the last 5 minutes rather over a longer-term basis) — this suggests why background always-on monitors produce more reliable data. The subjects remembered proximity with reciprocal non-friends with 99.5% accuracy, but on reciprocal friends were only 35% accurate at reporting non-proximity (since one’s mind infers that you must have been proximate the friend even if you weren’t). Interestingly, the researchers were able to predict reciprocal non-friends and reciprocal friendships (just using phone call log data and proximity) with 95% accuracy. [They used elements like the frequency of phone calls, the proximity of A and B at home, the proximity of A and B at work, the proximity of A and B outside of work, the proximity of A and B on Sat. night, etc. ] And in some ways the call log and proximity data better captured the nuanced element of our social ties. For example, there is strong literature that shows the relation between social ties and life satisfaction. Interestingly, friendships inferred from the call log and proximity data better predicted life satisfaction than self-reported friendship data. [David Lazer has also very recently used sociometers, devices developed by the MIT Media Lab and hung like a badge around one’s neck, that track things like the proximity of A and B, whether A or B is speaking and in what tone and modulation; whether A is facing B; the movements of A and B (standing, sitting, walking, etc.). Kennedy School of Government students wore these sociometers during the Spring public policy exercise and the data will be analyzed with Nancy Katz to determine the effectiveness of teams.]
A third study was on natural language analysis. Lazer was involved in a study analyzing the content of Congressional representatives’ webpages. [Some of this project is summarized here.] These pages are all trying to do a similar thing: communicate to the constituents in a strategic way the representatives’ positions on various issue that the representative thinks it is advantageous to emphasize. For the moment, they have merely tried to analyze the content of certain words or phrases and found using natural language processors that the presence of certain phrases on a House Members’ website predict his/her party affiliation (more uses of “terror” in late 2001 was the best predictor of Republican affiliation and more uses of “Iraq” in 2006 was the best predictor of Democratic affiliation). In the future, they would like to use this to try to determine the dynamic evolution of these words or how they disseminate or what the social mapping is — what words are how closely affiliated with what other words.
The final project that Lazer highlighted is natural experiments. So in his Connecting To Congress project with Kevin Esterling, Curt Ziniel and and Michael Neblo, they conducted 20 deliberative on-line sessions with Congressional members and their constituents. Subjects engaged in pre-test and post-test surveys, follow-up surveys after the election to see how they voted, and demographics were gathered for all on-line participants. They could manipulate the features of this on-line discussion. They are analyzing these data but they did find that participation in these sessions for constituents was most frequent when they generally knew a lot about politics but not the specific topic of the on-line discussion. And they found that participation in the session had a big impact on a favorable view of their representative and their level of general political participation.
In summary, Lazer predicted that these technologies (and others like it) will have a quantum leap in orders of magnitude in what’s known about human social behavior. And the use of these data is likely to increase in an increasingly digitized environment and with increasing computational power to analyze such enormous datasets. Lazer thinks that sociologic academics have lived in a Flatland and we are just emerging to see new dimensions beyond the squares and triangles we have observed all our lives. We don’t yet know what the new paradigms will be and how to effectively use the new dimension. But these technologies may permit us to observe properties like the evolution of social networks, or dynamically observe what predicts the spread of avian flu or a cold, or to see how an intervention or policy in real time changes social interaction in the way envisioned or an unexpected way.
Finally, Lazer cautioned that there are some clear obstacles. 1) overcoming academic silos — social scientists and scientists and computer scientists are not used to collaborating but these data will require cross-silo collaboration; 2) we will need new infrastructures to gather and analyze these data; 3) there are substantial human subjects and privacy issues — these data are most interesting the more one knows about the demographics of social network members and the content of their communications, but the more one knows about these, the less possible it is to protect the anonymity of people within these networks; 4) much of the data gathered is held by the government or private companies so there is much to be worked out about whether these data will be shared and under what conditions that don’t violate privacy issues or give up corporation competitively prized information. Lazer thinks that this will require shifted paradigms, but we don’t yet know what those paradigms will be.
Some questions that were generated included asking whether if we could effectively predict friendship using such information, could we predict things like power or influence in a network?
I said it reminded me of the early days of ‘artificial intelligence.’ There was much promise of what machines might be able to accomplish, but a sense of how crude the instruments were relative to the nuances of human thinking. Similarly here, while the network data is stunningly large, it is also blunt and simplistic, so depending on the data you may not be able to tell whether two people are talking, or what the content, or emotional level of the exchange is, or body language, or…. It may be that these things change over time and improve.
Nevertheless, I think the new data is quite interesting for understanding some of the dynamics of social networks that have always been studied in a static form (like looking at a one-time snapshot of a social network). For example, how do hubs form? Are the people who start to become hubs hubs because of power or extroversion? Are hubs more likely to initiate new ties with new friends or are those ties most likely to be initiated by others who want to be friends with that “hub.” Do hubs do more to strengthen weak ties than others? Or many have commented that dyads tend to close into triads, in other words if A knows B and A knows C, B and C are more likely to become friends over time. Such dynamic networks might help explain how this happens (is it proximity, or shared interests, does A tend to close the loop or do B and C, what factors distinguish under what conditions dyads are likely to close into triads)? There are lots of other similarly interesting questions to this.
Moreover, such data could be valuable at the individual or organizational level to see if one could consciously strengthen a network, similar to what some call netweaving. One could imagine that analyzing the structure of an organization’s networks could be really valuable at understanding that there need to be more links between cluster A and cluster B (where clusters might be defined by race or office location or functional group within an organization or age or educational background). And an organization might consciously try certain interventions to graft these ties through the structuring of work groups or social events or office location or… and then could monitor how effective this was at building and sustaining a link and increasing flows of information across these sociologic silos.
If you knew the races/ethnicities of folks in the network it would be interesting to understand whether building bridging links across race (or it could be across other dimensions) helps increase the number of bridges between two clusters. In other words assume A1 is in cluster A (largely composed of people like A) and forms a tie with B1 in cluster B (largely composed of people like B) or is encouraged to form a tie. Does this increasingly make it more likely that others in cluster A will form ties with others in cluster B and under what conditions.
Anyway, you get a flavor of the types of interesting questions raised by this talk.