Category Archives: social digital traces

Companies using social capital data for betting on people’s lives

Flickr photo by idletype

The Wall Street Journal recently noted  how insurance companies (Aviva PLC, Prudential Financial, AIG) bet on whom to insure at what rates through data mining.  Much of the info gleaned from online purchases and other digital traces is more lifestyle: is the insurance applicant an athlete? a TV addict? a hunter?

But some of the information is social capital-related:

Increasingly, some gather online information, including from social-networking sites. Acxiom Corp., one of the biggest data firms, says it acquires a limited amount of “public” information from social-networking sites, helping “our clients to identify active social-media users, their favorite networks, how socially active they are versus the norm, and on what kind of fan pages they participate.”

For insurers and data-sellers alike, the new techniques could open up a regulatory can of worms. The information sold by marketing-database firms is lightly regulated. But using it in the life-insurance application process would “raise questions” about whether the data would be subject to the federal Fair Credit Reporting Act, says Rebecca Kuehn of the Federal Trade Commission’s division of privacy and identity protection. The law’s provisions kick in when “adverse action” is taken against a person, such as a decision to deny insurance or increase rates. The law requires that people be notified of any adverse action and be allowed to dispute the accuracy or completeness of data, according to the FTC.

The article also notes that Celent, an insurance consulting division of Marsh & McLennan, indicates that such online social-network data could be mined for policing fraud and in making pricing decisions: “A life insurer might want to scrutinize an applicant who reports no family history of cancer, but indicates online an affinity with a cancer-research group, says Mike Fitzgerald, a Celent senior analyst.  ‘Whether people actually realize it or not, they are significantly increasing their personal transparency,’ he says. ‘It’s all public, and it’s electronically mineable.’  ”

We’ve written earlier about other life insurers using social capital data in making insurance decisions, but in those cases, the individual was being asked directly about his social and civic involvement.  [See also this blog post about social capital and healthcare.]

We applaud the life insurers for coming to the late realization that social capital data is strongly related to health, but strongly believe they should be more transparent about what they are doing.  Then it wouldn’t violate privacy concerns and it would have the added benefit of making the insured better aware of the positive health impact of being more involved civicly and socially, which might actually induce those who are less engaged to become more so.

See earlier blog post on loss of digital privacy and digital traces left online.

Read “Insurers Test Data Profiles to Identify Risky Clients” (Wall St. Journal, 11/17/2010, by Leslie Scism and Mark Maremount)

Honest signals: our hidden, influential patterns of communication

(photo by shadowplay)

(photo by shadowplay)

Interesting lunchtime talk by Alex (Sandy) Pentland about honest signals sponsored by the Program on Networked Governance program at Harvard’s Kennedy School.

Sandy’s theory is that 50,000-100,000 years ago, humans lacked language, yet still managed to communicate with each other through “honest signals” (ancient primate signaling efforts which developed biologically to communicate our intentions, our trustworthiness, our suitability as a collaborator, whether we were bluffing, etc.). When language was introduced, it didn’t over-write or eliminate these honest signals but evolved to be synergistic with these signals. While we focus much more on language, these signals are measurable (Sandy’s group developed machines to read these signals) and often equally or more effective at predicting various behaviors than language. Sandy’s research aims to shine a light on this powerful channel that we know less about.

Sandy notes that such data from electronic ID badges (sociometers) and specially-programmed smart phones, can give us a “god’s eye” view of how the people in organizations interact, and observe the “rhythms of interaction for everyone in a city”.

What are such behaviors?

Sandy’s group at the MIT Media lab focuses on 4 of them, although there are probably others (laughter, yawning, etc.).

  1. INTEREST, shown by activity. An autonomic response. For example in children, this is evinced by jumping up and down or in dog’s by barking or wagging tail.
  2. ATTENTION, by looking at influence. Evidence of thalmic attention. Sandy observes that people actively following in conversations break in faster than they could with normal attention spans. Shows that they are processing the conversation and discussion as it goes along and predicting the right time to break in.
  3. EMPATHY, as shown by mimicry. This is evinced by mirror neurons, which are observable in infants as young as 3 hours old that can imitate a mother sticking her tongue out. People who evince higher levels of mimicry are seen as more empathic and more trustworthy. For example, they had computerized agents trying to sell an unpopular policy to students; in the cases where the computerized agent mimicked the body movements of the experimental subject with a 4 second delay, the computerized agent was 20% more successful in selling the policy to the experimental subject and the subject was unaware that he/she was being mimicked.
  4. EXPERTISE, as shown by consistency. This a function of the cerebellar motor. We assume that people who can do things more smoothly are more expert because of the number of actions that need to be simultaneously coordinated.

What do these honest signals predict?
These are only some of the examples:
-Computers attentive to these honest signals (and ignoring the content) were as successful in predicting from pitches by entrepreneurs which business plans would be judged by business school students as successful.
– Effective sales pitches: listening to the first few seconds of a telephone sales pitch (without listening to the language) but listening to tone, timing, etc., the computer could predict with 80% accuracy which would be successful calls.:
-Success in speed dating: monitoring the female’s signals predicted 35% of the variation in which couples exchanged their phone numbers, and this was significantly higher than any other factor researchers could find. Interestingly, the men’s signals were not predictive, but somehow men must have been able to subconsciously pick up on the women’s signals, because in almost all cases the men didn’t ask for phone numbers where it wasn’t reciprocated by women.
– They also found that honest signals predicted depression, predicted who was likely to be successful in negotiating for a pay raise, job interviews, who was bluffing at poker, etc.
Successful individual-level traits: they found that the most successful folks with these “honest signals” were ones who were high in activity, high in influence (others were more likely to mirror their communication styles then they were likely to mirror others’) high in “variable prosody” (their pitch varied and they sounded open to ideas), and high in body language dominance (i.e., they were more likely to directly face another person and others were more likely to not face them square on).  They were often far more successful in these “honest signals” than they were aware of.

Organizational effectiveness

Sandy notes that unlike an MRI, one can hook up an entire organization to these sociometers and absorb micro-second by micro-second, and the results are highly predictive. But the challenge is that while the people who exhibit these highly successful individual traits are useful to organizations, they are usually in “connector” roles for organizations, with star-shaped patterns of communication, where ideas flow through these individuals. While this speeds up the decision-making process, it actually impairs the brainstorming process. Sandy’s group is experimenting with devices to see if making participants aware of the dynamics of a team can influence their behavior in a positive manner.  They have shown with some experiments (Japanese-American teams designing Rube-Goldberg-type projects, and distance teams) that it can change people’s behaviors in a positive manner. The challenge will be to see if the group’s behavior can be more connected at the brainstorming phase and more “star-shaped” at the decision-making stage.

Sandy noted that they have been able to extract many properties of the social networks using smart phones: from a combination of where people are (GPS), when, and communication flows (who they talk to and when). He noted some interesting experiments to observe the flow of nurses in a nursing ward, or the flow of taxis in San Francisco, or communication (e-mail and face-to-face) between departments in a German bank. They are now at the stage of trying to get whole dormitories or parts of the city of Boston using these smart phones to try to track social networks and patterns in these data. (I’ve written about digital traces before.)

How could these flows of people be used:

Traffic: one could monitor, for example, delivery vans coursing through the road networks and by observing flows slower than typical, spot emerging traffic problems.

Urban tribes: Sandy noted that by monitoring flows of taxis, you can distill separate patterns of interconnected places. In other words people who live in this neighborhood, work in this area, go to these restaurants, go to these nightclubs. (You are not actually monitoring individual people but patterns of association.  This is equivalent to Netflix telling you that people who like “The Firm” also like “Michael Clayton”.) Or one can even find sub-patterns in a neighborhood:e.g., locations from which people regularly are returning from nightclubs at 3 or 4 AM.

-You can then use these patterns to “find people like me”: based on your own patterns (where you work, where you live, etc.), the system could tell you where many people in your neighborhood shop, go to dinner, or hear music.

Lending: one major bank told Sandy that credit scores are not very good (except at the high end) in predicting repayment rates on loans. Banks would love to use behavioral information (who is at nightclubs late at night, who goes to work early) to predict repayment rates.

Health insurance: similarly one could imagine rates tied to activity levels (who was jogging or getting enough sleep or…)

Germs: they want to use these devices to watch the spread of germs through social networks.

Privacy issues

The above examples of health insurance and lending make one understand why there are clear privacy implications. Do we want banks or health insurers knowing what we are doing (going to nightclubs) to set our rates? Will this be used to impose behavioral bases for “red lining”, where people in certain areas (like the old red lined areas) don’t get loans because of some behavior of theirs that is correlated with low repayment rates? Does it make any difference if these people can supposedly change their behavior?
-Sandy thinks we should move from company owning the personal data and sharing with no one or only sharing if an individual didn’t say it was confidential to the person owning the data and being able to decide how it gets used and whether the owner gets compensated for such use.
-There are clearly issues here about how the decision is framed? Does the individual truly understand why certain marginal information is so useful to a bank or insurer? And there may be negative externalities for all, even if you don’t choose to share your information with these companies?

Sandy’s research also raises questions about what happens when you start incentivizing people in companies based on these behaviors, or you start teaching people about these hidden “honest signals”. Do people start learning how to display these honest signals and dupe people who are not as aware of this (e.g., mimicking others to increase sales or do better in negotiations). If so, do people start focusing on these behaviors (like mimicry) and consciously teach themselves not to be swayed by this? Do companies find that people who pretend to be connectors (to get a pay raise) are actually less valuable to companies than the people who do it naturally (and are unaware they are doing this)?

See previews of Sandy’s book Honest Signals here.

Buy Alex (Sandy) Pentland’s Honest Signals here.

See interesting related story in NYT, “You’re Leaving a Digital Trail. Should You Care?” (John Markoff, 11/30/08), mentioning Alex Pentland’s work among others and discussing the SF taxi example.

Surveiling ourselves

There’s an interesting article in the Utne Reader describing how citizens unwittingly reveal lots of information about themselves, in Invading Our Own Privacy. Tell-all blogs, digital surveillance, online profiling: Who needs Big Brother? (May/June 2007,  David Schimke)

The article points out that “On February 22, ClickZ.com reported that Fox Interactive Media, a division of Rupert Murdoch’s News Corp., which owns MySpace, had hired a high-tech ad firm to mine user profiles, blog posts, and bulletins to ‘allow for highly refined audience segmentation and contextual microtargeting . . . which might put it in more direct competition with the likes of Yahoo, AOL, and MSN.'”

The article also mentions a Chronicle of Higher Education (Jan. 12, 2007) piece that notes that “two professors at Drake University’s law school, worried that their students’ casual approach to digital correspondence could hinder their careers, started a class stressing online discretion. The lesson, according to one student, is simple: ‘If you are not comfortable with shouting your comments from a street corner, you probably shouldn’t convey them via electronic print.'”

Finally, the article also refers to a New York article “Say Anything” (2/21/07) on the digital exhibitionism of youth today, willing to reveal lots of personal information about themselves on blogs, through e-mails, etc.

Life in the Network (II)

This is a postscript to the May 7th post about David Lazer’s quite interesting talk on using digital traces to uncover human behavior..  [Read that post first.] The presentation of David Lazer I mentioned in the earlier post is available here.  Videos of the presentation are available in two parts:  part 1 and part 2.

There is also an interesting  post by Ben Waber on the Kennedy School of Government Complexity blog on the instrumentation of human behavior-trying to discern human behavior like friendship from their proximity and call logs.

Also, The Economist in their April 28, 2007 edition has a special report on telecoms.  And in one story called “The Hidden Revolution” (p. 58) they highlight that a patented technique of American Express enables the use of RFID chips to track the flow of people in public places from the RFID tags in their clothing and carried products.  The Economist notes that they have “agreed not to use it without disclosing the fact, after pressure from privacy advocates.”  But already the article notes that “Prisons in America are experimenting with bracelets that have wireless chips embedded in them to keep track of inmates….Guards are also tagged, so prisoners may feel safer from abuse.”  They note that the new wireless communication will be virtually invisible to humans and the only sure bet is that how it will be used will surprise us.”

Life In The Network: Possibilities of dynamically monitoring social networks

Fascinating talk by David Lazer at the Kennedy School of Government.  He talked about how new computational power, and new digital traces of our comings/goings, communication, etc. is creating and can create vast data troves even on a minute-by-minute basis that can be mined to understand the structure of social networks at an individual or collective level (organization, town, etc.). 

 The new data get over some earlier limitations in the data: 1) they are much larger scale than one could capture with survey data — they can be millions of observations not thousands; 2) they are dynamic so one can see how networks evolve over time; 3) they overcome response problems with survey (no problem of memory recall and much reduced response rate bias from who responds to surveys).  These new data enable us to understand understudied problems and properties of networks and get a better handle on inferential conclusions.

 He highlighted 4 examples of such computational studies: 1) call log analysis; 2) instrumentation of human behavior; 3) natural language processing; and 4) virtual worlds.

Call Log Analysis:  this study is summarized in National Academy of Sciences Proceeding paper with 4 international co-authors.  They analyzed cellphone call log information over a 9 month period of a medium sized European country cellphone provider with 7 million users and 49 trillion directed dyadic relationships.  [Picture of the network here.]  Analysis of these data showed that it did exhibit scale-free, power law properties.  It did not show “6 degrees of separation“; some nodes in the network were actually 13 degrees of separation away.  And the network did not quite show the strength of weak ties;  it looked more like dirt roads (infrequent communication) connected the hubs and then superhighways (very frequent communication) connected within the local clusters.  Much as a road system constructed in this way would not facilitate the quick dissemination materials across the network, the structure of the social networks was suboptimal for information dissemination.  They found in the cellphone network data that it was actually the moderate-frequency ties (rather than weak ties) that most enabled information to be shared (since the weak ties were too infrequent to get the information out.)  [Lazer admitted that the cellphone log data said very little about the content of these relationships.  A handyman might look like a hub of the network because he used a cellphone for his business and everyone with problems contacted him; or it doesn’t differentiate a short actual call from a wrong number, etc.]

A second study “Revealing Social Relationships Using Contextual Proximity and Communication Data” (Lazer, with Nathan Eagle and Sandy Pentland) monitored about 100 MIT students over 9 months using call log data, locational proximity (bluetooth monitors that detected what other subjects of the study they were near and when) and self reports on proximity, friends and satisfaction.   They found a substantial recency effect (subjects overweight who they’ve been near in the last 5 minutes rather over a longer-term basis) — this suggests why background always-on monitors produce more reliable data.  The subjects remembered proximity with reciprocal non-friends with 99.5% accuracy, but on reciprocal friends were only 35% accurate at reporting non-proximity (since one’s mind infers that you must have been proximate the friend even if you weren’t).    Interestingly, the researchers were able to predict reciprocal non-friends and reciprocal friendships (just using phone call log data and proximity) with 95% accuracy.    [They used elements like the frequency of phone calls, the proximity of A and B at home, the proximity of A and B at work, the proximity of A and B outside of work, the proximity of A and B on Sat. night, etc. ] And in some ways the call log and proximity data better captured the nuanced element of our social ties.  For example, there is strong literature that shows the relation between social ties and life satisfaction.  Interestingly, friendships inferred from the call log and proximity data better predicted life satisfaction than self-reported friendship data.  [David Lazer has also very recently used sociometers, devices developed by the MIT Media Lab and hung like a badge around one’s neck, that track things like the proximity of A and B, whether A or B is speaking and in what tone and modulation; whether A is facing B; the movements of A and B (standing, sitting, walking, etc.).  Kennedy School of Government students wore these sociometers during the Spring public policy exercise and the data will be analyzed with Nancy Katz to determine the effectiveness of teams.]

A third study was on natural language analysis. Lazer was involved in a study analyzing the content of Congressional representatives’ webpages. [Some of this project is summarized here.] These pages are all trying to do a similar thing: communicate to the constituents in a strategic way the representatives’ positions on various issue that the representative thinks it is advantageous to emphasize. For the moment, they have merely tried to analyze the content of certain words or phrases and found using natural language processors that the presence of certain phrases on a House Members’ website predict his/her party affiliation (more uses of “terror” in late 2001 was the best predictor of Republican affiliation and more uses of “Iraq” in 2006 was the best predictor of Democratic affiliation). In the future, they would like to use this to try to determine the dynamic evolution of these words or how they disseminate or what the social mapping is — what words are how closely affiliated with what other words.

The final project that Lazer highlighted is natural experiments.  So in his Connecting To Congress project with Kevin Esterling, Curt Ziniel and and Michael Neblo, they conducted 20 deliberative on-line sessions with Congressional members and their constituents.  Subjects engaged in pre-test and post-test surveys, follow-up surveys after the election to see how they voted, and demographics were gathered for all on-line participants.  They could manipulate the features of this on-line discussion.  They are analyzing these data but they did find that participation in these sessions for constituents was most frequent when they generally knew a lot about politics but not the specific topic of the on-line discussion.  And they found that participation in the session had a big impact on a favorable view of their representative and their level of general political participation.

In summary, Lazer predicted that these technologies (and others like it) will have a quantum leap in orders of magnitude in what’s known about human social behavior.  And the use of these data is likely to increase in an increasingly digitized environment and with increasing computational power to analyze such enormous datasets.  Lazer thinks that sociologic academics have lived in a Flatland and we are just emerging to see new dimensions beyond the squares and triangles we have observed all our lives.  We don’t yet know what the new paradigms will be and how to effectively use the new dimension.   But these technologies may permit us to observe properties like the evolution of social networks, or dynamically observe what predicts the spread of avian flu or a cold, or to see how an intervention or policy in real time changes social interaction in the way envisioned or an unexpected way.

Finally, Lazer cautioned that there are some clear obstacles. 1) overcoming academic silos — social scientists and scientists and computer scientists are not used to collaborating but these data will require cross-silo collaboration; 2) we will need new infrastructures to gather and analyze these data; 3) there are substantial human subjects and privacy issues — these data are most interesting the more one knows about the demographics of social network members and the content of their communications, but the more one knows about these, the less possible it is to protect the anonymity of people within these networks; 4) much of the data gathered is held by the government or private companies so there is much to be worked out about whether these data will be shared and under what conditions that don’t violate privacy issues or give up corporation competitively prized information.  Lazer thinks that this will require shifted paradigms, but we don’t yet know what those paradigms will be.

Some questions that were generated included asking whether if we could effectively predict friendship using such information, could we predict things like power or influence in a network?

I said it reminded me of the early days of ‘artificial intelligence.’  There was much promise of what machines might be able to accomplish, but a sense of how crude the instruments were relative to the nuances of human thinking.  Similarly here, while the network data is stunningly large, it is also blunt and simplistic, so depending on the data you may not be able to tell whether two people are talking, or what the content, or emotional level of the exchange is, or body language, or….    It may be that these things change over time and improve.

Nevertheless, I think the new data is quite interesting for understanding some of the dynamics of social networks that have always been studied in a static form (like looking at a one-time snapshot of a social network).  For example, how do hubs form?  Are the people who start to become hubs hubs because of power or extroversion?  Are hubs more likely to initiate new ties with new friends or are those ties most likely to be initiated by others who want to be friends with that “hub.”  Do hubs do more to strengthen weak ties than others?    Or many have commented that dyads tend to close into triads, in other words if A knows B and A knows C, B and C are more likely to become friends over time.  Such dynamic networks might help explain how this happens (is it proximity, or shared interests, does A tend to close the loop or do B and C, what factors distinguish under what conditions dyads are likely to close into triads)?  There are lots of other similarly interesting questions to this.

Moreover, such data could be valuable at the individual or organizational level to see if one could consciously strengthen a network, similar to what some call netweaving.  One could imagine that analyzing the structure of an organization’s networks could be really valuable at understanding that there need to be more links between cluster A and cluster B (where clusters might be defined by race or office location or functional group within an organization or age or educational background).  And an organization might consciously try certain interventions to graft these ties through the structuring of work groups or social events or office location or… and then could monitor how effective this was at building and sustaining a link and increasing flows of information across these sociologic silos.

If you knew the races/ethnicities of folks in the network it would be interesting to understand whether building bridging links across race (or it could be across other dimensions) helps increase the number of bridges between two clusters.  In other words assume A1 is in cluster A (largely composed of people like A) and forms a tie with B1 in cluster B (largely composed of people like B) or is encouraged to form a tie.  Does this increasingly make it more likely that others in cluster A will form ties with others in cluster B and under what conditions.

Anyway, you get a flavor of the types of interesting questions raised by this talk.