Category Archives: google

Dramatic growth in social capital scholarship

The attached graph from Google Lab’s Beta “Books Ngram Viewer” lets you chart the mention of various words over time.  It’s quite fascinating and thought-provoking.

One interesting comparison is the rise over time in the discussion of “human capital” vs. “social capital” as depicted in the following chart from 1900-2008 (blue is “human capital”, red is “social capital”.  [The data comes from the 5.2 million books that Google has digitized as part of their book project.]

Basically it took “social capital” about a quarter of the amount of time to become as dominant a concept in academic as human capital.  [Scholarly attention to social capital from 1993-2003, 10 years, advanced it to the point that it took “human capital” 40 years to achieve, from 1963-2003.

If you want to see a better image of the graph, click here.

Google Labs Books Ngram Viewer

Internet showing you what they think you want, not what you need (UPDATED)

Flickr photo by antoonsfoobar

I recently saw an interesting TED talk by Eli Pariser on the next wave of cyberbalkanization.  [Read his fascinating new book “The Filter Bubble” here.]

Background: Marshall Van Alstyne predicted 15 years earlier that users would self-segregate on the net and choose to get exposed to ever more narrow communities of interest.

We’re now onto the “The Daily Me” 2.0.  Some news sites originally let users click on their interests a user could limit his/her news to say sports and entertainment news.  Cass Sunstein and Nicholas Negroponte predicted that it would lead to stronger news blinders and expose us to less and less common information, what they called “The Daily Me”.

Well, it turns out that users actually choose to subject themselves to more diversity in opinions and networks on the net than people predicted.

But the latest onslaught, what Eli Pariser calls “The Filter Bubble”, is more invidious.  More and more user sites (Facebook, Google Search, Yahoo News, Huffington Post, the Washington Post) now automatically tailor your stream of results, facebook feed, and news feed based on your past clicks, where you are sitting, what type of computer you use, what web browser you use, etc.

Unlike in the past, this is not “opt in” cyberbalkanization but automatic.  And since it happens behind-the-scenes, you can’t know what you’re not seeing.  One’s search of Tunisia on Google might not even tell you about the political uprising if you haven’t expressed interest in politics in the past.  Eric Schmidt of Google said “It will be very hard for people to watch or consume something that has not in some sense been tailored for them.”

Pariser notes that we all have internal battles between our aspirational selves (who want greater diversity) and our current selves (who often want something easy to consume).  In most of our lives or Netflix queues we continually play out these battles with sometimes our aspirational selves winning out.  These filter bubbles edit out our aspirational selves when we need a mix of vegetables and dessert.  Pariser believes that the algorithmic gatekeepers need to show us things that are not only junk food but also things that are challenging, important and uncomfortable and present competing points of view. We need Internet ethics in the way that journalistic ethics were introduced in 1915 with transparency and a sense of civic responsibility and room for user control.

It’s an interesting talk and I clearly agree with Pariser that gatekeepers should be more transparent and allow user input to tweak our ratio of dessert to vegetables, to use his analogy.  But I think Pariser, in forecasting the degree of our Filter Bubble, misses out the fact that there are other sources of finding about news articles. Take Twitter retweets.  Even if my friends are not that diverse — and many of us will choose to “follow” people we don’t agree with — as long as one of the people I’m following has diverse views in his/her circle of followers and retweets their interesting posts, I get exposed to them.  Ditto with e-mail alerts by friends of interesting articles or social searches using Google.  We live in far more of a social world where information leads come from many other sources than Google searches or Yahoo News.  So let’s work on the automatic filters, but the sky is not falling just yet.

See “The Filter Bubble.” (Feb. 2011 TED talk)

Behind MIT’s DARPA Weather Balloon challenge win

MIT Red Balloon Challenge Team

As many of you know, the MIT Red Balloon Challenge Team (part of the MIT Media Lab) won the race to locate the DARPA Red Weather Balloon Challenge.    Their system was a reverse Ponzi scheme where those finding the balloon got $2000, and those progressively farther back the invite chain in finding those people got progressively lower payouts;  the surplus got donated to charity.  (Because the payoffs were cut in 1/2 with every additional degree of separation from the balloon finder, there is no way that MIT could owe more than $4000 per balloon, even if path links to MIT were very long, and MIT assumed that many of the path lengths would be short.)

MIT team members reported that they sent out 2 million SMS messages as one of their strategies but that was a complete bust as far as finding the balloons.  Twitter and Facebook on the other hand were far more effective.  They are going to be subsequently distilling their findings on effective viral communication and sharing it at an appropriate venue.

Their victory was a victory of human connections (“social capital“) over number crunching.  A Google Team was racing them using number crunching and image recognition techniques (e.g., crawling the web in real time for images of red balloons) and had spotted 9 of the 10 balloons when the MIT Team found 10.  The MIT Team noted that the balloon finders were using Google Map to determine the coordinates of their balloon sighting (to report to the MIT team) and Google could have captured that information and used it for their own proprietary team but didn’t.

See a blog post about the basic architecture of the MIT reward structure.  DARPA’s network challenge obviously has implications for how to effectively and rapidly spread information in the event of an attack, although clearly the task here (spotting a red balloon) is infinitely easier than other possible challenges which are less observable to the the naked eye (infectious diseases or biological attacks) or actions who cause is less clear (a plane crashing for instance).

DARPA noted how the challenge explored “how broad-scope problems can be tackled using social networking tools. The Challenge explores basic research issues such as mobilization, collaboration, and trust in diverse social networking constructs and could serve to fuel innovation across a wide spectrum of applications.”

Read more here.

Social networking becoming more invisible but more ubiquitous?

The Economist notes that while social networking efforts haven’t found profitable financial models, there is evidence that they are migrating to more of a common model that is less proprietary and more in the background, like air.

“Historically, online media tend to start this way. The early services, such as CompuServe, Prodigy or AOL, began as ‘walled gardens’ before they opened up to become websites. The early e-mail services could send messages only within their own walls (rather as Facebook’s messaging does today). Instant-messaging, too, started closed, but is gradually opening up. In social networking, this evolution is just beginning. Parts of the industry are collaborating in a ‘data portability workgroup’ to let people move their friend lists and other information around the web. Others are pushing OpenID, a plan to create a single, federated sign-on system that people can use across many sites.

“The opening of social networks may now accelerate thanks to that older next big thing, web-mail. As a technology, mail has come to seem rather old-fashioned. But Google, Yahoo!, Microsoft and other firms are now discovering that they may already have the ideal infrastructure for social networking in the form of the address books, in-boxes and calendars of their users. ‘E-mail in the wider sense is the most important social network,’ says David Ascher, who manages Thunderbird, a cutting-edge open-source e-mail application, for the Mozilla Foundation, which also oversees the popular Firefox web browser.

“That is because the extended in-box contains invaluable and dynamically updated information about human connections. On Facebook, a social graph notoriously deteriorates after the initial thrill of finding old friends from school wears off. By contrast, an e-mail account has access to the entire address book and can infer information from the frequency and intensity of contact as it occurs. Joe gets e-mails from Jack and Jane, but opens only Jane’s; Joe has Jane in his calendar tomorrow, and is instant-messaging with her right now; Joe tagged Jack ‘work only’; in his address book. Perhaps Joe’s party photos should be visible to Jane, but not Jack.

“This kind of social intelligence can be applied across many services on the open web. Better yet, if there is no pressure to make a business out of it, it can remain intimate and discreet. Facebook has an economic incentive to publish ever more data about its users, says Mr Ascher, whereas Thunderbird, which is an open-source project, can let users minimize what they share. Social networking may end up being everywhere, and yet nowhere.”

View full Economist story here.

Are social networks replacing search engines?

A comment here on Search Engine Journal suggests that social bookmarks like reddit, delicious, StumbleUpon, may replace Google as the search engines of the future.

The author hints to the advantage of these social bookmarks as incorporating human intelligence, but the author ignores the fact that Google is already powered by links incorporating human intelligence as well. The fact that Google ranks sites by (among other things) the number of external websites linking to those website URLs is already a social form of bookmarking or search. The sites that other people find powerful, influential or authentic get linked to and hence are listed higher in the Google rankings.

In Ian Ayres very interesting read, SuperCrunchers, he discusses Google’s beta search efforts as a way of using personalized information about searchers.

“Tera mining of customer records, airline prices, and inventories is peanuts compared to Google’s goal of organizing all the world’s information. … Google has developed a Personalized Search feature that uses your past search history to further refine what you really have in mind. If Bill Gates and Martha Stewart both Google ‘blackberry,’ Gates is more likely to see web pages about the email device at the top of his results list, while Stewart is more likely to see web pages about the fruit. Google is pushing this personalized data mining into almost every one of its features. Its new web accelerator dramatically speeds up access to the Internet–not by some breakthrough in hardware or software technology–but by predicting what you are going to want to read next. Google’s web accelerator is continually pre-picking web pages from the net. So while you’re reading the first page of an article, it’s already downloading pages two and three. And even before you fire up your browser tomorrow morning, simple data mining helps Google predict what sites you’re going to want to look at (hint: it’s probably the same sites you look at most days). “

I’ve long been interested in how websites can use network knowledge (the wisdom captured within its usebase). found a way to do this in distributing the ability to praise or ding posts of members (without giving anyone veto power); Wikipedia does this through distributing editorial input; Craigslist does this by giving users the power to flag postings as spam.  And I’ve separately written about “viral popularity” as a way of using social networks to spread the popularity of interest in media of various sorts.