Thursday, June 18, 2009

Ahmadinejad Goes Down!

Like much of the world, I have been following the presidential election in Iran and its aftermath with great excitement. The election was crudely stolen by the incumbent Ahmadinejad after surprising open campaign, but the people of Iran have bravely taken to the streets in support of Mousavi -- the real winner. It is too early to tell who will prevail in this bare-knuckle power struggle, but you get probably guess who I am rooting for.

The sentiment polarity graph tells the interesting story. Ebbs and flows of the campaign are reflected before the vote, particularly Ahmadinejad's widely-panned debate performance on June 4 and the increasing sense that Mousavi could win. The election on June 12 drew enormous turnout followed too quickly by the announcement of a landslide Ahmadinejad victory. But within 24 hours, Mousavi's claim of fraud gains credence, and Ahmadinejad's sentiment (at least) goes down.

Thursday, June 11, 2009

SBIR Award for General Sentiment

General Sentiment, the startup company which licenced Lydia technology from Stony Brook, has just received a $100,000 Small Business Innovative Research (SBIR) phase I grant from the National Science Foundation (NSF) entitled `` Identifying and Interpreting Trends through News/Blog Analysis''.

Special thanks go to Barack Obama, as this award was funded under the American Recovery and Reinvestment Act of 2009 (ARRA).

Lydia at the Hadoop Summit!

My student Mikhail Bautin just presented his work on the Lydia processing architecture to over 700 people at the 2009 Hadoop Summit in Santa Clara, CA. He found it to be a great conference (better he says than the more academic venues I've sent him to before). There is enormous energy in the Hadoop world today as it becomes the primary system for web-type parallel processing and cloud computing in general.

Hadoop is a distributed processing system inspired by Google's MapReduce paradigm. Computations proceed in rounds of mapping (sending data packets to particular machines based on identification keys) and reduce (crunching these tuples down to a particular result). Such problems arise frequently in Lydia. For example, we can imagine mapping all the sentences in our news corpus keyed to the name of the entities within it, so we can then use reduce to count the number of occurrences of each entity and the other entities it is juxtaposed with. Hadoop manages all the messy stuff of parallel processing, like load balancing and distributed data structures and the like.

It is hard to overstate the importance that Hadoop has made to the Lydia project, efforts which are now rapidly bearing fruit. Expect to hear me soon report on results from enormous blog depositories we have spidered for years yet never previously been able to analyze. Further, we now regularly do large scale analysis of our analysis
using Hadoop, for example in studying trends across all entities across nationalities or ethnic groups.

It is equally hard to overstate the efforts Mikhail has made getting us there with our system. I can ask nothing more of my other students except that they try to "be like Mike".