Friday, May 15, 2009

British News Processing

I gave a demo today to Sean Carey, a professor of political science at the University of Sheffield. He is interested in using our news analysis in cahoots with polling data from the British Election Study, to better understand how voters make their decision, and why. They are gearing up for the next national election, likely to occur in Spring 2010.

Since their study revolves around why British voters vote the way they do, they are only concerned with data from British newspapers. The source set tab of TextMap Access makes it easy to create a source set from the dailies depository consisting of all newspapers from the United Kingdom. Once this source set is named and registered, it will appear as an entry as a new depository ready for use in the frequency and sentiment tabs.

One interesting discovery in playing with it was the fraction of references to a local entity like `Gordon Brown' that came from British sources. The answer proved to be about 60%, which is quite impressive considering that less than 10% of our total spidered sources are from the United Kingdom. But it makes sense that he would be what the local readership is interested in...


Particularly amusing was to look at the entities juxtaposed with Gordon Brown at different type scales. Tony Blair, who he served faithfully as Chancellor of the Exchequer, proves his strongest association over the full dailies depository (left column). The past year (center column) more strongly reflects his activities as Prime Minister, including interactions with world leaders (Obama, Sarkozy, Merkel). The strongest associations over the past month (right) column reflect recent activities. We were puzzled a bit by the strong association with Carol Ann Duffy, but a little reading revealed that Brown had just had appointed her as the first female Poet Laureate.

One minor complication of British news processing is that the spelling and word usage is slightly different from what is used in the United States. The lexical resources we employ both British and American spellings, and I expect that our NLP performance will be quite similar on British texts.

No comments:

Post a Comment