Open source is chock full of high-quality libraries to solve common problems in text processing, like sentiment analysis, topic identification, automatic labelling of content, and more.
The factor most strongly correlated with a film’s profitability is the average gross revenue made by the director’s previous films. In other words, directors who have generated more revenue in the past are correlated with greater profitability in future. Popular stars are correlated with increased revenue but not with profitability. In other words, big stars draw crowds but they don’t guarantee a profit, presumably because they cost a lot to hire in the first place.
Startup Crystal claims it can help you write better e-mails by mining recipients’ online data for clues to their personality.
Recently, data management companies have started using statistical analysis to make educated guesses about user identity. They can hypothesize, based on certain limited pieces of information, that a given smartphone user probably is the same person as a given desktop user. Among the data they collect and analyze is information about the Wi-Fi networks a person uses, the websites she regularly visits, time-of-day patterns, and geographical cues.
Researchers need to be wary of serious pitfalls that arise when working with huge social media data sets.
We’re all influenced by the weather but psychologists have struggled to gather convincing data revealing the correlation. So researchers are turning to Twitter instead.
Google is approaching hospitals and universities with a new pitch. Have genomes? Store them with us.