Few weeks back while preparing for our presentation for agileNCR 2013, Sanchit and I started working on an interesting problem statement to solve using MapReduce.
We thought of applying MapReduce algorithm to find the trends in Twitter.
A Tweet in a twitter can have hashTags (#helloTwitter) and a certain hashTag used most number of times in tweets globally is said to have highest trend. More details can be found here.
This data is huge and also keeps on increasing, so processing it in traditional manner would not be possible.
Hence we would require hadoop to help us solve this problem.
Twiiter uses Cassandra to store the data in key-value format. Lets assume for simplicity that the key value pair for tweet data looks something like this < twitterHandle,Tweet >.