Imagine you have a jar of multicolored candies, and you need to learn something from them, perhaps the count of blue candies relative to red and yellow ones. You could empty the jar onto a plate, sift through them and tally up your answer. If the jar held only a few hundred candies, this process would take only a few minutes.
Now imagine you have four plates and four helpers. You pour out about one-fourth of the candies onto each plate. Everybody sifts through their set and arrives at an answer that they share with the others to arrive at a total. Much faster, no?
That is what Hadoop does for data. Hadoop is an open-source software framework for running applications on large clusters of commodity hardware. Hadoop delivers enormous processing power – the ability to handle virtually limitless concurrent tasks and jobs – making it a remarkably low-cost complement to a traditional enterprise data infrastructure.