For an average human it is hard to fathom the volume of data he deals with.
The notion of “a lot of data” changes with Moore’s law and highly subjective: from a stack of punch cards to a rack of hard drives.
A gigabyte used to mean “a lot of data”.
What an average human could fathom is his personal perception of how fast it takes to process data.
Thus, while working with Hadoop based technologies I couldn’t help noticing — how long it takes to process small samples — comparing to relational databases.
Which is the small (but annoying) price to pay for the overwhelming speed of processing “a lot of data”.
That is why it is critical to run pig in local mode when going through tutorials.
pig -x local