MapReduce in terms of SQL


I found a great explanation of MapReduce in terms of SQL in a blog post about Hadoop by Chris Stucchio:

SQL-ish pseudocode:

SELECT G(...) FROM table GROUP BY F(...)

The only thing you are permitted to touch is F(k,v) and G(k,v), except of course for performance optimizations (usually not the fun kind!) at intermediate steps. Everything else is fixed.