MapReduce in terms of SQL


I found a great explanation of MapReduce in terms of SQL in a blog post about Hadoop by Chris Stucchio:

SQL-ish pseudocode:

SELECT G(...) FROM table GROUP BY F(...)

The only thing you are permitted to touch is F(k,v) and G(k,v), except of course for performance optimizations (usually not the fun kind!) at intermediate steps. Everything else is fixed.

Re: Data munging


A short while ago I read a curious blog post titled "Data munging in Perl 6 vs Perl 5". I liked how each individual Perl 6 code snippet for each data manipulation looked. I understood that the purpose of the exercise was to highlight this particular part of the language. And yet, in the end I couldn't shake off the thought that this was not the right way to solve the kind of problems of which the toy problem at hand was an example. I have come to suspect that complex dictionary manipulation is mostly an antipattern that appears in scripts as they evolve into complex programs over time.

Continue reading.