I’m the map!

(originally posted 2009-11-24)

If there’s a place you wanna go, I’m the one you need to know…

My boss tells me I have “an unhealthy fascination with map“. That may be true, but it’s only because map is a perfect example of what makes Perl Perl. Let’s take a quick stroll through some of the ins and outs of this wondrous little function, shall we?

The Shortest Distance Between Two Points

Here’s the setup. You have a list. You want to make another list with just as many elements, where every element of the new list is created by some sort of operation on the elements of the first list. Huh, you say? Okay, for a concrete example, let’s say you have a list of words, and you want a list of the lengths of those words. Your first instinct may be a loop:

my @lengths;
foreach my $word (@words) {
    push @lengths, length($word);
}

You may even think a fancy statement modifier would do the trick nicely:

push @lengths, length($_) foreach @words;

But what you really want is map:

my @lengths = map {length($_)} @words;

or alternatively:

my @lengths = map(length($_), @words);

It stands head and shoulders above the loop implementation for several reasons:

  1. Its purpose is very well-defined. You know instantly from looking at it that you are trying to create a list with exactly the same number of elements as @words. The loop is a more general-purpose mechanism, so its purpose takes more work to discern.
  2. @lengths gets its value by way of assignment rather than by side effects. This is a clear win for the language purists, but for the normal folks it’s still a Good Thing (TM) because it’s clear where the value came from. If you’re pushing here, it is certainly reasonable to suspect that you may be pushing more items on somewhere else. If you populate it all at once and never change it again, it’s very clean and clear to the reader.
  3. It’s more efficient because Perl can optimize performance for the specific task of list assembly, which it cannot do for a loop, even if the loop is simply trying to do the same thing.

Building on this, we can use map to build hashes:

my %word_length = map { $_ => length($_) } @words;

Note how in the case above, map returns two list elements for each element of @words (remember that the “fat comma” operator => is still basically just a comma, so the expression evaluates to a two-element list). This sort of approach can also be used to create longer lists, e.g.:

my @repeated_words = map { ($_) x length($_) } @words;

You can embed one map inside another, but remember that $_ in the innermost expression will refer to something different from $_ in the outer expression:

@matrix = map {my $x = $_; map {[$x, $_]} @y_values} @x_values;

You can’t get there from here

So when may it be desirable to opt for a loop when map could be used to accomplish the same task? Here are a few general guidelines:

  • If the code block or expression is too complicated, using map will only be detrimental to the purpose of clarity and concision. A loop may be more verbose, but it can win big in terms of making the code comprehensible and therefore maintainable. And branching is more straightforward in a flow-control structure than in a declarative-style function like map.
  • If you want to perform an operation on each element of a list, but don’t need to create a list of results of these operations, map is a counter-intuitive choice.
  • If you’re worried about action-at-a-distance (see below) a foreach loop gives you an easy way to specify a loop variable so you don’t get caught in the trap of using the default variable $_.

Here be Sea Serpents!

map, its kid brother grep, and under certain circumstances for/foreach, have an interesting–er–feature, if you will. You see, the $_ that you use in you expression or code block is actually an alias for the element of the list that is being operated on. So if you change the value of $_, you are operating on a reference (just like operating on @_ within a subroutine) and can therefore create all manner of side effects, intended or–more likely–otherwise. This is what is referred to in the Perl world as “action-at-a-distance”. I recently had my first run-in without action-at-a-distance bugs as I was building an object-oriented system (a la Moose) which at some point in the hierarchy wraps some older code that I was much less familiar with. Now it seems perfectly reasonable to have a collection of objects and want to do something like get the sum of the values of a certain attribute. Something like:

sum map { $_->get_population } @cities

Now if you like being lazy as any great programmer should, you might not actually construct those population attributes until they’re needed, and Moose makes such deferred evaluation a snap. But what if somewhere in the process of constructing the attribute some code along the way decides to use $_ for its own nefarious purposes. This is exactly what I ran into. Somewhere someone decided to assign a value to $_ (which is not necessarily a bad thing, as there are occasions where this is quite convenient) but $_ being a global variable, it impacted the map statement I had written, through many layers of code. My array, which originally contained blessed object references, all of a sudden had some stray strings, and sometimes undefs, in it. A most unwelcome turn of events.

What to do? Well, there are a few precautions that can and should be taken to prevent such things from happening. First of all, any time $_ is used, it should be localized or even lexicalized in scope. Constructs which generate $_ for you typically automatically limit the scope to the construct itself (in the case of map, the map statement itself) as a byproduct of aliasing, i.e.:

$_ = 'a';
@b = 1..3;
for (@b) {++$_}

In the outer scope, $_ still has the value “a”, but incrementing $_ in the loop’s scope has incremented all the values of the array @b so now @b is 2..4. Nutty, huh? But that value of “a” could be problematic to some other scope, so to be a good citizen, I should preface that first assignment with local or (if you have Perl 5.10) my. If the code I had been calling had taken such precautions, I wouldn’t have had so much trouble.

However, there are precautions that can be taken at the other end as well. And in the end, even though I went back and sanitized the underlying code to have good scoping hygiene, I still applied the following so that my code was more robust even if someone went back and did the same thing over again at the lower level. Well how can we prevent the value of the alias from changing? If we are sure that our expression/code-block doesn’t change the value of $_ within its lexical scope, we could (again, if you have Perl 5.10) lexicalize $_. However, if you have no such guarantee, or you don’t have 5.10, then you probably need to localize $_, like so:

@a = 100..200;
@b = map { local $_ = $_; s/^/1/ } @a;

This leaves @a intact but leaves @b equivalent to 1100..1200. If you want a simple way to do this more often, you can roll your own “safe” version of map:

sub safe_map (&@) {
my $code = shift;
map {local $_ = $_; $code->()} @_
}

Now you can just drop that in where you usually use map (except if you are using the EXPR form, which won’t work with the prototype) and you can ensure your mapped lists won’t get corrupted. There are ways the output list could still get corrupted by poorly-scoped subroutines, but that’s a much easier problem to recognize and deal with.

Conclusion

Hopefully this is enough to make you want to use map but want to do so carefully. For with great power comes great responsibility. It is a beautiful tool, but in the wrong hands it can unleash a force so terrible, few have lived to tell the tale. So forge on, intrepid adventurers. Excelsior!

Leave a Reply

Your email address will not be published. Required fields are marked *