The Knowin' Samoan

Scope Creep

(initially published 2012-05-16)

I have a love-hate relationship with Perl’s scoping model. It has quite obviously grown over time so it is kind of twisted and knotted like a tree. However, all those twists and knots make it easy to climb if you know what you’re doing. So lets review the different types of scope:

Package Scope – Package scope is actually really just global scope with the caveat that unqualified identifiers refer to the current package setting. This is the default scope, though naked package-scoped variables are fraught with peril, so I will repeat the admonition to use strict so you don’t end up hanging yourself from the Perl tree with a noose made of your own code. Subroutine declarations (at least “nonymous” ones) are almost always package/globally scoped. Perl’s magic variables (such as $_) are generally package-scoped as well.
Dynamic Scope – Using the local keyword, we can restrict the declarations of variables, typeglobs, etc. to only be valid within the confines of our current block and everything we call from within that block. The declarations essentially have global/package scope until we exit the current block. This means we can override global declarations for the duration of the block, or we can materialize our own pseudo-global subroutines and make them disappear just as quickly when we’re done with them. People try to tell me Perl’s local is deprecated. It’s not. It’s just abused and misunderstood.
Lexical Scope – my and our modify declarations so that they are only meaningful within the block within which the declaration occurs. This includes sub-blocks, but it does not include calls to code that is defined outside of the lexical block. The primary distinction between my and our is that my declarations disappear completely when the block is done, but our declarations can be re-accessed in an entirely different scope (or the next time we re-enter this scope) by another use of our. This creates “hidden” global variables that can be explicitly revealed within any given lexical scope. There are more peculiarities of our that the Perl docs can certainly elucidate more than I have patience to.

Here’s some examples that hopefully will make some of the differences clear:

$foo = 0; # AKA $main::foo

sub a { print "$foo\n"; }
a(); # prints "0"
package bar;
$foo = 1; # AKA $bar::foo
main::a(); # still prints "0"
sub a { print "$foo\n"; }
a(); # prints "1"
{
    local $foo = 2;
    a(); # prints "2"
    local $main::foo = 3;
    main::a(); # prints "3"
}
a(); # prints "1" again
main::a(); # prints "0" again

my $foo = 4; # lexically scoped, not the same as $bar::foo
a(); # prints "1"

package baz;
print "$foo\n"; # prints "4"

{
    my $foo = 5;
    sub a {
        print "$foo\n";
    }
    a(); # prints "5"
}
print "$foo\n"; # prints "4" again
a(); # prints "5" again. Aha! A closure!

{
    our $foo = 6;
    package moo;
    print "$foo\n"; # still prints "6" b/c "our" is lexical
    our $foo = 7; # Lexical but associated with the moo package
}

{
    package moo;
    print "$foo\n"; # prints "4" still
}

{
    our $foo;
    print "$foo\n"; # prints "6"
    package moo;
    our $foo;
    print "$foo\n"; # prints "7"
}

Please note, the above won’t work in a use strict environment, because of the use of unqualified global variables. It’s a little like driving without a seatbelt. You really can if you need to, but in almost all circumstances there’s just no need to take the risk.
Globals and locals are forever bound to a package scope. If you switch packages, the same names refer to entirely different entities.
Lexical identifiers, declared with my and our are good throughout the entire block in which they were declared. our identifiers however are also associated with a package, and you can get back to the same entity in a different scope by using our again.
Lexical variables can be “closed” as in the last sub a above. In this case, the subroutine is actually defined in the global package scope despite the enclosing anonymous block, and maintains a reference to the lexically defined $foo even after the scope in which that $foo was originally defined is gone. We’ll get back to closures more in another post.

Now, you may ask yourself, why is any of this a good thing? This post is unfortunately short on some of the sexy techniques I have been advertising, but scoping is foundational to some truly mindblowing stunts which are soon to come. So study up. This will be on the final!

What playing Go has taught me about Lean Software Development

(originally published 2012-05-16)

I just recently read Lean Software Development: An Agile Toolkit by Mary and Tom Poppendieck. Very insightful and replete with wisdom derived of experience. What just struck me though was how many of the principles relate to my ongoing study and practice of the ancient Asian strategy game of Go. Maybe there’s some shared Japanese philosophy behind it all that I just happened upon. I will assume for this discussion that you have actually played Go, so if you haven’t, go out now and learn before continuing. But before I go into the overlap with Go, here’s some quick background. Here are the 7 Lean principles a la the Poppendiecks:

Eliminate Waste – Build only what you need to, YAGNI
Amplify Learning – Make sure that you are continually learning what to do along the way, because you most likely won’t know everything you need to know up front
Decide as Late as Possible – Only make irreversible decisions at the last “responsible” moment, when you know as much as you can
Deliver as Fast as Possible – Turn around small units of work that deliver visible value, enabling you to get better feedback and keep moving in the right direction
Empower the Team – Enable the people closest to the work to make decisions based on the knowledge they gain from their proximity
Build Integrity In – Make sure the parts work together as a cohesive whole
See the Whole – Don’t allow optimization of the parts subvert the overall system

There, I saved you the trouble of reading the book. Not. This back-of-the-envelope summary hardly does the book justice, and the book only scratches the surface of these topics. All I want to do here is reflect upon some relevant aspects of Go strategy that go hand-in-hand with these Lean principles. Having learned them in one field, I am all the more certain of the underlying truths and can leverage them in another. So without further ado, here is how I would explain the Lean principles in Go terms:

Eliminate Waste – Go is all about the efficient use of stones. If you play an unnecessary stone in a close game, it will haunt you. Much time is needlessly spent reinforcing groups that don’t need it, leaving other areas open to attack and foregoing opportunities to go on the offensive. There are many subtleties to the process of deciding when tenuki is appropriate, but simply remembering that it is always an option is half the battle.
Amplify Learning – Go strategy needs to be responsive, particularly early on. You cannot set out a course and expect to be able to play it through to the end without correction along the way. Every move that your opponent makes may be occasion for a complete reevaluation of the board position. Even when they move exactly how you expect them to, there may be things that you see now that they did make that move that you didn’t see earlier.
Decide as Late as Possible – Many of the best strategies in Go involve moves that will have positive impact regardless of the opponent’s response. Joseki are built around trees of decisions that generally have more than one acceptable outcome, at least locally. Furthermore, playing some situations out too early may lead to some things getting set in stone (pun partially intended) that limit your options severely later on down the road. Leaving ambiguity in at early stages doesn’t always feel comforting, but living with the tension of unresolved battles is an essential skill.
Deliver as Fast as Possible – Being able to create a viable shape quickly enables you to move on to whatever is most important on the board. It is tragic to get stuck shoring up a dumpling for fear of losing it all, while your opponent has happily moved on to the next big point. Light shapes are quick to erect and can be sacrificed as necessary in service of the larger cause without too much heartache.
Empower the Team – This one is a bit of a stretch, as your team is a bunch of stones. But it is relevant in that stones are much less effective when they are overconcentrated. Too much backup can turn a once-useful move into dead weight. The stone’s purpose is undermined by redundancy.
Build Integrity In & See the Whole – I’m combining these two, because at least for the purpose of this discussion, there is so much overlap between the two. Basically, they are a reminder to make sure that your tactics work together with your strategy. Picking your joseki to supplement the overall board position, keeping an eye on ladder breakers, stuff like that are at least apparent to us middling-ranked players. Over-committing to one area of the board can throw away all the gains made elsewhere. I could go on and on about this one, and perhaps I will some day.

The application of these lessons to software development I will leave as an exercise to the reader, but the key metaphor is that your “opponent” in software development is the ever-changing set of requirements that you have to satisfy. Go is helpful in that the “projects” are short and it’s easier to ingrain the lessons through rapid repetition, so it is useful for training your brain to think certain ways. Then the only thing is maintaining those ways of thinking in a different context. I don’t have any tricks up my sleeve for doing that, except perhaps spending a few hours writing on your blog about how the two relate to each other. Not that I recommend you try it, as I hear it’s already been done. But I’m sure you can come up with something.

So now that you know the deal, Go out there and Develop something. Leanly.

I’m the map!

(originally posted 2009-11-24)

If there’s a place you wanna go, I’m the one you need to know…

My boss tells me I have “an unhealthy fascination with map“. That may be true, but it’s only because map is a perfect example of what makes Perl Perl. Let’s take a quick stroll through some of the ins and outs of this wondrous little function, shall we?

The Shortest Distance Between Two Points

Here’s the setup. You have a list. You want to make another list with just as many elements, where every element of the new list is created by some sort of operation on the elements of the first list. Huh, you say? Okay, for a concrete example, let’s say you have a list of words, and you want a list of the lengths of those words. Your first instinct may be a loop:

my @lengths;
foreach my $word (@words) {
    push @lengths, length($word);
}

You may even think a fancy statement modifier would do the trick nicely:

push @lengths, length($_) foreach @words;

But what you really want is map:

my @lengths = map {length($_)} @words;

or alternatively:

my @lengths = map(length($_), @words);

It stands head and shoulders above the loop implementation for several reasons:

Its purpose is very well-defined. You know instantly from looking at it that you are trying to create a list with exactly the same number of elements as @words. The loop is a more general-purpose mechanism, so its purpose takes more work to discern.
@lengths gets its value by way of assignment rather than by side effects. This is a clear win for the language purists, but for the normal folks it’s still a Good Thing (TM) because it’s clear where the value came from. If you’re pushing here, it is certainly reasonable to suspect that you may be pushing more items on somewhere else. If you populate it all at once and never change it again, it’s very clean and clear to the reader.
It’s more efficient because Perl can optimize performance for the specific task of list assembly, which it cannot do for a loop, even if the loop is simply trying to do the same thing.

Building on this, we can use map to build hashes:

my %word_length = map { $_ => length($_) } @words;

Note how in the case above, map returns two list elements for each element of @words (remember that the “fat comma” operator => is still basically just a comma, so the expression evaluates to a two-element list). This sort of approach can also be used to create longer lists, e.g.:

my @repeated_words = map { ($_) x length($_) } @words;

You can embed one map inside another, but remember that $_ in the innermost expression will refer to something different from $_ in the outer expression:

@matrix = map {my $x = $_; map {[$x, $_]} @y_values} @x_values;

You can’t get there from here

So when may it be desirable to opt for a loop when map could be used to accomplish the same task? Here are a few general guidelines:

If the code block or expression is too complicated, using map will only be detrimental to the purpose of clarity and concision. A loop may be more verbose, but it can win big in terms of making the code comprehensible and therefore maintainable. And branching is more straightforward in a flow-control structure than in a declarative-style function like map.
If you want to perform an operation on each element of a list, but don’t need to create a list of results of these operations, map is a counter-intuitive choice.
If you’re worried about action-at-a-distance (see below) a foreach loop gives you an easy way to specify a loop variable so you don’t get caught in the trap of using the default variable $_.

Here be Sea Serpents!

map, its kid brother grep, and under certain circumstances for/foreach, have an interesting–er–feature, if you will. You see, the $_ that you use in you expression or code block is actually an alias for the element of the list that is being operated on. So if you change the value of $_, you are operating on a reference (just like operating on @_ within a subroutine) and can therefore create all manner of side effects, intended or–more likely–otherwise. This is what is referred to in the Perl world as “action-at-a-distance”. I recently had my first run-in without action-at-a-distance bugs as I was building an object-oriented system (a la Moose) which at some point in the hierarchy wraps some older code that I was much less familiar with. Now it seems perfectly reasonable to have a collection of objects and want to do something like get the sum of the values of a certain attribute. Something like:

sum map { $_->get_population } @cities

Now if you like being lazy as any great programmer should, you might not actually construct those population attributes until they’re needed, and Moose makes such deferred evaluation a snap. But what if somewhere in the process of constructing the attribute some code along the way decides to use $_ for its own nefarious purposes. This is exactly what I ran into. Somewhere someone decided to assign a value to $_ (which is not necessarily a bad thing, as there are occasions where this is quite convenient) but $_ being a global variable, it impacted the map statement I had written, through many layers of code. My array, which originally contained blessed object references, all of a sudden had some stray strings, and sometimes undefs, in it. A most unwelcome turn of events.

What to do? Well, there are a few precautions that can and should be taken to prevent such things from happening. First of all, any time $_ is used, it should be localized or even lexicalized in scope. Constructs which generate $_ for you typically automatically limit the scope to the construct itself (in the case of map, the map statement itself) as a byproduct of aliasing, i.e.:

$_ = 'a';
@b = 1..3;
for (@b) {++$_}

In the outer scope, $_ still has the value “a”, but incrementing $_ in the loop’s scope has incremented all the values of the array @b so now @b is 2..4. Nutty, huh? But that value of “a” could be problematic to some other scope, so to be a good citizen, I should preface that first assignment with local or (if you have Perl 5.10) my. If the code I had been calling had taken such precautions, I wouldn’t have had so much trouble.

However, there are precautions that can be taken at the other end as well. And in the end, even though I went back and sanitized the underlying code to have good scoping hygiene, I still applied the following so that my code was more robust even if someone went back and did the same thing over again at the lower level. Well how can we prevent the value of the alias from changing? If we are sure that our expression/code-block doesn’t change the value of $_ within its lexical scope, we could (again, if you have Perl 5.10) lexicalize $_. However, if you have no such guarantee, or you don’t have 5.10, then you probably need to localize $_, like so:

@a = 100..200;
@b = map { local $_ = $_; s/^/1/ } @a;

This leaves @a intact but leaves @b equivalent to 1100..1200. If you want a simple way to do this more often, you can roll your own “safe” version of map:

sub safe_map (&@) {
my $code = shift;
map {local $_ = $_; $code->()} @_
}

Now you can just drop that in where you usually use map (except if you are using the EXPR form, which won’t work with the prototype) and you can ensure your mapped lists won’t get corrupted. There are ways the output list could still get corrupted by poorly-scoped subroutines, but that’s a much easier problem to recognize and deal with.

Conclusion

Hopefully this is enough to make you want to use map but want to do so carefully. For with great power comes great responsibility. It is a beautiful tool, but in the wrong hands it can unleash a force so terrible, few have lived to tell the tale. So forge on, intrepid adventurers. Excelsior!

A slice of hash

(originally posted 2009-10-20)

I used to like Perl. The past few months using it as my main language, I grew to really like it a lot. What had me fall head over heels in love, what inspired me to start this new blog series: hash slices. Anyone who’s been around Perl at all knows hashes (i.e. associative arrays, to use the behavioral description rather than the implementation description). They are a core feature that set it apart fairly early on as a force to be reckoned with, and has been emulated numerous times due to it’s simplicity and power. But addressing multiple elements of an associative array requires some sort of loop, right? Not so in Perl. List assignments are a fairly well-known aspect of Perl. But with a hash slice, you can treat a set of hash elements exactly as you would a list.

You’ve probably come across array slices before, though you may not have recognized them as such:

@foo[1..5]
@bar[2,4,6]

These can be used pretty much anywhere a regular list could, even as an lvalue. A hash slice produces much the same animal, though it comes of different parentage. The key syntax for referring to a hash slice is analogous:

@hash{list_of_keys}

@list_of_keys could also be replaced by a literal list, a function returning a list, a de-referenced array reference, etc. In other words, you tell the parser that you want a list of hash elements and just supply a list of keys. That’s interesting, but it gets insanely beautiful when you do things like assigning to a hash slice:

# Assign sequential values to a bunch of hash elements
@my_hash{qw(a b c d)} = 1..4;
# Copy all elements of hash_b whose key has a 'g' in it into hash_a, leaving other elements of hash_a intact
@hash_a{@tmp} = @hash_b{@tmp = sort grep {/g/} keys %hash_b};
# Assign the value 1 to all elements of the hash referenced by $hashref, whose keys are in the array true_keys
@{$hashref}{@true_keys} = (1) x scalar(@true_keys);

To accomplish these same things you would have had to do things like:

@my_arr = qw(a b c d);
foreach (0..$#my_arr) {
$my_hash{$my_arr[$_]} = $_ + 1;
}
foreach (grep {/g/} sort keys %hash_b) {
$hash_a{$_} = $hash_b{$_};
}
foreach (@true_keys) {
$hashref->{$_} = 1;
}

Or even more unwieldy:

$my_hash{a} = 1;
$my_hash{b} = 2;
$my_hash{c} = 3;
$my_hash{d} = 4;
...

The minute I discovered hash slices, I was able to refactor away dozens of lines of code, and make things in most circumstances easier to understand and harder to break. Once you become familiar with the syntax, it can be much easier to discern the intention of a single line of code than a loop, or god forbid a series of assignments.

So let’s get slicin’ and dicin’ people!

Round-trip integration testing

(originally posted 2012-04-03)

Any system worth its salt is going to need some pretty intricate testing to verify it does all the wonderful things it’s designed to do. But hand-crafting test cases is tedious, brittle, and prone to masking your presuppositions about what will work and what might break. Auto-generating data takes care of a lot of the grunt-work, but how do you figure out what to look for at the end if you’re not explicitly defining the input and the expected output?

Well, one obvious approach is to programmatically obtain the output based on the input. But then you’re basically repeating the implementation of the code being tested. At best, the two implementations were arrived at independently and there’s something useful in the test. But if the requirements change over time–and they will–the test and code will both have to change similarly, and what’s the likelihood that you can maintain clean-room separation between the two during this evolution? You’re doomed to coupling the two together.

That’s why I shoot for what I’m calling “round trip” integration tests. The key idea is for the test to perform the operation in reverse, so you can put your test data through both and make sure it looks the same when it comes back. Depending on the operations in question, it may make sense to start with an expected output, from that derive the input needed to produce it, then verify that it does in fact do so. Conversely you may start with input, perform the operation, then transform it back to compare with the original input. If the operation is in any way destructive, then the latter option probably isn’t available to you. So let’s look at a tiny, somewhat contrived, example of the former.

Let’s say I wanted to test a grouping function like Ruby’s Enumerable#group_by, but in Perl. I can generate some sample output:

sub r100 { rand(100) + 1 }
sub random_sequence {
  my ($length) = @_;
  my %key;
  my @sequence;
  while (@sequence < $length) {
    my $num = r100;
    next if $key{$num};
    push @sequence, $num;
    $key{$num} = 1;
  }
  return @sequence;
}
my %grouped = map {$_ => [sequence(r100)]} sequence(r100);

Now we need to produce some input that should produce such a thing. Some type of randomness is nice, but I’d still like to see the entries for every given key in the same relative order:

my @input;
while (my @keys = keys %grouped) {
  my $key = $keys[rand(@keys)];
  push @input, [$key, shift @{$grouped{$key}}];
  delete $grouped{$key} unless @{$grouped{$key}};
}

So let’s write some code to perform the collation:

sub group {
  my %grouped;
  foreach my $item (@_) {
    my ($key, $value) = @$item;
    push @{$grouped{$key}}, $value;
  }
  return %grouped;
}

Now all we have left is to string it all together and compare. Your average test-suite will do this comparison for you, but let’s just brute force it here:

my %result = group @input;
die 'Key match failure' unless %result ~~ %grouped;
foreach my $key (keys %grouped) {
  die 'Grouped value failure' unless @{$result{$key}} ~~ @{$grouped{$key}};
}

Stringing it all together, I found there was a flaw somewhere in the round trip as the ‘Key match failure’ got tripped. Turned out that the transformation from %grouped to @input was destructive, so I needed to operate on a copy. A shallow copy made it past the key test, but died on the ‘Grouped value failure’. So here it is all strung together, with a deep copy, error free:

use strict;

sub r100 { int(rand(100)) + 1 }
sub sequence {
  my ($length) = @_;
  my %key;
  my @sequence;
  while (@sequence < $length) { my $num = r100; next if $key{$num}; push @sequence, $num; $key{$num} = 1; } return @sequence; } my %grouped = map {$_ => [sequence(r100)]} sequence(r100);

my %tmp;
@tmp{keys %grouped} = map {[@$_]} values %grouped;
my @input;
while (my @keys = keys %tmp) {
  my $key = $keys[rand(@keys)];
  push @input, [$key, shift @{$tmp{$key}}];
  delete $tmp{$key} unless @{$tmp{$key}};
}

sub group {
  my %grouped;
  foreach my $item (@_) {
    my ($key, $value) = @$item;
    push @{$grouped{$key}}, $value;
  }
  return %grouped;
}

my %result = group @input;

die 'Key match failure' unless %result ~~ %grouped;
foreach my $key (keys %grouped) {
  die 'Grouped value failure' unless @{$result{$key}} ~~ @{$grouped{$key}};
}

Your patience will be rewarded…

Retooling this site because I haven’t touched it in a while, and what with the Jeopardy! uproar I have to present a better face to the world. So hold on to your horses and I’ll have all my old content up soonish, as well as some brand new stuff.