error handling and closures in Perl

Error handling

I’ve never been totally happy with how I do error handling, in any language.
I’ve always favoured calling a function and returning a simple scalar value that determines whether an error happened or not. Perhaps this dates back to my C programming days. So for many years, I had used a simple convention in my Perl programming. If the return value was 0, then everything was ok. If it was a positive integer, typically 1, it indicated a non-fatal error and you took appropriate action. Perhaps the positive value indicated the number of errors that were found. For example, the number of errors found in parsing a configuration file. And if the return value was negative this indicated a fatal error. For example, you called the function with inappropriate arguments and you should go fix your program. The value of the negative return would typically be the argument number. ie: if it was the 3rd argument that was incorrect, it would return -3.

This leaves the issue of how to return the error message(s).
The other convention I was using as well was to pass in a reference to the error message that was to be returned. I would also pass in a reference to the output I was expecting back, whether an array, an associative array (hash) or some complex data structure. So, a typical calling sequence might look like:

my @input = ( "this", "and", "that" ) ;
my @output = () ;
my $error = "" ;
my $ret = some_function( \@input, \@output, \$error ) ;
if ( $ret < 0 ) {
   print STDERR "Fatal error: $error\n" ;
   exit(1) ;
} elsif ( $ret ) {
   print STDERR "Error: $error\n" ;
   # fix your problem...
}
for my $arg ( @output ) {
    print "OUTPUT: $arg\n" ;
}
exit(0) ;

The function would do sanity checking of the arguments. For eg, for the 2nd argument above for output, it would check that it was a reference to an array, and if it was not, then the function would do an:

if (( ref( $output_ref ) eq "" ) or 
    ( ref( $output_ref ) ne "ARRAY" )) {
       ${$errs_ref} = "arg #2 (output) is not an ARRAY ref" ;
       return(-2) ;
}

A complete sample function, which just copies the input array to the output array is:

sub some_function {
    my $input_ref   = shift ;
    my $output_ref  = shift ;
    my $errs_ref    = shift ;

    if (( ref( $errs_ref ) eq "" ) or
        ( ref( $errs_ref ) ne "SCALAR" )) {
           return(-3) ;
    }
    if (( ref( $input_ref ) eq "" ) or
        ( ref( $input_ref ) ne "ARRAY" )) {
        ${$errs_ref} = "arg #1 (input) is not an ARRAY ref" ;
           return(-1) ;
    }
    if (( ref( $output_ref ) eq "" ) or
        ( ref( $output_ref ) ne "ARRAY" )) {
        ${$errs_ref} = "arg #2 (output) is not an ARRAY ref" ;
           return(-2) ;
    }

    foreach my $arg ( @{$input_ref} ) {
        push( @{$output_ref}, $arg ) ;
    }
    return(0) ;
}

This leaves the problem, what if the caller screwed up the error argument itself?
You can’t use it to return the error. So then I’d do something goofy like prepare the potential error message before calling the function:

my $error = "Error: Arg 3 is not a SCALAR reference" ;
my $ret = some_function( \@input, \@output, \$error ) ;

and then the function would simply do the argument checking and return the proper error
value. Since the caller received the error code (-3), it would already have the error message pre-populated. So you have to determine error status by the return value, not if the error message is non-empty.
I’ve never been totally happy with this as it seemed to be mostly ok, but with a wart.

Adding more arguments

As time goes on, I’d find that I’d want to add more functionality to the function, which often required more inputs. The way I’d typically deal with this would be to add a single new Options associative array (hash) as the last argument. This way I would not break the existing API and I could continue to add more options. This was something I was also not totally happy with. Perhaps a better solution, at least for numerous scalar inputs, is to have them all in a single associative array input, and this way I can also assign defaults to those pre-defined options within that array within the function.

What about an errno?

I considered something like a C errno value, that if the function returned a error indication, then it would check some globally available value to see what the problem is. But unlike errno which specifies a specific predefined error, I’d also want a globally available error message that I can set to whatever I want. But, there are times I want an array of error messages available, not just a scalar. For example, if I create a config file and have several errors in it, I don’t want to find out one error at a time as I correct each and every single error as it is reported. I think it would be useful at times to have separate error codes available as well as the error messages themselves. But the biggest problem I have with this is polluting the global namespace with these error codes and messages. What to call them?

Using closures

To use the above idea of a globally available error message(s) and error code(s), and not pollute the namespace, assuming I am using libraries as opposed to modules, I considered closures. I tend to prefer using libraries (.pl files) as opposed to modules (.pm files) because I think they are simpler, and I can dynamically decide if I want to load them or not, or which version, depending on the environment and other factors. If you don’t know what a closure is, I won’t explain them (much) here, and instead direct you to an explanation here and an example here
But basically, you want to hide a variable, and so here is an example of how to have a simple set of debug routines, with a dprint() function that only prints the supplied message if an interal debug_flag value is set. Basically, you define a lexical variable (using my) and put it and the routines needing access to it in a block, and have it initialized at the beginning by having it as a BEGIN block:

package Debug ;

use strict ;
use warnings ;

our $VERSION = '1.0';

BEGIN {
    # hide flag inside a closure. 

    my $debug_flag = 0 ;

    sub debug {
        return( $debug_flag ) ;
    }

    sub set_debug {
        my $old_value = $debug_flag ;
        $debug_flag = shift ;
        return( $old_value ) ;
    }

    sub dprint {
        my $msg = shift ;

        if ( $debug_flag ) {
            my $ret = print STDERR "$msg\n" ;
        }
        return(0) ;
    }
}
1;

The $debug_flag is not available to anything other than the functions debug(), set_debug() and dprint(). Yes, I realize that is not a issue when using Object Oriented Programming and encapsulation. But for small simple programs, I prefer procedural programming.

Carrying this idea forward to error handling, you can hide a couple of arrays in a Error library:

package Error ;

use strict ;
use warnings ;

our $VERSION = '1.0';

BEGIN {
    # hide variables inside a closure.  These variables cannot 
    # be accessed by anyone except the routines defined here.

    my $error_status  = 0 ;   # only visible to error()
    my @error_strs    = () ;  # ditto
    my @error_codes   = () ;  # ditto
    ...

In this case, I have parallel arrays, @error_strs and @error_codes. So if you accessed the 3rd error message, you’d want the 3rd error_code. An example of the calling sequence could be:

Debug::set_debug(1) ;  # turn on debugging

# check to see if error occurred
my $ret = some_func() ;

if ( $ret ) {
    my @errors = Error::get_errors() ;
    my @codes  = Error::get_error_codes() ;
    for ( my $i = 0 ; $i < @errors ; $i++ ) {
        printf STDERR "$errors[$i] ($codes[$i])\n" ;
    }
} else {
    print "all is ok\n" ;
}

# lets say we corrected our errors...
Error::set_error() ;    # zero out errors

if ( Error::is_error() == 0 ) {
    print STDERR "All is ok now\n" ;
} else {
    print STDERR "We have errors!\n" ;
}

I’m still not sure what I think about the above strategy for handling errors, but the point of this blog entry was to point out the use of closures and how they might be useful. Granted, the namespace pollution is not as bad as I suggested, since I did create separate packages (Debug and Error), but if you didn’t do that, you’d have polluted your global namespace without doing something like using a closure as above.

The actual Error.pl library that the above code called, is below:

package Error ;

use strict ;
use warnings ;

our $VERSION = '1.0';

BEGIN {
    # hide variables inside a closure.  These variables cannot 
    # be accessed by anyone except the routines defined here.

    my $error_status  = 0 ;   # only visible to error()
    my @error_strs    = () ;  # ditto
    my @error_codes   = () ;  # ditto

    # return number of error messages waiting
    # So 0 means there are no errors

    sub is_error {
        return( $error_status ) ;   # number of errors
    }

    # return all error messages.  Do not affect stack

    sub get_errors {
        return( @error_strs ) ;
    }

    # return all error codes.  Do not affect stack
    # Probably most folks won't use codes, but it is there
    # if they want it.  It is a parallel array to the messages.

    sub get_error_codes {
        return( @error_codes ) ;
    }

    # zero out errors if no arguments given.
    # Otherwise, supplies a error code and a message.
    # It's up to the program on how to interpret
    # program-supplied code.
    #   set_error( "error1" ) ;     # error - no code
    #   set_error( "error2", error_code2 ) ;
    #   set_error() ;               # clear out errors

    sub set_error {
        my $num_args = @_ ;

        if ( $num_args == 0 ) {
            @error_strs =  () ;      # zero out messages
            @error_codes = () ;      # zero out codes
            $error_status = 0 ;      # reset
            return(0) ;
        }

        # See if only 1 argument supplied.
        # If so, treat it as the message, and set code to 0.

        my $e_code = 0 ;
        my $e_str ;
        if ( $num_args == 1 ) {
            $e_str = shift ;
        } else {
            $e_str  = shift ;
            $e_code = shift ;
            # ignore anything else
        }

        push @error_codes, $e_code ;
        push @error_strs, $e_str ;
        $error_status++ ;
        return( $error_status ) ;
    }
}

1;
About

RJ is a freelance consultant living in Toronto specializing in software development and systems administration on Unix/Linux systems.

Posted in Perl, Programming

Leave a Reply

%d bloggers like this: