Chapter 18

References in Perl 5

by Kamran Husain


CONTENTS

Today's lesson describes the use of Perl references and the concept of pointers. Today's lesson also shows you how to use references to create complex data structures, pass pointers around, and work with subroutines. You learn the following topics:

Introduction to References

A reference is simply a pointer to something, such as a Perl variable, array, hash (also known as an associative array), or even a subroutine. The concept of a reference is probably familiar to Pascal or C programmers. A reference is simply an address to a value. How you use that value is up to you as the programmer and what the language lets you get away with. In Perl, you can refer to a pointer as a reference; in fact, you can use the terms pointer and reference interchangeably without any loss of meaning.

References are useful in creating complex data structures in Perl. In fact, you cannot really define any complicated structures in Perl without using references.

The two types of references in Perl 5 are hard and symbolic. A symbolic reference contains the name of a variable. Symbolic references are useful for creating variable names and addressing them at runtime. Basically, a symbolic link is like the name of a file or a soft link on a UNIX system. Hard references are more like hard links in the file system (that is, merely another path to the same underlying item).

Perl 4 permits only symbolic references, which are difficult to use. For example, in Perl 4, you have to use names to index to an associative array called _main{} of symbol names for a package. Perl 5 now lets you have hard references to data.

Hard references keep track of reference counts. When the reference count becomes zero, Perl automatically frees the item referred to. If that item happens to be a Perl object, the object is destructed-freed to the memory pool. Perl is object-oriented in itself because everything in Perl is an object. Packages and modules make it much easier to use objects in Perl.

Hard references are easy to use in Perl as long as you use them as scalars. To use hard references as anything but scalars, you have to explicitly de-reference the variable and tell it how you want it to behave. If this sounds confusing, don't worry; references are covered on Day 19, "Object-Oriented Programming in Perl," to help make this concept clearer.

Using References

In today's lesson, a scalar value refers to a variable such as $pointer. The variable $pointer contains one data item; whether the item is a number, string, or an address is determined by how you use it.

Any scalar can hold a hard reference, and because arrays and hashes do contain scalars, it follows that you can now easily build complex data structures of different combinations of arrays of arrays, arrays of hashes, hashes of functions, and so on. As long as you understand that you are working only with scalars, you should be able to navigate through the most complex structures with proper dereferencing.

Let's cover some of the basics first before we get too deep into the chapter.

To use the value of $pointer as the pointer to an array, you reference the items in the array as @$pointer. This notation of "@$pointer" roughly translates to "take the address in $pointer and then use it as an array." Similarly for hashes, you would use %$pointer as the reference to the first element in the hash.

Because there are several ways to construct references, you can have references to just about anything, such as arrays, scalar variables, subroutines, file handles, and, yes-to the delight of C programmers-even other references. Perl gives you the power to write enough complicated code to hang yourself.

Now look at some of the ways that you can create and use references in Perl.

Using the Backslash Operator

Using the backslash operator is analogous to using the ampersand (&) operator in C to pass the address of an operator. Usually, you use the backslash operator to create a second, new reference to a variable. The following code shows how to create a reference to a scalar variable:


$variable = 22;

$pointer = \$variable;

$ice = "jello"

$iceptr = \$ice;

$pointer points to the location that contains the value of $variable. The pointer $iceptr points to "jello". Even if the original reference $variable gets destroyed, you can still access the value from the $pointer reference. There is a hard reference at work here, so you will have to get rid of both $pointer and $variable for the space in which 22 is allocated to be freed back to the memory pool.

In the preceding code, the variable $pointer contains the address of $variable, not the value itself. To get the value, you have to de-reference $pointer with two $$. The following sample script shows how this works:


#!/usr/bin/perl



$value = 10;



$pointer = \$value;



printf "\n Pointer Address $pointer of  $value \n";



printf "\n What Pointer *($pointer) points to $$pointer\n";

The $value in the script is set to 10. The $pointer is set to point to the address of $value. The two printf statements show how the value of the variable is referenced. If you run the script shown, you see something very close to the following output:


Pointer Address SCALAR(0x806c520) of  10



What Pointer *(SCALAR(0x806c520)) points to 10

The address in the output from your script will probably be different from what's shown. However, you can see that $pointer gave the address and $$pointer gave the value of the scalar that $variable points to.

Pay attention to how the address is shown in the pointer variable. The word SCALAR is followed by a long hexadecimal number. The word SCALAR tells you that the address points to a scalar variable. The number following SCALAR is the address where the actual value of the scalar variable is kept.

NOTE
A pointer is an address. The data at that address is referred to by a pointer. If the pointer happens to point to an invalid address, you can get bad data. Generally, Perl will simply return a NULL value, but you should not rely on this, and should program to initialize all your pointers to refer to valid data items

References and Arrays

Perhaps the most important point you must remember about Perl is that all Perl @ARRAYs and %HASHes are always one-dimensional. As such, the arrays and hashes hold scalar values only and do not directly contain other arrays or complex data structures. A member of an array is either a number or a reference (including strings).

You can use the backslash operator on arrays and hashes just as you would for scalar variables. You would use something like Listing 18.1 for arrays.


Listing 18.1. Using the backslash operator on arrays.

1  #!/usr/bin/perl

2  #

3  # Using Array references

4  #

5  $pointer = \@ARGV;

6  printf "\n Pointer Address of ARGV = $pointer\n";

7  $i = scalar(@$pointer);

8  printf "\n Number of arguments : $i \n";

9  $i = 0;

10  foreach (@$pointer) {

11      printf "$i : $$pointer[$i++]; \n";

12      }



$ test 1 2 3 4



 Pointer Address of ARGV = ARRAY(0x806c378)



 Number of arguments : 4

0 : 1;

1 : 2;

2 : 3;

3 : 4;

Examine the lines that pertain to references in the shell script shown, which prints the contents of the input argument array @ARGV. Line 5 is where the reference $pointer is set to point to the array @ARGV. Line 6 simply prints the address of ARGV. You probably will never have to use the address of ARGV, but had you been using another array, this is a quick way to get to the address of the first element of the array.

NOTE
Pointers are referred to as references, and vice versa

The $pointer returns the address of the first element of an array. In Listing 18.1, the array happened to be @ARGV. A pointer to an array should sound familiar to C programmers because a reference to a one-dimensional array is simply a pointer to the first element of the array.

Line 7 calls the function scalar() (not to be confused with the type of variable scalar) to get the count of the number of elements in an array. The parameter passed in could be @ARGV, but with the pointer $pointer, you must specify the type of parameter that is expected by the scalar() function. Therefore, you specify the type of parameter as an array by using @$pointer.

The type of $pointer in this case is a pointer to the array whose number of elements you must return from the scalar() function. The call to the function has @$pointer as the passed parameter. The $pointer gives the address of the first element, and the @ sign forces the passing of the address of the first element as an array reference.

Line 10 contains the same reference to the array that line 7 contains. Line 11 lists all the elements of the array using the $$pointer[$i] item. How do you interpret this? The $pointer points to the first element in the array. The program then gets the ($i - 1)-th item in the array ($pointer[$i++]) and increments $i. Finally, the value at $$pointer[$i] is returned as a scalar. Because the autoincrement operator is low on the operator precedence priority list, $i is incremented last of all.

You can also use the backslash operator with associative arrays. The idea is the same-you are substituting the $pointer for all references to the name of the associative array. The number following the word ARRAY in the pointer address of ARGV in the previous example is the address of ARGV. The address itself won't do you any good, because most programs do not need this information, but just realize that references to arrays and scalars are displayed with the type that they happen to be pointing to.

For pointers to functions, the address is printed with the word CODE, and for a hash, it is printed as HASH. See Listing 18.2 for an example of how to print out an address to a hash.


Listing 18.2. Using references to a hash.

#!/usr/bin/perl

1#

2 # Using Associative Array references

3 #

4 %month = (

5     '01', 'Jan',

6    '02', 'Feb',

7  '03', 'Mar',

8    '04', 'Apr',

9    '05', 'May',

10    '06', 'Jun',

11    '07', 'Jul',

12    '08', 'Aug',

13   '09', 'Sep',

14    '10', 'Oct',

15    '11', 'Nov',

16    '12', 'Dec',

17    );

18

19 $pointer = \%month;

20

21 printf "\n Address of hash = $pointer\n ";

22 

23 #

24 # The following lines would be used to print out the

25 # contents of the associative array if %month was used.

26 #

27 # foreach $i (sort keys %month) {

28 # printf "\n $i $$pointer{$i} ";

29 # }

30

31 #

32 # The reference to the associative array via $pointer

33 #

34 foreach $i (sort keys %$pointer) {

35     printf "$i is $$pointer{$i} \n";

36 }



$ mth



 Address of hash = HASH(0x806c52c)



 01 is Jan

 02 is Feb

 03 is Mar

 04 is Apr

 05 is May

 06 is Jun

 07 is Jul

 08 is Aug

 09 is Sep

 10 is Oct

 11 is Nov

 12 is Dec

The reference to the associative array is made with the code in line 19, $pointer = \%month;. As with ordinary arrays, the references to the elements of the array are made with the $$pointer{$index} construct. Of course, because the array is really a hash, the $index is the key into the hash and not a number. See lines 34 and 35 to see how elements in the array are being referenced.

You don't have to construct associative arrays using the comma operator. You can use the => operator instead. In the later Perl module and sample code in this chapter, you will see the => operator, which is the same as the comma operator. Using => makes the code a bit easier to read. See Listing 18.3 for a sample usage of the => operator.


Listing 18.3. Using the => operator.

1 #!/usr/bin/perl

2 #

3 # Using Array references

4 #

5 %weekday = (

6    '01' => 'Mon',

7     '02' => 'Tue',

8     '03' => 'Wed',

9     '04' => 'Thu',

10     '05' => 'Fri',

11    '06' => 'Sat',

12    '07' => 'Sun',

13    );

14 $pointer = \%weekday;

15 $i = '05';

16 printf "\n ================== start test ================= \n";

17 #

18 # These next two lines should show an output

19 #

20     printf '$$pointer{$i} is ';

21    printf "$$pointer{$i} \n";

22    printf '${$pointer}{$i} is ';

23    printf "${$pointer}{$i} \n";

24    printf '$pointer->{$i} is ';

25

26    printf "$pointer->{$i}\n";

27 #

28 # These next two lines should not show anything

29 #

30    printf '${$pointer{$i}} is ';

31    printf "${$pointer{$i}} \n";

32    printf '${$pointer->{$i}} is ';

33    printf "${$pointer->{$i}}";

34 printf "\n ================== end of test ================= \n";

35



================== start test =================


$$pointer{$i} is Fri

${$pointer}{$i} is Fri

$pointer->{$i} is Fri

${$pointer{$i}} is

${$pointer->{$i}} is

 ================== end of test =================

As you can see, the first two lines provided the expected output. The first reference is used in the same way as references to regular arrays. The second line uses the ${pointer} and then indexes using {$i}, and the leftmost $ de-references (gets) the value at the location reached after the indexing. See Lines 20 through 23.

NOTE
When in doubt, print it out. Always use the print statements in Perl to print out values of suspect code. This way you can be sure of how Perl is interpreting your code. Print statements are a cheap tool to use for learning how the Perl interpreter works

Then, two lines of the output didn't work as expected. In the third line, $pointer{$i} tries to reference an array where there is no first element. Because the first element does not point to a valid string, nothing is printed. Nothing is printed in the fourth line of the output for the same reason. See lines 30 through 33.

Multidimensional Arrays

You create a reference to an array through the statement @array = list. You use square brackets to create a reference to a complex anonymous array. Consider the following statement, which sets the parameters for a three-dimensional drawing program:


$line = ['solid', 'black', ['1','2','3'] , ['4', '5', '6']];

The preceding statement constructs an array of four elements. The array is referred to by the scalar $line. The first two elements are scalars, indicating the type and color of the line to draw. The next two elements are references to anonymous arrays and contain the starting and ending points of the line.

To get to the elements of the inner array elements, you can use the following multidimensional syntax:
$arrayReference->[$index]single-dimensional array
$arrayReference->[$index1][$index2] two-dimensional array
$arrayReference->[$index1][$index2][$index3] three-dimensional array

You can create as complex a structure as your sanity, design practices, and computer memory allow. Be kind to the person who might have to manage your code-please keep it as simple as possible. On the other hand, if you are just trying to impress someone with your coding ability, Perl gives you a lot of opportunity to mystify yourself and improve your social life.

TIP
When you have more than three dimensions for any array, consider using a different data structure to simplify the code.

Let's see how creating arrays within arrays works in practice. See Listing 18.4 to see how to print out the information pointed at by the $list reference.


Listing 18.4. Using multi-dimensional array references.

1   #!/usr/bin/perl

2   #

3   # Using Multi-dimensional Array references

4   #

5   $line = ['solid', 'black', ['1','2','3'] , ['4', '5', '6']];

6   print "\$line->[0] = $line->[0] \n";

7   print "\$line->[1] = $line->[1] \n";

8   print "\$line->[2][0] = $line->[2][0] \n";

9   print "\$line->[2][1] = $line->[2][1] \n";

10  print "\$line->[2][2] = $line->[2][2] \n";

11  print "\$line->[3][0] = $line->[3][0] \n";

12  print "\$line->[3][1] = $line->[3][1] \n";

13  print "\$line->[3][2] = $line->[3][2] \n";

14  print "\n"; # The obligatory output beautifier.



$line->[0] = solid

$line->[1] = black

$line->[2][0] = 1

$line->[2][1] = 2

$line->[2][2] = 3

$line->[3][0] = 4

$line->[3][1] = 5

$line->[3][2] = 6

What about the third dimension for an array? Look at a modified version of the same program but add a new twist to the list just created. See Listing 18.5.


Listing 18.5. Using multi-dimensional array references again.

1   #!/usr/bin/perl

2   #

3   # Using Multi-dimensional Array references again

4   #

5   $line = ['solid', 'black', ['1','2','3', ['4', '5', '6']]];

6   print "\$line->[0] = $line->[0] \n";

7   print "\$line->[1] = $line->[1] \n";

8   print "\$line->[2][0] = $line->[2][0] \n";

9   print "\$line->[2][1] = $line->[2][1] \n";

10  print "\$line->[2][2] = $line->[2][2] \n";

11  print "\$line->[2][3][0] = $line->[2][3][0] \n";

12  print "\$line->[2][3][1] = $line->[2][3][1] \n";

13  print "\$line->[2][3][2] = $line->[2][3][2] \n";

14  print "\n";


There is no output for this listing.

In this example of an array that's three deep, you must use a reference such as $line ->[2][3][0]. For a C programmer, this is akin to the statement Array_pointer[2][3][0], where the pointer is pointing to what's declared as an array with three indices.

Can you see how easy it is to set up complex structures of arrays within arrays? The examples shown thus far have used only hard-coded numbers as the indices. There is nothing preventing you from using variables instead.

As with array constructors, you can mix and match hashes and arrays to create as complex a structure as you want.

Let's see how these two hashes and arrays can be combined. Listing 18.6 uses the point numbers and coordinates to define a cube.


Listing 18.6. Defining a cube.

1  #!/usr/bin/perl

2  #

3  # Using Multi-dimensional Array and Hash references

4  #

5  %cube = (

6     '0', ['0', '0', '0'],

7     '1', ['0', '0', '1'],

8     '2', ['0', '1', '0'],

9     '3', ['0', '1', '1'],

10    '4', ['1', '0', '0'],

11    '5', ['1', '0', '1'],

12    '6', ['1', '1', '0'],

13    '7', ['1', '1', '1']

14    );

15 $pointer = \%cube;

16 print "\n Da Cube \n";

17 foreach $i (sort keys %$pointer) {

18    $list = $$pointer{$i};

19    $x = $list->[0];

20    $y = $list->[1];

21    $z = $list->[2];

22    printf " Point $i =  $x,$y,$z \n";

23 }


There is no output for this listing.

In Listing 18.6, %cube contains point numbers and coordinates in a hash. Each coordinate itself is an array of three numbers. The $list variable is used to get a reference to each coordinate definition with the following statement:


$list = $$pointer{$i};

After you get the list, you can reference off of it to get to each element in the list with the following statement:


$x = $list->[0];

$y = $list->[1];

The same result-assigning values to $x, $y, and $z-could be achieved with the following two lines of code:


($x,$y,$z) = @$list;

$x = $list->[0];

This works because you are de-referencing what $list points to and using it as an array, which in turn is assigned to the list ($x,$y,$z). The $x is still assigned with the -> operator.

When you're working with hashes or arrays, de-referencing by -> is similar to de-referencing by $. When you are accessing individual array elements, you are often faced with writing statements such as the following:


$$names[0] = "Kamran";

$names->[0] = "Kamran";

Both lines are equivalent. The $names in the first line has been replaced with the -> operator in the second line. In the case of hashes, the two statements that do the same type of referencing are listed as shown in the following code:


$$lastnames{"Kamran"} = "Husain";

$lastnames->{"Kamran"} = "Husain";

Array references are created automatically when they are first referenced in the left side of an equation. Using a reference such as $array[$i] creates an array into which you can index with $I. Scalars and even multidimensional arrays are created the same way. The following statement creates the contours array if it did not already exist:


$contours[$x][$y][$z] = &xlate($mouseX,$mouseY);

Arrays in Perl can be created and grown on demand. Referencing them for the first time creates the array. Referencing them again at different indices creates the referenced elements for you.

References to Subroutines

In the same way you reference individual items such as arrays and scalar variables, you can also point to subroutines. This is similar to pointing to a function in C. To construct such a reference, you use the following type of statement:


$pointer_to_sub = sub { ... declaration of sub ... } ;

Notice the use of the semicolon at the end of the sub declaration. The subroutine pointed to by $pointer_to_sub points to the same function reference even if this statement is placed in a loop. This feature of Perl enables you to declare anonymous sub() functions in a loop without worrying about whether you are chewing up memory by declaring the same function over and over.

To call a subroutine by reference, you must use the following type of reference:


&$pointer_to_sub( parameters );

This code works because you are de-referencing the $pointer_to_sub and using it with the ampersand (&) as a pointer to a function. The parameters portion might or might not be empty depending on how your function is defined.

The code within a sub is simply a declaration created through a previous statement. The code within the sub is not executed immediately, however. It is compiled and set for each use. Consider Listing 18.7.


Listing 18.7. References to subroutines.

1 #!/usr/bin/perl

2 sub print_coor{

3     my ($x,$y,$z) = @_;

4     print "$x $y $z \n";

5     return $x;};

6  $k = 1;

7  $j = 2;

8  $m = 4;

9  $this  = print_coor($k,$j,$m);

10 $that  = print_coor(4,5,6);



$ test

1 2 3

4 5 6

This output reflects that the assignment of $x, $y, and $z was done when the first declaration of print_coor was encountered as a call. In Listing 18.7, each reference $this and $that points to a different subroutine, the arguments to which were passed at run- time.

Using Subroutine Templates

Subroutines are not limited to returning data types only; they can also return references to other subroutines. The returned subroutines run in the context of the calling routine but are set up in the original call that created them. This behavior is due to the way closure is handled in Perl. Closure means that if you define a function in one context, it runs in that particular context where it was first defined. (See a book on object-oriented programming to get more information on closure.)

For an example of how closure works, Listing 18.8 shows code that you could use to set up different types of error messages. Such subroutines are useful in creating templates of all error messages.


Listing 18.8. Using closures.

#!/usr/bin/perl





sub errorMsg {

         my $lvl = shift;

        #

        # define the subroutine to run when called.

        #

         return sub {



        my $msg = shift;  # Define the error type now.

        print "Err Level $lvl:$msg\n"; }; # print later.

         }



$severe  = errorMsg("Severe");

$fatal = errorMsg("Fatal");

$annoy = errorMsg("Annoying");



&$severe("Divide by zero");

&$fatal("Did you forget to use a semi-colon?");

&$annoy("Uninitialized variable in use");



$severe  = errorMsg("Severe");

$fatal   = errorMsg("Fatal");

$annoy   = errorMsg("Annoying");

The subroutine errorMsg declared here uses a local variable called lvl. After this declaration, errorMsg uses $lvl in the subroutine it returns to the caller. The value of $lvl is therefore set in the context when the subroutine errorMsg is first called, even though the keyword my is used. The three calls that follow set up three different $lvl variable values, each in their own context:


$severe  = errorMsg("Severe");

$fatal   = errorMsg("Fatal");

$annoy   = errorMsg("Annoying");

When the subroutine, errorMsg, returns, the value of $lvl is retained for each context in which $lvl was declared. The $msg value from the referenced call is used, but the value of $lvl remains what was first set in the actual creation of the function.

Sounds confusing? It is. This is primarily the reason you do not see such code in most Perl programs.

Using Subroutines to Work with Multiple Arrays

Using arrays is great for collecting relevant information in one place. Now let's see how we can work with multiple arrays through subroutines. You pass one or more arrays into Perl subroutines by reference. However, you have to keep in mind a few subtle things about using the @_ symbol when you process these arrays in the subroutine. Look at Listing 18.9, which is an example of a subroutine that expects a list of names and a list of phone numbers.


Listing 18.9. Passing multiple arrays.

1  #!/usr/bin/perl

2  @names = (mickey, goofy, daffy );

3  @phones = (5551234, 5554321, 666 );

4  $i = 0;

5  sub listem {

6      my (@a,@b) = @_;

7      foreach (@a) {

8         print "a[$i] = ". $a[$i] . " " . "\tb[$i] = " . $b[$i] ."\n";

9          $i++;

10         }

11     }

12 &listem(@names, @phones);



a[0] = mickey     b[0] =

a[1] = goofy      b[1] =

a[2] = daffy      b[2] =

a[3] = 5551234    b[3] =

a[4] = 5554321    b[4] =

a[5] = 666        b[5] =

Whoa! What happened to the @b array, and why is the rest of @a just like the array @b? This result occurs because the array @_ of parameters in a subroutine is one-I repeat, only one-long list of parameters. If you pass in fifty arrays, the @_ is one array of all the elements of the fifty arrays concatenated together.

In the subroutine in Listing 18.9, the assignment my (@a, @b) = @_ gets loosely interpreted by your Perl interpreter as, "Let's see, @a is an array, so assign one array from @_ to @a and then assign everything else to @b." Never mind that the @_ is itself an array and will therefore get assigned to @a, leaving nothing to assign to @b.

To illustrate this point, let's change the script to how it appears in Listing 18.10.


Listing 18.10. Passing a scalar and an array.

#!/usr/bin/perl

@names = (mickey, goofy, daffy );

@phones = (5551234, 5554321, 666 );

$i = 0;

sub listem {

    my ($a,@b) = @_;

    print " \$a is " . $a . "\n";

    foreach (@b) {

        print "b[$i] = $b[$i] \n";

        $i++;

        }

    # --------------------------------------------------

    # Actually, you could write the for loop as

    # foreach (@b) {

    #   print $_ . "\n" ;

    #   }

    # This your secret answer to Quiz question 18.4.

    # ----------------------------------------------------

    }



&listem(@names, @phones);



$ testArray



 $a is mickey

b[0] = goofy

b[1] = daffy

b[2] = 5551234

b[3] = 5554321

b[4] = 666

Do you see how $a was assigned the first value and then @b was assigned the rest of the values? In order get around this @_ interpretation feature and pass arrays into subroutines, you have to pass arrays in by reference, which you do by modifying the script to look like the following:


#!/usr/bin/perl



@names = (mickey, goofy, daffy );

@phones = (5551234, 5554321, 666 );

$i = 0;

sub listem {

    my ($a,$b) = @_;

    foreach (@$a) {

       print "a[$i] = " . @$a[$i] . " " . "\tb[$i] = " . @$b[$i] ."\n";

        $i++;

        }

    }



&listem(\@names, \@phones);

The following major changes were necessary to bring the original script to this point:

The following output matches what we expected:


$ testArray2

a[0] = mickey     b[0] = 5551234

a[1] = goofy      b[1] = 5554321

a[2] = daffy      b[2] = 666

DO pass by reference whenever possible.
DO pass arrays by reference when you are passing more than one array to a subroutine.
DON'T use (@variable)=@_ in a subroutine unless you want to concatenate all the passed parameters into one long array

Pass By Value or By Reference?

When used in a subroutine argument list, scalar variables are always passed by reference. You do not have a choice here. You can, however, modify the values of these variables if you really want to. To access these variables, you can use the @_ array and index each individual element in it using $_[$index], where $index counts from zero up.

Arrays and hashes are different beasts altogether. You can either pass them as references once or pass references to each element in the array. For long arrays, the choice should be fairly obvious-pass the reference to the array only. In either case, you can use the references to modify what you want in the original array.

The @_ mechanism concatenates all the input arrays in a subroutine into one long array. This feature is nice if you do want to process the incoming arrays as one long array. Usually, you want to keep the arrays separate when you process them in a subroutine, and passing by reference is the best way to do that. Hold that thought: Don't use globals.

In short, pass by reference and respect the value of any global variable unless there is a strong compelling reason not to.

References to File Handles

Sometimes, you have to write the same output to different output files. For example, an application programmer might want the output to go to the screen in one instance, the printer in another, and a file in another-or even all three at the same time. Rather than make separate statements for each handle, it would be nice to write something like the following:


spitOut(\*STDIN);

spitOut(\*LPHANDLE);

spitOut(\*LOGHANDLE);

Notice that the file handle reference is sent with the \*FILEHANDLE syntax because you refer to the symbol table in the current package. In the subroutine that handles the output to the file handle, you would have code that looks something like the following:


sub spitOut {

    my $fh = shift;

    print $fh "Gee Wilbur, I like this lettuce\n";

}

What Does the *variable Operator Do?

In UNIX (and other operating systems), the asterisk is a sort of wildcard operator. In Perl, you can refer to other variables and so on by using the asterisk operator:


*iceCream;

When used in this manner, the asterisk is also known as a typeglob. The asterisk at the beginning of a term can be thought of as a wildcard match for all the mangled names generated internally by Perl.

You can use a typeglob in the same way you use a reference because the de-reference syntax always indicates the kind of reference you want. ${*iceCream} and ${\$iceCream} both indicate the same scalar variable. Basically, *iceCream refers to the entry in the internal _main associative array of all symbol names for the _main package. *kamran really translates to $_main{'kamran'} if you are in the _main package context. If you are in another package, the _packageName{} hash is used.

When evaluated, a typeglob produces a scalar value that represents the first objects of that name. This includes file handles, format specifiers, and subroutines.

Using Symbolic References… Again

Using brackets around references makes constructing strings easier:


    $road = ($w)  ? "free":"high";

print "${road}way";

The preceding line prints highway or freeway depending on the value of $w. This syntax will be familiar to you if you write make files or shell scripts. In fact, you can use this ${variable} construct outside of double quotes, as in the following example:


print ${road};

print ${road} . "way";

print ${ road } . "way";

You can also use reserved words in the ${ } brackets. Check out the following lines:


$if = "road";

print "\n ${if} way \n";

Using reserved words for anything other than their intended purpose, however, is playing with fire. Be imaginative and make up your own variables. You can use reserved words but will have to remember to force interpretation as a reserved word by adding anything that makes it more than a reference. It's generally not a good idea to use a variable called ${while}, because it is confusing to read.

When you work with hashes, you have to create an extra reference to the index. In other words, you cannot use something like this:


$clients { \$credit } = "despicable" ;

The \$credit variable will be converted to a string and won't be used correctly as an index in the hash. You have to use a two-step procedure such as this:


$chist = \@credit;

$x{ $chist } = "despicable";

Declaring Variables with Curly Braces

The preceding section brings up an interesting point about curly braces for a use other than indexing into hashes. In Perl, curly braces are usually reserved for delimiting blocks of code. Assume you were returning the passed list by sorting it in reverse order. The passed list is in @_ of the called subroutine, so the following two statements are equivalent:


sub backward {

    { reverse sort @_ ; }

    };





sub backward {

    reverse sort @_ ;

    };

When preceded by the @ operator, curly braces enable you to set up small blocks of evaluated code.


#!/usr/bin/perl





sub average {

    ($a,$b,$c) = @_;

        $x = $a + $b + $c;

        $x2 = $a*$a + $b*$b + $c*$c;

    return ($x/3, $x2/3 ); }



$x = 1;

$y = 34;

$x = 47;



print "The midpt is @{[&average($x,$y,$z)]} \n";

This script prints 27 and 1121.6666. In the last line of code with the @{} in the double-quoted string, the contents of the @{} are evaluated as a block of code. The block creates a reference to an anonymous array that contains the results of the call to the subroutine average($x,$y,$z). The array is constructed because of the brackets around the call. As a result, the [] construct returns a reference to an array, which in turn is converted by @{} into a string and inserted into the double-quoted string.

More on Hard Versus Symbolic References

By now, you should be able to see the difference between hard and symbolic links. Let's look at some of the minor details of the two types of links and how these links are handled in Perl.

When you use a symbolic reference that does not exist, Perl creates the variable for you and uses it. For variables that already exist, the value of the variable is substituted for the $variable string. This substitution is a powerful feature of Perl because you can construct variable names from variable names.

Consider the following example:


1 $lang = "java";

2 $java = "coffee";

3 print "${lang}\n";

4 print "hot${lang}\n";

5 print "$$lang \n"

Look at line 5. The $$lang is first reduced to $java. Then recognizing that $java can also be re-parsed, the value of $java ("coffee") is used.

The value of the scalar produced by $$lang is taken to be the name of a new variable, and the variable at $name is used. The following is the output from this example:


java

hotjava

coffee

The difference between a hard reference ($lang) and a symbolic reference ($$lang) is how the variable name is derived. With a hard reference, you are referring to a variable's value directly. Either the variable exists in the symbol table for the package you are in (that is, which lexical context you are in), or the variable does not exist. With a symbolic reference, you are using another level of indirection by constructing or deriving a symbol name from an existing variable.

To force only hard references in a program and protect yourself from accidentally creating symbolic references, you can use the module called strict, which forces Perl to do strict type checking. To use this module, place the following statement at the top of your Perl script:


use strict 'refs';

From this point on, only hard references are allowed for the rest of the script. You place this use strict ... statement within curly braces to limit the type checking to the code block within the braces. For example, in the following code, the type checking would be limited to the code in the subroutine java():


sub java {

   use strict "refs"; 

   #

   # type  checking here. 

}

...

# no type checking here.

To turn off the strict type checking at any time within a code block, use this statement:


no strict 'refs';

One last point: Symbolic references cannot be used on variables declared with the my construct because these variables are not kept in any symbol table. Variables declared with the my construct are valid only for the block in which they are created. Variables declared with the local word are visible to all ensuing lower code blocks because they are in a symbol table.

For More Information

In addition to consulting the obvious documents such as the Perl man pages, look at the Perl source code for more information. The 't/op' directory in the Perl source tree has some regression test routines that should definitely get you thinking. A lot of documents and references are available at the Web sites www.perl.com and www.metronet.com.

Summary

The two types of references in Perl 5 are hard and symbolic. Hard links work like hard links in UNIX file systems. You can have more than one hard link to the same item; Perl keeps a reference count for you. This reference count is incremented or decremented as references to the item are created or destroyed. When the count goes to zero, the link and the object it is pointing to are both destroyed. Symbolic links, which are created through the ${} construct, are useful in providing multiple stages of references to objects.

You can have references to scalars, arrays, hashes, subroutines, and even other references. References themselves are scalars and have to be de-referenced to the context before being used. Use @$pointer for an array, %$pointer for a hash, &$pointer for a subroutine, and so on for dereferencing.

Multidimensional arrays are possible using references in arrays and hashes.

Parameters are passed into a subroutine through references. The @_ array is really all the passed parameters concatenated in one long array. To send separate arrays, use the references to the individual items.

Tomorrow's lesson covers Perl objects and references to objects. We have deliberately not covered Perl objects in this chapter because it requires some knowledge of references. References are used to create and refer to objects, constructors, and packages.

Q&A

Q:How do I know what type of address a pointer is pointing to?
A:The address printed out with the print statement on a reference has a qualifier word in front of it. For example, a reference to a hash has the word HASH followed by an address value, an array has the word ARRAY, and so on.
Q:How are multidimensional arrays possible using Perl?
A:References in Perl point to scalars only. References to arrays point to the beginning of the array. Arrays can contain references to other arrays, hashes, and so on. The way to create multidimensional arrays in Perl is by using references to references.
Q:What's the best way to pass more than one array into a subroutine?
A:Pass references to the arrays, using the \@arrayname for each array passed-as in the following call:
mysub(\@one, \@two);
Within the subroutine, take each reference off one at a time.
my ($a, $b) = @_;
Now use @$a and @$b to get to the arrays passed into the subroutines.
Q:Why is *moo more efficient to use than $_main{'moo'}? Is there a difference in usage?
A:Both *moo and $_main{'moo'} mean the same variable (as long as you aren't using a package). *moo is more efficient because the reference is looked up once at compile time, whereas $_main{'moo'} is evaluated at runtime and evaluated each time it is run.

Workshop

The Workshop provides quiz questions to help you solidify your understanding of the material covered and exercises to give you experience in using what you've learned. Try and understand the quiz and exercise answers before you go on to tomorrow's lesson.

Quiz

  1. Given that $pointer is a pointer to a hash, what's wrong with the following line of code?
    $x= ${$pointer->{$i}};
  2. Why is $b not being set in the following line of code? What do you have to do to make it okay?
    sub xxx {
    my ($a, $b) = @_;
    }
  3. What's the difference between these two lines of code?
    printf "$i : $$pointer[$i++]; ";
    printf " and $i : $pointer->[$i++]; \n";
  4. What do the following lines of code print out?
    $HelpHelpHelp = \\\"Help";
    print $$$$HelpHelpHelp;
  5. What's the use of the ${variable} construct? How could the following three lines of code be rewritten?
    $name = ${$scalarref};
    draw(@{$coordinates}, $display);
    ${$months}[0] = "March";

Exercises

  1. Write a Perl script to print out address types of different variables and complex structures.
  2. Write a Perl code fragment that constructs an array of pointers to functions. How would you use it?
    Strong Hint:
    $foo = sub foo { print "foo\n"; }
    $bar = sub bar { print "bar\n"; }
    $yuk = sub yuk { print "yuk\n"; }
    $huh = sub huh { print "huh\n"; }
    @list = ($foo, $bar, $yuk, $huh);
  3. Explain the difference between hard and symbolic references.
  4. Write a Perl subroutine that takes two arrays as arguments and returns the reverse-sorted copy of each array.
  5. Modify the following script to print the value of $this and $that. Are they the same? If not, why not?
    #!/usr/bin/perl
    sub print_coor{
    my ($x,$y,$z) = @_;
    print "$x $y $z \n";
    return $x;};
    $k = 1;
    $j = 2;
    $m = 4;
    $this = print_coor($k,$j,$m);
    $that = print_coor(4,5,6);