Chapter 12

Working with the File System


CONTENTS

Today's lesson teaches you how to manipulate your machine's file system using some of Perl's built-in library functions. Today, you learn about the following:

Many of the functions described in today's lesson use features of the UNIX operating system. If you are using Perl on a machine that is not running UNIX, some of these functions might not be defined or might behave differently.
Check the documentation supplied with your version of Perl for details on which functions are supported or emulated on your machine

File Input and Output Functions

The following sections describe the built-in library functions that read information from files and write information to files. These library functions perform the following tasks:

Basic Input and Output Functions

Some of the input and output functions supplied by Perl have been discussed in earlier chapters. These are

The following sections briefly describe these functions again, along with some features of these functions that have not been discussed previously.

The open Function

The open function enables a Perl program to access a file. It associates a special file variable with each accessed file. The following is an example:


open (MYVAR, "/u/jqpublic/file");

Here, open requests access to the file /u/jqpublic/file, and it associates the file MYVAR with this file after it is open. open returns a nonzero value if the open succeeds, and zero if the open fails.

By default, open opens a file for reading only. To open a file for writing, put a > character in front of the filename, as follows:


open (MYVAR, ">/u/jqpublic/file");

To append information to an existing file, put two > characters in front of the filename, as follows:


open (MYVAR, ">>/u/jqpublic/file");

To treat the open file as a command to which to pipe data, put a pipe (|) character in front of the filename, as follows:


open (MAIL, "|mail dave");

(For more information, refer to Day 6, "Reading from and Writing to Files.")

Piping Input Using open

The open function enables you to open files in several other ways not previously discussed. For example, to treat the open file as a command that is piping data to this program, put a | character after the filename. For example:


open (CAT, "cat file*|");

This call to open executes the command cat file*. This command creates a temporary file consisting of the contents of all files whose name starts with file; these contents are joined (concatenated) into a single file. This file is treated as an input file that is accessible using the file variable CAT.


$input = <CAT>;

Listing 12.1 is another example of a program that uses piped input. This program uses the output from the w command to list the users who are currently logged on to the machine.


Listing 12.1. A program that receives input from a piped command.

1:  #!/usr/local/bin/perl

2:  

3:  open (WOUT, "w|");

4:  $time = <WOUT>;

5:  $time =~ s/^ *//;

6:  $time =~ s/ .*//;

7:  <WOUT>;   # skip headings line

8:  @users = <WOUT>;

9:  close (WOUT);

10: foreach $user (@users) {

11:         $user =~ s/ .*//;

12: }

13: print ("Current time:  $time");

14: print ("Users logged on:\n");

15: $prevuser = "";

16: foreach $user (sort @users) {

17:         if ($user ne $prevuser) {

18:                 print ("\t$user");

19:                 $prevuser = $user;

20:         }

21: }



$ program12_1

Current time: 4:25pm

Users logged on:

        dave

        kilroy

        root

        zarquon

$

The w command lists the current time, the machine load, and the users logged onto the machine. It also lists the job time and the currently executing command for each user.

Here is sample output for the w command:


  4:25pm  up 1 day,  6:37,  6 users,  load average: 0.79, 0.36, 0.28

User     tty       login@  idle   JCPU   PCPU what

dave     ttyp0     2:26pm           27      3 w

kilroy   ttyp1     9:01am  2:27   1:04     11 -csh

kilroy   ttyp2     9:02am    43   1:46     27 rn

root     ttyp3     4:22pm     2               -csh

zarquon  ttyp4     1:26pm     4     43     16 cc myprog.c

kilroy   ttyp5     9:03am         2:14     48 /usr/games/hack

This Perl program takes the output from the w command and massages it to retrieve only the information needed: the current time and the users who are currently logged on.

Line 3 starts the w command. The call to open specifies that the output from w is to be treated as input to this program, and that the file variable WOUT is to be used to access this input.

Line 4 reads the first line of the input piped from WOUT. This is the line read:


4:25pm  up 1 day,  6:37,  6 users,  load average: 0.79, 0.36, 0.28

The following two lines extract the current time from this line. First, line 5 removes the leading spaces. Then, line 6 removes everything after the first word, except for the trailing newline character. This leaves the time, 4:25pm, along with the trailing newline, stored in $time.

Line 7 reads the second line from WOUT. Because this line contains no useful information, there is no need to assign it to any scalar variable.

Line 8 reads the rest of the output from w to the array variable @users. After this output has been read, line 9 closes WOUT, which terminates the process that is running the w command.

Each element of the list stored in @users contains one line of user information. Because this program needs only the first word of each line, lines 10-12 get rid of everything else (except, again, for the trailing newline character). After this loop is complete, the array in @users contains a list of users logged on.

Line 13 prints the current time, as stored in $time. Note that print does not need to specify a trailing newline character, because $time contains one.

Lines 16-21 sort the list of users in @users and prints them. Because a user can be logged on more than once, $prevuser stores the last user name printed. The value stored in $user is not printed unless it is not the same as the value stored in $prevuser.

Redirecting One File to Another

Many UNIX shells enable you to direct both the standard output file and the standard error file to the same output file. For example, in the Bourne shell sh, the command


$ foo >file1 2>&1

runs the command foo and stores the output from the standard output file and the standard error file in file1.

Listing 12.2 shows how you can do this in Perl.


Listing 12.2. A program that redirects the standard output and standard error files.

1:  #!/usr/local/bin/perl

2:  

3:  open (STDOUT, ">file1") || die ("open STDOUT failed");

4:  open (STDERR, ">&STDOUT") || die ("open STDERR failed");

5:  print STDOUT ("line 1\n");

6:  print STDERR ("line 2\n");

7:  close (STDOUT);

8:  close (STDERR);


This program produces no output.

The following are the contents of the output file file1:


line 2

line 1

As you can see, these lines aren't in the order intended. To understand what is happening, let's examine this program in more detail.

Line 3 redirects the standard output file. To do this, it opens the output file file1 and associates it with the file variable STDOUT; this closes the standard output file.

Line 4 redirects the standard error file. The argument >&STDOUT tells the Perl interpreter to use the file already opened and associated with STDOUT. This means that the file variable STDERR refers to the same file as STDOUT.

Lines 5 and 6 write to STDOUT and STDERR, respectively. Because these file variables refer to the same file, both lines are written to file1. Unfortunately, they are written in the wrong order. What has happened?

The problem arises because of how UNIX handles the writing of output. When you use print (or any other function) to write to a file such as the standard output file, what the UNIX operating system really does is copy the output to a special internal storage area called a buffer. (You can think of a buffer as a giant character string or as an array of characters.) Subsequent output operations continue writing to the buffer until it is full; when the buffer is full, the entire buffer is written out. Copying to a buffer and then writing out the entire buffer takes much less time than writing individual lines of text. (This is because, on most machines, input-output operations are slower than memory-access operations.)

When a program ends, any non-empty buffers are written out. However, the system maintains separate buffers for STDERR and STDOUT, and it writes out the buffer for STDERR first. This means that line 2, which is stored in the STDERR buffer, appears before line 1, which is stored in the STDOUT buffer.

To get around this problem, you can tell the Perl interpreter not to use a buffer for a particular file. To do this, do the following:

  1. Select the file using the select function.
  2. Assign 1 to the system variable $|.

The system variable $| indicates whether a particular file is to be buffered (in other words, whether it should use a buffer or not). If $| is assigned a nonzero value, no buffer is used. As with $~ and $^, assigning to $| affects the current default file, which is the file last specified in a call to select (or STDOUT, if select has not been called).

Listing 12.3 shows how you can use $| to ensure that your output lines appear in the correct order.


Listing 12.3. A program that redirects standard input and output and turns off buffering.

1:  #!/usr/local/bin/perl

2:  

3:  open (STDOUT, ">file1") || die ("open STDOUT failed");

4:  open (STDERR, ">&STDOUT") || die ("open STDERR failed");

5:  $| = 1;

6:  select (STDERR);

7:  $| = 1;

8:  print STDOUT ("line 1\n");

9:  print STDERR ("line 2\n");

10: close (STDOUT);

11: close (STDERR);


This program produces no output.

The contents of the output file file1 are now the following:


line 1

line 2

Line 5 sets $| to 1, which tells the Perl interpreter that the current default file does not need to be buffered. Because select has not yet been called, the current default file is STDOUT, which means that line 5 turns off buffering for the standard output file (which has been redirected to file1).

Line 6 sets the current default file to STDERR, and line 7 once again sets $| to 1. This turns off buffering for the standard error file (which has also been redirected to file1).

Because buffering has been turned off for both STDERR and STDOUT, lines 8 and 9 write to file1 right away. This means that the output lines appear in file1 in the order in which they are printed.

Specifying Read and Write Access

To open a file for both read and write access, specify +> before the filename, as follows:


open (READWRITE, "+>file1");

This opens the file named file1 for both reading and writing. This enables you to overwrite portions of a file.

Opening a file for reading and writing works best in conjunction with the library functions seek and tell, which enable you to skip to the middle of a file. (For more information on seek and tell, refer to the section called "Skipping and Rereading Data," later in today's lesson.)

NOTE
You also can use +< as the prefix to specify both reading and writing, as follows:
open (READWRITE, "+<file1");
The prefix <, by itself, specifies that the file is to be opened for reading. This means that the following two statements are identical:
open (READONLY, "<read");
open (READONLY, "read")

The close Function

The library function close was discussed on Day 6, "Reading from and Writing to Files." It closes a file opened by open, as follows:


close (MYFILE);

Here, MYFILE is the file variable (passed to open) that is associated with the open file.

NOTE
If you use close to close a pipe, the program will wait for the piped program to terminate. For example:
open (MYPIPE, "cat file*|");
close (MYPIPE);
When close is called, the program suspends execution until the command cat file* is terminated

The print, printf, and write Functions

The print, printf, and write functions have been covered also in previous chapters, but I'll briefly recap them here.

The print function is the simplest function. It writes to the file specified, or to the current default file if no file is specified. For example:


print ("Hello, there!\n");

print OUTFILE ("Hello, there!\n");

The first statement writes to the current default file (which is STDOUT unless select has been called). The second statement writes to the file specified by OUTFILE.

The printf function formats a string and sends it to either the file specified or the current default file. For example, the statement


printf OUTFILE ("You owe me %8.2f", $owing);

takes the value stored in $owing and substitutes it for %8.2f in the specified string. %8.2f is an example of a field specifier and indicates that the value stored in $owing is to be treated as a floating-point number.

The write function uses a print format to send formatted output to the file that is specified or to the current default file. For example:


select (OUTFILE);

$~ = "MYFORMAT";

write;

This call to write uses the print format MYFORMAT to send output to the file OUTFILE.

For more information on printf or write, refer to Day 11, "Formatting Your Output."

The select Function

The select function also is covered on Day 11. This function is passed a file variable, which becomes the new current default file. For example:


select (MYFILE);

In this case, MYFILE is now the current default file, which means that calls to print, write, and printf write to MYFILE unless a file variable is explicitly specified.

The eof Function

The library function eof checks whether the last input file read has been exhausted. If all of the input has been read, eof returns a nonzero value. If there is input remaining, eof returns zero.

The eof function was first introduced on Day 6. You might have noticed that, on that day, the examples that use eof use it without parentheses. This is because the behavior of eof is a little tricky if you are using it in conjunction with the <> operator; in this case, eof and eof() behave differently.

Listing 12.4 shows how eof interacts with <>. It prints the contents of one or more input files whose names are supplied on the command line. A line of dashes is printed after each input file is completed.

To run this program yourself, create two files named file1 and file2. Put the following in file1:


This is a line from the first file.

Here is the last line of the first file.

Then, put the following in file2:


This is a line from the second and last file.

Here is the last line of the last file.

Finally, specify file1 and file2 on the command line when you run this program. For example, if you have called this program program 12_4, run it as follows:


$ program12_4 file1 file2

This will give you the output shown in the input-output example.


Listing 12.4. A program that uses eof and <> together.

1:  #!/usr/local/bin/perl

2:  

3:  while ($line = <>) {

4:          print ($line);

5:          if (eof) {

6:                  print ("-- end of current file --\n");

7:          }

8:  }



$ program12_4 file1 file2

This is a line from the first file.

Here is the last line of the first file.

-- end of current file --

This is a line from the second and last file.

Here is the last line of the last file.

-- end of current file --

$

The <> operator in line 3 tells the program to read the next line of input from the input files supplied on the command line. Line 4 then prints the line.

Line 5 calls eof without parentheses. This is the form of eof that you are familiar with. It returns true if the current input file has been completely read.

When you test for end-of-file, use either eof or eof() but not both

Compare the program in Listing 12.4 with Listing 12.5, which uses eof() instead of eof.


Listing 12.5. A program that uses eof() and <> together.

1:  #!/usr/local/bin/perl

2:  

3:  while ($line = <>) {

4:          print ($line);

5:          if (eof()) {

6:                  print ("-- end of output --\n");

7:          }

8:  }



$ program12_5 file1 file2

This is a line from the first file.

Here is the last line of the first file.

This is a line from the second and last file.

Here is the last line of the last file.

-- end of output --

$

Line 5 of this program calls eof with parentheses. Calls to eof with parentheses only return true when all of the files have been read. If the program is at the end of the first input file, eof() returns false because there is still input to be read.

NOTE
If you like, you can use eof with a particular file. For example:
if (eof(MYFILE)) {
# do end-of-file stuff
}
Here, the conditional expression returns true if all of MYFILE has been read.
Also, note that the distinction between eof and eof() is only meaningful when you are using the <> operator. If you are just reading from a single file, it doesn't matter whether you supply parentheses or not. For example:
while ($line = <STDIN>) {
# stuff goes here
if (eof) { # you can also use eof() here
# more stuff here
}
}

Indirect File Variables

When you call any of the functions described so far in today's lesson, you can indicate which file to use by specifying a file variable. However, these functions also enable you to supply a scalar variable in place of a file variable; when you do, the Perl interpreter treats the value stored in the scalar variable as the name of the file variable. For example, consider the following:


$filename = "MYFILENAME";

open ($filename, ">file1");

This call to open takes the value stored in $filename-MYFILENAME-and uses it as the file-variable name. This means that the file variable MYFILENAME is now associated with the output file file1.

Listing 12.6 is an example of a program that stores a file-variable name in a scalar variable and passes the library variable to Perl input and output functions.


Listing 12.6. A program that uses a scalar variable to store a file variable name.

1:  #!/usr/local/bin/perl

2:  

3:  &open_file("INFILE", "", "file1");

4:  &open_file("OUTFILE", ">", "file2");

5:  while ($line = &read_from_file("INFILE")) {

6:          &print_to_file("OUTFILE", $line);

7:  }

8:  

9:  sub open_file {

10:         local ($filevar, $filemode, $filename) = @_;

11: 

12:         open ($filevar, $filemode . $filename) ||

13:                 die ("Can't open $filename");

14: }

15: sub read_from_file {

16:         local ($filevar) = @_;

17: 

18:         <$filevar>;

19: }

20: sub print_to_file {

21:         local ($filevar, $line) = @_;

22: 

23:         print $filevar ($line);

24: }


This program produces no output.

This program is just a fancy way of copying the contents of file1 to file2. Line 3 opens the input file, file1, for reading by calling the subroutine open_file. This subroutine is passed the name of the file variable to use, which is INFILE.

Line 4 uses the same subroutine, open_file, to open the output file, file2, for writing. The file variable OUTFILE is used in this open operation.

Line 5 calls read_from_file to read a line of input and passes it the file variable name INFILE. Line 18 substitutes the value of $filevar, INFILE, into <$filevar>, yielding the result <INFILE>; then, it reads a line from this input file. Because this line-reading operation is the last expression evaluated in the subroutine, the line read is returned by the subroutine and assigned to $line.

Line 6 then passes OUTFILE and the input line just read to the subroutine print_to_file.

NOTE
All of the functions you've seen so far in this chapter-open, close, print, printf, write, select, and eof-enable you to use a scalar variable in place of a file variable.
The functions open, close, write, select, and eof also enable you to use an expression in place of a file variable. The value of the expression must be a character string that can be used as a file variable

Skipping and Rereading Data

In the programs you've seen so far,i nput files have always been read in order, starting with the first line of input and continuing on to the end. Perl provides two special functions, seek and tell, which enable you to skip forward or backward in a file so that you can skip or re-read data.

The seek Function

The seek function moves backward or forward in a file.

The syntax for the seek function is


seek (filevar, distance, relative_to);

As you can see, seek requires three arguments:

If relative_to is 0, the number of bytes to skip is relative to the beginning of the file. If relative_to is 1, the skip is relative to the current position in the file (the current position is the location of the next line to be read). If relative_to is 2, the skip is relative to the end of the file.

For example, to skip back to the beginning of the file MYFILE, use the following:


seek(MYFILE, 0, 0);

The following statement skips forward 80 bytes:


seek(MYFILE, 80, 1);

The following statement skips backward 80 bytes:


seek(MYFILE, -80, 1);

And the following statement skips to the end of the file (which is useful when the file has been opened for reading and writing):


seek(MYFILE, 0, 2);

The seek function returns true (nonzero) if the skip was successful, and 0 if it failed. It is often used in conjunction with the tell function, described in the next section.

The tell Function

The tell function returns the distance, in bytes, between the beginning of the file and the current position of the file (the location of the next line to be read).

The syntax for the tell function is


tell (filevar);

filevar, which is required, represents the file whose current position is needed.

For example, the following statement retrieves the current position of the file MYFILE:


$offset = tell (MYFILE);

NOTE
tell and seek accept an expression in place of a file variable, provided the value of the expression is the name of a file variable

You can use tell and seek to skip to a particular position in a file. For example, Listing 12.7 uses these functions to print pairs of lines twice each. (This is, of course, not the fastest way to do this.)


Listing 12.7. A program that demonstrates seek and tell.

1:  #!/usr/local/bin/perl

2:  

3:  @array = ("This", "is", "a", "test");

4:  open (TEMPFILE, ">file1");

5:  foreach $element (@array) {

6:          print TEMPFILE ("$element\n");

7:  }

8:  close (TEMPFILE);

9:  open (TEMPFILE, "file1");

10: while (1) {

11:         $skipback = tell(TEMPFILE);

12:         $line = <TEMPFILE>;

13:         last if ($line eq "");

14:         print ($line);

15:         $line = <TEMPFILE>;  # assume the second line exists

16:         print ($line);

17:         seek (TEMPFILE, $skipback, 0);

18:         $line = <TEMPFILE>;

19:         print ($line);

20:         $line = <TEMPFILE>;

21:         print ($line);

22: }



$ program12_7

This

is

This

is

a

test

a

test

$

Lines 3-8 of this program create a temporary file named file1 consisting of four lines: This, is, a, and test. Line 9 opens this temporary file for reading.

Lines 10-22 loop through the test file. Line 11 calls tell to obtain the current position of the file before reading the pair of lines. Lines 12-16 read the lines and print them (first testing whether the end of the file has been reached).

Line 17 then calls seek, which positions the file at the point returned by tell in line 11. This means that the pair of lines read by lines 12 and 15 are read again by lines 18 and 20. Therefore, lines 19 and 21 print a second copy of the input lines.

NOTE
You cannot use seek and tell if the file variable actually refers to a pipe. For example, if you open a pipe using the statement
open (MYPIPE, "cat file*|");
then the following statement makes no sense:
$illegal = tell (MYPIPE)

System Read and Write Functions

In Perl, the easiest way to read input from a file is to use the <filevar> operator, where filevar is the file variable representing the file to read. Perl also provides two other functions that read from an input file:

Perl also enables you to write output using the built-in function syswrite, which calls the UNIX write function.

These functions are described in the following sections.

The read Function

The read function is designed to be equivalent to the UNIX function fread. It enables you to read an arbitrary number of characters (bytes) into a scalar variable.

The syntax for the read function is


read (filevar, result, length, skipval);

Here, filevar is the file variable representing the file to read, result is the scalar variable (or array variable element) into which the bytes are to be stored, and length is the number of bytes to read.

skipval is an optional argument which specifies the number of bytes to skip before reading.

For example:


read (MYFILE, $scalar, 80);

This call to read tries to read 80 bytes from the file represented by the file variable MYFILE, storing the resulting character string in $scalar. It returns the number of bytes actually read; if MYFILE is at end-of-file, it returns 0 (read returns the null string if an error occurs).

You can use read to append to an existing scalar variable by specifying a fourth argument, which indicates the number of bytes to skip in the scalar variable.


read (MYFILE, $scalar, 40, 80);

This call to read reads another 40 bytes from MYFILE. When copying these bytes into $scalar, read first skips the first 80 bytes already stored there.

The sysread and syswrite Functions

If you want to read data as quickly as possible, you can call sysread instead of read.

The syntax for the sysread function is


sysread (filevar, result, length, skipval);

These arguments are the same as for read.

For example:


sysread (MYFILE, $scalar, 80);

sysread (MYFILE, $scalar, 40, 80);

sysread is equivalent to the UNIX function read. The arguments to sysread are the same as those for the Perl read function.

To write as quickly as possible, call the syswrite function, which is equivalent to the UNIX function write.

The syntax of the syswrite function is


syswrite (filevar, data, length, skipval);

Here, filevar is the file to write to, data is the place where the data is located, length is the number of bytes to write, and skipval is the number of bytes to skip before writing.

For instance, the following call writes the first 80 bytes of $scalar to the file specified by MYFILE:


syswrite (MYFILE, $scalar, 80);

Similarly, the following statement skips the first 80 bytes stored in $scalar, and then writes the next 40 bytes to the file specified by MYFILE:


syswrite (MYFILE, $scalar, 40, 80);

Don't use sysread and syswrite unless you know what you are doing. For more information on these functions, refer to the UNIX system manual pages for the read and write functions

Reading Characters Using getc

Perl provides one other built-in function, getc, which reads a single character of input from a file.

The syntax for calls to the getc function is


char = getc (infile);

infile is the file from which to read, and char is the character returned.

For example:


$singlechar = getc(INFILE);

This statement reads a character from the file represented by INFILE and stores it (as a character string) in the scalar variable $singlechar.

The getc is useful for "hot key" applications. These applications accept and process input one character at a time rather than one line at a time. Listing 12.8 is an example of such a program. It reads one character at a time and checks whether the character is alphanumeric. If it is, it writes out the next higher letter or number. For example, when you enter a, the program prints out b, and so on. In this example, the alphabetic letters a through z and the digits 0 through 9 are typed in.


Listing 12.8. A program that demonstrates the use of getc.

1:  #!/usr/local/bin/perl

2:  

3:  &start_hot_keys;

4:  while (1) {

5:          $char = getc(STDIN);

6:          last if ($char eq "\\");

7:          $char =~ tr/a-zA-Z0-9/b-zaB-ZA1-90/;

8:          print ($char);

9:  }

10: &end_hot_keys;

11: print ("\n");

12: 

13: sub start_hot_keys {

14:         system ("stty cbreak");

15:         system ("stty -echo");

16: }

17: 

18: sub end_hot_keys {

19:         system ("stty -cbreak");

20:         system ("stty echo");

21: }



$ program12_8

bcdefghijklmnopqrstuvwxyza1234567890

$

The subroutine start_hot_keys modifies the runtime environment to support hot-key input. To do this, it uses two calls to the built-in function system, which simply takes its argument and executes it. The command stty cbreak tells the system to process input one character at a time, and the command stty -echo tells the system not to display characters typed at the keyboard.

NOTE
Some machines might not support hot keys or might use different commands to establish the hot-key environment. If you are on a machine that uses different commands to establish the environment, you still can run this program; just change the stty commands to whatever works on your machine

The loop in lines 4-9 reads and writes one character per loop iteration. Line 5 starts off by reading a character from the standard input file using getc.

Line 6 tests whether the character read is a backslash. If it is, the loop terminates. If the character is not a backslash, the program continues with line 7. This line translates all alphanumeric characters to the next-highest letter or number; for example, it translates g to h, E to F, and 7 to 8. The characters z, Z, and 9 are translated to a, A, and 0, respectively.

Line 8 prints out the translated character. Because the characters you type at the keyboard are not displayed, the program makes it look like your keyboard is malfunctioning. (It's quite disorienting!)

The subroutine end_hot_keys restores the normal working environment by undoing the system calls that are performed by start_hot_keys.

If you are using hot keys, when you clean up make sure you call stty-cbreak before calling stty echo. If you call stty echo first, your terminal might wind up not printing newline characters properly

Reading a Binary File Using binmode

If your machine distinguishes between text files and binary files (files that contain unprintable characters), your Perl program can tell the system that a particular file is a binary file. To do this, call the built-in function binmode.

The syntax for calling the binmode function is


binmode (filevar);

filevar is a file variable.

binmode expects a file variable (or an expression whose value is the name of a file variable). It must be called after the file is opened, but before the file is read.

The following is an example of a call to binmode:


binmode (MYFILE);

NOTE
Normally, you won't need to use this function unless you are running in a DOS-like environment

Directory-Manipulation Functions

The input and output functions that you have seen earlier read and write data to files. Perl also provides a group of functions that enable you to manipulate UNIX directories. Functions exist that enable you to create, read, open, close, delete, and skip around in directories. The following sections describe these functions.

The mkdir Function

To create a new directory, call the function mkdir.

The syntax for the mkdir function is


mkdir (dirname, permissions);

mkdir requires two arguments:

For example, to create a directory named /u/jqpublic/newdir, you can use the following statement:


mkdir ("/u/jqpublic/newdir", 0777);

To create a subdirectory of the current working directory, just specify the new directory name, as follows:


mkdir ("newdir", 0777);

If the current working directory is /u/janedoe/mydir, this creates a subdirectory named /u
/janedoe/mydir/newdir
.

The permissions value of 0777 in both these examples grants read, write, and execute permissions to everybody. Table 12.1 lists each possible access permission and the octal number associated with it.

Table 12.1. Access permissions for the mkdir function.

Value
Permission
4000
Set user ID on execution
2000
Set group ID on execution
1000
Sticky bit (see the UNIX chmod manual page)
0400
Read permission for file owner
0200
Write permission for file owner
0100
Execute permission for file owner
0040
Read permission for owner's group
0020
Write permission for owner's group
0010
Execute permission for owner's group
0004
Read permission for world
0002
Write permission for world
0001
Execute permission for world

You can combine access permissions by adding (or doing a logical OR operation on) the appropriate octal values in the table. For example, to grant read, write, and execute permission to the owner but only read permission to everybody else, specify 0744 as the permission value.

NOTE
All of the permission values shown here are in octal notation, because a leading zero is specified. If you like, you can use decimal or hexadecimal here, but it won't be as easy to read.
Also note that the permission value set here is affected by the current value of umask. See the description of the umask function later today for more information

mkdir returns true (nonzero) if the directory is successfully created. It returns false (0) if the directory is not.

The chdir Function

To set a directory to be the current working directory, use the function chdir.

The syntax for the chdir function is


chdir (dirname);

dirname is the name of the new current working directory.

chdir returns true if the current directory is set properly, false if an error occurs.

For example, to set the current working directory to /u/jqpublic/newdir, use the following statement:


chdir ("/u/jqpublic/newdir");

NOTE
As with mkdir, the directory name passed to chdir can be either a character string or an expression whose value is a directory name. For example, the following sets the current directory to be /u/jqpublic/newdir:
$dir = "/u/jqpublic/";
chdir ($dir . "newdir")

The opendir Function

You can have your program examine a list of the files contained in a directory. To do this, the first step is to call the built-in function opendir.

The syntax for the opendir function is


opendir (dirvar, dirname);

dirvar is the name the program is to use to represent the directory, also known as a directory variable, and dirname is the name of the directory to open (which can be a character string or the value of an expression).

opendir returns true if the open operation is successful, and it returns false otherwise.

For example, to open the directory named /u/janedoe/mydir, you can use the following statement:


opendir (DIR, "/u/janedoe/mydir");

This associates the directory variable DIR with the opened directory.

NOTE
If you like, you can use the same name as both a directory variable and a file variable.
opendir (MYNAME, "/u/jqpublic/dir");
open (MYNAME, "/u/jqpublic/dir/file");
The Perl interpreter always can tell from context whether a name is being used as a directory variable or as a file variable. (However, there is no real reason to do so. Your programs will be easier to read if you use different names to represent files and directories.

The closedir Function

To close an opened directory, call the closedir function.

The syntax for the closedir function is


closedir (mydir);

closedir expects one argument: the directory variable associated with the directory to be closed.

The readdir Function

After opendir has opened a directory, you can access the name of each file or subdirectory stored in the directory by calling the function readdir.

The syntax for the readdir function is


readdir (mydir);

Like closedir, readdir is passed the directory variable that is associated with the open directory.

If the value returned from readdir is assigned to a scalar variable, readdir returns the name of the first file or subdirectory stored in the directory. For example:


$filename = readdir(MYDIR);

The first name is returned also if the return value from readdir is assigned to an element of an array variable. For example:


$filearray[3] = readdir(MYDIR);

$filearray{"foo"} = readdir(MYDIR);

If readdir is called again, it returns the next name in the directory; subsequent calls return other names, continuing until the directory is exhausted. Listing 12.9 uses readdir to list the files and subdirectories in a directory.


Listing 12.9. A program that lists the files and subdirectories in a directory.

1:  #!/usr/local/bin/perl

2:  

3:  opendir(HOMEDIR, "/u/jqpublic") ||

4:          die ("Unable to open directory");

5:  while ($filename = readdir(HOMEDIR)) {

6:          print ("$filename\n");

7:  }

8:  closedir(HOMEDIR);



$ program12_9

.

..

.cshrc

.Xresources

.xsession

test

bin

letter

file1

$

Line 3 opens the directory /u/jqpublic, which is the home directory for user jqpublic. The opendir function associates the directory variable HOMEDIR with /u/jqpublic.

Lines 5-7 read the name of each file in the directory in turn. Line 6 prints each filename as it is read in.

Note that, on a UNIX system, the list of names includes two special files:

As you can see, readdir reads the names in the order in which they appear in the directory.

Listing 12.10 shows how you can display the names in alphabetical order.


Listing 12.10. A program that lists the files and subdirectories in a directory in alphabetical order.

1:  #!/usr/local/bin/perl

2:  

3:  opendir(HOMEDIR, "/u/jqpublic") ||

4:          die ("Unable to open directory");

5:  @files = readdir(HOMEDIR);

6:  closedir(HOMEDIR);

7:  foreach $file (sort @files) {

8:          print ("$file\n");

9:  }



$ program12_10

.

..

.Xresources

.cshrc

.xsession

bin

file1

letter

test

$

The readdir function behaves differently when its return value is assigned to an array; in this case, the entire list of files and subdirectories in the directory is assigned to the array variable @files by line 5.

After the entire list is stored, sort can be called to sort the list into alphabetical order. The foreach loop in lines 7-9 then prints the sorted list one name at a time.

The telldir and seekdir Functions

As you've seen, the library functions tell and seek enable you to skip backward and forward in a file. Similarly, the library functions telldir and seekdir enable you to skip backward and forward in a list of directories.

To use telldir, pass it the directory variable defined by opendir. telldir returns the current directory location (where you are in the list of files).

The syntax for the telldir function is


location = telldir (mydir);

Here, mydir is the directory variable corresponding to the directory whose file list you are examining, and location is assigned the current directory location.

To skip to the directory location returned by telldir, call seekdir.

The syntax for the seekdir function is


seekdir(mydir, location);

This call to seekdir sets the current directory location to the location specified by location.

seekdir works only with directory locations returned by telldir

The rewinddir Function

Although being able to skip anywhere you like in a directory list is useful, the most common skipping operation in directory lists is rewinding the directory list, or starting over again. Because of this, Perl provides a special function, rewinddir, that handles the rewind operation.

The syntax for the rewinddir function is


rewinddir (mydir);

rewinddir sets the current directory location to the beginning of the list of files, which lets you read the entire list of files again. As with the other directory functions, mydir is the directory variable defined by opendir.

The rmdir Function

The final directory function supplied by Perl is rmdir, which deletes an empty directory.

The syntax for calling the rmdir function is


rmdir (dirname);

rmdir returns true (nonzero) if the directory dirname is deleted successfully, and false if the directory is not empty or cannot be deleted.

File-Attribute Functions

Perl provides several library functions that modify the attributes or behavior of files. These functions can be divided into the following groups:

These groups of functions are described in the following sections.

File-Relocation Functions

Perl provides the following file-relocation functions:

The rename Function

The built-in function rename changes the name of a file.

The syntax for the rename function is


rename (oldname, newname);

oldname is the old filename, and newname is the new filename.

The rename function returns true if the rename succeeds, and false if an error occurs.

For example, to change a file named name1 to name2, use the following:


rename ("name1", "name2");

You can use the value stored in a scalar variable as an argument to rename, or any variable or expression whose value is a character string, as follows:


rename ($oldname, &get_new_name);

You can also use rename to move a file from one directory to another (provided both directories are in the same file system). For example:


rename ("/u/jqpublic/name1", "/u/janedoe/name2");


NOTE
When rename moves a file, as in
rename ("name1", "name2");
it does not check whether a file named name2 already exists. Any existing name2 is destroyed by the rename operation.
To get around this problem, use the -e file-test operator, which checks whether a named file exists, as follows:
-e "name2" || rename (name1, name2);
Here, the || operator ensures that rename is called only when no file named name2 already exists

The unlink Function

To delete a file, use the unlink function.

The syntax for the unlink function is


num = unlink (filelist);

This function takes a list as its argument and deletes all the files named in that list.

unlink returns the number of files actually deleted.

The following is an example of a call to unlink:


@deletelist = ("file1", "file2");

unlink (@deletelist);

The function is called unlink, instead of delete, because what it is actually doing is removing a reference, or link, to the particular file. See the following section for more details on links in Perl.

Link and Symbolic Link Functions

In the UNIX environment, files can be "contained" in more than one directory at a time. Each directory contains a reference, or link, to the file.

The following sections describe how to create and access links.

NOTE
If a file is referenced by multiple links, unlink removes only one of the links, and the file can still be referenced

The link Function

To create a link to an existing file, use the built-in function link.

The syntax for the link function is


link (newlink, file);

newlink is the link being created, and file is the file being linked to.

link returns true if the link is created, and false if an error occurs.

For example:


link ("/u/jqpublic/file", "/u/janedoe/newfile");

After link has been called, the file /u/jqpublic/file also can be thought of as the file /u/janedoe/newfile. If unlink is called using /u/jqpublic/file, as in


unlink ("/u/jqpublic/file");

you can still reference the file by specifying the name /u/janedoe/newfile.

The symlink Function

The link created by the link function is called a hard link, which means that it actually references the file itself. Many operating systems also support symbolic links, which are references to the filename, not to the file itself.

To create a symbolic link, use the function symlink.

The syntax for the symlink function is


symlink (newlink, file);

newlink is the link being created, and file is the file being linked to.

symlink, like link returns true if the link is created, and false if an error occurs.

The following is an example of symlink:


symlink("/u/jqpublic/file", "/u/janedoe/newfile");

Here, /u/janedoe/newfile is symbolically linked to /u/jqpublic/file. Now, when the following statement is executed, the file is actually deleted:


unlink ("/u/jqpublic/file");

/u/janedoe/newfile now references nothing at all. (In this case, /u/janedoe/newfile is an example of an unresolved symbolic link.) When /u/jqpublic/file is created again, you will be able to access the new file using /u/janedoe/newfile.

The readlink Function

If a filename, such as /u/janedoe/newfile, is actually a symbolic link to another filename, the function readlink returns the filename to which it is linked.

The syntax for the readlink function is


filename = readlink (linkname);

linkname is the symbolic link, and filename is the equivalent filename.

readlink returns an empty string if the filename is not a symbolic link. (In particular, readlink fails if the filename is actually a hard link.)

For example:


$linkname = readlink("/u/janedoe/newfile");

# $linkname now contains "/u/jqpublic/file"

Listing 12.11 is an example of a program that prints all the symbolic links in a particular directory.


Listing 12.11. A program that prints symbolic links.

1:  #!/usr/local/bin/perl

2:  

3:  $dir = "/u/janedoe";

4:  opendir(MYDIR, $dir);

5:  while ($name = readdir(MYDIR)) {

6:          if (-l $dir . "/" . $name) {

7:                  print ("$name is linked to ");

8:                  print (readlink($dir . "/". $name) . "\n");

9:          }

10: }

11: closedir(MYDIR);



$ program12_11

newfile is linked to /u/jqpublic/file

$

This program uses opendir and readdir to examine each file in the directory in turn. Line 6 uses the -l file-test operator to determine whether the filename is actually a symbolic link. If the filename is a symbolic link, the following expression becomes true, and the program executes the calls to print in lines 7 and 8:


-l $dir . "/" . $name

Line 8 calls readlink, passing it the directory name and the filename stored in $name. Because readlink is called only if the expression in line 6 is true, $name is always a symbolic link.

File-Permission Functions

As you've seen, the built-in function mkdir requires you to specify the access permissions for the directory you are creating. These permissions indicate, for example, whether particular users are allowed to read files from the directory or write into the directory.

In the UNIX environment, each individual file has its own set of access permissions. The set of possible permissions is the same as for directories. (Refer to Table 12.1 in the section titled "The mkdir Function" earlier in today's lesson for a complete list of the possible functions.)

In Perl, three functions are defined that deal with access permissions.

The chmod Function

To change the access permissions for a list of files, call the chmod function.

The syntax for the chmod function is


chmod (permissions, filelist);

permissions is the set of access permissions you want to give, and is a standard UNIX file permissions mask. (For example, setting permissions to 0777 gives read, write, and execute permission to everybody. See the section called "The mkdir Function" for a description of the set of permissions.) filelist is the list of files whose permissions you want to change.

The chmod function returns the number of files whose permissions were successfully set.

The following is an example of a call to chmod:


@filelist = ("file1", "file2");

chmod (0777, @filelist);

In this example, the files file1 and file2 are assigned global read, write, and execute permissions.

NOTE
You cannot change access permissions using chmod unless you have permission to do so. You need to have been granted write permission on a file before you can change its permissions

The chown Function

Normally, the owner of a file is the person who created it. To change the owner of a file, use the function chown.

The syntax for the chown function is


chown (userid, groupid, filelist);

The chown function requires three arguments:

The chown function returns the number of files changed.

The following is an example of a call to chown:


@filelist = ("file1", "file2");

chown (17, -1, @filelist);

NOTE
On most UNIX systems, you can retrieve a user ID or group ID from the /etc/passwd file. You can use the Perl function getpwnam to retrieve information from this file. For more information on getpwnam, refer to Day 15, "System Functions."
Also, the superuser (system administrator) is usually the only user allowed to change the owner of a file

The umask Function

As you've seen, you can change the access permissions for a file using chmod. To specify access permissions you cannot use when you create a file, use the umask function.

The syntax for calls to umask is


oldmaskval = umask (maskval);

maskval is the current umask value, and umask returns the previous (superseded) umask value in oldmaskval. Each umask value is a file creation mask, and is used to set the default permissions for files and directories. (See the umask manual page for more details on file creation masks.)

For example, the following statement disables group and world access permissions for the newly created file:


$oldperms = umask(0022);

NOTE
You can determine the current umask value by passing no arguments to umask, as follows:
$currperms = umask();
This statement assigns the current umask value to $currperms.

Permission File-Test Operators

Some file-test operators in Perl are designed to test for various permissions. Table 12.2 lists these file-test operators; in each case, filename is the name of the file being tested.

Table 12.2. File-test operators that test for permissions.

Operator
Description
-g
Does filename have its set group ID bit set?
-k
Does filename have its "sticky bit" set?
-r
Is filename a readable file?
-u
Does filename have its set user ID bit set?
-w
Is filename a writable file?
-x
Is filename an executable file?
-R
Is filename readable only if the real user ID can read it?
-W
Is filename writable only if the real user ID can write?
-X
Is filename executable only if the real user ID can execute it?

In this case, the real user ID is the user id specified at login, as opposed to the effective user ID, which is the user id under which you are currently running. (On some machines, a command such as /usr/local/etc/suid enables you to change your effective user ID.)

(See Day 6 for more information on how to use file-test operators.)

Miscellaneous Attribute Functions

The following sections describe other Perl functions that manipulate files.

The truncate Function

The truncate function enables you to reduce the size of a specified file to a particular length.

The syntax for the truncate function is


truncate (filename, length);

filename is the name of the file to reduce, and length is the new length of the file.

For example, the statement


truncate ("/u/jqpublic/longfile", 5000);

reduces the size of /u/jqpublic/longfile to 5000 bytes in length. (If the file is already smaller than 5000 bytes, truncate does nothing.)

NOTE
You can use a file variable in place of the filename.
Truncate (MYFILE, 5000);
The file variable must refer to a file opened for writing by the open function

The stat Function

The stat function retrieves information about a particular file when given its name or a file variable representing its name.

The syntax for the stat function is


stat (file);

Here, file is either a filename or a file variable.

stat returns a list containing the following elements, in this order:

Some of the items returned by stat can be obtained using file test operators. Table 12.3 lists these items.

Table 12.3. File-test operators that check information returned by stat.

Operator
Description
-b
Is filename a mountable disk (block device)?
-c
Is filename an I/O device (character device)?
-s
Is filename a non-empty file?
-t
Does filename represent a terminal?
-A
How long since filename accessed?
-C
How long since filename's inode accessed?
-M
How long since filename modified?
-S
Is filename a socket?

For more information on stat or the information it returns, see the UNIX manual page for the stat command on your machine.

The lstat Function

The lstat function returns the same information as stat, but it assumes that the name being passed as an argument is a symbolic link.

The syntax for lstat is the same as that for stat.


lstat (file);

file is either a filename or a file variable.

The time Function

The access and modification times returned by stat and by the -A and -M file-test operators are integers representing the number of elapsed seconds from January 1, 1970, to the time the file was accessed or modified.

To obtain the number of elapsed seconds from January 1, 1970, to the present time, call the built-in function time.

The syntax for calls to the time function is


currtime = time();

currtime is the returned elapsed-seconds value.

The gmtime and localtime Functions

The value returned by time can be converted to either Greenwich Mean Time or your computer's local time.

To convert to Greenwich Mean Time, call the gmtime function. To convert to local time, call the localtime function.

The syntax for the gmtime and localtime functions is identical:


timelist = gmtime (timeval);

timelist = localtime (timeval);

Both functions accept the time value returned by time, stat, or the -A and -M file-test operators.

Both functions return a list consisting of the following nine elements:

For more information on the list returned by gmtime or localtime, refer to the UNIX manual pages for the system functions with the same names.

The utime Function

The time values returned by stat, time, and the -A and -M file-test operators can be used to set the access and modification times of other files. To do this, use the utime function.

The syntax for the utime function is


utime (acctime, modtime, filelist);

acctime is the new access time, modtime is the new modification time, and filelist is the list of files.

utime returns the number of files whose access and modification times have been successfully changed.

The following is an example of a call to utime:


$acctime = -A "file1";

$modtime = -M "file1";

@filelist = ("file2", "file3");

utime ($acctime, $modtime, @filelist);

Here, the files file2 and file3 have their access and modification times changed to those of file1.

The fileno Function

The fileno function returns the internal UNIX file descriptor associated with a particular file variable.

The syntax for the fileno function is


filedesc = fileno (filevar);

Here, filevar is the file variable whose descriptor is to be retrieved.

The file descriptor returned by fileno is used in various UNIX system calls; these calls can be accessed using the system function (as described on Day 15).

The flock and fcntl Functions

The flock and fcntl functions call the UNIX system commands of the same name.

The syntax for the flock and fcntl functions is


fcntl (filevar, fcntlrtn, value);

flock (filevar, flockop);

Here, filevar is a file variable representing an open file. fcntlrtn is a fcntl function as defined in the UNIX fcntl manual page, and value is the value passed to the function, if appropriate. Similarly, flockop is a file-locking operation, as defined in the UNIX flock manual page.

For more information on these functions, refer to the manual pages or to a book about UNIX. (You won't really be able to use these functions effectively unless you know a fair bit about how your operating system works.)

Using DBM Files

Many systems on which Perl is available support files that are created using the Data Base Management (DBM) library. Perl enables you to use an associative array to access a particular DBM file.

The following sections describe how to access DBM files from Perl programs using the dbmopen and dbmclose functions. If you are running Perl 5, these functions have been superseded by the tie and untie functions; see Day 19, "Object-Oriented Programming in Perl," for more details.

For more information on DBM, refer to your system's appropriate manual pages.

The dbmopen Function

To associate an associative array with a DBM file, use the dbmopen function.

The syntax for the dbmopen function is


dbmopen (array, dbmfilename, permissions);

This function requires three arguments:

After the DBM file has been opened, the subscripts for the associative array represent the DBM file keys, and the values of the array represent the values associated with the keys.

Calling dbmopen destroys any existing values in the associative array

The dbmclose Function

To close a DBM file opened by dbmopen, use dbmclose.

The syntax for the dbmclose function is


dbmclose (array);

Here, array is the associative array specified in the call to dbmopen.

Summary

Today, you learned how to open a pipe that directs input to the program, how to open a file for both reading and writing, and how to associate multiple file variables with a single file. You also learned how to test for the end of a particular input file or for the end of the last input file.

You also learned how to skip backward and forward in files and how to read single characters from a file using getc. You can use getc to build hot-key applications, which act as soon as they read a single character from the keyboard.

Perl provides several functions for manipulating directories. They enable you to create, open, read, close, delete, and skip around in directories. Other Perl functions enable you to move a file from one directory to another, create hard and symbolic links from one location to another, and delete a hard link (or a file).

You learned about the Perl functions that enable you to change the file owner or file permissions, truncate a file, retrieve file information, set file access and modification times, retrieve the file descriptor, and call the flock and fcntl system commands.

Finally, Perl provides an interface to the DBM library that enables you to associate DBM files with associative arrays.

Q&A

Q:How can I determine whether a particular Perl function that manipulates the UNIX file system is defined on my machine?
A:A Perl function that manipulates the UNIX file system normally has the same name as the UNIX command or C library function that performs the same task. If the UNIX command or C library function is defined, the Perl function is usually defined as well.
To check whether a UNIX command or C library function is defined, enter the command man name, where name is the name of the Perl library function for which you are checking.
Q:Why does a list of files in a directory appear in unsorted order?
A:The list appears in the order in which the files are stored in the directory. This varies, depending on the machine; usually, however, newer files appear at the end of the list.
Q:Which is better to use: the file-test operators or the built-in function stat?
A:Whenever possible, use the file-test operators. They are easier to use and are often more efficient.
Q:Why are both read and sysread defined, when they are so similar?
A:read, like the UNIX function fread, uses the standard UNIX input-output (I/O) environment. sysread and syswrite, on the other hand, bypass the standard I/O environment and perform low-level system calls.
Q:Why are eof and eof() different?
A:The short answer is: Just because. The long answer is that an empty list as an argument (as in eof()) refers to the list of files on the command line, as does the <> in
while ($line = <>) ...
eof
, on the other hand, refers only to the file currently being read.

Workshop

The Workshop provides quiz questions to help you solidify your understanding of the material covered and exercises to give you experience in using what you've learned. Try and understand the quiz and exercise answers before you go on to tomorrow's lesson.

Quiz

  1. What do these functions do?
    a.    tell
    b.    mkdir
    c.    link
    d.    unlink
    e.    truncate
  2. What is the difference between stat and lstat?
  3. What is the difference between tell and telldir?
  4. How are the following files being opened?
    A.    open (MYFILE, "<file1");
    b.    open (MYFILE, "file2|");
    c.    open (MYFILE, "+>file3");
    d.     open (MYFILE, ">&STDOUT");
  5. What permissions are granted by the following values?
    a.    0666
    b.    0777
    c.    0700
    d.    0644

Exercises

  1. Write a program that reads the directory /u/jqpublic and prints out all file and directory names that start with a period. Ignore the special files . (one period) and .. (two periods).
  2. Write a program that lists all the files (not the subdirectories) in the directory /u/jqpublic and then lists the contents of any subdirectories, their subdirectories, and so on. (Hint: Use a recursive subroutine.)
  3. Write a program that uses readdir and rewinddir to read a directory named /u/jqpublic and print a sorted list of the files and directories in alphabetical order. Ignore all names beginning with a period. (Of course, this is not the most efficient way to do this.)
  4. Write a program that uses hot keys and does the following:
  5. Write a program that reads the directory /u/jqpublic and grants global execute permissions for all files ending in .pl. Take away all other permissions, except user read, for every other file in the directory. Skip over all subdirectories.
  6. BUG BUSTER: What is wrong with the following program?
    #!/usr/local/bin/perl

    while ($line = <>) {
    print ($line);
    if (eof()) {
    print ("-- end of current file --\n");
    }
    }