Text files containing data are very common. To read such a file usually requires knowing how many lines of text it contains. Under UNIX and Windows, there is no special text file type, and it is not possible to tell how many lines are contained in a file from basic file attributes. Rather, lines are encoded using a special character or characters at the end of each line:
- UNIX operating systems use an ASCII linefeed (LF) character at the end of each line.
- Older Macintosh systems (prior to the UNIX-based Mac OS X) use a carriage return (CR).
- Microsoft Windows uses a two character CR/LF sequence.
The only way to determine the number of lines of text contained within a file is to open it and count lines while reading and skipping over them until the end of the file is encountered. Since files are often copied from one type of system to another without going through the proper line termination conversion, portable software needs to be able to recognize any of these terminations, regardless of the system being used. FILE_LINES performs this operation in an efficient and portable manner, handling all three of the line termination conventions listed above.
This routine works by opening the file and reading the data contained within. It is therefore only suitable for regular disk files, and only when access to that file is fast enough to justify reading it more than once. For other types of files, other approaches are necessary, such as:
- Reading the file once, using an adaptive (expandable) data structure, counting the number of lines as they are input, and growing the data structure as necessary.
- Building a header into your file format that includes the necessary information, or somehow embedding the number of lines into the file data.
- Maintaining file information in a separate file associated with each file.
- Using a self describing data format that avoids these issues.
This routine assumes that the specified file or files contain only lines of text. It is unable to correctly count lines in files that contain binary data, or which do not use the standard line termination characters. Results are undefined for such files.
Note that FILE_LINES is equivalent to the following IDL code:
FUNCTION file_lines, filename
OPENR, unit, filename, /GET_LUN
str = ''
count = 0ll
READF, unit, str
count = count + 1
The primary advantage of FILE_LINES over the IDL version shown here is efficiency. FILE_LINES is able to avoid the overhead of the WHILE loop as well as not having to create an IDL string for each line of the file.
Read the contents of the text file mydata.dat into a string array.
nlines = FILE_LINES('mydata.dat')
sarr = STRARR(nlines)
OPENR, unit, 'mydata.dat',/GET_LUN
READF, unit, sarr
Returns the number of lines of text contained within the specified file or files. If an array of file names is specified via the Path parameter, the return value is an array with the same number of elements as Path, with each element containing the number of lines in the corresponding file.
A scalar string or string array containing the names of the text files for which the number of lines is desired.
If this keyword is set, FILE_LINES assumes that the files specified in Path contain data compressed in the standard GZIP format, and decompresses the data in order to count the number of lines. See the description of the COMPRESS keyword to the OPENR/OPENU/OPENW procedure for additional information.
If this keyword is set, FILE_LINES uses Path exactly as specified, without expanding any wildcard characters or environment variable names included in the path. See FILE_SEARCH for details on path expansion.
Added COMPRESS keyword