File reading/writing utility

Overview

This module provides functions for file input/output. These are all wrapper functions, based on existing functions in other Python classes. Functions are provided to save a two-dimensional array to a text file, load selected columns of data from a text file, load a column header line, compact strings to include only legal filename characters, and a function from the Python Cookbook to recursively match filename patterns.

See the __main__ function for examples of use.

This package was partly developed to provide additional material in support of students and readers of the book Electro-Optical System Analysis and Design: A Radiometry Perspective, Cornelius J. Willers, ISBN 9780819495693, SPIE Monograph Volume PM236, SPIE Press, 2013. http://spie.org/x648.html?product_id=2021423&origin_id=x646

Module functions

latex_escape(str_)[source]

Escape special LaTeX characters in a string.

Parameters:

str (|) – input string potentially containing LaTeX special chars.

Returns:

string with special characters replaced by LaTeX escape sequences.

Return type:

(string)

Raises:

| No exception is raised.

saveHeaderArrayTextFile(filename, dataArray, header=None, comment=None, delimiter=None)[source]

Save a numpy array to a file, with optional header lines.

This function saves a two-dimensional array to a text file, with an optional user-defined header.

Parameters:
  • filename (|) – name of the output ASCII flatfile.

  • dataArray (|) – a two-dimensional array.

  • header (|) – the optional header.

  • comment (|) – the symbol used to comment out lines, default value is None.

  • delimiter (|) – delimiter used to separate columns, default is whitespace.

Returns:

Nothing.

Raises:

| No exception is raised.

loadColumnTextFile(filename, loadCol=[1], comment=None, normalize=0, skiprows=0, delimiter=None, abscissaScale=1, ordinateScale=1, abscissaOut=None, returnAbscissa=False)[source]

Load selected column data from a text file, processing as specified.

This function loads column data from a text file, scaling and interpolating the read-in data, according to user specification. The first 0’th column has special significance: it is considered the abscissa (x-values) of the data set, while the remaining columns are any number of ordinate (y-value) vectors. The user passes a list of columns to be read (default is [1]) - only these columns are read, processed and returned when the function exits. The user also passes an abscissa vector to which the input data is interpolated and then subsequently amplitude scaled or normalised.

Note: leave only single separators (e.g. spaces) between columns! Also watch out for a single space at the start of line.

Parameters:
  • filename (|) – name of the input ASCII flatfile.

  • loadCol (|) – the M =len([]) column(s) to be loaded as the ordinate, default value is column 1

  • comment (|) – string, the symbol used to comment out lines, default value is None

  • normalize (|) – integer, flag to indicate if data must be normalized.

  • skiprows (|) – integer, the number of rows to be skipped at the start of the file (e.g. headers)

  • delimiter (|) – string, the delimiter used to separate columns, default is whitespace.

  • abscissaScale (|) – scale by which abscissa (column 0) must be multiplied

  • ordinateScale (|) – scale by which ordinate (column >0) must be multiplied

  • abscissaOut (|) – abscissa vector on which output variables are interpolated.

  • returnAbscissa (|) – return the abscissa vector as second item in return tuple.

Returns:

The interpolated, M columns of N rows, processed array. | abscissaOut (np.array[N,M]): The abscissa where the ordinates are interpolated

Return type:

ordinatesOut (np.array[N,M])

Raises:

| No exception is raised.

loadHeaderTextFile(filename, loadCol=[1], comment=None)[source]

Load column header data from the first line of a text file.

Headers must be delimited by commas. The function loadColumnTextFile provides more comprehensive capabilities.

Parameters:
  • filename (|) – the name of the input ASCII flatfile.

  • loadCol (|) – list of numbers, the column headers to be loaded, default value is column 1

  • comment (|) – the symbol to comment out lines

Returns:

a list with selected column header entries

Return type:

[string]

Raises:

| No exception is raised.

cleanFilename(sourcestring, removestring=' %:/,.\\[]<>*?')[source]

Clean a string by removing selected characters.

Creates a legal and ‘clean’ source string from a string by removing some clutter and characters not allowed in filenames. A default set is given but the user can override the default string.

Parameters:
  • sourcestring (|) – the string to be cleaned.

  • removestring (|) – remove all these characters from the string (optional).

Returns:

A cleaned-up string.

Return type:

(string)

Raises:

| No exception is raised.

listFiles(root, patterns='*', recurse=1, return_folders=0, useRegex=False)[source]

List files/directories matching a specific pattern.

Returns a list of file paths in a file system, searching a directory structure along the specified path, looking for files that match the glob pattern. If specified, the search will continue into sub-directories. The function supports a local or network reachable filesystem, but not URLs.

Parameters:
  • root (|) – directory root from where the search must take place

  • patterns (|) – glob/regex pattern for filename matching. Multiple patterns may be present, each one separated by ;

  • recurse (|) – flag to indicate if subdirectories must also be searched (optional)

  • return_folders (|) – flag to indicate if folder names must also be returned (optional)

  • useRegex (|) – flag to indicate if patterns are regular expression strings (optional)

Returns:

A list with matching file/directory names

Raises:

| No exception is raised.

execOnFiles(cmdline, root, patterns='*', recurse=1, return_folders=0, useRegex=False, printTask=False)[source]

Execute a program on a list of files/directories meeting specific requirements.

Seek files recursively and then execute a program on those files. The program is defined as a command line string as would be typed on a terminal, except that a token ‘{0}’ is given in place of the filename. During execution the token is replaced with the filename found in the recursive search:

task = cmdline.format(filename)

Example: cmdline = ‘bmpp -l eps.object {0}’

Parameters:
  • cmdline (|) – string that defines the program to be executed

  • root (|) – directory root from where the search must take place

  • patterns (|) – glob/regex pattern for filename matching

  • recurse (|) – flag to indicate if subdirectories must also be searched (optional)

  • return_folders (|) – flag to indicate if folder names must also be returned (optional)

  • useRegex (|) – flag to indicate if patterns are regular expression strings (optional)

  • printTask (|) – flag to indicate if the commandline must be printed (optional)

Returns:

A list with matching file/directory names

Raises:

| No exception is raised.

QueryDelete(recurse, tdir, patn, instr='')[source]

Delete files matching a pattern, after prompting the user for confirmation.

Parameters:
  • recurse (|) – flag to indicate if the search must be recursive (0=no, 1=yes)

  • tdir (|) – directory to search

  • patn (|) – file pattern to match

  • instr (|) – pre-supplied answer (‘y’ to skip the prompt)

Returns:

Nothing.

Raises:

| No exception is raised.

readRawFrames(fname, rows, cols, vartype, loadFrames=[])[source]

Load multi-frame two-dimensional arrays from a raw data file of known data type.

The file must consist of multiple frames, all with the same number of rows and columns. Frames of different data types can be read according to the user specification.

Parameters:
  • fname (|) – filename

  • rows (|) – number of rows in each frame

  • cols (|) – number of columns in each frame

  • vartype (|) – numpy data type of data to be read

  • int8 (|)

  • int16

  • int32

  • int64

  • uint8 (|)

  • uint16

  • uint32

  • uint64

  • float16 (|)

  • float32

  • float64

  • loadFrames (|) – optional list of frames to load, zero-based;

  • list (| empty)

Returns:

number of frames in the returned data set, 0 if error occurred | rawShaped (np.ndarray): vartype numpy array of dimensions (frames, rows, cols), | None if error occurred

Return type:

frames (int)

Raises:

| Exception is raised if IOError

writeRawFrames(fname, img, vartype, writeFrames=[])[source]

Write selected multiple 2D frames from a 3D array to a raw data file.

The array must be two-dimensional or three-dimensional. Frames increase over the first dimension of the 3D array.

Parameters:
  • fname (|) – filename

  • (np.array( (| img) – ,:,:) or np.array(:,:)): array to be written to disk

  • vartype (|) – numpy data type of data to be written

  • int8 (|)

  • int16

  • int32

  • int64

  • uint8 (|)

  • uint16

  • uint32

  • uint64

  • float16 (|)

  • float32

  • float64

  • writeFrames (|) – optional list of frames to write, zero-based;

  • list (| empty)

Returns:

empty if successful, fail message otherwise

Return type:

message (string)

Raises:

| No exception is raised.

rawFrameToImageFile(image, filename)[source]

Write a single raw image frame to an image file.

The file type is determined by the extension (e.g. png or jpg). The image is normalised to [0, 255] prior to writing.

Parameters:
  • image (|) – two-dimensional array representing an image

  • filename (|) – name of file to be written to, with extension

Returns:

Nothing

Raises:

| No exception is raised.

arrayToLaTex(filename, arr, header=None, leftCol=None, formatstring='%10.4e', filemode='w')[source]

Write a numpy array to latex table format in an output file.

The table can contain only the array data (no top header or left column side-header), or you can add either or both of the top row or side column headers. Leave ‘header’ or ‘leftCol’ as None if you don’t want these.

The output format of the array data can be specified, i.e. scientific notation or fixed decimal point.

Parameters:
  • filename (|) – text writing output path and filename

  • arr (|) – array with table data

  • header (|) – column header in final latex format (optional)

  • leftCol (|) – left column each row, in final latex format (optional)

  • formatstring (|) – output format precision for array data (see np.savetxt) (optional)

  • filemode (|) – file open mode — ‘w’ (default, new file) or ‘a’ (append)

Returns:

None, writes a file to disk

Raises:

| No exception is raised.

epsLaTexFigure(filename, epsname, caption, scale=None, vscale=None, filemode='a', strPost='')[source]

Write the LaTeX code to include an eps graphic as a latex figure.

The text is appended to an existing file.

Parameters:
  • filename (|) – text writing output path and filename.

  • epsname (|) – filename/path to eps file (relative to where the LaTeX document is built).

  • caption (|) – figure caption

  • scale (|) – figure scale to textwidth [0..1]

  • vscale (|) – figure scale to textheight [0..1]

  • filemode (|) – file open mode (a=append, w=new file) (optional)

  • strPost (|) – string to write to file after latex figure block (optional)

Returns:

None, writes a file to disk

Raises:

| No exception is raised.

read2DLookupTable(filename)[source]

Read a 2D lookup table and extract the data.

The table has the following format:

line 1: xlabel ylabel title
line 2: 0 (vector of y (col) abscissa)
lines 3 and following: (element of x (row) abscissa), followed by table data.

The file format can be depicted as follows:

x-name y-name ordinates-name
0 y1 y2 y3 y4
x1 v11 v12 v13 v14
x2 v21 v22 v23 v24
x3 v31 v32 v33 v34
Parameters:

filename (|) – input path and filename

Returns:

x abscissae | yVec (np.array[M]): y abscissae | data (np.array[N,M]): data corresponding to x, y | xlabel (string): x abscissa label | ylabel (string): y abscissa label | title (string): dataset title

Return type:

xVec (np.array[N])

Raises:

| No exception is raised.

downloadFileUrl(url, saveFilename=None, proxy=None)[source]

Download a file, given a URL.

The URL is used to download a file, to the saveFilename specified. If no saveFilename is given, the basename of the URL is used. Before downloading, first test to see if the file already exists.

Parameters:
  • url (|) – the url to be accessed.

  • saveFilename (|) – path to where the file must be saved (optional).

  • proxy (|) – path to proxy server (optional).

  • this (The proxy string is something like)

  • {'https' (proxy =) – r’https://username:password@proxyname:portnumber’}

Returns:

Filename saved, or None if failed.

Return type:

(string)

Raises:

| Exceptions are handled internally and signaled by return value.

unzipGZipfile(zipfilename, saveFilename=None)[source]

Unzip a file that was compressed using the gzip format.

The zipfilename is used to open a file, saving to the saveFilename specified. If no saveFilename is given, the basename of the zipfilename is used, but with the file extension removed.

Parameters:
  • zipfilename (|) – the zipfilename to be decompressed.

  • saveFilename (|) – to where the file must be saved (optional).

Returns:

Filename saved, or None if failed.

Return type:

(string)

Raises:

| Exceptions are handled internally and signaled by return value.

untarTarfile(tarfilename, saveDirname=None)[source]

Untar a tar archive, saving all files to the specified directory.

The tarfilename is used to open a file, extracting to the saveDirname specified. If no saveDirname is given, the local directory ‘.’ is used.

Parameters:
  • tarfilename (|) – the name of the tar archive.

  • saveDirname (|) – to where the files must be extracted

Returns:

list of filenames saved, or None if failed.

Return type:

([string])

Raises:

| Exceptions are handled internally and signaled by return value.

downloadUntar(tgzFilename, url, destinationDir=None, tarFilename=None, proxy=None)[source]

Download and untar a compressed tar archive, saving all files to the specified directory.

The tarfilename is used to open the tar file, extracting to the destinationDir specified. If no destinationDir is given, the local directory ‘.’ is used. Before downloading, a check is done to determine if the file was already downloaded and exists in the local file system.

Parameters:
  • tgzFilename (|) – the name of the tar archive file

  • url (|) – url where to look for the file (not including the filename)

  • destinationDir (|) – to where the files must be extracted (optional)

  • tarFilename (|) – downloaded tar filename (optional)

  • proxy (|) – path to proxy server (optional).

  • this (The proxy string is something like)

  • {'https' (proxy =) – r’https://username:password@proxyname:portnumber’}

Returns:

list of filenames saved, or None if failed.

Return type:

([string])

Raises:

| Exceptions are handled internally and signaled by return value.

mergeDFS(df1, df2, leftPre=None, rightPre=None, bounds_error=False, mergeOn=None)[source]

Merge two pandas DataFrames on a common column, returning the merged DataFrame.

By default the merging takes place on columns named ‘time’ or ‘Time’, but the merge column name can be specified in mergeOn.

Parameters:
  • df1 (|) – first dataframe to be merged

  • df2 (|) – second dataframe to be merged

  • leftPre (|) – prefix to prepend to df1 column names, except time

  • rightPre (|) – prefix to prepend to df2 column names, except time

  • bounds_error (|) – passed through to the interpolation function

  • mergeOn (|) – if not merging on time, use this column name instead

Returns:

A Pandas DataFrame with the merged data

Return type:

(pd.DataFrame)

Raises:

| No exception is raised.