cardinal_pythonlib.file_io


Original code copyright (C) 2009-2022 Rudolf Cardinal (rudolf@pobox.com).

This file is part of cardinal_pythonlib.

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.


Support functions for file I/O.

cardinal_pythonlib.file_io.add_line_if_absent(filename: str, line: str) None[source]

Adds a line (at the end) if it’s not already in the file somewhere.

Parameters:
  • filename – filename to modify (in place)

  • line – line to append (which must not have a newline in)

cardinal_pythonlib.file_io.convert_line_endings(filename: str, to_unix: bool = False, to_windows: bool = False) None[source]

Converts a file (in place) from UNIX to Windows line endings, or the reverse.

Parameters:
  • filename – filename to modify (in place)

  • to_unix – convert Windows (CR LF) to UNIX (LF)

  • to_windows – convert UNIX (LF) to Windows (CR LF)

cardinal_pythonlib.file_io.gen_files_from_zipfiles(zipfilenames_or_files: Iterable[str | BinaryIO], filespec: str, on_disk: bool = False) Generator[BinaryIO, None, None][source]
Parameters:
  • zipfilenames_or_files – iterable of filenames or BinaryIO file-like objects, giving the .zip files

  • filespec – filespec to filter the “inner” files against

  • on_disk – if True, extracts inner files to disk yields file-like objects that access disk files (and are therefore seekable); if False, extracts them in memory and yields file-like objects to those memory files (which will not be seekable; e.g. https://stackoverflow.com/questions/12821961/)

Yields:

file-like object for each inner file matching filespec; may be in memory or on disk, as per on_disk

cardinal_pythonlib.file_io.gen_lines_from_binary_files(files: Iterable[BinaryIO], encoding: str = 'utf8') Generator[str, None, None][source]

Generates lines from binary files. Strips out newlines.

Parameters:
  • files – iterable of BinaryIO file-like objects

  • encoding – encoding to use

Yields:

each line of all the files

cardinal_pythonlib.file_io.gen_lines_from_textfiles(files: Iterable[TextIO]) Generator[str, None, None][source]

Generates lines from file-like objects.

Parameters:

files – iterable of TextIO objects

Yields:

each line of all the files

cardinal_pythonlib.file_io.gen_lines_without_comments(filename: str, comment_at_start_only: bool = False) Generator[str, None, None][source]

As for gen_noncomment_lines(), but using a filename.

cardinal_pythonlib.file_io.gen_lower(x: Iterable[str]) Generator[str, None, None][source]
Parameters:

x – iterable of strings

Yields:

each string in lower case

cardinal_pythonlib.file_io.gen_noncomment_lines(file: TextIO, comment_at_start_only: bool = False) Generator[str, None, None][source]

From an open file, yields all lines as a list, left- and right-stripping the lines and (by default) removing everything on a line after the first #.

Also removes blank lines.

Parameters:
  • file – The input file-like object.

  • comment_at_start_only – Only detect comments when the # is the first non-whitespace character of a line? (The default is False, meaning that comments are also allowed at the end of lines. NOTE that this does not cope well with quoted # symbols.)

cardinal_pythonlib.file_io.gen_part_from_iterables(iterables: Iterable[Any], part_index: int) Generator[Any, None, None][source]

Yields the nth part of each thing in iterables.

Parameters:
  • iterables – iterable of anything

  • part_index – part index

Yields:

item[part_index] for item in iterable

cardinal_pythonlib.file_io.gen_part_from_line(lines: Iterable[str], part_index: int, splitter: str | None = None) Generator[str, None, None][source]

Splits lines with splitter and yields a specified part by index.

Parameters:
  • lines – iterable of strings

  • part_index – index of part to yield

  • splitter – string to split the lines on

Yields:

the specified part for each line

cardinal_pythonlib.file_io.gen_rows_from_csv_binfiles(csv_files: Iterable[BinaryIO], encoding: str = 'utf8', skip_header: bool = False, **csv_reader_kwargs) Generator[Iterable[str], None, None][source]

Iterate through binary file-like objects that are CSV files in a specified encoding. Yield each row.

Parameters:
  • csv_files – iterable of BinaryIO objects

  • encoding – encoding to use

  • skip_header – skip the header (first) row of each file?

  • csv_reader_kwargs – arguments to pass to csv.reader()

Yields:

rows from the files

cardinal_pythonlib.file_io.gen_textfiles_from_filenames(filenames: Iterable[str]) Generator[TextIO, None, None][source]

Generates file-like objects from a list of filenames.

Parameters:

filenames – iterable of filenames

Yields:

each file as a TextIO object

cardinal_pythonlib.file_io.get_lines_without_comments(filename: str) List[str][source]

See gen_lines_without_comments(); returns results as a list.

cardinal_pythonlib.file_io.is_line_in_file(filename: str, line: str) bool[source]

Detects whether a line is present within a file.

Parameters:
  • filename – file to check

  • line – line to search for (as an exact match)

cardinal_pythonlib.file_io.remove_gzip_timestamp(filename: str, gunzip_executable: str = 'gunzip', gzip_executable: str = 'gzip', gzip_args: List[str] | None = None) None[source]

Uses external gunzip/gzip tools to remove a gzip timestamp. Necessary for Lintian.

cardinal_pythonlib.file_io.replace_in_file(filename: str, text_from: str, text_to: str, backup_filename: str | None = None) None[source]

Replaces text in a file.

Parameters:
  • filename – filename to process (modifying it in place)

  • text_from – original text to replace

  • text_to – replacement text

  • backup_filename – backup filename to write to, if modifications made

cardinal_pythonlib.file_io.replace_multiple_in_file(filename: str, replacements: List[Tuple[str, str]], backup_filename: str | None = None) None[source]

Replaces multiple from/to string pairs within a single file.

Parameters:
  • filename – filename to process (modifying it in place)

  • replacements – list of (from_text, to_text) tuples

  • backup_filename – backup filename to write to, if modifications made

cardinal_pythonlib.file_io.smart_open(filename: str, mode: str = 'Ur', buffering: int = -1, encoding: str = None, errors: str = None, newline: str = None, closefd: bool = True) IO[source]

Context manager (for use with with) that opens a filename and provides a IO object. If the filename is '-', however, then sys.stdin is used for reading and sys.stdout is used for writing.

cardinal_pythonlib.file_io.webify_file(srcfilename: str, destfilename: str) None[source]

Rewrites a file from srcfilename to destfilename, HTML-escaping it in the process.

cardinal_pythonlib.file_io.write_gzipped_text(basefilename: str, text: str) None[source]

Writes text to a file compressed with gzip (a .gz file). The filename is used directly for the “inner” file and the extension .gz is appended to the “outer” (zipped) file’s name.

This function exists primarily because Lintian wants non-timestamped gzip files, or it complains: - https://lintian.debian.org/tags/package-contains-timestamped-gzip.html - See https://stackoverflow.com/questions/25728472/python-gzip-omit-the-original-filename-and-timestamp

cardinal_pythonlib.file_io.write_text(filename: str, text: str) None[source]

Writes text to a file.

cardinal_pythonlib.file_io.writeline_nl(fileobj: TextIO, line: str) None[source]

Writes a line plus a terminating newline to the file.

cardinal_pythonlib.file_io.writelines_nl(fileobj: TextIO, lines: Iterable[str]) None[source]

Writes lines, plus terminating newline characters, to the file.

(Since fileobj.writelines() doesn’t add newlines… https://stackoverflow.com/questions/13730107/writelines-writes-lines-without-newline-just-fills-the-file)