cardinal_pythonlib.file_io¶
Original code copyright (C) 2009-2022 Rudolf Cardinal (rudolf@pobox.com).
This file is part of cardinal_pythonlib.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Support functions for file I/O.
-
cardinal_pythonlib.file_io.
add_line_if_absent
(filename: str, line: str) → None[source]¶ Adds a line (at the end) if it’s not already in the file somewhere.
Parameters: - filename – filename to modify (in place)
- line – line to append (which must not have a newline in)
-
cardinal_pythonlib.file_io.
convert_line_endings
(filename: str, to_unix: bool = False, to_windows: bool = False) → None[source]¶ Converts a file (in place) from UNIX to Windows line endings, or the reverse.
Parameters: - filename – filename to modify (in place)
- to_unix – convert Windows (CR LF) to UNIX (LF)
- to_windows – convert UNIX (LF) to Windows (CR LF)
-
cardinal_pythonlib.file_io.
gen_files_from_zipfiles
(zipfilenames_or_files: Iterable[Union[str, BinaryIO]], filespec: str, on_disk: bool = False) → Generator[BinaryIO, None, None][source]¶ Parameters: - zipfilenames_or_files – iterable of filenames or
BinaryIO
file-like objects, giving the.zip
files - filespec – filespec to filter the “inner” files against
- on_disk – if
True
, extracts inner files to disk yields file-like objects that access disk files (and are therefore seekable); ifFalse
, extracts them in memory and yields file-like objects to those memory files (which will not be seekable; e.g. https://stackoverflow.com/questions/12821961/)
Yields: file-like object for each inner file matching
filespec
; may be in memory or on disk, as peron_disk
- zipfilenames_or_files – iterable of filenames or
-
cardinal_pythonlib.file_io.
gen_lines_from_binary_files
(files: Iterable[BinaryIO], encoding: str = 'utf8') → Generator[str, None, None][source]¶ Generates lines from binary files. Strips out newlines.
Parameters: - files – iterable of
BinaryIO
file-like objects - encoding – encoding to use
Yields: each line of all the files
- files – iterable of
-
cardinal_pythonlib.file_io.
gen_lines_from_textfiles
(files: Iterable[TextIO]) → Generator[str, None, None][source]¶ Generates lines from file-like objects.
Parameters: files – iterable of TextIO
objectsYields: each line of all the files
-
cardinal_pythonlib.file_io.
gen_lines_without_comments
(filename: str, comment_at_start_only: bool = False) → Generator[str, None, None][source]¶ As for
gen_noncomment_lines()
, but using a filename.
-
cardinal_pythonlib.file_io.
gen_lower
(x: Iterable[str]) → Generator[str, None, None][source]¶ Parameters: x – iterable of strings Yields: each string in lower case
-
cardinal_pythonlib.file_io.
gen_noncomment_lines
(file: TextIO, comment_at_start_only: bool = False) → Generator[str, None, None][source]¶ From an open file, yields all lines as a list, left- and right-stripping the lines and (by default) removing everything on a line after the first
#
.Also removes blank lines.
Parameters: - file – The input file-like object.
- comment_at_start_only – Only detect comments when the
#
is the first non-whitespace character of a line? (The default is False, meaning that comments are also allowed at the end of lines. NOTE that this does not cope well with quoted#
symbols.)
-
cardinal_pythonlib.file_io.
gen_part_from_iterables
(iterables: Iterable[Any], part_index: int) → Generator[Any, None, None][source]¶ Yields the nth part of each thing in
iterables
.Parameters: - iterables – iterable of anything
- part_index – part index
Yields: item[part_index] for item in iterable
-
cardinal_pythonlib.file_io.
gen_part_from_line
(lines: Iterable[str], part_index: int, splitter: str = None) → Generator[str, None, None][source]¶ Splits lines with
splitter
and yields a specified part by index.Parameters: - lines – iterable of strings
- part_index – index of part to yield
- splitter – string to split the lines on
Yields: the specified part for each line
-
cardinal_pythonlib.file_io.
gen_rows_from_csv_binfiles
(csv_files: Iterable[BinaryIO], encoding: str = 'utf8', skip_header: bool = False, **csv_reader_kwargs) → Generator[Iterable[str], None, None][source]¶ Iterate through binary file-like objects that are CSV files in a specified encoding. Yield each row.
Parameters: - csv_files – iterable of
BinaryIO
objects - encoding – encoding to use
- skip_header – skip the header (first) row of each file?
- csv_reader_kwargs – arguments to pass to
csv.reader()
Yields: rows from the files
- csv_files – iterable of
-
cardinal_pythonlib.file_io.
gen_textfiles_from_filenames
(filenames: Iterable[str]) → Generator[TextIO, None, None][source]¶ Generates file-like objects from a list of filenames.
Parameters: filenames – iterable of filenames Yields: each file as a TextIO
object
-
cardinal_pythonlib.file_io.
get_lines_without_comments
(filename: str) → List[str][source]¶ See
gen_lines_without_comments()
; returns results as a list.
-
cardinal_pythonlib.file_io.
is_line_in_file
(filename: str, line: str) → bool[source]¶ Detects whether a line is present within a file.
Parameters: - filename – file to check
- line – line to search for (as an exact match)
-
cardinal_pythonlib.file_io.
remove_gzip_timestamp
(filename: str, gunzip_executable: str = 'gunzip', gzip_executable: str = 'gzip', gzip_args: List[str] = None) → None[source]¶ Uses external
gunzip
/gzip
tools to remove agzip
timestamp. Necessary for Lintian.
-
cardinal_pythonlib.file_io.
replace_in_file
(filename: str, text_from: str, text_to: str, backup_filename: str = None) → None[source]¶ Replaces text in a file.
Parameters: - filename – filename to process (modifying it in place)
- text_from – original text to replace
- text_to – replacement text
- backup_filename – backup filename to write to, if modifications made
-
cardinal_pythonlib.file_io.
replace_multiple_in_file
(filename: str, replacements: List[Tuple[str, str]], backup_filename: str = None) → None[source]¶ Replaces multiple from/to string pairs within a single file.
Parameters: - filename – filename to process (modifying it in place)
- replacements – list of
(from_text, to_text)
tuples - backup_filename – backup filename to write to, if modifications made
-
cardinal_pythonlib.file_io.
smart_open
(filename: str, mode: str = 'Ur', buffering: int = -1, encoding: str = None, errors: str = None, newline: str = None, closefd: bool = True) → IO[source]¶ Context manager (for use with
with
) that opens a filename and provides aIO
object. If the filename is'-'
, however, thensys.stdin
is used for reading andsys.stdout
is used for writing.
-
cardinal_pythonlib.file_io.
webify_file
(srcfilename: str, destfilename: str) → None[source]¶ Rewrites a file from
srcfilename
todestfilename
, HTML-escaping it in the process.
-
cardinal_pythonlib.file_io.
write_gzipped_text
(basefilename: str, text: str) → None[source]¶ Writes text to a file compressed with
gzip
(a.gz
file). The filename is used directly for the “inner” file and the extension.gz
is appended to the “outer” (zipped) file’s name.This function exists primarily because Lintian wants non-timestamped gzip files, or it complains: - https://lintian.debian.org/tags/package-contains-timestamped-gzip.html - See https://stackoverflow.com/questions/25728472/python-gzip-omit-the-original-filename-and-timestamp
-
cardinal_pythonlib.file_io.
write_text
(filename: str, text: str) → None[source]¶ Writes text to a file.
-
cardinal_pythonlib.file_io.
writeline_nl
(fileobj: TextIO, line: str) → None[source]¶ Writes a line plus a terminating newline to the file.
-
cardinal_pythonlib.file_io.
writelines_nl
(fileobj: TextIO, lines: Iterable[str]) → None[source]¶ Writes lines, plus terminating newline characters, to the file.
(Since
fileobj.writelines()
doesn’t add newlines… https://stackoverflow.com/questions/13730107/writelines-writes-lines-without-newline-just-fills-the-file)