cardinal_pythonlib.text


Original code copyright (C) 2009-2022 Rudolf Cardinal (rudolf@pobox.com).

This file is part of cardinal_pythonlib.

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.


Simple text-processing functions.

cardinal_pythonlib.text.escape_newlines(s: str) → str[source]

Escapes CR, LF, and backslashes.

Its counterpart is unescape_newlines().

s.encode("string_escape") and s.encode("unicode_escape") are alternatives, but they mess around with quotes, too (specifically, backslash-escaping single quotes).

cardinal_pythonlib.text.escape_tabs_newlines(s: str) → str[source]

Escapes CR, LF, tab, and backslashes.

Its counterpart is unescape_tabs_newlines().

cardinal_pythonlib.text.get_unicode_category_strings() → Dict[str, str][source]

Returns a dictionary mapping Unicode categories (e.g. “ASCII”) to a string containing those characters.

This is large (~5 Mb) so don’t call it unnecessarily and don’t have it as a module-level variable.

NB ‘Alphabetic’ has length 118240; ‘Latin_Alphabetic’ only 1022.

cardinal_pythonlib.text.get_unicode_characters(category: str) → str[source]
Parameters:category – a Unicode category, e.g. “ASCII”
Returns:a string containing those characters
Return type:str
Raises:KeyError if the category is bad
cardinal_pythonlib.text.unescape_newlines(s: str) → str[source]

Reverses escape_newlines().

cardinal_pythonlib.text.unescape_tabs_newlines(s: str) → str[source]

Reverses escape_tabs_newlines().

See also https://stackoverflow.com/questions/4020539.