cardinal_pythonlib.pdf
Original code copyright (C) 2009-2022 Rudolf Cardinal (rudolf@pobox.com).
This file is part of cardinal_pythonlib.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Support functions to generate (and serve) PDFs.
- class cardinal_pythonlib.pdf.PdfPlan(is_html: bool = False, html: str | None = None, header_html: str | None = None, footer_html: str | None = None, wkhtmltopdf_filename: str | None = None, wkhtmltopdf_options: Dict[str, Any] | None = None, is_filename: bool = False, filename: str | None = None)[source]
Class to describe a PDF on disk or the information required to create the PDF from HTML.
- Parameters:
is_html¶ – use HTML mode?
html¶ – for HTML mode, the main HTML
header_html¶ – for HTML mode, an optional page header (in HTML)
footer_html¶ – for HTML mode, an optional page footer (in HTML)
wkhtmltopdf_filename¶ – filename of the
wkhtmltopdfexecutablewkhtmltopdf_options¶ – options for
wkhtmltopdfis_filename¶ – use file mode?
filename¶ – for file mode, the filename of the existing PDF on disk
Use either
is_htmloris_filename, not both.
- cardinal_pythonlib.pdf.append_memory_pdf_to_writer(input_pdf: bytes, writer: PdfWriter, start_recto: bool = True) None[source]
Appends a PDF (as bytes in memory) to a pypdf writer.
- cardinal_pythonlib.pdf.append_pdf(input_pdf: bytes, output_writer: PdfWriter)[source]
Appends a PDF to a pyPDF writer. Legacy interface.
- cardinal_pythonlib.pdf.assert_processor_available(processor: str) None[source]
Assert that a specific PDF processor is available.
- Parameters:
processor¶ – a PDF processor type from
Processors- Raises:
AssertionError – if bad
processorRuntimeError – if requested processor is unavailable
- cardinal_pythonlib.pdf.get_concatenated_pdf_from_disk(filenames: Iterable[str], start_recto: bool = True) bytes[source]
Concatenates PDFs from disk and returns them as an in-memory binary PDF.
- cardinal_pythonlib.pdf.get_concatenated_pdf_in_memory(pdf_plans: Iterable[PdfPlan], start_recto: bool = True) bytes[source]
Concatenates PDFs and returns them as an in-memory binary PDF.
- cardinal_pythonlib.pdf.get_default_fix_pdfkit_encoding_bug() bool[source]
Should we be trying to fix a
pdfkitencoding bug, by default?- Returns:
should we? Yes if we have the specific buggy version of
pdfkit.
- cardinal_pythonlib.pdf.get_pdf_from_html(html: str, header_html: str | None = None, footer_html: str | None = None, wkhtmltopdf_filename: str | None = None, wkhtmltopdf_options: Dict[str, Any] | None = None, file_encoding: str = 'utf-8', debug_options: bool = False, debug_content: bool = False, debug_wkhtmltopdf_args: bool = True, fix_pdfkit_encoding_bug: bool | None = None, processor: str = 'pdfkit') bytes[source]
Takes HTML and returns a PDF.
See the arguments to
make_pdf_from_html()(excepton_disk).- Returns:
the PDF binary as a
bytesobject
- cardinal_pythonlib.pdf.make_pdf_from_html(on_disk: bool, html: str, output_path: str | None = None, header_html: str | None = None, footer_html: str | None = None, wkhtmltopdf_filename: str | None = None, wkhtmltopdf_options: Dict[str, Any] | None = None, file_encoding: str = 'utf-8', debug_options: bool = False, debug_content: bool = False, debug_wkhtmltopdf_args: bool = True, fix_pdfkit_encoding_bug: bool | None = None, processor: str = 'pdfkit') bytes | bool[source]
Takes HTML and either returns a PDF in memory or makes one on disk.
For preference, uses
wkhtmltopdf(withpdfkit):faster than
xhtml2pdftables not buggy like
Weasyprinthowever, doesn’t support CSS Paged Media, so we have the
header_htmlandfooter_htmloptions to allow you to pass appropriate HTML content to serve as the header/footer (rather than passing it within the main HTML).
- Parameters:
on_disk¶ – make file on disk (rather than returning it in memory)?
html¶ – main HTML
output_path¶ – if
on_disk, the output filenameheader_html¶ – optional page header, as HTML
footer_html¶ – optional page footer, as HTML
wkhtmltopdf_filename¶ – filename of the
wkhtmltopdfexecutablewkhtmltopdf_options¶ – options for
wkhtmltopdffile_encoding¶ – encoding to use when writing the header/footer to disk
debug_options¶ – log
wkhtmltopdfconfig/options passed topdfkit?debug_content¶ – log the main/header/footer HTML?
debug_wkhtmltopdf_args¶ – log the final command-line arguments to that will be used by
pdfkitwhen it callswkhtmltopdf?fix_pdfkit_encoding_bug¶ – attempt to work around bug in e.g.
pdfkit==0.5.0by encodingwkhtmltopdf_filenameto UTF-8 before passing it topdfkit? If you passNonehere, then a default value is used, fromget_default_fix_pdfkit_encoding_bug().processor¶ – a PDF processor type from
Processors
- Returns:
the PDF binary as a
bytesobject- Raises:
AssertionError – if bad
processorRuntimeError – if requested processor is unavailable
- cardinal_pythonlib.pdf.make_pdf_on_disk_from_html(html: str, output_path: str, header_html: str | None = None, footer_html: str | None = None, wkhtmltopdf_filename: str | None = None, wkhtmltopdf_options: Dict[str, Any] | None = None, file_encoding: str = 'utf-8', debug_options: bool = False, debug_content: bool = False, debug_wkhtmltopdf_args: bool = True, fix_pdfkit_encoding_bug: bool | None = None, processor: str = 'pdfkit') bool[source]
Takes HTML and writes a PDF to the file specified by
output_path.See the arguments to
make_pdf_from_html()(excepton_disk).- Returns:
success?
- cardinal_pythonlib.pdf.pdf_from_html(html: str, header_html: str | None = None, footer_html: str | None = None, wkhtmltopdf_filename: str | None = None, wkhtmltopdf_options: Dict[str, Any] | None = None, file_encoding: str = 'utf-8', debug_options: bool = False, debug_content: bool = False, fix_pdfkit_encoding_bug: bool = True, processor: str = 'pdfkit') bytes[source]
Older function name for
get_pdf_from_html()(q.v.).
- cardinal_pythonlib.pdf.pdf_from_writer(writer: PdfWriter) bytes[source]
Extracts a PDF (as binary data) from a pypdf writer object.