cardinal_pythonlib.sqlalchemy.merge_db

Original code copyright (C) 2009-2022 Rudolf Cardinal (rudolf@pobox.com).

This file is part of cardinal_pythonlib.

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Function “merge_db” to merge two databases via SQLAlchemy.

Notes:

Note in passing: there is no common base class for SQLAlchemy ORM instances (it’s not DeclarativeMeta). For example, in CamCOPS:

> Phq9.__bases__
(<class 'camcops_server.cc_modules.cc_task.TaskHasPatientMixin'>,
 <class 'camcops_server.cc_modules.cc_task.Task'>,
 <class 'sqlalchemy.ext.declarative.api.Base'>)

… and that last Base isn’t a permanent class, just a newly named thing; see sqlalchemy.ext.declarative.api.declarative_base().

Again, with the CamCOPS classes:

> issubclass(Phq9, Base)
True

> issubclass(Base, DeclarativeMeta)
False

> Base.__bases__
(<class 'object'>,)

So the best type hints we have are:

class: Type
instance: object

class cardinal_pythonlib.sqlalchemy.merge_db.TableDependency(parent_table_id: TableIdentity | None = None, child_table_id: TableIdentity | None = None, parent_table: Table | None = None, child_table: Table | None = None, parent_tablename: str | None = None, child_tablename: str | None = None, metadata: MetaData | None = None)[source]

Stores a table dependency for use in functions such as sqlalchemy.schema.sort_tables(), which requires a tuple of two Table objects, in the order (parent, child), where child depends on parent (e.g. a field like child.parent_id refers to parent.id).

The parent and child tables can be specified by name, Table object, or our TableIdentity descriptor class.

property child_table: Table: Returns the child table as a Table.

property child_tablename: str: Returns the child table’s string name.

property parent_table: Table: Returns the parent table as a Table.

property parent_tablename: str: Returns the parent table’s string name.

set_metadata(metadata: MetaData) → None[source]: Sets the metadata for the parent and child tables.

set_metadata_if_none(metadata: MetaData) → None[source]: Sets the metadata for the parent and child tables, unless they were set already.

sqla_tuple() → Tuple[Table, Table][source]: Returns the tuple (parent_table, child_table), both as Table objects.

class cardinal_pythonlib.sqlalchemy.merge_db.TableDependencyClassification(table: Table, children: List[Table] | None = None, parents: List[Table] | None = None)[source]

Class to describe/classify a table in terms of its dependencies.

Parameters:

table¶ – the table in question
children¶ – its children (things that depend on it)
parents¶ – its parents (things that it depends on)

property child_names: List[str]: Returns the names of this table’s children.

property circular_description: str: Description of the circular chain.

property description: str: Short description.

property is_child: bool: Is this table a child?

property is_parent: bool: Is this table a parent?

property parent_names: List[str]: Returns the names of this table’s parents.

set_circular(circular: bool, chain: List[Table] | None = None) → None[source]

Mark this table as circular (or not).

Parameters:

circular¶ – is it circular?
chain¶ – if it’s circular, this should be the list of tables participating in the circular chain

property standalone: bool: Is this table standalone (neither a child nor a parent)?

property tablename: str: Returns the table’s name.

class cardinal_pythonlib.sqlalchemy.merge_db.TranslationContext(oldobj: object, newobj: object, objmap: Dict[object, object], table: Table, tablename: str, src_session: Session, dst_session: Session, src_engine: Engine, dst_engine: Engine, src_table_names: List[str], missing_src_columns: List[str] | None = None, info: Dict[str, Any] | None = None)[source]

Information-passing object for user callbacks from merge_db().

Parameters:

oldobj¶ – The old SQLAlchemy ORM object from the source session.
newobj¶ –
The framework’s go at building a new SQLAlchemy ORM object, which will be inserted into the destination session.

The sequence is:
1. newobj is created
2. a TranslationContext is created, referring to newobj
3. The translate_fn parameter to merge_db() will be called with the TranslationContext as its parameter
  - the user-suppled translate_fn() function can, at this point, modify the newobj attribute
  - if the user function sets the newobj attribute to None, this object will be skipped
4. If the TranslationContext’s newobj member is not None, the new object is inserted into the destination session.
objmap¶ – A dictionary mapping old to new objects, for objects in tables other than standalone tables.
table¶ – SQLAlchemy Table object from the metadata. (Not necessarily bound to any session, but will reflect the structure of the destination, not necessarily the source, since the merge operation assumes that the metadata describes the destination.)
tablename¶ – Table name that corresponds to table.
src_session¶ – The SQLAlchemy Session object for the source.
dst_session¶ – The SQLAlchemy Session object for the destination.
src_engine¶ – The SQLAlchemy Engine object for the source.
dst_engine¶ – The SQLAlchemy Engine object for the destination.
missing_src_columns¶ – Names of columns known to be present in the destination but absent from the source.
info¶ – Extra dictionary for additional user-specified information.

It is possible that oldobj and newobj are the SAME OBJECT.

cardinal_pythonlib.sqlalchemy.merge_db.classify_tables_by_dependency_type(metadata: MetaData, extra_dependencies: List[TableDependency] | None = None, sort: bool = True) → List[TableDependencyClassification][source]

Inspects a metadata object (optionally adding other specified dependencies) and returns a list of objects describing their dependencies.

Parameters:

metadata¶ – the MetaData to inspect
extra_dependencies¶ – additional dependencies
sort¶ – sort the results by table name?

Returns:

list of TableDependencyClassification objects, one for each table

cardinal_pythonlib.sqlalchemy.merge_db.get_all_dependencies(metadata: MetaData, extra_dependencies: List[TableDependency] | None = None, sort: bool = True) → List[TableDependency][source]

Describes how the tables found in the metadata depend on each other. (If table B contains a foreign key to table A, for example, then B depends on A.)

Parameters:

metadata¶ – the metadata to inspect
extra_dependencies¶ – additional table dependencies to specify manually
sort¶ – sort into alphabetical order of (parent, child) table names?

Returns:

a list of TableDependency objects

See sort_tables_and_constraints() for method.

cardinal_pythonlib.sqlalchemy.merge_db.merge_db(base_class: Type, src_engine: Engine, dst_session: Session, allow_missing_src_tables: bool = True, allow_missing_src_columns: bool = True, translate_fn: Callable[[TranslationContext], None] | None = None, skip_tables: List[TableIdentity] | None = None, only_tables: List[TableIdentity] | None = None, tables_to_keep_pks_for: List[TableIdentity] | None = None, extra_table_dependencies: List[TableDependency] | None = None, dummy_run: bool = False, info_only: bool = False, report_every: int = 1000, flush_per_table: bool = True, flush_per_record: bool = False, commit_with_flush: bool = False, commit_at_end: bool = True, prevent_eager_load: bool = True, trcon_info: Dict[str, Any] | None = None) → None[source]

Copies an entire database as far as it is described by metadata and base_class, from SQLAlchemy ORM session src_session to dst_session, and in the process:

creates new primary keys at the destination, or raises an error if it doesn’t know how (typically something like: Field 'name' doesn't have a default value)
maintains relationships, or raises an error if it doesn’t know how

Basic method:

Examines the metadata for the SQLAlchemy ORM base class you provide.
Assumes that the tables exist (in the destination).
For each table/ORM class found in the metadata:
- Queries (via the ORM) from the source.
- For each ORM instance retrieved:
  - Writes information to the destination SQLAlchemy session.
  - If that ORM object has relationships, process them too.

If a table is missing in the source, then that’s OK if and only if allow_missing_src_tables is set. (Similarly with columns and allow_missing_src_columns; we ask the ORM to perform a partial load, of a subset of attributes only.)

Parameters:

base_class¶ – your ORM base class, e.g. from Base = declarative_base()
src_engine¶ – SQLALchemy Engine for the source database
dst_session¶ – SQLAlchemy Session for the destination database
allow_missing_src_tables¶ – proceed if tables are missing from the source (allowing you to import from older, incomplete databases)
allow_missing_src_columns¶ – proceed if columns are missing from the source (allowing you to import from older, incomplete databases)

translate_fn¶ –

optional function called with each instance, so you can modify instances in the pipeline. Signature:

def my_translate_fn(trcon: TranslationContext) -> None:
    # We can modify trcon.newobj, or replace it (including
    # setting trcon.newobj = None to omit this object).
    pass

skip_tables¶ – tables to skip (specified as a list of TableIdentity)
only_tables¶ – tables to restrict the processor to (specified as a list of TableIdentity)
tables_to_keep_pks_for¶ – tables for which PKs are guaranteed to be safe to insert into the destination database, without modification (specified as a list of TableIdentity)
extra_table_dependencies¶ – optional list of TableDependency objects (q.v.)
dummy_run¶ – don’t alter the destination database
info_only¶ – show info, then stop
report_every¶ – provide a progress report every n records
flush_per_table¶ – flush the session after every table (reasonable)
flush_per_record¶ – flush the session after every instance (AVOID this if tables may refer to themselves)
commit_with_flush¶ – COMMIT with each flush?
commit_at_end¶ – COMMIT when finished?
prevent_eager_load¶ – disable any eager loading (use lazy loading instead)
trcon_info¶ – additional dictionary passed to TranslationContext.info (see TranslationContext)