cardinal_pythonlib.athena_ohdsi


Original code copyright (C) 2009-2022 Rudolf Cardinal (rudolf@pobox.com).

This file is part of cardinal_pythonlib.

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.


Functions to assist with the Athena OHDSI vocabularies.

See https://athena.ohdsi.org/.

class cardinal_pythonlib.athena_ohdsi.AthenaConceptRelationshipRow(concept_id_1: str, concept_id_2: str, relationship_id: str, valid_start_date: str, valid_end_date: str, invalid_reason: str)[source]

Simple information-holding class for CONCEPT_RELATIONSHIP.csv file from https://athena.ohdsi.org/ vocabulary download.

Argument order is important.

Parameters:
  • concept_id_1 – Athena concept ID #1

  • concept_id_2 – Athena concept ID #2

  • relationship_id – e.g. “Is a”, “Has legal category”

  • valid_start_date – date in YYYYMMDD format

  • valid_end_date – date in YYYYMMDD format

  • invalid_reason – ? (but one can guess)

class cardinal_pythonlib.athena_ohdsi.AthenaConceptRow(concept_id: str, concept_name: str, domain_id: str, vocabulary_id: str, concept_class_id: str, standard_concept: str, concept_code: str, valid_start_date: str, valid_end_date: str, invalid_reason: str)[source]

Simple information-holding class for CONCEPT.csv file from https://athena.ohdsi.org/ vocabulary download.

Argument order is important.

Parameters:
  • concept_id – Athena concept ID

  • concept_name – Concept name in the originating system

  • domain_id – e.g. “Observation”, “Condition”

  • vocabulary_id – e.g. “SNOMED”, “ICD10CM”

  • concept_class_id – e.g. “Substance”, “3-char nonbill code”

  • standard_concept – ?; e.g. “S”

  • concept_code – concept code in the vocabulary (e.g. SNOMED-CT concept code like “3578611000001105” if vocabulary_id is “SNOMED”; ICD-10 code like “F32.2” if vocabulary_is is “ICD10CM”; etc.)

  • valid_start_date – date in YYYYMMDD format

  • valid_end_date – date in YYYYMMDD format

  • invalid_reason – ? (but one can guess)

snomed_concept() SnomedConcept[source]

Assuming this Athena concept reflects a SnomedConcept, returns it.

(Asserts if it isn’t.)

class cardinal_pythonlib.athena_ohdsi.AthenaRelationshipId[source]

Constant-holding class for Athena relationship IDs that we care about. To show all (there are lots!):

awk 'BEGIN {FS="\t"}; {print $3}' CONCEPT_RELATIONSHIP.csv | sort -u
class cardinal_pythonlib.athena_ohdsi.AthenaVocabularyId[source]

Constant-holding class for Athena vocabulary IDs that we care about.

cardinal_pythonlib.athena_ohdsi.get_athena_concept_relationships(tsv_filename: str = '', cached_concept_relationships: Iterable[AthenaConceptRelationshipRow] | None = None, concept_id_1_values: Collection[int] | None = None, concept_id_2_values: Collection[int] | None = None, relationship_id_values: Collection[str] | None = None, not_concept_id_1_values: Collection[int] | None = None, not_concept_id_2_values: Collection[int] | None = None, not_relationship_id_values: Collection[str] | None = None, encoding: str = 'utf-8') List[AthenaConceptRelationshipRow][source]

From the Athena CONCEPT_RELATIONSHIP.csv tab-separated value file, return a list of relationships matching the restriction criteria.

Parameters:
  • tsv_filename – filename

  • cached_concept_relationships – alternative to tsv_filename

  • concept_id_1_values – permissible concept_id_1 values, or None or an empty list for all

  • concept_id_2_values – permissible concept_id_2 values, or None or an empty list for all

  • relationship_id_values – permissible relationship_id values, or None or an empty list for all

  • not_concept_id_1_values – impermissible concept_id_1 values, or None or an empty list for none

  • not_concept_id_2_values – impermissible concept_id_2 values, or None or an empty list for none

  • not_relationship_id_values – impermissible relationship_id values, or None or an empty list for none

  • encoding – encoding for input files

Returns:

of AthenaConceptRelationshipRow objects

Return type:

list

cardinal_pythonlib.athena_ohdsi.get_athena_concepts(tsv_filename: str = '', cached_concepts: Iterable[AthenaConceptRow] | None = None, vocabulary_ids: Collection[str] | None = None, concept_codes: Collection[str] | None = None, concept_ids: Collection[int] | None = None, not_vocabulary_ids: Collection[str] | None = None, not_concept_codes: Collection[str] | None = None, not_concept_ids: Collection[int] | None = None, encoding: str = 'utf-8') List[AthenaConceptRow][source]

From the Athena CONCEPT.csv tab-separated value file, return a list of concepts matching the restriction criteria.

Parameters:
  • tsv_filename – filename

  • cached_concepts – alternative to tsv_filename

  • vocabulary_ids – permissible vocabulary_id values, or None or an empty list for all

  • concept_codes – permissible concept_code values, or None or an empty list for all

  • concept_ids – permissible concept_id values, or None or an empty list for all

  • not_vocabulary_ids – impermissible vocabulary_id values, or None or an empty list for none

  • not_concept_codes – impermissible concept_code values, or None or an empty list for none

  • not_concept_ids – impermissible concept_id values, or None or an empty list for none

  • encoding – encoding for input files

Returns:

of AthenaConceptRow objects

Return type:

list

Test and timing code:

import logging
import timeit
logging.basicConfig(level=logging.DEBUG)

from cardinal_pythonlib.athena_ohdsi import (
    get_athena_concepts,
    get_athena_concept_relationships,
)

concept_filename = "CONCEPT.csv"
cr_filename = "CONCEPT_RELATIONSHIP.csv"
testcode = "175898006"
testid = 46067884

concept_testcode = '''
get_athena_concepts(concept_filename, concept_codes=[testcode])
'''
cr_testcode = '''
get_athena_concept_relationships(cr_filename, concept_id_1_values=[testid])
'''

timeit.timeit(cr_testcode, number=1, globals=globals())
# Initial method: 33.6 s (for 9.9m rows on a Windows laptop).
# Chain of generators: 21.5 s. Better.

timeit.timeit(concept_testcode, number=1, globals=globals())
# After speedup: 3.9 s for 1.1m rows.