cardinal_pythonlib.athena_ohdsi
Original code copyright (C) 2009-2022 Rudolf Cardinal (rudolf@pobox.com).
This file is part of cardinal_pythonlib.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Functions to assist with the Athena OHDSI vocabularies.
See https://athena.ohdsi.org/.
- class cardinal_pythonlib.athena_ohdsi.AthenaConceptRelationshipRow(concept_id_1: str, concept_id_2: str, relationship_id: str, valid_start_date: str, valid_end_date: str, invalid_reason: str)[source]
Simple information-holding class for
CONCEPT_RELATIONSHIP.csv
file from https://athena.ohdsi.org/ vocabulary download.Argument order is important.
- class cardinal_pythonlib.athena_ohdsi.AthenaConceptRow(concept_id: str, concept_name: str, domain_id: str, vocabulary_id: str, concept_class_id: str, standard_concept: str, concept_code: str, valid_start_date: str, valid_end_date: str, invalid_reason: str)[source]
Simple information-holding class for
CONCEPT.csv
file from https://athena.ohdsi.org/ vocabulary download.Argument order is important.
- Parameters:
concept_id¶ – Athena concept ID
concept_name¶ – Concept name in the originating system
domain_id¶ – e.g. “Observation”, “Condition”
vocabulary_id¶ – e.g. “SNOMED”, “ICD10CM”
concept_class_id¶ – e.g. “Substance”, “3-char nonbill code”
standard_concept¶ – ?; e.g. “S”
concept_code¶ – concept code in the vocabulary (e.g. SNOMED-CT concept code like “3578611000001105” if vocabulary_id is “SNOMED”; ICD-10 code like “F32.2” if vocabulary_is is “ICD10CM”; etc.)
valid_start_date¶ – date in YYYYMMDD format
valid_end_date¶ – date in YYYYMMDD format
invalid_reason¶ – ? (but one can guess)
- snomed_concept() SnomedConcept [source]
Assuming this Athena concept reflects a SnomedConcept, returns it.
(Asserts if it isn’t.)
- class cardinal_pythonlib.athena_ohdsi.AthenaRelationshipId[source]
Constant-holding class for Athena relationship IDs that we care about. To show all (there are lots!):
awk 'BEGIN {FS="\t"}; {print $3}' CONCEPT_RELATIONSHIP.csv | sort -u
- class cardinal_pythonlib.athena_ohdsi.AthenaVocabularyId[source]
Constant-holding class for Athena vocabulary IDs that we care about.
- cardinal_pythonlib.athena_ohdsi.get_athena_concept_relationships(tsv_filename: str = '', cached_concept_relationships: Iterable[AthenaConceptRelationshipRow] | None = None, concept_id_1_values: Collection[int] | None = None, concept_id_2_values: Collection[int] | None = None, relationship_id_values: Collection[str] | None = None, not_concept_id_1_values: Collection[int] | None = None, not_concept_id_2_values: Collection[int] | None = None, not_relationship_id_values: Collection[str] | None = None, encoding: str = 'utf-8') List[AthenaConceptRelationshipRow] [source]
From the Athena
CONCEPT_RELATIONSHIP.csv
tab-separated value file, return a list of relationships matching the restriction criteria.- Parameters:
tsv_filename¶ – filename
cached_concept_relationships¶ – alternative to tsv_filename
concept_id_1_values¶ – permissible
concept_id_1
values, or None or an empty list for allconcept_id_2_values¶ – permissible
concept_id_2
values, or None or an empty list for allrelationship_id_values¶ – permissible
relationship_id
values, or None or an empty list for allnot_concept_id_1_values¶ – impermissible
concept_id_1
values, or None or an empty list for nonenot_concept_id_2_values¶ – impermissible
concept_id_2
values, or None or an empty list for nonenot_relationship_id_values¶ – impermissible
relationship_id
values, or None or an empty list for noneencoding¶ – encoding for input files
- Returns:
of
AthenaConceptRelationshipRow
objects- Return type:
list
- cardinal_pythonlib.athena_ohdsi.get_athena_concepts(tsv_filename: str = '', cached_concepts: Iterable[AthenaConceptRow] | None = None, vocabulary_ids: Collection[str] | None = None, concept_codes: Collection[str] | None = None, concept_ids: Collection[int] | None = None, not_vocabulary_ids: Collection[str] | None = None, not_concept_codes: Collection[str] | None = None, not_concept_ids: Collection[int] | None = None, encoding: str = 'utf-8') List[AthenaConceptRow] [source]
From the Athena
CONCEPT.csv
tab-separated value file, return a list of concepts matching the restriction criteria.- Parameters:
tsv_filename¶ – filename
cached_concepts¶ – alternative to tsv_filename
vocabulary_ids¶ – permissible
vocabulary_id
values, or None or an empty list for allconcept_codes¶ – permissible
concept_code
values, or None or an empty list for allconcept_ids¶ – permissible
concept_id
values, or None or an empty list for allnot_vocabulary_ids¶ – impermissible
vocabulary_id
values, or None or an empty list for nonenot_concept_codes¶ – impermissible
concept_code
values, or None or an empty list for nonenot_concept_ids¶ – impermissible
concept_id
values, or None or an empty list for noneencoding¶ – encoding for input files
- Returns:
of
AthenaConceptRow
objects- Return type:
list
Test and timing code:
import logging import timeit logging.basicConfig(level=logging.DEBUG) from cardinal_pythonlib.athena_ohdsi import ( get_athena_concepts, get_athena_concept_relationships, ) concept_filename = "CONCEPT.csv" cr_filename = "CONCEPT_RELATIONSHIP.csv" testcode = "175898006" testid = 46067884 concept_testcode = ''' get_athena_concepts(concept_filename, concept_codes=[testcode]) ''' cr_testcode = ''' get_athena_concept_relationships(cr_filename, concept_id_1_values=[testid]) ''' timeit.timeit(cr_testcode, number=1, globals=globals()) # Initial method: 33.6 s (for 9.9m rows on a Windows laptop). # Chain of generators: 21.5 s. Better. timeit.timeit(concept_testcode, number=1, globals=globals()) # After speedup: 3.9 s for 1.1m rows.