cardinal_pythonlib.athena_ohdsi¶
Original code copyright (C) 2009-2022 Rudolf Cardinal (rudolf@pobox.com).
This file is part of cardinal_pythonlib.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Functions to assist with the Athena OHDSI vocabularies.
See https://athena.ohdsi.org/.
-
class
cardinal_pythonlib.athena_ohdsi.
AthenaConceptRelationshipRow
(concept_id_1: str, concept_id_2: str, relationship_id: str, valid_start_date: str, valid_end_date: str, invalid_reason: str)[source]¶ Simple information-holding class for
CONCEPT_RELATIONSHIP.csv
file from https://athena.ohdsi.org/ vocabulary download.Argument order is important.
Parameters: - concept_id_1 – Athena concept ID #1
- concept_id_2 – Athena concept ID #2
- relationship_id – e.g. “Is a”, “Has legal category”
- valid_start_date – date in YYYYMMDD format
- valid_end_date – date in YYYYMMDD format
- invalid_reason – ? (but one can guess)
-
class
cardinal_pythonlib.athena_ohdsi.
AthenaConceptRow
(concept_id: str, concept_name: str, domain_id: str, vocabulary_id: str, concept_class_id: str, standard_concept: str, concept_code: str, valid_start_date: str, valid_end_date: str, invalid_reason: str)[source]¶ Simple information-holding class for
CONCEPT.csv
file from https://athena.ohdsi.org/ vocabulary download.Argument order is important.
Parameters: - concept_id – Athena concept ID
- concept_name – Concept name in the originating system
- domain_id – e.g. “Observation”, “Condition”
- vocabulary_id – e.g. “SNOMED”, “ICD10CM”
- concept_class_id – e.g. “Substance”, “3-char nonbill code”
- standard_concept – ?; e.g. “S”
- concept_code – concept code in the vocabulary (e.g. SNOMED-CT concept code like “3578611000001105” if vocabulary_id is “SNOMED”; ICD-10 code like “F32.2” if vocabulary_is is “ICD10CM”; etc.)
- valid_start_date – date in YYYYMMDD format
- valid_end_date – date in YYYYMMDD format
- invalid_reason – ? (but one can guess)
-
class
cardinal_pythonlib.athena_ohdsi.
AthenaRelationshipId
[source]¶ Constant-holding class for Athena relationship IDs that we care about. To show all (there are lots!):
awk 'BEGIN {FS="\t"}; {print $3}' CONCEPT_RELATIONSHIP.csv | sort -u
-
class
cardinal_pythonlib.athena_ohdsi.
AthenaVocabularyId
[source]¶ Constant-holding class for Athena vocabulary IDs that we care about.
-
cardinal_pythonlib.athena_ohdsi.
get_athena_concept_relationships
(tsv_filename: str = '', cached_concept_relationships: Iterable[cardinal_pythonlib.athena_ohdsi.AthenaConceptRelationshipRow] = None, concept_id_1_values: Collection[int] = None, concept_id_2_values: Collection[int] = None, relationship_id_values: Collection[str] = None, not_concept_id_1_values: Collection[int] = None, not_concept_id_2_values: Collection[int] = None, not_relationship_id_values: Collection[str] = None, encoding: str = 'utf-8') → List[cardinal_pythonlib.athena_ohdsi.AthenaConceptRelationshipRow][source]¶ From the Athena
CONCEPT_RELATIONSHIP.csv
tab-separated value file, return a list of relationships matching the restriction criteria.Parameters: - tsv_filename – filename
- cached_concept_relationships – alternative to tsv_filename
- concept_id_1_values – permissible
concept_id_1
values, or None or an empty list for all - concept_id_2_values – permissible
concept_id_2
values, or None or an empty list for all - relationship_id_values – permissible
relationship_id
values, or None or an empty list for all - not_concept_id_1_values – impermissible
concept_id_1
values, or None or an empty list for none - not_concept_id_2_values – impermissible
concept_id_2
values, or None or an empty list for none - not_relationship_id_values – impermissible
relationship_id
values, or None or an empty list for none - encoding – encoding for input files
Returns: of
AthenaConceptRelationshipRow
objectsReturn type: list
-
cardinal_pythonlib.athena_ohdsi.
get_athena_concepts
(tsv_filename: str = '', cached_concepts: Iterable[cardinal_pythonlib.athena_ohdsi.AthenaConceptRow] = None, vocabulary_ids: Collection[str] = None, concept_codes: Collection[str] = None, concept_ids: Collection[int] = None, not_vocabulary_ids: Collection[str] = None, not_concept_codes: Collection[str] = None, not_concept_ids: Collection[int] = None, encoding: str = 'utf-8') → List[cardinal_pythonlib.athena_ohdsi.AthenaConceptRow][source]¶ From the Athena
CONCEPT.csv
tab-separated value file, return a list of concepts matching the restriction criteria.Parameters: - tsv_filename – filename
- cached_concepts – alternative to tsv_filename
- vocabulary_ids – permissible
vocabulary_id
values, or None or an empty list for all - concept_codes – permissible
concept_code
values, or None or an empty list for all - concept_ids – permissible
concept_id
values, or None or an empty list for all - not_vocabulary_ids – impermissible
vocabulary_id
values, or None or an empty list for none - not_concept_codes – impermissible
concept_code
values, or None or an empty list for none - not_concept_ids – impermissible
concept_id
values, or None or an empty list for none - encoding – encoding for input files
Returns: of
AthenaConceptRow
objectsReturn type: list
Test and timing code:
import logging import timeit logging.basicConfig(level=logging.DEBUG) from cardinal_pythonlib.athena_ohdsi import ( get_athena_concepts, get_athena_concept_relationships, ) concept_filename = "CONCEPT.csv" cr_filename = "CONCEPT_RELATIONSHIP.csv" testcode = "175898006" testid = 46067884 concept_testcode = ''' get_athena_concepts(concept_filename, concept_codes=[testcode]) ''' cr_testcode = ''' get_athena_concept_relationships(cr_filename, concept_id_1_values=[testid]) ''' timeit.timeit(cr_testcode, number=1, globals=globals()) # Initial method: 33.6 s (for 9.9m rows on a Windows laptop). # Chain of generators: 21.5 s. Better. timeit.timeit(concept_testcode, number=1, globals=globals()) # After speedup: 3.9 s for 1.1m rows.