cardinal_pythonlib.psychiatry.treatment_resistant_depression

Original code copyright (C) 2009-2022 Rudolf Cardinal (rudolf@pobox.com).

This file is part of cardinal_pythonlib.

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Helper functions for algorithmic definitions of treatment-resistant depression.

Performance notes:

200 test patients; baseline about 7.65-8.57 seconds (25 Hz).
From https://stackoverflow.com/questions/19237878/ to https://stackoverflow.com/questions/17071871/select-rows-from-a-dataframe-based-on-values-in-a-column-in-pandas # noqa: E501
Change from parallel to single-threading: down to 4.38 s (!).
Avoid a couple of slices: down to 3.85 s for 200 patients.
Add test patient E; up to 4.63 s for 250 patients (54 Hz).
On a live set (different test computer), single-threaded: 901.9 s for 4154 patients (4.6 Hz).
One pointless indexing call removed: 863.2s for 4154 patients (4.8 Hz).
Loop boundary tweak: 3.95 s for 300 test patients (76 Hz).
From iloc to iat: 3.79s (79 Hz)
These two are very helpful:
- https://stackoverflow.com/questions/28757389/loc-vs-iloc-vs-ix-vs-at-vs-iat
- https://medium.com/dunder-data/selecting-subsets-of-data-in-pandas-39e811c81a0c
Switching from tp.loc[conditions] to tp[conditions] didn’t make much difference, but the code is a bit cleaner
Anyway, we should profile (see the PROFILE flag). That shows the main time is spent in my algorithmic code, not in DataFrame operations.
Not creating unnecessary results DataFrame objects shaved things down from 5.7 to 3.9 s in the profiler.
Still slower in parallel. Time is spent in thread locking.
Adjust A loop condition: 3.9 to 3.6s.
Profiler off: 2.38s for 300 patients, or 126 Hz. Let’s call that a day; we’ve achieved a 5-fold speedup.

cardinal_pythonlib.psychiatry.treatment_resistant_depression.timedelta_days(days: int) → timedelta64[source]: Convert a duration in days to a NumPy timedelta64 object.

cardinal_pythonlib.psychiatry.treatment_resistant_depression.two_antidepressant_episodes(patient_drug_date_df: DataFrame, patient_colname: str = 'patient_id', drug_colname: str = 'drug', date_colname: str = 'date', course_length_days: int = 28, expect_response_by_days: int = 56, symptom_assessment_time_days: int = 180, n_threads: int = 1, first_episode_only: bool = True) → DataFrame[source]

Takes a pandas DataFrame, patient_drug_date_df (or, via reticulate, an R data.frame or data.table). This should contain dated present-tense references to antidepressant drugs (only).

Returns a set of result rows as a DataFrame.

cardinal_pythonlib.psychiatry.treatment_resistant_depression.two_antidepressant_episodes_single_patient(patient_id: str, patient_drug_date_df: DataFrame, patient_colname: str = 'patient_id', drug_colname: str = 'drug', date_colname: str = 'date', course_length_days: int = 28, expect_response_by_days: int = 56, symptom_assessment_time_days: int = 180, first_episode_only: bool = True) → DataFrame | None[source]

Processes a single patient for two_antidepressant_episodes() (q.v.).

Implements the key algorithm.