cardinal_pythonlib.psychiatry.treatment_resistant_depression
Original code copyright (C) 2009-2022 Rudolf Cardinal (rudolf@pobox.com).
This file is part of cardinal_pythonlib.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Helper functions for algorithmic definitions of treatment-resistant depression.
Performance notes:
200 test patients; baseline about 7.65-8.57 seconds (25 Hz).
From https://stackoverflow.com/questions/19237878/ to https://stackoverflow.com/questions/17071871/select-rows-from-a-dataframe-based-on-values-in-a-column-in-pandas # noqa
Change from parallel to single-threading: down to 4.38 s (!).
Avoid a couple of slices: down to 3.85 s for 200 patients.
Add test patient E; up to 4.63 s for 250 patients (54 Hz).
On a live set (different test computer), single-threaded: 901.9 s for 4154 patients (4.6 Hz).
One pointless indexing call removed: 863.2s for 4154 patients (4.8 Hz).
Loop boundary tweak: 3.95 s for 300 test patients (76 Hz).
From iloc to iat: 3.79s (79 Hz)
These two are very helpful:
Switching from tp.loc[conditions] to tp[conditions] didn’t make much difference, but the code is a bit cleaner
Anyway, we should profile (see the PROFILE flag). That shows the main time is spent in my algorithmic code, not in DataFrame operations.
Not creating unnecessary results DataFrame objects shaved things down from 5.7 to 3.9 s in the profiler.
Still slower in parallel. Time is spent in thread locking.
Adjust A loop condition: 3.9 to 3.6s.
Profiler off: 2.38s for 300 patients, or 126 Hz. Let’s call that a day; we’ve achieved a 5-fold speedup.
- cardinal_pythonlib.psychiatry.treatment_resistant_depression.timedelta_days(days: int) timedelta64 [source]
Convert a duration in days to a NumPy
timedelta64
object.
- cardinal_pythonlib.psychiatry.treatment_resistant_depression.two_antidepressant_episodes(patient_drug_date_df: DataFrame, patient_colname: str = 'patient_id', drug_colname: str = 'drug', date_colname: str = 'date', course_length_days: int = 28, expect_response_by_days: int = 56, symptom_assessment_time_days: int = 180, n_threads: int = 1, first_episode_only: bool = True) DataFrame [source]
Takes a pandas
DataFrame
,patient_drug_date_df
(or, viareticulate
, an Rdata.frame
ordata.table
). This should contain dated present-tense references to antidepressant drugs (only).Returns a set of result rows as a
DataFrame
.
- cardinal_pythonlib.psychiatry.treatment_resistant_depression.two_antidepressant_episodes_single_patient(patient_id: str, patient_drug_date_df: DataFrame, patient_colname: str = 'patient_id', drug_colname: str = 'drug', date_colname: str = 'date', course_length_days: int = 28, expect_response_by_days: int = 56, symptom_assessment_time_days: int = 180, first_episode_only: bool = True) DataFrame | None [source]
Processes a single patient for
two_antidepressant_episodes()
(q.v.).Implements the key algorithm.