cardinal_pythonlib.psychiatry.timeline

Original code copyright (C) 2009-2022 Rudolf Cardinal (rudolf@pobox.com).

This file is part of cardinal_pythonlib.

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Timeline calculations. Primarily for a lithium/renal function project, Apr 2019. Code is in DRAFT.

Usage from R:

# -------------------------------------------------------------------------
# Load libraries
# -------------------------------------------------------------------------

RUN_ONCE_ONLY <- '
    library(devtools)
    devtools::install_github("rstudio/reticulate")  # get latest version
'
library(data.table)
library(reticulate)

# -------------------------------------------------------------------------
# Set up reticulate
# -------------------------------------------------------------------------

VENV <- "~/dev/venvs/cardinal_pythonlib"  # or your preferred virtualenv
PYTHON_EXECUTABLE <- ifelse(
    .Platform$OS.type == "windows",
    file.path(VENV, "Scripts", "python.exe"),  # Windows
    file.path(VENV, "bin", "python")  # Linux
)
reticulate::use_python(PYTHON_EXECUTABLE, required=TRUE)

# -------------------------------------------------------------------------
# Import Python modules
# -------------------------------------------------------------------------

cpl_version <- reticulate::import("cardinal_pythonlib.version")
cpl_version$assert_version_eq("1.0.50")
cpl_logs <- reticulate::import("cardinal_pythonlib.logs")
cpl_logs$main_only_quicksetup_rootlogger()

cpl_timeline <- reticulate::import("cardinal_pythonlib.psychiatry.timeline")

# -------------------------------------------------------------------------
# Do something
# -------------------------------------------------------------------------

testdata_drug_events <- data.table(
    patient_id=c(
        rep("Alice", 3),
        rep("Bob", 3)
    ),
    drug_event_datetime=as.Date(c(
        # Alice
        "2018-01-05",
        "2018-01-20",
        "2018-04-01",
        # Bob
        "2018-06-05",
        "2018-08-20",
        "2018-10-01"
    ))
)
testdata_query_times <- data.table(
    patient_id=c(
        rep("Alice", 3),
        rep("Bob", 3)
    ),
    start=as.Date(c(
        # Alice
        rep("2017-01-01", 3),
        # Bob
        rep("2015-01-01", 3)
    )),
    when=as.Date(c(
        # Alice
        "2018-01-01",
        "2018-01-10",
        "2018-02-01",
        # Bob
        "2018-01-01",
        "2018-09-10",
        "2019-02-01"
    ))
)
testresult <- data.table(cpl_timeline$cumulative_time_on_drug(
    drug_events_df=testdata_drug_events,
    event_lasts_for_quantity=3,
    event_lasts_for_units="days",
    query_times_df=testdata_query_times,
    patient_colname="patient_id",
    event_datetime_colname="drug_event_datetime",
    start_colname="start",
    when_colname="when",
    debug=TRUE
))
print(testresult)

The result should be:

> print(testdata_drug_events)

   patient_id drug_event_datetime
     Alice          2018-01-05
     Alice          2018-01-20
     Alice          2018-04-01
       Bob          2018-06-05
       Bob          2018-08-20
       Bob          2018-10-01

> print(testdata_query_times)

   patient_id      start       when
     Alice 2017-01-01 2018-01-01
     Alice 2017-01-01 2018-01-10
     Alice 2017-01-01 2018-02-01
       Bob 2015-01-01 2018-01-01
       Bob 2015-01-01 2018-09-10
       Bob 2015-01-01 2019-02-01

> print(testresult)

   patient_id      start          t before_days during_days after_days
     Alice 2017-01-01 2018-01-01         365           0          0
     Alice 2017-01-01 2018-01-10         369           3          2
     Alice 2017-01-01 2018-02-01         369           6         21
       Bob 2015-01-01 2018-01-01        1096           0          0
       Bob 2015-01-01 2018-09-10        1251           6         91
       Bob 2015-01-01 2019-02-01        1251           9        232

However, there is a reticulate bug that can cause problems, by corrupting dates passed from R to Python:

# PROBLEM on 2018-04-05, with reticulate 1.11.1:
# - the R data.table is fine
# - all the dates become the same date when it's seen by Python (the value
#   of the first row in each date column)
# - when used without R, the Python code is fine
# - therefore, a problem with reticulate converting data for Python
# - same with data.frame() as with data.table()
# - same with as.Date.POSIXct() and as.Date.POSIXlt() as with as.Date()

# Further test:

cpl_rfunc <- reticulate::import("cardinal_pythonlib.psychiatry.rfunc")
cat(cpl_rfunc$get_python_repr(testdata_drug_events))
cat(cpl_rfunc$get_python_repr_of_type(testdata_drug_events))
print(testdata_drug_events)
print(reticulate::r_to_py(testdata_drug_events))

# Minimum reproducible example:

library(reticulate)
testdata_drug_events <- data.frame(
    patient_id=c(
        rep("Alice", 3),
        rep("Bob", 3)
    ),
    drug_event_datetime=as.Date(c(
        # Alice
        "2018-01-05",
        "2018-01-20",
        "2018-04-01",
        # Bob
        "2018-06-05",
        "2018-08-20",
        "2018-10-01"
    ))
)
print(testdata_drug_events)
print(reticulate::r_to_py(testdata_drug_events))

# The R data is:
#
#       patient_id drug_event_datetime
#     1      Alice          2018-01-05
#     2      Alice          2018-01-20
#     3      Alice          2018-04-01
#     4        Bob          2018-06-05
#     5        Bob          2018-08-20
#     6        Bob          2018-10-01
#
# Output from reticulate::r_to_py() in the buggy version is:
#
#       patient_id drug_event_datetime
#     0      Alice          2018-01-05
#     1      Alice          2018-01-05
#     2      Alice          2018-01-05
#     3        Bob          2018-01-05
#     4        Bob          2018-01-05
#     5        Bob          2018-01-05
#
# Known bug: https://github.com/rstudio/reticulate/issues/454
#
# Use remove.packages() then reinstall from github as above, giving
# reticulate_1.11.1-9000 [see sessionInfo()]...
# ... yes, that fixes it.

cardinal_pythonlib.psychiatry.timeline.cumulative_time_on_drug(drug_events_df: DataFrame, query_times_df: DataFrame, event_lasts_for_timedelta: timedelta | None = None, event_lasts_for_quantity: float | None = None, event_lasts_for_units: str | None = None, patient_colname: str = 'patient_id', event_datetime_colname: str = 'drug_event_datetime', start_colname: str = 'start', when_colname: str = 'when', include_timedelta_in_output: bool = False, debug: bool = False) → DataFrame[source]

Parameters:

drug_events_df¶ – pandas DataFrame containing the event data, with columns named according to patient_colname, event_datetime_colname
event_lasts_for_timedelta¶ – when an event occurs, how long is it assumed to last for? For example, if a prescription of lithium occurs on 2001-01-01, how long is the patient presumed to be taking lithium as a consequence (e.g. 1 day? 28 days? 6 months?)
event_lasts_for_quantity¶ – as an alternative to event_lasts_for_timedelta, particularly if you are calling from R to Python via reticulate (which doesn’t convert R as.difftime() to Python datetime.timedelta), you can specify event_lasts_for_quantity, a number and event_lasts_for_units (q.v.).
event_lasts_for_units¶ – specify the units for event_lasts_for_quantity (q.v.), if used; e.g. "days". The string value must be the name of an argument to the Python datetime.timedelta constructor.
query_times_df¶ – times to query for, with columns named according to patient_colname, start_colname, and when_colname
patient_colname¶ – name of the column in drug_events_df and query_time_df containing the patient ID
event_datetime_colname¶ – name of the column in drug_events_df containing the date/time of each event
start_colname¶ – name of the column in query_time_df containing the date/time representing the overall start time for the relevant patient (from which cumulative times are calculated)
when_colname¶ – name of the column in query_time_df containing date/time values at which to query
include_timedelta_in_output¶ – include datetime.timedelta values in the output? The default is False as this isn’t supported by R/reticulate.
debug¶ – print debugging information to the log?

Returns:

DataFrame with the requested data

cardinal_pythonlib.psychiatry.timeline.drug_timelines(drug_events_df: DataFrame, event_lasts_for: timedelta, patient_colname: str = 'patient_id', event_datetime_colname: str = 'drug_event_datetime') → Dict[Any, IntervalList][source]

Takes a set of drug event start times (one or more per patient), plus a fixed time that each event is presumed to last for, and returns an IntervalList for each patient representing the set of events (which may overlap, in which case they will be amalgamated).

Parameters:

drug_events_df¶ – pandas DataFrame containing the event data
event_lasts_for¶ – when an event occurs, how long is it assumed to last for? For example, if a prescription of lithium occurs on 2001-01-01, how long is the patient presumed to be taking lithium as a consequence (e.g. 1 day? 28 days? 6 months?)
patient_colname¶ – name of the column in drug_events_df containing the patient ID
event_datetime_colname¶ – name of the column in drug_events_df containing the date/time of each event

Returns:

mapping patient ID to a IntervalList object indicating the amalgamated intervals from the events

Return type:

dict