cardinal_pythonlib.psychiatry.timeline
Original code copyright (C) 2009-2022 Rudolf Cardinal (rudolf@pobox.com).
This file is part of cardinal_pythonlib.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Timeline calculations. Primarily for a lithium/renal function project, Apr 2019. Code is in DRAFT.
Usage from R:
# -------------------------------------------------------------------------
# Load libraries
# -------------------------------------------------------------------------
RUN_ONCE_ONLY <- '
library(devtools)
devtools::install_github("rstudio/reticulate") # get latest version
'
library(data.table)
library(reticulate)
# -------------------------------------------------------------------------
# Set up reticulate
# -------------------------------------------------------------------------
VENV <- "~/dev/venvs/cardinal_pythonlib" # or your preferred virtualenv
PYTHON_EXECUTABLE <- ifelse(
.Platform$OS.type == "windows",
file.path(VENV, "Scripts", "python.exe"), # Windows
file.path(VENV, "bin", "python") # Linux
)
reticulate::use_python(PYTHON_EXECUTABLE, required=TRUE)
# -------------------------------------------------------------------------
# Import Python modules
# -------------------------------------------------------------------------
cpl_version <- reticulate::import("cardinal_pythonlib.version")
cpl_version$assert_version_eq("1.0.50")
cpl_logs <- reticulate::import("cardinal_pythonlib.logs")
cpl_logs$main_only_quicksetup_rootlogger()
cpl_timeline <- reticulate::import("cardinal_pythonlib.psychiatry.timeline")
# -------------------------------------------------------------------------
# Do something
# -------------------------------------------------------------------------
testdata_drug_events <- data.table(
patient_id=c(
rep("Alice", 3),
rep("Bob", 3)
),
drug_event_datetime=as.Date(c(
# Alice
"2018-01-05",
"2018-01-20",
"2018-04-01",
# Bob
"2018-06-05",
"2018-08-20",
"2018-10-01"
))
)
testdata_query_times <- data.table(
patient_id=c(
rep("Alice", 3),
rep("Bob", 3)
),
start=as.Date(c(
# Alice
rep("2017-01-01", 3),
# Bob
rep("2015-01-01", 3)
)),
when=as.Date(c(
# Alice
"2018-01-01",
"2018-01-10",
"2018-02-01",
# Bob
"2018-01-01",
"2018-09-10",
"2019-02-01"
))
)
testresult <- data.table(cpl_timeline$cumulative_time_on_drug(
drug_events_df=testdata_drug_events,
event_lasts_for_quantity=3,
event_lasts_for_units="days",
query_times_df=testdata_query_times,
patient_colname="patient_id",
event_datetime_colname="drug_event_datetime",
start_colname="start",
when_colname="when",
debug=TRUE
))
print(testresult)
The result should be:
> print(testdata_drug_events)
patient_id drug_event_datetime
1: Alice 2018-01-05
2: Alice 2018-01-20
3: Alice 2018-04-01
4: Bob 2018-06-05
5: Bob 2018-08-20
6: Bob 2018-10-01
> print(testdata_query_times)
patient_id start when
1: Alice 2017-01-01 2018-01-01
2: Alice 2017-01-01 2018-01-10
3: Alice 2017-01-01 2018-02-01
4: Bob 2015-01-01 2018-01-01
5: Bob 2015-01-01 2018-09-10
6: Bob 2015-01-01 2019-02-01
> print(testresult)
patient_id start t before_days during_days after_days
1: Alice 2017-01-01 2018-01-01 365 0 0
2: Alice 2017-01-01 2018-01-10 369 3 2
3: Alice 2017-01-01 2018-02-01 369 6 21
4: Bob 2015-01-01 2018-01-01 1096 0 0
5: Bob 2015-01-01 2018-09-10 1251 6 91
6: Bob 2015-01-01 2019-02-01 1251 9 232
However, there is a reticulate
bug that can cause problems, by corrupting
dates passed from R to Python:
# PROBLEM on 2018-04-05, with reticulate 1.11.1:
# - the R data.table is fine
# - all the dates become the same date when it's seen by Python (the value
# of the first row in each date column)
# - when used without R, the Python code is fine
# - therefore, a problem with reticulate converting data for Python
# - same with data.frame() as with data.table()
# - same with as.Date.POSIXct() and as.Date.POSIXlt() as with as.Date()
# Further test:
cpl_rfunc <- reticulate::import("cardinal_pythonlib.psychiatry.rfunc")
cat(cpl_rfunc$get_python_repr(testdata_drug_events))
cat(cpl_rfunc$get_python_repr_of_type(testdata_drug_events))
print(testdata_drug_events)
print(reticulate::r_to_py(testdata_drug_events))
# Minimum reproducible example:
library(reticulate)
testdata_drug_events <- data.frame(
patient_id=c(
rep("Alice", 3),
rep("Bob", 3)
),
drug_event_datetime=as.Date(c(
# Alice
"2018-01-05",
"2018-01-20",
"2018-04-01",
# Bob
"2018-06-05",
"2018-08-20",
"2018-10-01"
))
)
print(testdata_drug_events)
print(reticulate::r_to_py(testdata_drug_events))
# The R data is:
#
# patient_id drug_event_datetime
# 1 Alice 2018-01-05
# 2 Alice 2018-01-20
# 3 Alice 2018-04-01
# 4 Bob 2018-06-05
# 5 Bob 2018-08-20
# 6 Bob 2018-10-01
#
# Output from reticulate::r_to_py() in the buggy version is:
#
# patient_id drug_event_datetime
# 0 Alice 2018-01-05
# 1 Alice 2018-01-05
# 2 Alice 2018-01-05
# 3 Bob 2018-01-05
# 4 Bob 2018-01-05
# 5 Bob 2018-01-05
#
# Known bug: https://github.com/rstudio/reticulate/issues/454
#
# Use remove.packages() then reinstall from github as above, giving
# reticulate_1.11.1-9000 [see sessionInfo()]...
# ... yes, that fixes it.
- cardinal_pythonlib.psychiatry.timeline.cumulative_time_on_drug(drug_events_df: DataFrame, query_times_df: DataFrame, event_lasts_for_timedelta: timedelta | None = None, event_lasts_for_quantity: float | None = None, event_lasts_for_units: str | None = None, patient_colname: str = 'patient_id', event_datetime_colname: str = 'drug_event_datetime', start_colname: str = 'start', when_colname: str = 'when', include_timedelta_in_output: bool = False, debug: bool = False) DataFrame [source]
- Parameters:
drug_events_df¶ – pandas
DataFrame
containing the event data, with columns named according topatient_colname
,event_datetime_colname
event_lasts_for_timedelta¶ – when an event occurs, how long is it assumed to last for? For example, if a prescription of lithium occurs on 2001-01-01, how long is the patient presumed to be taking lithium as a consequence (e.g. 1 day? 28 days? 6 months?)
event_lasts_for_quantity¶ – as an alternative to
event_lasts_for_timedelta
, particularly if you are calling from R to Python viareticulate
(which doesn’t convert Ras.difftime()
to Pythondatetime.timedelta
), you can specifyevent_lasts_for_quantity
, a number andevent_lasts_for_units
(q.v.).event_lasts_for_units¶ – specify the units for
event_lasts_for_quantity
(q.v.), if used; e.g."days"
. The string value must be the name of an argument to the Pythondatetime.timedelta
constructor.query_times_df¶ – times to query for, with columns named according to
patient_colname
,start_colname
, andwhen_colname
patient_colname¶ – name of the column in
drug_events_df
andquery_time_df
containing the patient IDevent_datetime_colname¶ – name of the column in
drug_events_df
containing the date/time of each eventstart_colname¶ – name of the column in
query_time_df
containing the date/time representing the overall start time for the relevant patient (from which cumulative times are calculated)when_colname¶ – name of the column in
query_time_df
containing date/time values at which to queryinclude_timedelta_in_output¶ – include
datetime.timedelta
values in the output? The default isFalse
as this isn’t supported by R/reticulate
.debug¶ – print debugging information to the log?
- Returns:
DataFrame
with the requested data
- cardinal_pythonlib.psychiatry.timeline.drug_timelines(drug_events_df: DataFrame, event_lasts_for: timedelta, patient_colname: str = 'patient_id', event_datetime_colname: str = 'drug_event_datetime') Dict[Any, IntervalList] [source]
Takes a set of drug event start times (one or more per patient), plus a fixed time that each event is presumed to last for, and returns an
IntervalList
for each patient representing the set of events (which may overlap, in which case they will be amalgamated).- Parameters:
drug_events_df¶ – pandas
DataFrame
containing the event dataevent_lasts_for¶ – when an event occurs, how long is it assumed to last for? For example, if a prescription of lithium occurs on 2001-01-01, how long is the patient presumed to be taking lithium as a consequence (e.g. 1 day? 28 days? 6 months?)
patient_colname¶ – name of the column in
drug_events_df
containing the patient IDevent_datetime_colname¶ – name of the column in
drug_events_df
containing the date/time of each event
- Returns:
mapping patient ID to a
IntervalList
object indicating the amalgamated intervals from the events- Return type:
dict