Getting started with JupyterHealthClient#

First, you’ll want to create a JupyterHealthClient. In a managed deployment, credentials are typically loaded from the $JHE_TOKEN and $JHE_URL environment variables.

from jupyterhealth_client import Code, JupyterHealthClient

# use anonymize=True to allow output in documentation
jh_client = JupyterHealthClient(anonymize=True)
# or jh_client = JupyterHealthClient(url=url, token=token)

Retrieving information#

Getting the current user#

First, we can see who we are logged in as:

jh_client.get_user()

Getting study information#

We can list all the studies I currently have access to, including the organization they are associated with.

study_id will be useful for retrieving observations .ater.

print("All my studies:")
for study in jh_client.list_studies():
    print(f"  - [{study['id']}] {study['name']} org:{study['organization']['name']}")

And we can get a single study by id:

jh_client.get_study(study["id"])

Getting patient information#

We can list patients we have access to with list_patients(), and see which studies they have shared data with using get_patient_consents.

list endpoints all return generators and should handle pagination automatically when there are a lot of results.

# show all the patients with study data I have access to:
print("Patients with data I have access to:")

for patient in jh_client.list_patients():
    consents = jh_client.get_patient_consents(patient["id"])
    if not consents["studies"] and not consents["studiesPendingConsent"]:
        continue
    print(
        f"[{patient['id']}] {patient['nameFamily']}, {patient['nameGiven']} ({patient['telecomEmail']})"
    )
    for study in consents["studies"]:
        for scope in study["scopeConsents"]:
            if scope["consented"]:
                # remember which patients have which data for later in the demo
                if scope["code"]["codingCode"] == Code.BLOOD_GLUCOSE.value:
                    cgm_patient_id = patient["id"]
                    cgm_study_id = study["id"]
                if scope["code"]["codingCode"] == Code.BLOOD_PRESSURE.value:
                    bp_patient_id = patient["id"]
                    bp_study_id = study["id"]
                print(f"  - [{study['id']}] {study['name']} ({scope['code']['text']})")
    for study in consents["studiesPendingConsent"]:
        print(f"  - (not consented) [{study['id']}] {study['name']}")

Retrieving Observations#

list_observations_df retrieves all observations into a pandas You can filter by:

  • study_id - fetch data authorized to a single study

  • patient_id - fetch data for a single patient

  • code - a Code filter to select only a single measurement type (e.g. Code.BLOOD_PRESSURE)

At least one of study_id or patient_id must be specified. code is always optional.

To get all blood pressure data for a single study:

bp_iter = jh_client.list_observations(study_id=bp_study_id, code=Code.BLOOD_PRESSURE)
bp_iter
observation = next(iter(bp_iter))
observation

The interesting data is in valueAttachment, which is a base64-encoded JSON blob. We can extract it:

import base64
import json

json.loads(base64.decodebytes(observation["valueAttachment"]["data"].encode()).decode())

Or we can use tidy_observation to turn the nested structure of an Observation into one more suitable for DataFrames.

tidy_observation takes nested fields and turns them into a single flat dictionary, so

{"a": "b": 5}}

becomes

{"a_b": 5}

tidy_observation also understands the structure of the valueAttachment, so it handles the base64/json bit, too:

from jupyterhealth_client import tidy_observation

tidy_observation(observation)

Loading observations into a DataFarme#

list_observations_df takes the same arguments as list_observations, but returns a DataFrame instead of a generator. The observations are passed through tidy_observation, so the keys above are the columns of the DataFrame.

The same data:

# get all blood pressure data
full_bp = jh_client.list_observations_df(study_id=bp_study_id, code=Code.BLOOD_PRESSURE)
full_bp.columns

The data frame preserves all fields recorded by JHE, which is a lot. You can thin this out by selecting columns to make things more manageable.

Generally the most informative columns are:

  • code - the code identifying the data type for the row (if code isn’t filtered; always matches the input code, if given)

  • subject_reference - the Patient/$id identifier (useful when you have retrieved data for multiple patients)

  • effective_time_frame_date_time - the effective time of the Observation in UTC. Also available as effective_time_frame_date_time_local if the local time-of-day at the time and place of measurement is useful.

  • *_value columns - the actual measurements, e.g. systolic_blood_pressure_value, blood_glucose_value, etc.

Now we can use that and groupby("subject_reference") in case we have more than one patient.

bp = full_bp[
    [
        "subject_reference",
        "effective_time_frame_date_time",
        "systolic_blood_pressure_value",
        "diastolic_blood_pressure_value",
    ]
]
bp
bp.groupby("subject_reference").plot(
    x="effective_time_frame_date_time",
    y=["systolic_blood_pressure_value", "diastolic_blood_pressure_value"],
    style="o",
)

Continuous Glucose Monitor (CGM) data for a single patient#

We can do the same with CGM data. This time, we use patient_id and code to retrieve CGM data for a single patient.

# get all cgm data
full_cgm = jh_client.list_observations_df(
    patient_id=cgm_patient_id, code=Code.BLOOD_GLUCOSE
)
full_cgm.columns

We can transform the data to have the columns expected by cgmquantify and plot it:

import cgmquantify

cgm = full_cgm.loc[:, ["effective_time_frame_date_time_local", "blood_glucose_value"]]
# define columns cgmquantify expects
cgm["Time"] = cgm.effective_time_frame_date_time_local
cgm["Glucose"] = cgm.blood_glucose_value
cgm["Day"] = cgm["Time"].dt.date
cgmquantify.plotglucosebounds(cgm)