{ "cells": [ { "cell_type": "markdown", "id": "0", "metadata": {}, "source": [ "# Getting started with JupyterHealthClient\n", "\n", "First, you'll want to create a `JupyterHealthClient`.\n", "In a managed deployment, credentials are typically loaded from the `$JHE_TOKEN` and `$JHE_URL` environment variables." ] }, { "cell_type": "code", "execution_count": null, "id": "1", "metadata": {}, "outputs": [], "source": [ "from jupyterhealth_client import Code, JupyterHealthClient\n", "\n", "# use anonymize=True to allow output in documentation\n", "jh_client = JupyterHealthClient(anonymize=True)\n", "# or jh_client = JupyterHealthClient(url=url, token=token)" ] }, { "cell_type": "markdown", "id": "2", "metadata": {}, "source": [ "## Retrieving information\n", "\n", "### Getting the current user\n", "\n", "First, we can see who we are logged in as:" ] }, { "cell_type": "code", "execution_count": null, "id": "3", "metadata": {}, "outputs": [], "source": [ "jh_client.get_user()" ] }, { "cell_type": "markdown", "id": "4", "metadata": {}, "source": [ "### Getting study information" ] }, { "cell_type": "markdown", "id": "5", "metadata": {}, "source": [ "We can list all the studies I currently have access to,\n", "including the organization they are associated with.\n", "\n", "`study_id` will be useful for retrieving observations .ater." ] }, { "cell_type": "code", "execution_count": null, "id": "6", "metadata": {}, "outputs": [], "source": [ "print(\"All my studies:\")\n", "for study in jh_client.list_studies():\n", " print(f\" - [{study['id']}] {study['name']} org:{study['organization']['name']}\")" ] }, { "cell_type": "markdown", "id": "7", "metadata": {}, "source": [ "And we can get a single study by id:" ] }, { "cell_type": "code", "execution_count": null, "id": "8", "metadata": {}, "outputs": [], "source": [ "jh_client.get_study(study[\"id\"])" ] }, { "cell_type": "markdown", "id": "9", "metadata": {}, "source": [ "### Getting patient information\n", "\n", "We can list patients we have access to with `list_patients()`,\n", "and see which studies they have shared data with using `get_patient_consents`.\n", "\n", "`list` endpoints all return _generators_ and should handle pagination automatically when there are a lot of results." ] }, { "cell_type": "code", "execution_count": null, "id": "10", "metadata": {}, "outputs": [], "source": [ "# show all the patients with study data I have access to:\n", "print(\"Patients with data I have access to:\")\n", "\n", "for patient in jh_client.list_patients():\n", " consents = jh_client.get_patient_consents(patient[\"id\"])\n", " if not consents[\"studies\"] and not consents[\"studiesPendingConsent\"]:\n", " continue\n", " print(\n", " f\"[{patient['id']}] {patient['nameFamily']}, {patient['nameGiven']} ({patient['telecomEmail']})\"\n", " )\n", " for study in consents[\"studies\"]:\n", " for scope in study[\"scopeConsents\"]:\n", " if scope[\"consented\"]:\n", " # remember which patients have which data for later in the demo\n", " if scope[\"code\"][\"codingCode\"] == Code.BLOOD_GLUCOSE.value:\n", " cgm_patient_id = patient[\"id\"]\n", " cgm_study_id = study[\"id\"]\n", " if scope[\"code\"][\"codingCode\"] == Code.BLOOD_PRESSURE.value:\n", " bp_patient_id = patient[\"id\"]\n", " bp_study_id = study[\"id\"]\n", " print(f\" - [{study['id']}] {study['name']} ({scope['code']['text']})\")\n", " for study in consents[\"studiesPendingConsent\"]:\n", " print(f\" - (not consented) [{study['id']}] {study['name']}\")" ] }, { "cell_type": "markdown", "id": "11", "metadata": {}, "source": [ "## Retrieving Observations\n", "\n", "`list_observations_df` retrieves all observations into a pandas \n", "You can filter by:\n", "\n", "- `study_id` - fetch data authorized to a single study\n", "- `patient_id` - fetch data for a single patient\n", "- `code` - a `Code` filter to select only a single measurement type (e.g. `Code.BLOOD_PRESSURE`)\n", "\n", "At least one of `study_id` or `patient_id` must be specified.\n", "`code` is always optional.\n", "\n", "To get all blood pressure data for a single study:" ] }, { "cell_type": "code", "execution_count": null, "id": "12", "metadata": {}, "outputs": [], "source": [ "bp_iter = jh_client.list_observations(study_id=bp_study_id, code=Code.BLOOD_PRESSURE)\n", "bp_iter" ] }, { "cell_type": "code", "execution_count": null, "id": "13", "metadata": {}, "outputs": [], "source": [ "observation = next(iter(bp_iter))\n", "observation" ] }, { "cell_type": "markdown", "id": "14", "metadata": {}, "source": [ "The interesting data is in `valueAttachment`, which is a base64-encoded JSON blob. We can extract it:" ] }, { "cell_type": "code", "execution_count": null, "id": "15", "metadata": {}, "outputs": [], "source": [ "import base64\n", "import json\n", "\n", "json.loads(base64.decodebytes(observation[\"valueAttachment\"][\"data\"].encode()).decode())" ] }, { "cell_type": "markdown", "id": "16", "metadata": {}, "source": [ "Or we can use `tidy_observation` to turn the nested structure of an Observation into one more suitable for DataFrames.\n", "\n", "`tidy_observation` takes nested fields and turns them into a single flat dictionary, so\n", "\n", "```python\n", "{\"a\": \"b\": 5}}\n", "```\n", "\n", "becomes\n", "\n", "```python\n", "{\"a_b\": 5}\n", "```\n", "\n", "`tidy_observation` also understands the structure of the `valueAttachment`, so it handles the base64/json bit, too:" ] }, { "cell_type": "code", "execution_count": null, "id": "17", "metadata": {}, "outputs": [], "source": [ "from jupyterhealth_client import tidy_observation\n", "\n", "tidy_observation(observation)" ] }, { "cell_type": "markdown", "id": "18", "metadata": {}, "source": [ "### Loading observations into a DataFarme\n", "\n", "`list_observations_df` takes the same arguments as `list_observations`, but returns a DataFrame instead of a generator.\n", "The observations are passed through` tidy_observation`, so the keys above are the columns of the DataFrame.\n", "\n", "The same data:" ] }, { "cell_type": "code", "execution_count": null, "id": "19", "metadata": {}, "outputs": [], "source": [ "# get all blood pressure data\n", "full_bp = jh_client.list_observations_df(study_id=bp_study_id, code=Code.BLOOD_PRESSURE)\n", "full_bp.columns" ] }, { "cell_type": "markdown", "id": "20", "metadata": {}, "source": [ "The data frame preserves all fields recorded by JHE, which is a lot.\n", "You can thin this out by selecting columns to make things more manageable.\n", "\n", "Generally the most informative columns are:\n", "\n", "- `code` - the code identifying the data type for the row (if `code` isn't filtered; always matches the input `code`, if given)\n", "- `subject_reference` - the `Patient/$id` identifier (useful when you have retrieved data for multiple patients)\n", "- `effective_time_frame_date_time` - the effective time of the Observation in UTC. Also available as `effective_time_frame_date_time_local` if the local time-of-day at the time and place of measurement is useful.\n", "- `*_value` columns - the actual measurements, e.g. `systolic_blood_pressure_value`, `blood_glucose_value`, etc.\n", "\n", "Now we can use that and `groupby(\"subject_reference\")` in case we have more than one patient." ] }, { "cell_type": "code", "execution_count": null, "id": "21", "metadata": {}, "outputs": [], "source": [ "bp = full_bp[\n", " [\n", " \"subject_reference\",\n", " \"effective_time_frame_date_time\",\n", " \"systolic_blood_pressure_value\",\n", " \"diastolic_blood_pressure_value\",\n", " ]\n", "]\n", "bp" ] }, { "cell_type": "code", "execution_count": null, "id": "22", "metadata": {}, "outputs": [], "source": [ "bp.groupby(\"subject_reference\").plot(\n", " x=\"effective_time_frame_date_time\",\n", " y=[\"systolic_blood_pressure_value\", \"diastolic_blood_pressure_value\"],\n", " style=\"o\",\n", ")" ] }, { "cell_type": "markdown", "id": "23", "metadata": {}, "source": [ "### Continuous Glucose Monitor (CGM) data for a single patient\n", "\n", "We can do the same with CGM data.\n", "This time, we use `patient_id` and `code` to retrieve CGM data for a single patient." ] }, { "cell_type": "code", "execution_count": null, "id": "24", "metadata": {}, "outputs": [], "source": [ "# get all cgm data\n", "full_cgm = jh_client.list_observations_df(\n", " patient_id=cgm_patient_id, code=Code.BLOOD_GLUCOSE\n", ")\n", "full_cgm.columns" ] }, { "cell_type": "markdown", "id": "25", "metadata": {}, "source": [ "We can transform the data to have the columns expected by `cgmquantify` and plot it:" ] }, { "cell_type": "code", "execution_count": null, "id": "26", "metadata": {}, "outputs": [], "source": [ "import cgmquantify\n", "\n", "cgm = full_cgm.loc[:, [\"effective_time_frame_date_time_local\", \"blood_glucose_value\"]]\n", "# define columns cgmquantify expects\n", "cgm[\"Time\"] = cgm.effective_time_frame_date_time_local\n", "cgm[\"Glucose\"] = cgm.blood_glucose_value\n", "cgm[\"Day\"] = cgm[\"Time\"].dt.date\n", "cgmquantify.plotglucosebounds(cgm)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.10" } }, "nbformat": 4, "nbformat_minor": 5 }