{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# Getting started with JupyterHealthClient\n",
    "\n",
    "First, you'll want to create a `JupyterHealthClient`.\n",
    "In a managed deployment, credentials are typically loaded from the `$JHE_TOKEN` and `$JHE_URL` environment variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1",
   "metadata": {},
   "outputs": [],
   "source": [
    "from jupyterhealth_client import Code, JupyterHealthClient\n",
    "\n",
    "# use anonymize=True to allow output in documentation\n",
    "jh_client = JupyterHealthClient(anonymize=True)\n",
    "# or jh_client = JupyterHealthClient(url=url, token=token)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2",
   "metadata": {},
   "source": [
    "## Retrieving information\n",
    "\n",
    "### Getting the current user\n",
    "\n",
    "First, we can see who we are logged in as:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3",
   "metadata": {},
   "outputs": [],
   "source": [
    "jh_client.get_user()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4",
   "metadata": {},
   "source": [
    "### Getting study information"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5",
   "metadata": {},
   "source": [
    "We can list all the studies I currently have access to,\n",
    "including the organization they are associated with.\n",
    "\n",
    "`study_id` will be useful for retrieving observations .ater."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"All my studies:\")\n",
    "for study in jh_client.list_studies():\n",
    "    print(f\"  - [{study['id']}] {study['name']} org:{study['organization']['name']}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7",
   "metadata": {},
   "source": [
    "And we can get a single study by id:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8",
   "metadata": {},
   "outputs": [],
   "source": [
    "jh_client.get_study(study[\"id\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9",
   "metadata": {},
   "source": [
    "### Getting patient information\n",
    "\n",
    "We can list patients we have access to with `list_patients()`,\n",
    "and see which studies they have shared data with using `get_patient_consents`.\n",
    "\n",
    "`list` endpoints all return _generators_ and should handle pagination automatically when there are a lot of results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10",
   "metadata": {},
   "outputs": [],
   "source": [
    "# show all the patients with study data I have access to:\n",
    "print(\"Patients with data I have access to:\")\n",
    "\n",
    "for patient in jh_client.list_patients():\n",
    "    consents = jh_client.get_patient_consents(patient[\"id\"])\n",
    "    if not consents[\"studies\"] and not consents[\"studiesPendingConsent\"]:\n",
    "        continue\n",
    "    print(\n",
    "        f\"[{patient['id']}] {patient['nameFamily']}, {patient['nameGiven']} ({patient['telecomEmail']})\"\n",
    "    )\n",
    "    for study in consents[\"studies\"]:\n",
    "        for scope in study[\"scopeConsents\"]:\n",
    "            if scope[\"consented\"]:\n",
    "                # remember which patients have which data for later in the demo\n",
    "                if scope[\"code\"][\"codingCode\"] == Code.BLOOD_GLUCOSE.value:\n",
    "                    cgm_patient_id = patient[\"id\"]\n",
    "                    cgm_study_id = study[\"id\"]\n",
    "                if scope[\"code\"][\"codingCode\"] == Code.BLOOD_PRESSURE.value:\n",
    "                    bp_patient_id = patient[\"id\"]\n",
    "                    bp_study_id = study[\"id\"]\n",
    "                print(f\"  - [{study['id']}] {study['name']} ({scope['code']['text']})\")\n",
    "    for study in consents[\"studiesPendingConsent\"]:\n",
    "        print(f\"  - (not consented) [{study['id']}] {study['name']}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11",
   "metadata": {},
   "source": [
    "## Retrieving Observations\n",
    "\n",
    "`list_observations_df` retrieves all observations into a pandas \n",
    "You can filter by:\n",
    "\n",
    "- `study_id` - fetch data authorized to a single study\n",
    "- `patient_id` - fetch data for a single patient\n",
    "- `code` - a `Code` filter to select only a single measurement type (e.g. `Code.BLOOD_PRESSURE`)\n",
    "\n",
    "At least one of `study_id` or `patient_id` must be specified.\n",
    "`code` is always optional.\n",
    "\n",
    "To get all blood pressure data for a single study:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12",
   "metadata": {},
   "outputs": [],
   "source": [
    "bp_iter = jh_client.list_observations(study_id=bp_study_id, code=Code.BLOOD_PRESSURE)\n",
    "bp_iter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "13",
   "metadata": {},
   "outputs": [],
   "source": [
    "observation = next(iter(bp_iter))\n",
    "observation"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14",
   "metadata": {},
   "source": [
    "The interesting data is in `valueAttachment`, which is a base64-encoded JSON blob. We can extract it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15",
   "metadata": {},
   "outputs": [],
   "source": [
    "import base64\n",
    "import json\n",
    "\n",
    "json.loads(base64.decodebytes(observation[\"valueAttachment\"][\"data\"].encode()).decode())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16",
   "metadata": {},
   "source": [
    "Or we can use `tidy_observation` to turn the nested structure of an Observation into one more suitable for DataFrames.\n",
    "\n",
    "`tidy_observation` takes nested fields and turns them into a single flat dictionary, so\n",
    "\n",
    "```python\n",
    "{\"a\": \"b\": 5}}\n",
    "```\n",
    "\n",
    "becomes\n",
    "\n",
    "```python\n",
    "{\"a_b\": 5}\n",
    "```\n",
    "\n",
    "`tidy_observation` also understands the structure of the `valueAttachment`, so it handles the base64/json bit, too:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "17",
   "metadata": {},
   "outputs": [],
   "source": [
    "from jupyterhealth_client import tidy_observation\n",
    "\n",
    "tidy_observation(observation)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18",
   "metadata": {},
   "source": [
    "### Loading observations into a DataFarme\n",
    "\n",
    "`list_observations_df` takes the same arguments as `list_observations`, but returns a DataFrame instead of a generator.\n",
    "The observations are passed through` tidy_observation`, so the keys above are the columns of the DataFrame.\n",
    "\n",
    "The same data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "19",
   "metadata": {},
   "outputs": [],
   "source": [
    "# get all blood pressure data\n",
    "full_bp = jh_client.list_observations_df(study_id=bp_study_id, code=Code.BLOOD_PRESSURE)\n",
    "full_bp.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20",
   "metadata": {},
   "source": [
    "The data frame preserves all fields recorded by JHE, which is a lot.\n",
    "You can thin this out by selecting columns to make things more manageable.\n",
    "\n",
    "Generally the most informative columns are:\n",
    "\n",
    "- `code` - the code identifying the data type for the row (if `code` isn't filtered; always matches the input `code`, if given)\n",
    "- `subject_reference` - the `Patient/$id` identifier (useful when you have retrieved data for multiple patients)\n",
    "- `effective_time_frame_date_time` - the effective time of the Observation in UTC. Also available as `effective_time_frame_date_time_local` if the local time-of-day at the time and place of measurement is useful.\n",
    "- `*_value` columns - the actual measurements, e.g. `systolic_blood_pressure_value`, `blood_glucose_value`, etc.\n",
    "\n",
    "Now we can use that and `groupby(\"subject_reference\")` in case we have more than one patient."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "21",
   "metadata": {},
   "outputs": [],
   "source": [
    "bp = full_bp[\n",
    "    [\n",
    "        \"subject_reference\",\n",
    "        \"effective_time_frame_date_time\",\n",
    "        \"systolic_blood_pressure_value\",\n",
    "        \"diastolic_blood_pressure_value\",\n",
    "    ]\n",
    "]\n",
    "bp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22",
   "metadata": {},
   "outputs": [],
   "source": [
    "bp.groupby(\"subject_reference\").plot(\n",
    "    x=\"effective_time_frame_date_time\",\n",
    "    y=[\"systolic_blood_pressure_value\", \"diastolic_blood_pressure_value\"],\n",
    "    style=\"o\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23",
   "metadata": {},
   "source": [
    "### Continuous Glucose Monitor (CGM) data for a single patient\n",
    "\n",
    "We can do the same with CGM data.\n",
    "This time, we use `patient_id` and `code` to retrieve CGM data for a single patient."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "24",
   "metadata": {},
   "outputs": [],
   "source": [
    "# get all cgm data\n",
    "full_cgm = jh_client.list_observations_df(\n",
    "    patient_id=cgm_patient_id, code=Code.BLOOD_GLUCOSE\n",
    ")\n",
    "full_cgm.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25",
   "metadata": {},
   "source": [
    "We can transform the data to have the columns expected by `cgmquantify` and plot it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "26",
   "metadata": {},
   "outputs": [],
   "source": [
    "import cgmquantify\n",
    "\n",
    "cgm = full_cgm.loc[:, [\"effective_time_frame_date_time_local\", \"blood_glucose_value\"]]\n",
    "# define columns cgmquantify expects\n",
    "cgm[\"Time\"] = cgm.effective_time_frame_date_time_local\n",
    "cgm[\"Glucose\"] = cgm.blood_glucose_value\n",
    "cgm[\"Day\"] = cgm[\"Time\"].dt.date\n",
    "cgmquantify.plotglucosebounds(cgm)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}