- Overview
- Prerequisites
- Steps
- Project Structure
- Apple Health
- Number Crunching
- Sleep Log Data
- Plotting
- The Result
- Future Plans
Overview
I’ve been sleeping poorly, so I started keeping a log of how well I feel I’ve slept. It’s nothing extraordinary — basically just a spreadsheet with the day and a totally non-scientific and subjective 1-5 rating of how well I feel I’ve slept. Better to successfully do something imperfect than get hung up on perfection and never do it!
I also wear an Apple Watch with sleep tracking most nights, so thought it would be cool to join that data with the export format Apple Health provides and graph it all. I used polars for data crunching and matplotlib for graphing.
Prerequisites
This blog post has instructions near the top on how to export your Apple Health data. You basically get a giant export.xml
file. The data in this file is specific to you, but the schema of this file is the same for everyone.
This code also assumes you have an external *.csv
file, representing a “sleep evaluation log”, with at least two columns —
- “Day After” (which for me is a string column with dates that look like
July 23, 2025
) - “Quality” (which for me is a string column with options
Terrible (1/5)
,Bad (2/5)
,Okay (3/5)
,Good (4/5)
,Great (5/5)
.
Take these two files and put them somewhere together.
Steps
Project Structure
Here’s a pyproject.toml
file with the necessary dependencies —
[project]
name = "apple-health-export"
version = "0.0.0"
requires-python = ">=3.9"
dependencies = [
"cashews[diskcache]>=7.4.1",
"matplotlib>=3.9.4",
"pandas>=2.3.1",
"pip>=25.1.1",
"polars>=1.31.0",
"pyarrow>=21.0.0",
# jupyter dependencies, if you're doing this through a notebook
"ipykernel>=6.30.0",
"ipython>=8.18.1",
]
I recommend installing these with uv (i.e. install uv
and then do a uv sync
) and then creating a project.ipynb
(aka Jupyter Notebook) in the same directory.
Apple Health
We’ll first ingest the export.xml
file and convert it into a polars
DataFrame. We use the cashews
library as a caching layer because the actual data load here is pretty performance-intensive (takes ~20s for me) and we don’t want to hit this overhead every time we load in the workbook.
import xml.etree.ElementTree as ET
import polars as pl
@cache(ttl=60 * 60 * 24)
async def _get_data():
"""Parse the XML file and return a list of dictionaries."""
tree = ET.parse("data/health.xml")
root = tree.getroot()
return [x.attrib for x in root.iter("Record")]
record_list = await _get_data()
record_list[:1]
This gives us a list of records that look like this —
[{'type': 'HKQuantityTypeIdentifierHeight',
'sourceName': 'iPhone SE (2021)',
'sourceVersion': '16.5',
'unit': 'ft',
'creationDate': '2023-07-15 10:07:55 -0500',
'startDate': '2023-07-15 10:07:55 -0500',
'endDate': '2023-07-15 10:07:55 -0500',
'value': '5.83333'}]
For consistency, we’ll remap all our columns to snake_case
, and drop the attributes we have no use for in order to reduce noise.
import polars as pl
df = pl.DataFrame(record_list)
df = df.rename(
{
"sourceName": "source_name",
"sourceVersion": "source_version",
"creationDate": "creation_date",
"startDate": "start_dt",
"endDate": "end_dt",
},
).drop("source_version", "unit")
df
Number Crunching
We’ll first do some early data cleaning and filter to just data originating from a watch —
df = (
df.with_columns(
pl.col("type")
.str.replace_all("HKQuantityTypeIdentifier", "")
.str.replace_all("HKCategoryTypeIdentifier", ""),
)
.filter(
# only stuff originating from one of my apple watches
# later code is meant to function well whether or not it has the detailed information
(pl.col("type") == "SleepAnalysis")
& (pl.col("source_name").str.contains("Watch"))
)
.drop("type", "source_name") # no more need for these, drop to reduce noise
.with_columns(
pl.col("creation_date")
.str.strptime(pl.Datetime, "%Y-%m-%d %H:%M:%S %z")
.dt.date()
.alias("creation_date"),
pl.col("start_dt").str.strptime(pl.Datetime, "%Y-%m-%d %H:%M:%S %z"),
pl.col("end_dt").str.strptime(pl.Datetime, "%Y-%m-%d %H:%M:%S %z"),
pl.col("value").cast(pl.Float64, strict=False).fill_null(1.0), # many metrics logged by apple health are a count, not an actual metric, so defaulting to 1 makes aggregation easier
)
)
df
Then, because each individual sleep “record” isn’t a full night’s sleep, but instead might just be one little “segment” within each night, we do some aggregation to form the fields we’re interested in —
# the records we get are point-in-time snapshots, so we need to basically group by and pick the BOUNDS for each, in a way
df = (
df.group_by("creation_date")
.agg(
pl.col("start_dt").min().alias("start_dt"),
pl.col("end_dt").max().alias("end_dt"),
pl.col("creation_date").count().alias("data_points"),
)
.with_columns((pl.col("end_dt") - pl.col("start_dt")).alias("duration"))
)
df
I only had a few months of “sleep log” data so felt it was best to truncate to the last few months.
import datetime
df = df.filter(
pl.col("creation_date") >= (datetime.datetime.now() - datetime.timedelta(days=90))
)
df
Sleep Log Data
Now, we’ll join in the sleep log data and do a bit of data cleaning. Things like remapping the quality
column I mentioned to be the actual numerator of the “score” field I was keeping.
import re
def extract_quality_numerator(q):
if isinstance(q, str):
match = re.search(r"(\d+)/\d+", q)
if match:
return int(match.group(1))
return None
# we need creation_date to just be a date so we can productively merge it in
df = df.with_columns(pl.col("creation_date").dt.date().alias("creation_date"))
df = (
df.join(
pl.read_csv("data/notion.csv")
.select("Day After", "Quality")
.rename({"Day After": "day_after", "Quality": "quality"})
.with_columns(
pl.col("day_after").str.strptime(pl.Date, "%B %d, %Y").alias("day_after")
),
left_on="creation_date",
right_on="day_after",
how="left",
)
.filter(pl.col("quality").is_not_null())
.with_columns(
pl.col("quality")
.map_elements(extract_quality_numerator, return_dtype=pl.Int32)
.alias("quality"),
)
.with_columns((pl.col("duration").dt.total_minutes()).alias("duration"))
.filter(
# >= 14hr or <= 2hr == obvious computation mistakes; TODO: Properly fix these, I think I have boundary issues
(pl.col("duration") < (14 * 60)) & (pl.col("duration") > (2 * 60))
)
)
df
Plotting
Most tutorials use matplotlib
for this and I wanted to as well, but polars
doesn’t have the simplest plotting experience, so I decided to convert over to pandas
at this point.
df_pandas = df.to_pandas()
df_pandas.set_index("creation_date", inplace=True)
df_pandas.sort_index(inplace=True)
df_pandas
At this point, the most productive plot we have is creation_date
on the x-axis and two different y-axes for duration
and quality
so that we can cross-reference them and try and identify trends.
# Plotting
import matplotlib.pyplot as plt
fig, ax1 = plt.subplots()
color = "tab:red"
ax1.set_xlabel("Date")
ax1.set_ylabel("Time Asleep (min)", color=color)
ax1.plot(
df_pandas.index,
df_pandas["duration"],
color=color,
marker="o",
label="Time Asleep",
)
ax1.tick_params(axis="y", labelcolor=color)
ax2 = ax1.twinx()
color = "tab:blue"
ax2.set_ylabel("Quality Score", color=color)
ax2.plot(
df_pandas.index,
df_pandas["quality"],
color=color,
marker="x",
label="Quality Score",
)
ax2.set_ylim(0, 5)
ax2.tick_params(axis="y", labelcolor=color)
plt.setp(ax1.get_xticklabels(), rotation=30, horizontalalignment="right")
fig.tight_layout()
plt.show()
The Result
Future Plans
This definitely isn’t perfect (I think there are still some minor issues here around things like how I want to consider a sleep window at like 10pm on a given day part of the “next” day), but it’s pretty solid and a nice place to increment from.
My “sleep log” has quite a few other variables I’ve logged, so the plan at some point is probably to train a rudimentary model off of these to identify top factors, and go from there. My sleep log dataset is probably still too small to form many conclusions, so it’ll be a bit.
I also recently got a watch that more finely tracks stuff like REM cycles and “types” of sleep, so incorporating that into the analysis would be cool.