Analyzing Apple Health Sleep Data w/ Python & Polars

Overview
Prerequisites
Steps
Project Structure
Apple Health
Number Crunching
Sleep Log Data
Plotting
The Result
Future Plans

Overview

I’ve been sleeping poorly, so I started keeping a log of how well I feel I’ve slept. It’s nothing extraordinary — basically just a spreadsheet with the day and a totally non-scientific and subjective 1-5 rating of how well I feel I’ve slept. Better to successfully do something imperfect than get hung up on perfection and never do it!

I also wear an Apple Watch with sleep tracking most nights, so thought it would be cool to join that data with the export format Apple Health provides and graph it all. I used polars for data crunching and matplotlib for graphing.

Prerequisites

This blog post has instructions near the top on how to export your Apple Health data. You basically get a giant export.xml file. The data in this file is specific to you, but the schema of this file is the same for everyone.

This code also assumes you have an external *.csv file, representing a “sleep evaluation log”, with at least two columns —

“Day After” (which for me is a string column with dates that look like July 23, 2025)
“Quality” (which for me is a string column with options Terrible (1/5), Bad (2/5), Okay (3/5) , Good (4/5), Great (5/5).

Take these two files and put them somewhere together.

Steps

Project Structure

Here’s a pyproject.toml file with the necessary dependencies —

[project]
name = "apple-health-export"
version = "0.0.0"
requires-python = ">=3.9"
dependencies = [
    "cashews[diskcache]>=7.4.1",
    "matplotlib>=3.9.4",
    "pandas>=2.3.1",
    "pip>=25.1.1",
    "polars>=1.31.0",
    "pyarrow>=21.0.0",

    # jupyter dependencies, if you're doing this through a notebook
    "ipykernel>=6.30.0",
    "ipython>=8.18.1",
]

I recommend installing these with uv (i.e. install uv and then do a uv sync) and then creating a project.ipynb (aka Jupyter Notebook) in the same directory.

Apple Health

We’ll first ingest the export.xml file and convert it into a polars DataFrame. We use the cashews library as a caching layer because the actual data load here is pretty performance-intensive (takes ~20s for me) and we don’t want to hit this overhead every time we load in the workbook.

import xml.etree.ElementTree as ET
import polars as pl


@cache(ttl=60 * 60 * 24)
async def _get_data():
    """Parse the XML file and return a list of dictionaries."""
    tree = ET.parse("data/health.xml")
    root = tree.getroot()
    return [x.attrib for x in root.iter("Record")]


record_list = await _get_data()
record_list[:1]

This gives us a list of records that look like this —

[{'type': 'HKQuantityTypeIdentifierHeight',
  'sourceName': 'iPhone SE (2021)',
  'sourceVersion': '16.5',
  'unit': 'ft',
  'creationDate': '2023-07-15 10:07:55 -0500',
  'startDate': '2023-07-15 10:07:55 -0500',
  'endDate': '2023-07-15 10:07:55 -0500',
  'value': '5.83333'}]

For consistency, we’ll remap all our columns to snake_case , and drop the attributes we have no use for in order to reduce noise.

import polars as pl

df = pl.DataFrame(record_list)
df = df.rename(
    {
        "sourceName": "source_name",
        "sourceVersion": "source_version",
        "creationDate": "creation_date",
        "startDate": "start_dt",
        "endDate": "end_dt",
    },
).drop("source_version", "unit")
df

Number Crunching

We’ll first do some early data cleaning and filter to just data originating from a watch —

df = (
    df.with_columns(
        pl.col("type")
        .str.replace_all("HKQuantityTypeIdentifier", "")
        .str.replace_all("HKCategoryTypeIdentifier", ""),
    )
    .filter(
        # only stuff originating from one of my apple watches
        # later code is meant to function well whether or not it has the detailed information
        (pl.col("type") == "SleepAnalysis")
        & (pl.col("source_name").str.contains("Watch"))
    )
    .drop("type", "source_name")  # no more need for these, drop to reduce noise
    .with_columns(
        pl.col("creation_date")
        .str.strptime(pl.Datetime, "%Y-%m-%d %H:%M:%S %z")
        .dt.date()
        .alias("creation_date"),
        pl.col("start_dt").str.strptime(pl.Datetime, "%Y-%m-%d %H:%M:%S %z"),
        pl.col("end_dt").str.strptime(pl.Datetime, "%Y-%m-%d %H:%M:%S %z"),
        pl.col("value").cast(pl.Float64, strict=False).fill_null(1.0), # many metrics logged by apple health are a count, not an actual metric, so defaulting to 1 makes aggregation easier
    )
)
df

Then, because each individual sleep “record” isn’t a full night’s sleep, but instead might just be one little “segment” within each night, we do some aggregation to form the fields we’re interested in —

# the records we get are point-in-time snapshots, so we need to basically group by and pick the BOUNDS for each, in a way
df = (
    df.group_by("creation_date")
    .agg(
        pl.col("start_dt").min().alias("start_dt"),
        pl.col("end_dt").max().alias("end_dt"),
        pl.col("creation_date").count().alias("data_points"),
    )
    .with_columns((pl.col("end_dt") - pl.col("start_dt")).alias("duration"))
)
df

I only had a few months of “sleep log” data so felt it was best to truncate to the last few months.

import datetime

df = df.filter(
    pl.col("creation_date") >= (datetime.datetime.now() - datetime.timedelta(days=90))
)
df

Sleep Log Data

Now, we’ll join in the sleep log data and do a bit of data cleaning. Things like remapping the quality column I mentioned to be the actual numerator of the “score” field I was keeping.

import re


def extract_quality_numerator(q):
    if isinstance(q, str):
        match = re.search(r"(\d+)/\d+", q)
        if match:
            return int(match.group(1))
    return None


# we need creation_date to just be a date so we can productively merge it in
df = df.with_columns(pl.col("creation_date").dt.date().alias("creation_date"))

df = (
    df.join(
        pl.read_csv("data/notion.csv")
        .select("Day After", "Quality")
        .rename({"Day After": "day_after", "Quality": "quality"})
        .with_columns(
            pl.col("day_after").str.strptime(pl.Date, "%B %d, %Y").alias("day_after")
        ),
        left_on="creation_date",
        right_on="day_after",
        how="left",
    )
    .filter(pl.col("quality").is_not_null())
    .with_columns(
        pl.col("quality")
        .map_elements(extract_quality_numerator, return_dtype=pl.Int32)
        .alias("quality"),
    )
    .with_columns((pl.col("duration").dt.total_minutes()).alias("duration"))
    .filter(
        # >= 14hr or <= 2hr == obvious computation mistakes; TODO: Properly fix these, I think I have boundary issues
        (pl.col("duration") < (14 * 60)) & (pl.col("duration") > (2 * 60))
    )
)
df

Plotting

Most tutorials use matplotlib for this and I wanted to as well, but polars doesn’t have the simplest plotting experience, so I decided to convert over to pandas at this point.

df_pandas = df.to_pandas()
df_pandas.set_index("creation_date", inplace=True)
df_pandas.sort_index(inplace=True)
df_pandas

At this point, the most productive plot we have is creation_date on the x-axis and two different y-axes for duration and quality so that we can cross-reference them and try and identify trends.

# Plotting
import matplotlib.pyplot as plt

fig, ax1 = plt.subplots()

color = "tab:red"
ax1.set_xlabel("Date")
ax1.set_ylabel("Time Asleep (min)", color=color)
ax1.plot(
    df_pandas.index,
    df_pandas["duration"],
    color=color,
    marker="o",
    label="Time Asleep",
)
ax1.tick_params(axis="y", labelcolor=color)

ax2 = ax1.twinx()
color = "tab:blue"
ax2.set_ylabel("Quality Score", color=color)
ax2.plot(
    df_pandas.index,
    df_pandas["quality"],
    color=color,
    marker="x",
    label="Quality Score",
)
ax2.set_ylim(0, 5)
ax2.tick_params(axis="y", labelcolor=color)
plt.setp(ax1.get_xticklabels(), rotation=30, horizontalalignment="right")

fig.tight_layout()
plt.show()

The Result

Future Plans

This definitely isn’t perfect (I think there are still some minor issues here around things like how I want to consider a sleep window at like 10pm on a given day part of the “next” day), but it’s pretty solid and a nice place to increment from.

My “sleep log” has quite a few other variables I’ve logged, so the plan at some point is probably to train a rudimentary model off of these to identify top factors, and go from there. My sleep log dataset is probably still too small to form many conclusions, so it’ll be a bit.

I also recently got a watch that more finely tracks stuff like REM cycles and “types” of sleep, so incorporating that into the analysis would be cool.