Sleep Guide · Spoke 19

Sleep Trackers (Oura, Whoop, Apple Watch): what they measure and how accurate they are

A ring or a band promises you a sleep score, deep-sleep minutes and a recovery value every night. Which of that is measured, which is estimated, and how accurate is it against polysomnography in the sleep lab? And when does tracking turn from a tool into a stressor?

Shukri Jarmoukli · Physician, Integrative Medicine · ViveCura Berlin
My starting point

People come into my practice with screenshots of their sleep app: "I only got 42 minutes of deep sleep, is that normal?" The honest answer is: this value is an algorithmic estimate, not a measurement, and dwelling on it can disturb sleep more than improve it. Sleep trackers are useful tools when you know what they can and cannot do. Modern devices often capture heart rate and HRV well (de Zambotti 2019). They only estimate the sleep stages, with limited accuracy against polysomnography (Chinoy 2021). And the pursuit of perfect values has been given a name, orthosomnia (Baron 2017). A sleep tracker answers the question "how do I behave", not the question "how much am I worth". In this spoke I separate what the devices measure from what they guess, and show how you can use tracking without unsettling yourself.

This spoke is the technical workshop of the sleep cluster. We go through what sensors really measure, how far apart sleep-stage estimation and polysomnography are, what HRV during sleep means, when a tracker helps and when it harms (orthosomnia), how Oura, Whoop and Apple Watch differ, what no tracker can do, the KPNI lenses on the topic, and three concrete levers for the coming weeks.

The score becomes the nightly judge

It is a recurring pattern, not an isolated case: someone buys a ring or a band to improve their sleep. At first it is exciting. Then curiosity turns into control. Looking at the phone becomes the first act in the morning, and sometimes the last at night. A low sleep score colours the whole day, even though one did not feel bad beforehand. This is exactly the phenomenon Baron and colleagues described in 2017 as orthosomnia: people who seek medical advice because of their tracker values, even though their sleep is objectively sufficient. The tracker has hijacked perception.

The reframe in such cases is not "buy a better device", but "change how you read it". Away from the nightly control glance, toward the weekly trend. Away from the absolute deep-sleep value, toward the question of what alcohol, late eating or exercise do to the resting heart rate and HRV. And in some cases the most effective measure is a deliberate tracker break, until sleep belongs to the body again and not to the app.

What a sleep tracker really measures and what it only estimates

The most important sentence up front: a consumer sleep tracker measures some things directly and derives others indirectly. Anyone who knows this boundary reads their data correctly.

Three quantities are usually measured directly. First, movement via an accelerometer (this is the classic principle of actigraphy). Second, heart rate via optical sensors on the skin, so-called photoplethysmography, in which green or infrared light captures the fluctuations in blood volume. Third, in many devices, skin surface temperature. Heart rate variability (HRV) is calculated from the heart rate, meaning the variation in the intervals between individual heartbeats.

Estimated, meaning derived by an algorithm from these raw data, are by contrast: the division into sleep stages (light sleep, deep sleep, REM), the sleep score, and the recovery or readiness values. These derived numbers are models. They can be useful, but they are not direct measurements and they differ from manufacturer to manufacturer.

The decisive difference

Polysomnography in the sleep lab measures the sleep stages directly: via brain-wave recordings (EEG), eye movements (EOG) and muscle tension (EMG). A tracker has none of this on the head. It infers the probable stage from pulse, HRV and movement. So the rule is: the sleep-wake separation works usefully, while the stage division is a reasoned estimate, not a measured fact.

Sleep stages: why the estimate falls short against polysomnography

The sleep stages are the point where most misunderstandings arise. A tracker shows you "1 hour 12 minutes of deep sleep" with a precision that suggests a measurement. In reality it is an algorithmic estimate, and its accuracy is limited.

Study · Seven devices against polysomnography [Real-World]

Consumer trackers represent sleep-wake well, stages inconsistently

Real-World Validation Chinoy and colleagues compared the performance of seven consumer sleep trackers directly against polysomnography as the gold standard in 2021 in Sleep, supplemented by a research-grade actigraphy. The result in short: the devices represent the sleep-wake state usefully, but the assignment of the individual sleep stages is inconsistent and, depending on the device, variably reliable. Put differently, the devices recognise well whether you slept, but much less well which stage you were in. The study is one of the most frequently cited independent comparison studies on this topic.

Chinoy ED, Cuellar JA, Huwa KE, et al. Performance of seven consumer sleep-tracking devices compared with polysomnography. Sleep. 2021;44(5):zsaa291. doi:10.1093/sleep/zsaa291 · PMID: 33378539

Study · Oura Ring against polysomnography [Real-World]

Very good heart rate, good sleep detection, weaker stages

Real-World Validation de Zambotti and colleagues investigated the Oura Ring against polysomnography in 2019 in Behavioral Sleep Medicine. The nightly heart rate agreed very well with the ECG. For pure sleep detection (am I asleep or awake) the ring reached a high sensitivity of around 96 percent. For the assignment of the individual stages, agreement was, as expected, weaker: deep sleep tended to be overestimated, light sleep partly underestimated. The pattern is typical of consumer devices and underlines the message: the raw values are strong, the stage estimation is orientation.

de Zambotti M, Rosas L, Colrain IM, Baker FC. The Sleep of the Ring: Comparison of the OURA Sleep Tracker Against Polysomnography. Behav Sleep Med. 2019;17(2):124-136. doi:10.1080/15402002.2017.1300587 · PMID: 28323455

How good can a pure estimate even become? Altini and Kinnunen showed in 2021 in Sensors, using the Oura Ring, that four-stage classification (wake, light sleep, deep sleep, REM) reaches around 79 percent agreement with the expert judgement using combined sensors (movement plus heart rate plus temperature and algorithm), while a pure movement sensor manages only about 57 percent. The lesson: more sensors plus better algorithms improve the estimate considerably, but 79 percent is simply not 100 percent. So never compare the absolute deep-sleep minutes between two devices or with friends, only yourself with yourself, over time, with the same device.

HRV and heart rate: the truly robust values

If the stages are the weak point, heart rate and HRV are the strength of modern trackers. This is precisely where the real benefit lies, one that is often underestimated while everyone stares at deep sleep.

Study · Autonomic nervous system during sleep [Review]

HRV reflects autonomic control across the sleep stages

Review Tobaldini and colleagues described in 2013 in Frontiers in Physiology how the autonomic control of the heart changes across sleep and the sleep stages. In deep sleep (non-REM) the parasympathetic, recovery-promoting part tends to dominate, recognisable by a higher HRV and a lower heart rate. In REM sleep and in situations of strain, the balance shifts toward the sympathetic part. This is why the nightly HRV is a window onto recovery and strain. An HRV that declines over weeks or a rising resting heart rate may point to stress, alcohol, incipient infections or overload.

Tobaldini E, Nobili L, Strada S, et al. Heart rate variability in normal and pathological sleep. Front Physiol. 2013;4:294. doi:10.3389/fphys.2013.00294 · PMID: 24137133

In practice this means: HRV is the value that is most suitable as a trend marker for recovery, because it rests on the comparatively accurately measured heart rate. The emphasis on trend and on individual is important. The absolute HRV depends strongly on age, genetics, time of measurement and measurement method. A 25-year-old often has a considerably higher HRV than a 55-year-old, without that allowing any statement about sleep quality. The only meaningful question is: how does my value today compare with my own average over the past weeks?

Orthosomnia: when tracking worsens sleep

Perhaps the most important section of this spoke. Sleep trackers can have a paradoxical effect: the pursuit of perfect values worsens sleep. This phenomenon has a name.

Study · Coining of orthosomnia

When patients take the Quantified Self too far

Case series and editorial Baron and colleagues coined the term orthosomnia in 2017 in Journal of Clinical Sleep Medicine, modelled on orthorexia (compulsively healthy eating). They described patients who sought medical advice because of their sleep-tracker data, for example because of supposedly too little deep sleep, even though their sleep was objectively sufficient. The core problem: preoccupation with the values creates performance pressure and rumination in bed, exactly the tension that makes falling asleep harder and fragments sleep. On top of this, many users place more trust in the estimated tracker values than in their own sense of wellbeing. The tracker thus turns from a tool into a stressor.

Baron KG, Abbott S, Jao N, Manalo N, Mullen R. Orthosomnia: Are Some Patients Taking the Quantified Self Too Far? J Clin Sleep Med. 2017;13(2):351-354. doi:10.5664/jcsm.6472 · PMID: 27855740

What does not work

"If the score is bad, I have to work harder on my sleep." That is the orthosomnia trap. Sleep cannot be forced; the more you strain, the more awake you become. A bad score that unsettles you during the day and pressures you at night does more harm than the one average night it describes.

What carries instead: Treat the score as a rough weekly trend, not as a daily grade. If the values stress you more than they help, the most effective measure is often a tracker break. Your subjective feeling in the morning is a fully valid and sometimes more reliable data point than the estimated deep-sleep minutes.

Oura, Whoop and Apple Watch: what distinguishes the devices

The most frequent question is: which is the best sleep tracker? The honest answer: there is no overall winner, because the devices have different form factors and focuses and none reaches the precision of polysomnography. What matters is which device you wear consistently and how you use the data.

Oura Ring

Ring form factor, barely noticeable during sleep, good heart-rate and HRV data (validated by de Zambotti 2019). Also measures skin temperature. Strength: comfort and solid raw values. Weakness, as with all: stage estimation remains an estimate. Subscription model for the full app.

Whoop

Band without a display, focus on strain, recovery and HRV. No glance at the time or values on the wrist, which counters the orthosomnia risk. Strength: recovery tracking. Subscription-only hardware. The stage limitation applies here too.

Apple Watch

Multifunction smartwatch, sleep is one function among many. Advantage: already present if you use the watch. Disadvantage for sleep tracking: the battery has to be charged so the watch stays on the wrist at night. A display on the wrist can encourage control glances.

What they all have in common

Good sleep-wake detection, usable heart rate and HRV, limited stage accuracy and proprietary algorithms that are not directly comparable. None is a medical device for diagnosing sleep disorders.

A word on product tests: such comparisons usually rate wearing comfort, battery life, app quality and data protection, that is, everyday usability. They are useful for the purchasing decision, but say little about the clinical accuracy of the sleep stages, which only a direct comparison against polysomnography reveals.

What no sleep tracker can do: the medical boundary

As useful as trackers are for self-observation, they cannot and must not replace a medical diagnosis. This boundary is not my private opinion, but the position of the professional society.

Position · American Academy of Sleep Medicine [Official document]

Consumer sleep technology is supplementary, not diagnostic

Official document The American Academy of Sleep Medicine (AASM) set out in a 2018 position statement (Khosla et al. in Journal of Clinical Sleep Medicine) how to handle consumer sleep technology. Key statements: such devices can promote awareness of one's own sleep and prompt a conversation between patient and physician. But they are not sufficiently validated to diagnose or treat sleep disorders, and should not serve as a substitute for a medical evaluation. A conspicuous tracker finding may be the reason to seek a sleep-medicine evaluation, but it is never itself the diagnosis.

Khosla S, Deak MC, Gault D, et al. Consumer Sleep Technology: An American Academy of Sleep Medicine Position Statement. J Clin Sleep Med. 2018;14(5):877-880. doi:10.5664/jcsm.7128 · PMID: 29734997

Concretely this means: with signs of sleep apnea (loud snoring, observed breathing pauses, pronounced daytime sleepiness), with chronic insomnia over weeks, with pronounced restless-legs sensations or with suspicion of an underlying illness, the evaluation belongs in expert hands, not in an app. Some newer devices provide hints of breathing disturbances or oxygen drops; these are screening signals that should lead to a doctor's visit, not finished diagnoses.

The tracker question through the KPNI lenses

In Clinical Psychoneuroimmunology we do not look at an isolated value, but at the system. A sleep tracker provides building blocks for this, if you read it through the right lenses.

Autonomic nervous system

The nightly HRV and the resting heart rate are direct windows onto the balance between the sympathetic and parasympathetic systems (Tobaldini 2013). A persistently high sympathetic tone at night, visible in low HRV and elevated pulse, is a sign of unprocessed strain, not just of poor sleep.

Behaviour and rhythm

The greatest value of a tracker lies in making behaviour visible: how late eating, alcohol, caffeine or training time shift the nightly pulse and HRV. The tracker thus becomes a tool for rhythm and sleep hygiene, not an end in itself.

Stress and cognition

Orthosomnia (Baron 2017) is a psychoneuroimmunological lesson: a cognitive pattern (the urge to control) produces a physiological stress response (tension in bed) that worsens sleep. The head keeps the body awake. Here no device helps, but a different approach.

Individuality over norm

HRV, sleep need and stage proportions are individual and age-dependent. The KPNI stance is consistently personalised: comparison only with yourself over time, never with an app norm or with other people. The trend is the signal, the single value is noise.

Three levers for the coming weeks

If you have a tracker or are getting one, these three habits make the difference between a useful tool and a source of stress.

1

Read trends, ignore single values

Look at weekly averages of resting heart rate and HRV, not the score of a single night. A bad night is normal. An HRV trend that declines over two weeks alongside a rising resting pulse is a real signal that suggests a break or less alcohol.

2

Look in the morning, not at night

Banish the device and the app from the nighttime routine. No control glance in bed, that only feeds orthosomnia. Look at the data in the morning over coffee, as a review, not as a nighttime exam result.

3

Couple data to behaviour, not to self-worth

Form hypotheses: did the glass of wine yesterday lower the HRV? Did the earlier dinner improve the resting pulse? The tracker is an experimentation tool for your habits, not a judge of your worth. And if it stresses you, put it away.

The actual point

The best sleep meter remains your body

A tracker can make your behaviour visible and ask good questions. But how rested you are, your body knows better in the morning than any estimated score. Use the data as a hint, not as a verdict. With real, persistent sleep problems, the solution is not a better device, but a root-cause evaluation.

Safety note

This article is for information and does not replace a medical examination. A consumer sleep tracker is not a medical device for diagnosis and cannot replace a sleep-medicine evaluation. If you notice signs of sleep apnea (loud, irregular snoring, observed breathing pauses, morning headaches, pronounced daytime sleepiness), suffer from persistent insomnia over several weeks, or if preoccupation with tracker values strongly burdens or frightens you, seek medical advice. Do not rely solely on estimated app values when assessing your health. Changes to medications or therapies always belong in medical hands.

Frequently asked questions about sleep trackers

How accurate are sleep trackers compared with polysomnography?

It depends on what is being measured. Detecting sleep versus wakefulness works well in modern trackers, often with sensitivities above 90 percent. Detecting the individual sleep stages (light sleep, deep sleep, REM) is considerably less accurate. Chinoy 2021 in Sleep compared seven consumer devices with polysomnography and found that the devices represent the sleep-wake state usefully, but the sleep-stage assignment is inconsistent. The core problem: polysomnography measures brain waves (EEG), eye movements and muscle activity directly, while a tracker derives the stages indirectly from movement, heart rate and HRV and estimates them by algorithm. An estimate is not a measurement. In practice this means: use the trends over weeks, not the exact deep-sleep value of a single night.

What does a sleep tracker really measure and what does it only estimate?

Three things are mainly measured directly: movement via an accelerometer, heart rate, and skin surface temperature via optical sensors (photoplethysmography). Heart rate variability (HRV) is calculated from the heart rate. These raw values are often surprisingly reliable. de Zambotti 2019 in Behavioral Sleep Medicine found very good agreement of the nightly heart rate with the ECG for the Oura Ring. Estimated, meaning algorithmically derived rather than measured directly, are: the sleep stages, the so-called sleep score, and the recovery or readiness values. These derived values are models, not measurements, and they differ from manufacturer to manufacturer. Anyone who understands this reads their tracker correctly: HRV and resting heart rate as a robust trend, the stage percentages as rough orientation.

What is orthosomnia?

Orthosomnia describes the compulsive pursuit of perfect sleep-tracker values, which paradoxically may worsen sleep. The term was coined by Baron and colleagues in 2017 in Journal of Clinical Sleep Medicine (modelled on orthorexia, the compulsive pursuit of healthy eating). The authors described patients who sought medical advice because of poor tracker values, even though they objectively slept enough. The problem: preoccupation with the values creates performance pressure and rumination in bed, exactly the tension that makes falling asleep harder. The tracker turns from a tool into a stressor. Anyone who looks at the app at night, gets upset about too little deep sleep, and fights against the values during the day is engaging in orthosomnia. The remedy is not a better tracker, but a more relaxed approach or a tracker break.

Which sleep tracker is the best, Oura, Whoop or Apple Watch?

There is no universally best sleep tracker, because the devices have different strengths and no consumer device reaches the precision of polysomnography. The Oura Ring is comfortable to wear during sleep as a ring and provides good heart-rate and HRV data (de Zambotti 2019). Whoop is a band without a display focused on recovery and strain. The Apple Watch covers sleep alongside many other functions, but has to be charged so it can stay on at night. What matters is not the brand, but how consistently you wear the device and whether you use the trends sensibly. For everyday self-observation, all three are usable. For a medical diagnosis, for example with suspected sleep apnea, none of them is sufficient; that requires a sleep-medicine evaluation. Consumer product tests usually rate comfort, battery and app, not clinical accuracy.

Can a sleep tracker diagnose sleep apnea or a sleep disorder?

No. A consumer sleep tracker is not a diagnostic device. The American Academy of Sleep Medicine made it clear in its 2018 position (Khosla in Journal of Clinical Sleep Medicine): consumer sleep technology can promote awareness of sleep and prompt a conversation, but it does not replace validated diagnostics and should not be used to diagnose or treat sleep disorders. Some newer devices capture signs of breathing pauses or oxygen drops, but these are screening hints, not diagnoses. If you have signs of sleep apnea (loud snoring, observed breathing pauses, pronounced daytime sleepiness), that belongs in a sleep-medicine evaluation with polygraphy or polysomnography. A conspicuous tracker value may be the reason to go, but it is never the diagnosis.

What does HRV during sleep tell us?

Heart rate variability (HRV) describes the variation in the intervals between individual heartbeats and is a window onto the autonomic nervous system. Tobaldini 2013 in Frontiers in Physiology describes how autonomic control changes across the sleep stages: in deep sleep the parasympathetic, recovery-promoting part tends to dominate, while in REM sleep and under strain the sympathetic part rises. A nightly HRV that declines over weeks or a rising resting heart rate may point to stress, an incipient infection, alcohol in the evening or overload. What matters is the individual trend, not the comparison with other people, because HRV depends strongly on age, genetics and measurement method. As a trend marker for recovery, the nightly HRV is one of the most useful values a tracker provides, precisely because heart rate is measured comparatively accurately.

Why do trackers often show too much deep sleep or unrealistic stages?

Because the stage assignment is estimated. A tracker sees no brain waves; it derives from movement, heart rate and HRV which stage you are probably in. Altini and Kinnunen 2021 in Sensors showed with the Oura Ring that four-stage classification (wake, light sleep, deep sleep, REM) reaches around 79 percent agreement with combined sensors, but only about 57 percent with a movement sensor alone. That is good for a consumer device, but far from perfect. Different manufacturers use different algorithms, which is why two devices can divide the same night differently. The consequence: do not compare the absolute deep-sleep minutes between devices or with friends. The only meaningful comparison is with yourself over time, with the same device and the same algorithm.

How do I use a sleep tracker sensibly without driving myself crazy?

Three principles. First, read trends, not single values. A bad night in the tracker means nothing; an HRV that declines over two weeks alongside a rising resting heart rate is a signal. Second, look in the morning, not at night. Anyone who stares at the app in bed creates exactly the tension that prevents falling asleep (orthosomnia after Baron 2017). Third, couple the data to behaviour, not to self-criticism. Ask: did alcohol lower my HRV yesterday, did an earlier dinner improve my sleep? The tracker is a hypothesis generator for your behaviour, not a judge of your worth. And if the values stress you more than they help, a tracker break is the best intervention. With persistent sleep problems, the solution is not a better tracker, but a sleep-medicine and root-cause evaluation, as the pillar article describes.

More from the cluster "Treating sleep disorders holistically"

Connections to other topics

When sleep is the main topicTreating sleep disorders holistically

The pillar article explains when sleep problems need a root-cause evaluation and what a holistic treatment path looks like. A tracker is a tool here, not a diagnosis.

When melatonin is on the tableUsing melatonin correctly

Many tracker users reach for melatonin when values are poor. When that makes sense, what dosage and timing are realistic, and where the limits lie.

SJ
Written by

Shukri Jarmoukli

Physician, Integrative Medicine, Clinical Psychoneuroimmunology · ViveCura Berlin, Skalitzer Straße 137 · Focus areas: sleep as a systemic phenomenon, a clear separation of measured raw values (heart rate, HRV after de Zambotti 2019 and Tobaldini 2013) from algorithmically estimated sleep stages, a realistic appraisal of tracker accuracy against polysomnography after Chinoy 2021 in Sleep and Altini and Kinnunen 2021 in Sensors, education about orthosomnia after Baron 2017 in Journal of Clinical Sleep Medicine, and regard for the AASM position after Khosla 2018. My aim is a sober approach to data: the tracker asks questions about behaviour, while the body and an expert evaluation give the answers.

Sources and further reading

  1. Chinoy ED, Cuellar JA, Huwa KE, et al. Performance of seven consumer sleep-tracking devices compared with polysomnography. Sleep. 2021;44(5):zsaa291. doi:10.1093/sleep/zsaa291 · PMID: 33378539 [Real-World Validation, Human]
  2. de Zambotti M, Rosas L, Colrain IM, Baker FC. The Sleep of the Ring: Comparison of the OURA Sleep Tracker Against Polysomnography. Behav Sleep Med. 2019;17(2):124-136. doi:10.1080/15402002.2017.1300587 · PMID: 28323455 [Real-World Validation, Human]
  3. Altini M, Kinnunen H. The Promise of Sleep: A Multi-Sensor Approach for Accurate Sleep Stage Detection Using the Oura Ring. Sensors (Basel). 2021;21(13):4302. doi:10.3390/s21134302 · PMID: 34201861 [Real-World Validation, Human]
  4. Baron KG, Abbott S, Jao N, Manalo N, Mullen R. Orthosomnia: Are Some Patients Taking the Quantified Self Too Far? J Clin Sleep Med. 2017;13(2):351-354. doi:10.5664/jcsm.6472 · PMID: 27855740 [Case series and editorial, Human]
  5. Khosla S, Deak MC, Gault D, et al. Consumer Sleep Technology: An American Academy of Sleep Medicine Position Statement. J Clin Sleep Med. 2018;14(5):877-880. doi:10.5664/jcsm.7128 · PMID: 29734997 [Official document / position statement]
  6. Tobaldini E, Nobili L, Strada S, Casali KR, Braghiroli A, Montano N. Heart rate variability in normal and pathological sleep. Front Physiol. 2013;4:294. doi:10.3389/fphys.2013.00294 · PMID: 24137133 [Review, Human]
  7. Menghini L, Balducci C, de Zambotti M. Is it Time to Include Wearable Sleep Trackers in the Applied Psychologists' Toolbox? Span J Psychol. 2024;27:e8. doi:10.1017/SJP.2024.8 · PMID: 38410074 [Review, Human]
Note on the evidence base: The central evidence on tracker accuracy against polysomnography comes from Chinoy 2021 in Sleep (comparison of seven consumer devices, good sleep-wake detection, inconsistent stage assignment), de Zambotti 2019 in Behavioral Sleep Medicine (Oura Ring, very good heart rate, high sleep sensitivity, weaker stages) and Altini and Kinnunen 2021 in Sensors (four-stage classification around 79 percent with combined sensors). The physiological basis of HRV during sleep is described in Tobaldini 2013 in Frontiers in Physiology. The phenomenon of orthosomnia was coined by Baron 2017 in Journal of Clinical Sleep Medicine. The professional-society classification as a supplementary, not diagnostic technology follows the position statement of the American Academy of Sleep Medicine (Khosla 2018). Limitations: accuracy depends strongly on the device, algorithm version and tested population and can change with software updates; many validations are carried out in healthy adults. Sleep trackers are not medical devices for diagnosis. With suspected sleep apnea, persistent insomnia or burdensome preoccupation with the values, a medical evaluation is required. This article does not replace a medical examination.

Have questions or want to book an appointment?

We'd be happy to advise you personally at our practice.

Book appointment