Skip to contents

`raw_step` contains **synthetic Fitbit‑style step count data** for two simulated participants (`P01` and `P02`). The dataset spans **two weeks** at **1‑minute resolution**. It includes **exercise sessions**, **sleep‑related inactivity**, and **Fitbit‑like missingness patterns**, providing a realistic structure for testing algorithms that process wearable activity data.

Usage

raw_step

Format

A tibble with three columns:

id

Character participant ID ("P01", "P02").

step_timestamp

POSIXct timestamp in UTC, at 1‑minute resolution.

count

Whole‑number step count per minute.

Details

Participants

  • "P01" — starts at "2026‑01‑01 07:17:00" (UTC)

  • "P02" — starts at "2026‑01‑02 15:52:00" (UTC)

Each participant has exactly **14 days** of simulated minute‑level data.

Activity patterns

  • Daytime activity: mild random movement with realistic variability.

  • Sleep inactivity: step counts drop to near‑zero between 23:00–06:00.

  • Exercise sessions: intermittent bouts with ramp‑up, sustained high counts, and gradual cooldown.

  • Noise: stochastic variation representing device and behavioural variation.

Missingness model (Fitbit‑like)

Step counts include NA values introduced using:

  • tiny micro‑gaps (1–2 minute sensor losses),

  • longer dropouts (10–60 minutes).

These patterns mimic real‑world wrist‑worn accelerometer behaviour during motion artifacts, device removal, or Bluetooth disconnection.

Generation script

This dataset is produced reproducibly by the script:


  data-raw/simulate_raw_step.R

Examples

data(raw_step, package = "hypometrics")

# Structure
dplyr::glimpse(raw_step)
#> Rows: 40,320
#> Columns: 3
#> $ id             <chr> "P01", "P01", "P01", "P01", "P01", "P01", "P01", "P01",
#> $ step_timestamp <dttm> 2026-01-01 07:17:00, 2026-01-01 07:18:00, 2026-01-01 0…
#> $ count          <dbl> 35, 38, 52, 44, 17, 46, 44, 42, 35, 37, 29, 46, 35, 34,

# First few rows for P01
dplyr::filter(raw_step, id == "P01") %>% head()
#> # A tibble: 6 × 3
#>   id    step_timestamp      count
#>   <chr> <dttm>              <dbl>
#> 1 P01   2026-01-01 07:17:00    35
#> 2 P01   2026-01-01 07:18:00    38
#> 3 P01   2026-01-01 07:19:00    52
#> 4 P01   2026-01-01 07:20:00    44
#> 5 P01   2026-01-01 07:21:00    17
#> 6 P01   2026-01-01 07:22:00    46

# Missingness percentage per participant
raw_step %>%
  dplyr::group_by(id) %>%
  dplyr::summarise(
    pct_missing = mean(is.na(count)) * 100
  )
#> # A tibble: 2 × 2
#>   id    pct_missing
#>   <chr>       <dbl>
#> 1 P01          3.72
#> 2 P02          4.49