Three problems surfaced during the first 13.3" end-to-end run:
1) LittleFS IntegerDivideByZero on 200 → write /img.bin. Cause: the
~3.5 MB SPIFFS in default_16MB.csv can't fit three 960 KB setup
screens + a 960 KB cached image (~3.84 MB). Switching to a custom
partitions_13e6.csv with 24 MB LittleFS on the 32 MB flash.
2) Yellow wash across the panel on long SPI bursts. Cause: SPI DMA
from a PSRAM-backed scratch buffer hits a cache-coherency window —
the CPU's writes hadn't reached PSRAM yet when DMA read it. Push
each half in 8 KB chunks through an internal-SRAM (DMA-coherent)
scratch, and drop the bus clock to 4 MHz to match the 7.3"
production speed.
3) Bootstrap window (no image yet) was deep-sleeping for 15 s between
polls — each cycle a ~5 s ROM-boot + Wi-Fi reconnect, so the user
waited ~20 s × N retries between scanning the setup QR and seeing
their first photo land. Now normal_operation_impl returns early
during bootstrap and main.cpp's normal_operation loops with a
2 s delay, keeping Wi-Fi up. Once the first image arrives, the
normal scheduled deep sleep takes over.
Also fixes a related bug Matt called out: a transient TLS hiccup
during bootstrap was hitting the 5xx fallback path and painting a
full yellow fill over the green setup QR, leaving the user with
no claim path. Criterion is now "does /img.bin exist?" (panel has
something worth showing with a border) rather than "is currentImgId
set?", so a fresh device with no cached image preserves the setup
screen through transient network errors.
Diagnostic prints in the panel driver + [op] start/code lines in
normal_operation_impl that proved invaluable during bringup; leaving
them in for now. Tests updated for the new bootstrap semantics
(deep sleep no longer arms on bootstrap-cycle 204/404/5xx); 43/43
native tests pass, 7.3" production build stays byte-identical.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related fixes that together let the post-WiFi-setup window be quiet:
1. operation.h 204/404: skip the panel redraw entirely. The panel already
holds the right thing — setup QR if no image has ever been painted
(img_id == -1), or a real photo if img_id >= 0. Redrawing the QR every
15s during the bootstrap claim window put the e-ink into a perpetual
~20s mid-refresh loop and risked ghosting. Tests updated to assert
no redraw on either sub-case.
2. main.cpp WiFi-fail path: drop the epd_fill(RED) + 3s delay + AP
re-redraw sequence (~43s of e-ink work that destroyed the QR mid-flow)
and replace with a single repaint of a new "Connection Failed — try
again" Step 1/2 screen with red accents. gen_screens.py grows a
gen_ap_retry() variant that recolors yellow → red and swaps the
header/QR labels; the result is shipped as ap_bg_retry.bin alongside
ap_bg.bin in LittleFS. epd.h exposes epd_draw_ap_screen_retry().
Bug: the device only woke from deep sleep on a timer; pressing BOOT
during sleep did nothing. The 5-second-hold reset only worked in the
brief awake window during a poll, which made the documented "hold BOOT
to reset" gesture appear broken to the user. Reported live 2026-05-09.
Fix: arm EXT0 wakeup on PIN_BTN_RESET (active-low — BOOT is pulled-up
on the dev board) at every esp_deep_sleep_start. After the press wakes
the chip, setup() runs and the existing check_reset_button() handles
the rest of the 5-second hold and triggers the NVS clear + reprovision.
Mocks: esp_sleep.h gains gpio_num_t typedef + g_ext0_wakeup_pin/level
globals so the native test can assert the call shape.
Test: FW-RESET-WAKE pins the contract — every deep_sleep_start must
arm EXT0 on PIN_BTN_RESET, level 0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A freshly-claimed device on a noon-daily schedule would otherwise sit
dark for up to 24 h after WiFi setup waiting for its first image. The
schedule kicks in only AFTER an image has actually been displayed.
Mechanism: at the bottom of normal_operation_impl, re-read NVS_KEY_IMG_ID
to see whether any successful 200-with-integrity-OK persisted an image
id this cycle (or any prior). If still -1, override sleepMs to
FIRST_IMAGE_POLL_INTERVAL_MS (15 s) — bypassing the schedule and the
clamp range, since SLEEP_CLAMP_MIN_MS is about runaway protection in
steady state and the bootstrap window is naturally bounded by "first
image arrives."
Tests:
- FW-FIRST-IMG-A: 204 with no img_id in NVS → 15s override fires
even when server says 6 hours.
- FW-FIRST-IMG-B: img_id pre-set, 200 cycle → server interval honored
(override doesn't trap the device in 15s forever).
- FW-FIRST-IMG-C: first 200 ever (img_id was -1, now persisted) →
server interval applies starting THIS cycle, no extra 15s nap.
Also patched FW-03 (304 sleep timing) to pre-set img_id so the test
exercises what it claims; 304 in production only happens when the
device already holds the image, so the override would never fire there.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the sell-to-friend gap where a buyer's freshly-reset device
would briefly display the seller's photos before the buyer reached
/setup/{mac} to claim. The firmware had no way to tell the server
"I just got reset" — now it does.
Flow:
- WiFi-setup completion (handle_connect in main.cpp) writes
NVS_KEY_JUST_PROVISIONED=1 alongside the SSID/PASS save.
- Every poll while the flag is set sends X-Just-Provisioned: 1.
- Server (DeviceImageController, paired commit on the webApp side)
responds with 204 + X-Interval-Ms when the binding is stale,
forcing the device to its setup-QR fallback. Once the user
re-claims via /setup/{mac}, the binding is fresh, and the server
answers with X-Claimed: 1 alongside whatever response code applies.
- Firmware clears the NVS flag on seeing X-Claimed: 1 — once
cleared, the device is back to normal long-stable polling.
Tests:
- PROV-A: flag set in NVS → header on the request
- PROV-B: no flag → no header (steady state)
- PROV-C: response with X-Claimed: 1 → flag cleared
- PROV-D: response without X-Claimed → flag stays (so the next
poll keeps signaling "not yet acknowledged")
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Distinguish a cold-boot poll (UNDEFINED wakeup cause = power-on, hard
reset, plug-cycle) from a normal timer wake. Encoded as the
X-Boot-Reason request header; server uses it to deliberately bypass
the schedule and rotate. Matches how users actually use the device:
unplug-and-replug as a manual refresh.
Tests: two new native cases asserting the header is "cold" on
UNDEFINED wakeup and "timer" on TIMER wakeup. esp_sleep mock now
exposes a settable wakeup_cause global.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The dev-only cap that forced every-1-min polling regardless of the app's
schedule is removed. The device now sleeps for whatever X-Interval-Ms
the server hands back (driven by rotationIntervalMinutes / wakeTimes),
clamped to [30s, 25h] as a safety net against malformed values.
Renamed FETCH_INTERVAL_MS to FETCH_INTERVAL_MS_FALLBACK — it's now
*only* used when the header is absent (rare; rolling deploy / hand-
crafted response). Added SLEEP_CLAMP_MIN/MAX for the bounds.
Tests FW-09 and FW-10 flipped to lock the new behavior; added FW-10b
covering sub-MIN clamping (battery protection if server sends 1000ms).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three cross-referenced markers — config.h, operation.h, and FW-10 in the
test file — calling out that the FETCH_INTERVAL_MS cap is intentionally
holding the polling rate at 1 minute for dev iteration. Once the firmware
is stable and we want the device to honor the app's per-frame
rotationIntervalMinutes / wakeHour settings, the cap in operation.h
becomes a sanity-clamp (e.g., 30 s ≤ sleep ≤ 25 h) and the no-header
fallback splits into its own constant.
Behavior unchanged — comments only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pairs with the server-side header. After streaming the response body to
LittleFS, hash the file with mbedtls/sha256 (hardware-accelerated on
ESP32-S3) and compare against the server's claim. On mismatch:
- Don't update NVS_KEY_IMG_ID, so the next poll reports the old id and
the server sends 200 again with fresh bytes (natural retry, no extra
HTTP round-trip in this cycle).
- Don't draw — panel keeps whatever was up before, no garbage on the
e-ink.
- Raise NVS_KEY_ERR_BORDER so the next healthy 304 paints a clean
recovery frame with the sync-fail border.
Verification is skipped when the header is absent, so the firmware
stays compatible with any server that hasn't deployed the matching
header yet. mbedtls compiles into a native-test no-op stub (returns
empty hex), so existing native tests don't need a SHA implementation.
Two new tests: FW-17a (mismatch path) and FW-17b (missing header
backward compat). Mock String now has equalsIgnoreCase so the new
comparison compiles in native-test.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without this, devices upgrading from the old buggy fill-on-error firmware
get stuck on yellow forever: the new code reads NVS_KEY_ERR_BORDER == 0
(default — the old firmware never wrote that key), so the next 304 sees
no err flag and skips the redraw. NVS img_id matches what the server is
serving, so server says "you're current" indefinitely.
Add NVS_KEY_SCHEMA_V. On boot, if stored version is below
NVS_SCHEMA_VERSION (currently 1), treat errBorder as set for this cycle
and bump schema_v. The next 304 then redraws from LittleFS (the cached
.bin survives flashing) and clears the flag.
Tests: FW-06f locks in the upgrade path (schema_v missing → redraw on
304). FW-06g asserts the migration is one-shot (post-bump → no redraw
on steady-state 304). FW-06d updated to set schema_v explicitly so it
represents the post-migration steady state.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously a 5xx / timeout / malformed response fired epd_fill(COLOR_YELLOW),
which writes the yellow nibble across the entire 800×480 framebuffer and
destroys the last good image — exactly what FR38 forbids ("Last image
persists ... yellow border signals state"). The device then got stuck on a
blank yellow screen because the next 304 didn't redraw.
Changes:
- New epd_draw_image_with_border streams the cached .bin row-by-row,
overwrites border-region pixels in the row buffer, and pushes a single
composited framebuffer (same pattern as the existing setup-QR overlay).
- normal_operation_impl else-branch now redraws the cached image with a
yellow border, falling back to epd_fill only when no cache exists
(first-boot error). Sets a new NVS_KEY_ERR_BORDER flag.
- 200 and 304 paths clear NVS_KEY_ERR_BORDER. The 304 branch now
triggers a clean repaint when the err flag is set, so the device
recovers from the stuck-yellow state on the next healthy poll
without waiting for rotation to advance.
- LittleFS read mock now returns invalid File when the file doesn't
exist (matches real LittleFS), so the no-cache fallback path is
actually exercisable in tests.
Tests:
- Replaces the old test_fw06_error_fills_yellow (which locked in the
buggy fill behavior) with FW-06a..e covering: error+cache draws
border (no fill), error+no-cache falls back to fill, 304 after
error repaints clean, steady-state 304 touches nothing (the
regression the user flagged), 200 after error clears the flag.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three bugs fixed:
- NVS img_id now written before epd_init/draw; new draw_needed flag in NVS
survives power-loss mid-refresh so next boot re-draws from LittleFS instead
of showing stale content
- epd_sleep() now only called when display was initialized this cycle,
preventing a 60 s wait_busy() timeout on every 304 poll
- esp_task_wdt_reset() added to wait_busy() loop so the ~20 s 6-color
refresh no longer triggers the task watchdog
Also extracts normal_operation into operation.h template and adds
a native PlatformIO test suite (16 tests) covering the full response matrix.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>