The previous protocol fix (302 → 200 portal HTML) didn't restore iOS
captive-banner reliability under the lock-screen-camera join: the user
joined, accepted the prompt, and got nothing on unlock. We're guessing
without data, so this round adds instrumentation alongside three
high-confidence behavioral fixes that are individually plausible
explanations.
Fixes:
* Force the AP DHCP server to advertise the AP IP as DNS via
esp_netif_dhcps_option(ESP_NETIF_DOMAIN_NAME_SERVER). Arduino-ESP32's
softAP doesn't set this explicitly; if a client comes in with cached
cellular DNS the captive DNS hijack gets bypassed and iOS resolves
captive.apple.com to real internet — no captive signal ever fires.
* WiFi.setSleep(false) so the AP radio doesn't park between beacons
and drop probe packets that arrive during a sleep window.
* Cache-Control: no-store on the portal response, so iOS doesn't carry
a "this SSID was fine last time" determination across forget+rejoin
cycles.
Diagnostics (logged on serial at 115200, in AP mode only):
* Every HTTP request: method, URI, Host, User-Agent. Tells us whether
iOS is reaching us and which CNA path it's hitting.
* WiFi AP events: STA-associated, IP-assigned, STA-disconnected.
Tells us whether the join completed and DHCP succeeded.
Repro: pio device monitor -e waveshare73-v1, forget the network on
the phone, lock + scan + accept, watch the log.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous handler answered Apple/Android/Windows CNA probes with
HTTP 302 redirects to "/". That works in a desktop browser, but iOS —
particularly when joining via the lock-screen camera quick-scan path —
sometimes treats the redirect as "internet works" and never raises the
captive banner. The user has to remember the manual fallback URL on the
e-ink footer to recover.
Switch every probe URL to serve the portal HTML directly with 200 OK.
A 200 response whose body is not Apple's magic Success page is the
canonical "this is a captive network" signal; banner-fire becomes
deterministic on the first probe.
While here:
- Register HTTP handlers BEFORE softAP comes up so the very first probe
from a fast-joining device lands on a ready server, not connection-
refused.
- Drop the unconditional 500 ms post-softAP delay; softAPIP is valid
immediately and the gap was just a window for races.
- Add /library/test/success.html (iOS legacy) and /connecttest.txt
(Windows 10+) to the explicit handler list.
- Delete handle_captive (was the 302 redirect path).
Locked-phone caveat: iOS by design will not auto-open the captive
portal UI while the phone is locked — the best we can do is make the
banner notification fire reliably so it's waiting on unlock. This
change accomplishes that.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Match the failure path's latency to the happy path. Before: a wrong
password meant the user stared at the yellow Step 1/2 screen for the
full 30 s WIFI_TIMEOUT_MS before the red retry repaint started — total
~50 s to "Connection Failed" visible. After: WL_CONNECT_FAILED and
WL_NO_SSID_AVAIL bail attempt_wifi() immediately, so the red repaint
starts within a few seconds of the radio giving up — total ~25 s,
matching the happy-path-to-Step-2/2 timing.
Also collapse the duplicate boot-time poll loop in main.cpp onto the
shared attempt_wifi() so the same fast-fail covers boot-with-stored-
creds, not just captive-portal submission.
Tests: FW-15a (auth fail) and FW-15b (no SSID) assert millis() never
reaches WIFI_TIMEOUT_MS on those statuses. Existing FW-15 tightened
to use WL_DISCONNECTED so it actually exercises the timeout path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related fixes that together let the post-WiFi-setup window be quiet:
1. operation.h 204/404: skip the panel redraw entirely. The panel already
holds the right thing — setup QR if no image has ever been painted
(img_id == -1), or a real photo if img_id >= 0. Redrawing the QR every
15s during the bootstrap claim window put the e-ink into a perpetual
~20s mid-refresh loop and risked ghosting. Tests updated to assert
no redraw on either sub-case.
2. main.cpp WiFi-fail path: drop the epd_fill(RED) + 3s delay + AP
re-redraw sequence (~43s of e-ink work that destroyed the QR mid-flow)
and replace with a single repaint of a new "Connection Failed — try
again" Step 1/2 screen with red accents. gen_screens.py grows a
gen_ap_retry() variant that recolors yellow → red and swaps the
header/QR labels; the result is shipped as ap_bg_retry.bin alongside
ap_bg.bin in LittleFS. epd.h exposes epd_draw_ap_screen_retry().
The 15s FIRST_IMAGE_POLL_INTERVAL_MS bootstrap already keeps the QR on
the panel (204 responses don't trigger a redraw) until the user claims
via /setup/{mac} and the server's bootstrap-bypass serves an image. The
hard-coded delay(120000) was just dead time between WiFi save and the
first poll — observed in the field as ~110s of nothing happening after
login.
Also touches operation.h header comments to match the "hold until the
screen flashes" terminology and document the short-press fast-poll
gesture.
Bug: the device only woke from deep sleep on a timer; pressing BOOT
during sleep did nothing. The 5-second-hold reset only worked in the
brief awake window during a poll, which made the documented "hold BOOT
to reset" gesture appear broken to the user. Reported live 2026-05-09.
Fix: arm EXT0 wakeup on PIN_BTN_RESET (active-low — BOOT is pulled-up
on the dev board) at every esp_deep_sleep_start. After the press wakes
the chip, setup() runs and the existing check_reset_button() handles
the rest of the 5-second hold and triggers the NVS clear + reprovision.
Mocks: esp_sleep.h gains gpio_num_t typedef + g_ext0_wakeup_pin/level
globals so the native test can assert the call shape.
Test: FW-RESET-WAKE pins the contract — every deep_sleep_start must
arm EXT0 on PIN_BTN_RESET, level 0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A freshly-claimed device on a noon-daily schedule would otherwise sit
dark for up to 24 h after WiFi setup waiting for its first image. The
schedule kicks in only AFTER an image has actually been displayed.
Mechanism: at the bottom of normal_operation_impl, re-read NVS_KEY_IMG_ID
to see whether any successful 200-with-integrity-OK persisted an image
id this cycle (or any prior). If still -1, override sleepMs to
FIRST_IMAGE_POLL_INTERVAL_MS (15 s) — bypassing the schedule and the
clamp range, since SLEEP_CLAMP_MIN_MS is about runaway protection in
steady state and the bootstrap window is naturally bounded by "first
image arrives."
Tests:
- FW-FIRST-IMG-A: 204 with no img_id in NVS → 15s override fires
even when server says 6 hours.
- FW-FIRST-IMG-B: img_id pre-set, 200 cycle → server interval honored
(override doesn't trap the device in 15s forever).
- FW-FIRST-IMG-C: first 200 ever (img_id was -1, now persisted) →
server interval applies starting THIS cycle, no extra 15s nap.
Also patched FW-03 (304 sleep timing) to pre-set img_id so the test
exercises what it claims; 304 in production only happens when the
device already holds the image, so the override would never fire there.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the sell-to-friend gap where a buyer's freshly-reset device
would briefly display the seller's photos before the buyer reached
/setup/{mac} to claim. The firmware had no way to tell the server
"I just got reset" — now it does.
Flow:
- WiFi-setup completion (handle_connect in main.cpp) writes
NVS_KEY_JUST_PROVISIONED=1 alongside the SSID/PASS save.
- Every poll while the flag is set sends X-Just-Provisioned: 1.
- Server (DeviceImageController, paired commit on the webApp side)
responds with 204 + X-Interval-Ms when the binding is stale,
forcing the device to its setup-QR fallback. Once the user
re-claims via /setup/{mac}, the binding is fresh, and the server
answers with X-Claimed: 1 alongside whatever response code applies.
- Firmware clears the NVS flag on seeing X-Claimed: 1 — once
cleared, the device is back to normal long-stable polling.
Tests:
- PROV-A: flag set in NVS → header on the request
- PROV-B: no flag → no header (steady state)
- PROV-C: response with X-Claimed: 1 → flag cleared
- PROV-D: response without X-Claimed → flag stays (so the next
poll keeps signaling "not yet acknowledged")
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Distinguish a cold-boot poll (UNDEFINED wakeup cause = power-on, hard
reset, plug-cycle) from a normal timer wake. Encoded as the
X-Boot-Reason request header; server uses it to deliberately bypass
the schedule and rotate. Matches how users actually use the device:
unplug-and-replug as a manual refresh.
Tests: two new native cases asserting the header is "cold" on
UNDEFINED wakeup and "timer" on TIMER wakeup. esp_sleep mock now
exposes a settable wakeup_cause global.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The dev-only cap that forced every-1-min polling regardless of the app's
schedule is removed. The device now sleeps for whatever X-Interval-Ms
the server hands back (driven by rotationIntervalMinutes / wakeTimes),
clamped to [30s, 25h] as a safety net against malformed values.
Renamed FETCH_INTERVAL_MS to FETCH_INTERVAL_MS_FALLBACK — it's now
*only* used when the header is absent (rare; rolling deploy / hand-
crafted response). Added SLEEP_CLAMP_MIN/MAX for the bounds.
Tests FW-09 and FW-10 flipped to lock the new behavior; added FW-10b
covering sub-MIN clamping (battery protection if server sends 1000ms).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three cross-referenced markers — config.h, operation.h, and FW-10 in the
test file — calling out that the FETCH_INTERVAL_MS cap is intentionally
holding the polling rate at 1 minute for dev iteration. Once the firmware
is stable and we want the device to honor the app's per-frame
rotationIntervalMinutes / wakeHour settings, the cap in operation.h
becomes a sanity-clamp (e.g., 30 s ≤ sleep ≤ 25 h) and the no-header
fallback splits into its own constant.
Behavior unchanged — comments only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reorganizes the tree so adding a new panel is purely additive — drop in a
new src/panels/{vendor}/v{N}/ folder and a new platformio.ini env block,
no surgery to existing files.
Layout:
src/ shared across all panels
src/panels/waveshare73/v1/ V1 driver, version, README
data/waveshare73-v1/ LittleFS payload at this panel's size
src/config.h still defines the panel-agnostic bits (NVS keys, color
palette, network, sync-fail border) but EPD_WIDTH / EPD_HEIGHT / pin
assignments now come from each env's -D flags. Strict #error guards in
production builds; native tests get the V1 defaults via UNIT_TEST.
build_src_filter per env picks the right driver:
waveshare73-v1 main + panels/waveshare73/v1/
test-display test_display + panels/waveshare73/v1/
sim-yellow sim_border + panels/waveshare73/v1/
sim-red sim_border + panels/waveshare73/v1/
native-test unchanged
When V2 hardware lands, the diff is a new env block, a new
src/panels/waveshare133/v1/epd_driver.cpp, and regenerated screens at
data/waveshare133-v1/. Existing V1 envs stay frozen — re-flashing old
units remains a one-liner.
scripts/gen_screens.py takes --panel to target the correct
data/{panel}/ subfolder; defaults to waveshare73-v1.
29/29 native tests pass. All four hardware envs build clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pairs with the server-side header. After streaming the response body to
LittleFS, hash the file with mbedtls/sha256 (hardware-accelerated on
ESP32-S3) and compare against the server's claim. On mismatch:
- Don't update NVS_KEY_IMG_ID, so the next poll reports the old id and
the server sends 200 again with fresh bytes (natural retry, no extra
HTTP round-trip in this cycle).
- Don't draw — panel keeps whatever was up before, no garbage on the
e-ink.
- Raise NVS_KEY_ERR_BORDER so the next healthy 304 paints a clean
recovery frame with the sync-fail border.
Verification is skipped when the header is absent, so the firmware
stays compatible with any server that hasn't deployed the matching
header yet. mbedtls compiles into a native-test no-op stub (returns
empty hex), so existing native tests don't need a SHA implementation.
Two new tests: FW-17a (mismatch path) and FW-17b (missing header
backward compat). Mock String now has equalsIgnoreCase so the new
comparison compiles in native-test.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- BORDER_THICKNESS_PX: 16 -> 4. Hardware-tested at 4 px on both yellow
and red; yellow appears slightly thicker due to the irradiation
illusion (perception, not a rendering issue) — not compensating per
color absent an explicit request.
- Add Serial.println at every state transition that touches the
err_border lifecycle: schema migration firing, sync-fail else
branch (with HTTP code, distinguishing border vs full-fill fallback),
304 recovery (with which flags triggered it), and recovery completion
/ abort. Lets us trace why a frame is or isn't showing a border via
pio device monitor without needing to instrument anew each time.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two pio envs that build a tiny sketch reading /img.bin from LittleFS and
calling epd_draw_image_with_border with the chosen color. Lets us verify
the actual on-device pixel composition of the sync-fail (yellow) and
no-WiFi (red) borders without standing up a server failure or pulling
the WiFi cable.
Each sim sets NVS err_border=1 before halting, so flashing back to the
normal env afterwards exercises the 304 → clean repaint recovery path
end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without this, devices upgrading from the old buggy fill-on-error firmware
get stuck on yellow forever: the new code reads NVS_KEY_ERR_BORDER == 0
(default — the old firmware never wrote that key), so the next 304 sees
no err flag and skips the redraw. NVS img_id matches what the server is
serving, so server says "you're current" indefinitely.
Add NVS_KEY_SCHEMA_V. On boot, if stored version is below
NVS_SCHEMA_VERSION (currently 1), treat errBorder as set for this cycle
and bump schema_v. The next 304 then redraws from LittleFS (the cached
.bin survives flashing) and clears the flag.
Tests: FW-06f locks in the upgrade path (schema_v missing → redraw on
304). FW-06g asserts the migration is one-shot (post-bump → no redraw
on steady-state 304). FW-06d updated to set schema_v explicitly so it
represents the post-migration steady state.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously a 5xx / timeout / malformed response fired epd_fill(COLOR_YELLOW),
which writes the yellow nibble across the entire 800×480 framebuffer and
destroys the last good image — exactly what FR38 forbids ("Last image
persists ... yellow border signals state"). The device then got stuck on a
blank yellow screen because the next 304 didn't redraw.
Changes:
- New epd_draw_image_with_border streams the cached .bin row-by-row,
overwrites border-region pixels in the row buffer, and pushes a single
composited framebuffer (same pattern as the existing setup-QR overlay).
- normal_operation_impl else-branch now redraws the cached image with a
yellow border, falling back to epd_fill only when no cache exists
(first-boot error). Sets a new NVS_KEY_ERR_BORDER flag.
- 200 and 304 paths clear NVS_KEY_ERR_BORDER. The 304 branch now
triggers a clean repaint when the err flag is set, so the device
recovers from the stuck-yellow state on the next healthy poll
without waiting for rotation to advance.
- LittleFS read mock now returns invalid File when the file doesn't
exist (matches real LittleFS), so the no-cache fallback path is
actually exercisable in tests.
Tests:
- Replaces the old test_fw06_error_fills_yellow (which locked in the
buggy fill behavior) with FW-06a..e covering: error+cache draws
border (no fill), error+no-cache falls back to fill, 304 after
error repaints clean, steady-state 304 touches nothing (the
regression the user flagged), 200 after error clears the flag.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three bugs fixed:
- NVS img_id now written before epd_init/draw; new draw_needed flag in NVS
survives power-loss mid-refresh so next boot re-draws from LittleFS instead
of showing stale content
- epd_sleep() now only called when display was initialized this cycle,
preventing a 60 s wait_busy() timeout on every 304 poll
- esp_task_wdt_reset() added to wait_busy() loop so the ~20 s 6-color
refresh no longer triggers the task watchdog
Also extracts normal_operation into operation.h template and adds
a native PlatformIO test suite (16 tests) covering the full response matrix.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace orientation <select> dropdown on setup/configure with the same
visual two-button picker used in the SPA (SVG frame diagrams, ribbon
indicator, active highlight). Hidden input carries the value on submit.
Firmware: normal_operation() now calls show_setup_qr(mac) on 404 instead
of epd_fill(COLOR_RED) — device shows scannable QR with its own MAC when
not yet registered.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Collapse orientation to landscape/portrait (ribbon left = portrait standard)
- Add OrientationPicker component and wire settings sheet in HomeView
- Add password confirmation field to registration form (RepeatedType)
- Build frontend SPA to public/build/
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
State machine:
- Boot: check 5s reset-button hold → wipe NVS creds; load saved SSID/pass
- If no creds (or reset): enter_provisioning() — WiFi.softAP + DNS redirect + WebServer
- If creds: attempt_wifi(); on success → normal_operation(); on fail → enter_provisioning()
- normal_operation(): HTTPS GET /api/device/{mac}/image → stream to LittleFS → display;
204 = keep current stored image; 404 = red fill; server error = yellow fill;
deep sleep 15 min between polls
Provisioning flow:
- AP SSID: "PictureFrame-{last4hex}" broadcast as open network
- QR on e-ink: WIFI:S:PictureFrame-XXXX;T:nopass;; → phone auto-joins AP
- Captive portal: redirect all DNS to 192.168.4.1; serve minimal HTML form
(handles iOS /hotspot-detect.html and Android /generate_204 redirects)
- POST /connect: async — respond immediately, attempt WiFi in loop()
Success: save NVS, show Phase 2 setup QR (green bg) → 2min delay → normal_operation()
Failure: red fill → restart AP
EPD driver refactor:
- Extracted epd_init/sleep/fill/draw_qr/draw_image_from_file into epd.h + epd.cpp
- epd_draw_qr(): ricmoo/QRCode library; computes modules inline per pixel row
- epd_fill(): solid color in one pass (used for red=no-wifi, yellow=sync-fail)
- epd_draw_image_from_file(): streams LittleFS binary directly to display
Removed: convert_photo.py (pre-rendering moved to server-side Imagick), image.h (PROGMEM array)
Added: config.h, epd.h, epd.cpp; updated platformio.ini (QRCode lib, littlefs fs)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move firmware files from repo root src/ into firmware/ to avoid
collision with Symfony's src/ PHP class directory. Add DDEV
config targeting PHP 8.4 / PostgreSQL 16 / nginx-fpm with
Imagick extension via docker-compose override.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>