Architecture¶
Six layers, top to bottom. The rule that holds the design together: each layer talks DTOs to the layer below, never raw API dicts.
UI: REPL (prompt_toolkit) โ one-shot CLI (argparse)
Dispatch: parse โ validate โ run โ render
Commands: commands/{target,profile,media,network,content,interactions,batch,watch,operational,dossier}.py
Service: facade ยท history ยท analytics ยท exporter ยท watch
Backends: OSINTBackend ABC ยท HikerBackend (v0.1) ยท AiograpiBackend (v0.2)
Models: @dataclass(slots=True) DTOs โ Profile, Post, Story, User, Comment, Quota, ...
Conventions¶
- Async everywhere.
httpx(transitive viahikerapi),asynciofor fan-out,asyncio.to_threadfor sqlite calls. - Backend boundary is a hard wall. Raw HikerAPI / aiograpi dicts never leave
backends/. Mappers in_hiker_map.py(and the future_aiograpi_map.py) are the only converters. - Lazy backend imports.
import hikerapihappens only insidemake_backend("hiker"). v0.2'simport aiograpiwill be the same. Import errors stay localized. - Retry / backoff lives in one place.
backends/_retry.pydecorates SDK-method calls insideHikerBackend; commands never know retries exist. - CDN streaming through a single helper.
backends/_cdn.pyis the only code that pulls untrusted bytes off the network. Host allowlist, MIME sniff, byte budget, atomic write โ every download passes through it. - Pagination as
AsyncIterator[T]+limit: int | None. Every collection method is an async generator. Cursor management lives inside the backend; commands consume one item at a time and stop onlimit. - Identity by
pk, not username. Usernames are mutable;Profile.previous_usernamesaccumulates renames. The session cachesusername โ pkso a typo fails fast and downstream commands don't re-resolve.
Errors¶
insto/exceptions.py defines the taxonomy. Every backend error subclasses BackendError:
| Exception | Retryable? | User-visible message via _format_error |
|---|---|---|
ProfileNotFound |
no | profile not found: @<user> |
ProfilePrivate |
no | profile is private: @<user> |
ProfileBlocked |
no | profile blocked: @<user> (aiograpi) |
ProfileDeleted |
no | account no longer exists: @<user> |
PostNotFound / PostPrivate |
no | similarly direct |
AuthInvalid |
no | auth invalid โ refresh your token / re-login |
QuotaExhausted |
no, terminal | HikerAPI quota exhausted |
RateLimited(retry_after) |
yes | sleeps retry_after and retries |
Transient |
yes | exponential backoff + jitter |
SchemaDrift(endpoint, field) |
no | schema drift in <endpoint>: missing field "<f>" |
Banned |
no | account-level block (aiograpi) |
Commands never except BackendError themselves. The dispatcher catches everything at the boundary, runs _format_error (which redacts secrets via _redact.redact_secrets), and prints a single line. The same redactor runs in the rotating-file logger so stack traces in ~/.insto/logs/insto.log are also scrubbed.
Sqlite store¶
All persistent state lives in one DB at ~/.insto/store.db (mode 0600):
_meta schema_version, last_migrated_at
cli_history cmd, target, ts (90-day retention, indexed on ts)
watches user, interval, last_ok, last_error, paused
snapshots target_pk, captured_at, profile_fields_json, last_post_pks_json,
avatar_url_hash, banner_url_hash (30-day retention, max 100/target)
- One
sqlite3.Connectionper session, owned by the facade. asyncio.to_threadwraps every sync call from async contexts so the event loop never blocks.migrate_to_latest()runs on startup underBEGIN IMMEDIATEso twoinstoprocesses don't race a schema bump.- URLs (avatar / banner) are SHA256-hashed before write โ diffing checks hash inequality, not the URL.
Output / export¶
output/
<user>/
info.json
posts.json
posts/<pk>.<ext>
stories/<pk>.<ext>
highlights/<highlight_pk>/<item_pk>.<ext>
dossier/<iso_ts>/... (one self-contained intel package per /dossier run)
.batch-<sha>.jsonl (per-input-file resume state)
.insto-cdn-budget.lock (per-command 5 GB CDN ceiling)
JSON exports are versioned: every file has {"_schema": "insto.v1", "command": ..., "target": ..., "captured_at": ..., "data": ...}. CSV is flat rows with no envelope. Maltego CSV uses Type, Value, Weight, Properties with Properties JSON-encoded into one column.
mtime of every downloaded media file is set from the source's taken_at so Photos / Finder sort chronologically.
Watch¶
Session-only in v0.1 (daemon mode is v0.2). /watch <user> <interval> registers an asyncio.Task on the same loop that runs PromptSession.prompt_async(). Each tick is wrapped in asyncio.shield(...) and a single retry; two consecutive failures mark the watch paused. Notifications go through prompt_toolkit.patch_stdout so the user's in-progress input line is not corrupted.
Session limits: max 3 active watches, 5-minute floor on the interval, all watches cancelled cleanly on REPL exit.
Test strategy¶
- 700+ unit + integration tests, no live API calls in CI.
- Fixtures: one frozen HikerAPI dict per profile-access state (
public,private,deleted,empty,schema_drift). tests/fakes.py:FakeBackendimplementsOSINTBackendfrom fixtures with per-method error injection covering every entry of the error taxonomy.- 3 e2e flows under
tests/e2e/: subprocess one-shot, prompt_toolkit pty REPL session,/watchtick withpatch_stdoutcapture. - Strict mypy + ruff format + ruff lint as CI gates.