Comparison
Anvil vs Maesto
Maesto asks you to write every tap. Anvil writes the taps for you — then ships the findings to Slack, Linear, and GitHub before your CI finishes.
Maesto pioneered the YAML-scripted mobile flow. It's elegant on the tutorial repo and painful on a production backlog: every new flow is another file to maintain, every onboarding change is a schema migration, and every locale means another matrix.
Anvil takes a different posture. The specs are YAML too, but they describe intent ("onboard a new user and land on the home feed") not steps. The autonomous agent decomposes the intent against the live app, recovers from layout drift, and writes the step-by-step replay afterwards — so the spec stays high-level and the replay stays faithful.
When the App Store rejects your binary, Anvil's rejection regression suite catches it in CI on the next release. Maesto has no equivalent. You write the test yourself, from the rejection email, and hope you remember to run it next time.
| Feature | Anvil | Maesto |
|---|---|---|
| Test authoring model | Intent-driven specs + agent | Step-by-step YAML flows |
| AI agent runtime | Yes | No |
| Rejection regression suite Every prior App Store rejection is a permanent spec. | Yes | No |
| Adversarial matrix (9-axis pairwise) | 359 specs across device / network / locale / a11y / interrupts | No |
| iOS driver | KID — 82 native verbs | Embedded WebDriver bridge |
| Android driver | KAD — UiAutomator2 + Accessibility | UiAutomator wrapper |
| visionOS / macOS / watchOS | KVD / KMD / KWD — same verb shape | No |
| Real-device orchestration | BYOD + rentable fleet | Local emulators only |
| Perf gating in CI | Baselines + P95 budgets per release | No |
| Customer support | Dedicated Customer Success on every plan | Community forum |
| MCP server (Claude / Cursor / Windsurf) | Yes | No |
| Pricing | Free tier + transparent usage-based | Usage-based |
Why teams switch to Anvil
- Teams migrating from Maesto usually delete 60–80% of their spec files in the first week — the agent subsumes the happy paths.
- The adversarial matrix (9 stressful axes crossed pairwise) catches real-world combinations that scripted tests never will: weak network × non-Latin locale × memory pressure × incoming call.
- Drivers are native (Swift, Kotlin, ObjC++), not WebDriver bridges — a real tap, real accessibility tree, real perfetto trace per run.
- The rejection suite is the kind of institutional memory most startups only build after their third App Store rejection. Anvil ships it on day one.
See it on your own suite
Free for 100 runs/month. No credit card. No call required to start.