AI Specification-Driven Development

A while ago I created an auth-service backend with the purpose of having a unified login system for my projects. Later, I decided to build a new frontend for it.

With the amount of AI tools available today, creating a login screen is not the hard part. The hard part is keeping the project scalable, understandable, and easy to evolve after the first working version exists. I did not want to just start vibe coding and hope the architecture would appear naturally. I wanted to use AI, but I also wanted to stay in the loop.

So this lab became an experiment: can I use specs, plans, ADRs, and reusable Codex skills to guide an AI coding agent toward the system I actually want?

The short answer is yes, but only when the specification work is treated as part of the engineering work, not as documentation written after the code is done.

Starting with the SRS

The first document I created was an SRS, or Software Requirements Specification. The purpose of this file was simple: describe what the frontend should do before asking the AI to implement anything.

The application needed to support:

Sign in with email and password.
Sign up with name, email, password, and password confirmation.
Password recovery and reset flows.
Google and GitHub login options.
Session persistence.
Backend 4xx errors displayed as toasts.
Portuguese and English UI support.
Responsive and accessible authentication screens.

This was the first important lesson: the SRS prevented the AI from treating "login frontend" as a vague prompt. It turned the work into a product boundary.

For example, instead of saying "create a login app", the SRS defined the exact backend headers:

Content-Type: application/json
x-api-key
x-application-id
accept-language

It also defined the environment variables:

VITE_AUTH_API_BASE_URL
VITE_AUTH_API_KEY
VITE_AUTH_APPLICATION_ID

And it described the endpoints the frontend would consume:

| Feature           | Method  | Endpoint                  | Request Body                | Success Response     |
| ---               | ---     | ---                       | ---                         | ---                  |
| Sign in           | `POST`  | `/auth/signin`            | `{ email, password }`       | `{ token, user }`    |
| Sign up           | `POST`  | `/auth/signup`            | `{ name, email, password }` | `{ user }`           |
| Forgot password   | `POST`  | `/auth/forgot-password`   | `{ email }`                 | `{ message }`        |
| Reset password    | `PATCH` | `/auth/reset-password`    | `{ token, new_password }`   | `{ message }`        |
| Request OTP login | `POST`  | `/auth/request-otp-login` | `{ email }`                 | `{ message }`        |
| Verify OTP login  | `POST`  | `/auth/verify-otp-login`  | `{ email, code }`           | `{ token, user }`    |
| Validate token    | `GET`   | `/auth/validate-token`    | none                        | `{ valid: true }`    |
| Refresh token     | `GET`   | `/auth/refresh-token`     | none                        | `{ refreshedToken }` |

That table is not just documentation. It is a way to remove ambiguity before implementation starts.

Turning Architecture into ADRs

After the SRS, I created Architecture Decision Records. These were the first ten:

0001-use-typescript-for-frontend.md
0002-use-feature-based-frontend-architecture.md
0003-use-react-router-and-pages.md
0004-isolate-auth-backend-integration.md
0005-use-react-hook-form-and-zod.md
0006-use-i18next-for-internationalization.md
0007-use-sonner-for-toast-notifications.md
0008-session-management-with-cookie-backed-auth.md
0009-disable-provider-login-until-backend-contract-exists.md
0010-testing-strategy-for-auth-frontend.md

The ADRs helped me define the style of the system before generating the system.

For example, "use feature-based frontend architecture" was not just a preference. It meant the authentication code should be grouped by responsibility instead of becoming a pile of unrelated components. "Isolate auth backend integration" meant API calls should live behind a clear integration layer instead of being scattered across pages.

This matters a lot with AI coding agents. If the architecture is only in your head, the agent has to guess. If the architecture is written down, the agent can follow it, and you can review whether it actually did.

Design Guidelines Before Screens

The next step was visual direction. I did not want the UI to be invented screen by screen, so I looked at common frontend reference documents:

DESIGN_SYSTEM.md
UI_GUIDELINES.md
FRONTEND_STANDARDS.md
STYLE_GUIDE.md
DESIGN_TOKENS.md
BRAND_GUIDELINES.md

For this project I chose UI_GUIDELINES.md.

That document became the bridge between architecture and interface. It described how the screens should feel, how form states should behave, and how the authentication experience should stay consistent between English and Portuguese.

This is where spec-driven development becomes more practical than abstract. The goal is not to write documents forever. The goal is to give the agent enough context that each implementation step has a clear direction.

Specs, Plans, and Staying in the Loop

With the SRS, ADRs, and UI guidelines ready, I could have tried one huge prompt to generate the whole project. I avoided that.

The purpose of this lab was not speed only. I wanted to improve my architecture habits and learn how to work with AI without giving up control of the system. So I created the application spec by spec, then plan by plan.

My usual flow became:

Write or review the spec.
Ask Codex to ask me questions when context was missing.
Convert the spec into a plan.
Execute the plan.
Review the result.
Adjust the plan or create a reusable skill when the pattern would appear again.

During this process I created skills that captured project-specific behavior:

labs-login-router
labs-component
labs-page

The important part was not the skill names themselves. The important part was turning repeated project decisions into reusable instructions. A component skill, for example, can encode how components should be structured, named, and connected to the existing UI conventions.

That made later prompts smaller and safer because I did not need to explain the same rules every time.

When the AI Made a Mistake

Specs help, but they do not remove the need for review.

At one point I updated the structure to add a routes folder. Codex accidentally placed it under lib/routes, producing a structure like this:

src/
├── assets/
├── components/
│   └── ui/
├── features/
│   └── auth/
│       ├── components/
│       ├── hooks/
│       ├── api.ts
│       └── types.ts
├── hooks/
├── lib/
│   └── routes/
├── pages/
├── App.tsx
└── main.tsx

The lib/routes directory was out of context for this project. Routes belonged near the application routing structure, not inside shared library utilities.

This was a useful mistake. It showed me that spec-driven development is not "write a document and stop thinking." It is closer to creating a stronger feedback loop. The agent can move fast, but I still need to inspect whether the implementation matches the architecture.

I fixed the structure manually and adjusted the plan so the same confusion would not continue.

Discovering a Backend Gap

While building the frontend, I realized the backend was not complete for Google and GitHub provider login. The SRS expected provider login options, but the backend did not yet expose the full OAuth contract.

Instead of patching the frontend around an incomplete backend, I went back to the spec-driven process and created backend work:

docs/specs/session-with-github-and-google.md
docs/plans/session-with-github-and-google.md

After that backend work, I had new routes available:

router.post(
  "/auth/oauth/exchange",
  requireApiKey,
  requireApplicationId,
  AuthController.exchangeOAuthCode.bind(AuthController)
);

router.post(
  "/auth/oauth/:provider/authorize",
  requireApiKey,
  requireApplicationId,
  AuthController.authorizeOAuth.bind(AuthController)
);

router.get(
  "/auth/oauth/:provider/callback",
  AuthController.handleOAuthCallback.bind(AuthController)
);

Then I returned to the frontend and created another spec and plan for that integration:

docs/specs/0009-google-and-github-integration.md
docs/plans/0009-google-and-github-integration.md

This is one of the biggest benefits I felt during the lab: specs made the dependency between frontend and backend visible. The missing provider routes were not just a bug. They were a contract gap.

Adding Life to the UI

After the main flows were implemented, the app still felt unfinished. It worked, but it needed better visual details.

I added icons for language switching, password visibility, providers, and loading states:

src/components/ui/Icons/BrazilIcon.tsx
src/components/ui/Icons/EyeIcon.tsx
src/components/ui/Icons/EyeSlashIcon.tsx
src/components/ui/Icons/GithubIcon.tsx
src/components/ui/Icons/GoogleIcon.tsx
src/components/ui/Icons/LoadingIcon.tsx
src/components/ui/Icons/USAIcon.tsx

I also refined layout references, added a logout button, and improved small interaction states.

These details matter because authentication is usually the first experience a user has with a product. A login system can be technically correct and still feel careless if buttons do not show loading states, provider buttons look unfinished, or password visibility controls are awkward.

Loading State as a Spec, Not a Patch

One problem I found late in the project was that buttons did not show a loading state while asynchronous calls were running.

The easy fix would be to add loading logic directly wherever I noticed the issue. Instead, I treated it as a behavior pattern and created a dedicated skill:

.codex/skills/labs-integration-backend/SKILL.md

Then I created a spec and plan for the refactor:

docs/specs/0011-loading-state-backend-integration.md
docs/plans/011-loading-state-backend-integration.md

This may sound heavy for a loading spinner, but the goal was consistency. Loading state is not just an icon. It affects disabled states, double submissions, user feedback, and async error handling.

When the same behavior appears in multiple forms, it deserves a pattern.

Testing Strategy

I kept the tests closer to the end of this first version. I trust automated tests, especially for frontend plus backend behavior, but I did not want the first test setup to depend on my backend running perfectly.

So I chose Cypress and mocked the backend for the first version.

The test work also received its own spec and plan:

docs/specs/0010-testing-strategy.md
docs/plans/0010-testing-strategy.md

I also created a sub-agent under .agents:

labs-automated-tests.md

The purpose of that sub-agent was to help generate tests as new features were added. This is an area I want to keep improving, because tests are where AI can be useful but also dangerous. A generated test that only checks happy paths can create false confidence.

For this app, the most important test cases were:

Required field validation.
Invalid email validation.
Password confirmation mismatch.
Backend 4xx errors shown through toasts.
Generic behavior for 5xx errors.
Language switching.
OAuth redirect and callback behavior.
Disabled or loading states during requests.

Challenges

The biggest challenge was not writing specs. It was ordering them correctly.

At first I trusted the order Codex generated for the ADRs. That created some cost. For example, docs/adrs/0006-use-i18next-for-internationalization.md came after a lot of implementation had already happened. Internationalization should have influenced the project earlier because it affects labels, validation messages, backend headers, and UI layout.

The same happened with docs/adrs/0010-testing-strategy-for-auth-frontend.md. Testing strategy was last, but it should have been closer to the top. Once I noticed that, I moved testing higher in the spec process.

Another mistake was sometimes asking Codex to execute code directly from a spec. It was not a disaster, but for important features, creating a plan first produced better code and made review easier.

My takeaway is simple: a spec defines intent, but a plan defines execution. For small changes, a prompt can be enough. For architectural work, the plan is worth it.

Future Improvements

There are still improvements I want to make:

Move toward backend-owned HttpOnly cookies for stronger session security.
Create a React Native version to test whether the same specs, plans, and skills transfer well to mobile.
Integrate the frontend with another backend to validate how reusable the authorization flow really is.
Improve automated tests so every new auth feature comes with meaningful coverage instead of only happy-path checks.

The cookie topic is especially important. A frontend can store a token in a regular cookie, but it cannot create a true HttpOnly cookie. For that, the backend must own the session cookie flow. That is a better long-term direction for security.

Conclusion

Making an AI coding agent follow exactly what you are thinking is the hard part. The more clearly you illustrate your architecture, contracts, and expected behavior, the easier it becomes for the agent to produce useful code.

Spec-driven development is not a strict rule. I still used small prompts, manual fixes, and direct refinements when they made sense. But the combination of SRS, ADRs, UI guidelines, specs, plans, and skills made the work much more reliable than pure vibe coding.

The biggest benefit was staying in the loop. I could review each plan, execute one piece at a time, test features between implementations, and make focused commits. That gave me speed without giving up ownership of the architecture.

AI Specification-Driven Development

AI Specification-Driven Development

Starting with the SRS

Turning Architecture into ADRs

Design Guidelines Before Screens

Specs, Plans, and Staying in the Loop

When the AI Made a Mistake

Discovering a Backend Gap

Adding Life to the UI

Loading State as a Spec, Not a Patch

Testing Strategy

Challenges

Future Improvements

Conclusion

Continue reading

A Practical AWS Deployment Lab with Terraform, ECS Fargate, RDS, ALB, and TLS

Exploring Celery + Redis for Background Jobs