AI Project

FitCheck

An AI stylist that helps you decide what to wear

As a PM, I tend to measure my life in metrics, but there was one that refused to move: Wardrobe Utilization — the percentage of clothes I actually wear versus what I own. My closet kept getting bigger, but my “real” outfits didn't. FitCheck is the product I built to change that: an AI stylist that turns your existing wardrobe into outfits, improves utilization, and reduces morning decision fatigue.

TL;DR

FitCheck turns your existing wardrobe into outfits. It uses vision AI to catalog your clothes from selfies, then an LLM stylist to generate outfit combinations scored for color harmony, material synergy, and silhouette balance — reducing morning decision fatigue and improving wardrobe utilization.

Role Solo Builder

Type 0-to-1 AI Product

Stack Cohere Vision, Gemini 2.5, Supabase

Status V1 — Active

The Problem

Wardrobes That Don't Turn into Outfits

Talking to students and young professionals confirmed a pattern: people spend 5–15 minutes every morning deciding what to wear, and it gets worse for work days, events, and travel. Three root causes kept surfacing:

Decision fatigue

Most people default to the same “safe” outfits, leaving a large fraction of their closet effectively unused.

Underused closets

People forget what they own, can't see how pieces go together, and fall back on the same combinations again and again.

Shopping tools, not styling tools

Existing fashion apps mostly recommend what to buy, not how to style what you already own. Users wanted more value from pieces already in their closets.

Core insight: Outfit selection is a combinatorial problem. The challenge is turning a messy wardrobe into a set of confident outfit decisions.

The Memory

From Closet Mess to Structured Data

Before the AI could style anything, it needed to know what was in the closet. I broke the problem into two systems: the Memory (a digital wardrobe that knows your clothes) and the Brain (an AI stylist that knows what goes together).

Building the digital wardrobe

Asking users to do flat-lay photos of every item would be realistic for a power user but terrible for an MVP. Instead, the system works from one selfie at a time.

Vision pipeline (not just “image classification”)

Rather than asking “which model classifies garments best?”, the better question was “what information do I actually need downstream?” That reframing unlocked a simpler, more robust flow:

User takes a selfie in an outfit
The browser converts the image to base64 and sends it to a Supabase edge function
The edge function calls Cohere's Vision API with a tailored prompt to identify category (top, bottom, outer layer, shoes) and attributes (fabric, pattern, color, warmth score)
The model returns structured JSON describing each garment
Items are stored in a wardrobe_items table with metadata on formality and style
Lovable AI generates clean, product-style cutouts so the wardrobe UI stays consistent

Selfie → Base64 encode → Vision API → Structured JSON → Wardrobe DB

~1¢

Cost per image

~2K

Tokens per call

<5s

Selfie to data

What I learned

Metrics like Wardrobe Utilization don't just show progress; they expose when you're optimizing the wrong side of the equation.
The right abstraction (“vision pipeline” instead of “perfect classifier”) mattered more than picking the “best” model.
When I hit technical walls as a non-IC engineer, the breakthrough came from reframing the problem, not learning a new model.

The Brain

Building the Brain of a Stylist

Once the digital wardrobe worked, the hard part began: teaching the system to think like a stylist, not a random outfit generator.

The combinatorial explosion

Even a modest closet explodes into possibilities. With 15 tops, 10 bottoms, and 8 layering pieces, there are over 1,200 theoretical outfits. Many are unwearable. The system needed a way to explore creatively but with guardrails.

My ride-hailing background helped here. Matching riders and drivers is also a large possibility space: you filter, generate candidates, and score. I reused the same mental model — constrain the space, let the model explore, apply deterministic scoring.

Three-layer architecture

Layer 1 — Constraint layer

This mirrors how humans actually get dressed. You prune mentally by:

Weather — no shorts when it's 40°F
Availability — exclude what's in the laundry
Recency — avoid repeating the same piece for multiple days
Occasion — casual, business, athletic, evening, outdoor

These are hard constraints, handled deterministically before any LLM call.

Layer 2 — Generative layer

With the filtered pool (~30 viable items), the goal is exploration without hard-coding taste. Taste is encoded as metadata, not brittle rules.

The LLM (Gemini 2.5 Flash) sees structured attributes like fabric type, silhouette scores, and visual-weight tags
Candidate pools are capped (e.g., 12 tops, 10 bottoms, 8 other pieces) to keep prompts token-efficient
For each request, the LLM proposes outfits within those structured boundaries

Layer 3 — Deterministic validator

This layer takes the LLM's creative suggestions and scores each outfit along three dimensions:

Color harmony (40%) — penalize harsh clashes, reward complementary and cohesive palettes
Material synergy (30%) — avoid awkward pairings (e.g., silk top with athletic mesh shorts)
Silhouette balance (30%) — maintain intentional contrast (fitted top + wide-leg pants) versus unbalanced proportions

Score = (Color × 0.4) + (Material × 0.3) + (Silhouette × 0.3)

The LLM gets room to be creative. The validator enforces coherence.

Constraints → Filter pool → LLM generates → Validator scores → Top 5 outfits

Finishing touches

Deduplication — If two outfits share the same core pieces, only the highest-scoring variant is kept
Sort & cap — Results are sorted by score and capped at five, so the user sees a tight set of strong options

Visually, the interface strips backgrounds and presents outfits the way influencers do: clean, curated, and easy to evaluate at a glance.

Current Status

What's Working and What's Not (Yet)

This is still very much V1, and I'm explicit about the gaps.

Current friction points

Latency spikes — When calls slow down, the experience shifts from “instant stylist” to “waiting on a spinner.” Better caching, precomputation, and graceful fallbacks are needed.
Occasion nuance — Mapping real-life contexts to a single “occasion” label is too blunt. Some outfits are technically valid but feel socially off.
Broken feedback loop — Saves and skips are logged, but those signals don't yet feed into validator weights or occasion scoring in an automated way.
Sample bias — So far, it's mostly “closet-tested” on a narrow dataset. Performance across different body types, climates, and style identities hasn't been validated.

These are the questions I'm deliberately leaving open for the next iteration.

Success Metrics

How I'll Measure Success

To move beyond “it feels good,” I instrumented four core signals:

60%+

RAR target

4×

Wardrobe utilization goal

<2 min

Decision velocity target

Recommendation Acceptance Rate (RAR) — Outfits saved ÷ outfits generated. Target: 60%+.
Wardrobe Utilization (WU) — Unique items worn this month ÷ total wardrobe size. Baseline: 0.03, goal: 0.12 — a 4× improvement.
Decision velocity — Time from “app open” to “outfit selected.” Target: dropping from ~15 minutes to under 2 minutes.
Veto rate — Percentage of high-scoring outfits still rejected. This captures the “truth gap” between what the algorithm thinks works and what actually feels right.

That “truth gap” is where the next version lives — deciding how much of my own bias to encode and how much to let the system learn over time.

Episode 1

The Denominator Problem

Building the digital wardrobe — from selfies to structured data

Read on LinkedIn

Episode 2

Building the Brain of a Stylist

Three-layer architecture for AI outfit generation

Read on LinkedIn