FitCheck
An AI stylist that helps you decide what to wear
As a PM, I tend to measure my life in metrics, but there was one that refused to move: Wardrobe Utilization — the percentage of clothes I actually wear versus what I own. My closet kept getting bigger, but my “real” outfits didn't. FitCheck is the product I built to change that: an AI stylist that turns your existing wardrobe into outfits, improves utilization, and reduces morning decision fatigue.
FitCheck turns your existing wardrobe into outfits. It uses vision AI to catalog your clothes from selfies, then an LLM stylist to generate outfit combinations scored for color harmony, material synergy, and silhouette balance — reducing morning decision fatigue and improving wardrobe utilization.
Wardrobes That Don't Turn into Outfits
Talking to students and young professionals confirmed a pattern: people spend 5–15 minutes every morning deciding what to wear, and it gets worse for work days, events, and travel. Three root causes kept surfacing:
Decision fatigue
Most people default to the same “safe” outfits, leaving a large fraction of their closet effectively unused.
Underused closets
People forget what they own, can't see how pieces go together, and fall back on the same combinations again and again.
Shopping tools, not styling tools
Existing fashion apps mostly recommend what to buy, not how to style what you already own. Users wanted more value from pieces already in their closets.
Core insight: Outfit selection is a combinatorial problem. The challenge is turning a messy wardrobe into a set of confident outfit decisions.
From Closet Mess to Structured Data
Before the AI could style anything, it needed to know what was in the closet. I broke the problem into two systems: the Memory (a digital wardrobe that knows your clothes) and the Brain (an AI stylist that knows what goes together).
Building the digital wardrobe
Asking users to do flat-lay photos of every item would be realistic for a power user but terrible for an MVP. Instead, the system works from one selfie at a time.
Vision pipeline (not just “image classification”)
Rather than asking “which model classifies garments best?”, the better question was “what information do I actually need downstream?” That reframing unlocked a simpler, more robust flow:
- User takes a selfie in an outfit
- The browser converts the image to base64 and sends it to a Supabase edge function
- The edge function calls Cohere's Vision API with a tailored prompt to identify category (top, bottom, outer layer, shoes) and attributes (fabric, pattern, color, warmth score)
- The model returns structured JSON describing each garment
- Items are stored in a
wardrobe_itemstable with metadata on formality and style - Lovable AI generates clean, product-style cutouts so the wardrobe UI stays consistent
What I learned
- Metrics like Wardrobe Utilization don't just show progress; they expose when you're optimizing the wrong side of the equation.
- The right abstraction (“vision pipeline” instead of “perfect classifier”) mattered more than picking the “best” model.
- When I hit technical walls as a non-IC engineer, the breakthrough came from reframing the problem, not learning a new model.
Building the Brain of a Stylist
Once the digital wardrobe worked, the hard part began: teaching the system to think like a stylist, not a random outfit generator.
The combinatorial explosion
Even a modest closet explodes into possibilities. With 15 tops, 10 bottoms, and 8 layering pieces, there are over 1,200 theoretical outfits. Many are unwearable. The system needed a way to explore creatively but with guardrails.
My ride-hailing background helped here. Matching riders and drivers is also a large possibility space: you filter, generate candidates, and score. I reused the same mental model — constrain the space, let the model explore, apply deterministic scoring.
Three-layer architecture
Layer 1 — Constraint layer
This mirrors how humans actually get dressed. You prune mentally by:
- Weather — no shorts when it's 40°F
- Availability — exclude what's in the laundry
- Recency — avoid repeating the same piece for multiple days
- Occasion — casual, business, athletic, evening, outdoor
These are hard constraints, handled deterministically before any LLM call.
Layer 2 — Generative layer
With the filtered pool (~30 viable items), the goal is exploration without hard-coding taste. Taste is encoded as metadata, not brittle rules.
- The LLM (Gemini 2.5 Flash) sees structured attributes like fabric type, silhouette scores, and visual-weight tags
- Candidate pools are capped (e.g., 12 tops, 10 bottoms, 8 other pieces) to keep prompts token-efficient
- For each request, the LLM proposes outfits within those structured boundaries
Layer 3 — Deterministic validator
This layer takes the LLM's creative suggestions and scores each outfit along three dimensions:
- Color harmony (40%) — penalize harsh clashes, reward complementary and cohesive palettes
- Material synergy (30%) — avoid awkward pairings (e.g., silk top with athletic mesh shorts)
- Silhouette balance (30%) — maintain intentional contrast (fitted top + wide-leg pants) versus unbalanced proportions
The LLM gets room to be creative. The validator enforces coherence.
Finishing touches
- Deduplication — If two outfits share the same core pieces, only the highest-scoring variant is kept
- Sort & cap — Results are sorted by score and capped at five, so the user sees a tight set of strong options
Visually, the interface strips backgrounds and presents outfits the way influencers do: clean, curated, and easy to evaluate at a glance.
What's Working and What's Not (Yet)
This is still very much V1, and I'm explicit about the gaps.
Current friction points
- Latency spikes — When calls slow down, the experience shifts from “instant stylist” to “waiting on a spinner.” Better caching, precomputation, and graceful fallbacks are needed.
- Occasion nuance — Mapping real-life contexts to a single “occasion” label is too blunt. Some outfits are technically valid but feel socially off.
- Broken feedback loop — Saves and skips are logged, but those signals don't yet feed into validator weights or occasion scoring in an automated way.
- Sample bias — So far, it's mostly “closet-tested” on a narrow dataset. Performance across different body types, climates, and style identities hasn't been validated.
These are the questions I'm deliberately leaving open for the next iteration.
How I'll Measure Success
To move beyond “it feels good,” I instrumented four core signals:
- Recommendation Acceptance Rate (RAR) — Outfits saved ÷ outfits generated. Target: 60%+.
- Wardrobe Utilization (WU) — Unique items worn this month ÷ total wardrobe size. Baseline: 0.03, goal: 0.12 — a 4× improvement.
- Decision velocity — Time from “app open” to “outfit selected.” Target: dropping from ~15 minutes to under 2 minutes.
- Veto rate — Percentage of high-scoring outfits still rejected. This captures the “truth gap” between what the algorithm thinks works and what actually feels right.
That “truth gap” is where the next version lives — deciding how much of my own bias to encode and how much to let the system learn over time.