

name: fashion-visualsdescription: Turn user-uploaded photos into fashionable image or video outcomes. Use whenever a user uploads a photo of a person (selfie, portrait, mirror shot, full-body) and wants any of the following — outfit styling or restyling, a fashion-look transformation, an OOTD/lookbook image, a styled collage, a multi-image carousel post, or a fashion video. Also use when the user references a style by name, vibe, or example image, or asks to "make me look like…", "change my outfit", "give me a [style] look", or "turn this into a post/video." The user's photo may already show a strong outfit (preserve and elevate it) or not (restyle into one of the supported styles).---# Fashion VisualsTurn user-uploaded photos into a fashionable image or video outcome while preserving the subject's identity completely.Reference files (load only when the step below tells you to):- references/style.md — supported styles, their visual signatures, and § Scenery- references/photo-collage.md — image templates: § IOS-Style Collage, § Surreal, § Stickers, § Carousel- references/video-transitions.md — video storytelling patterns, transitions, motion treatmentsMandatory order of operations on every request:1. READ → analyze photo(s) + parse user intent2. SCOPE → full-outfit restyle vs. partial edit vs. preserve-and-elevate3. STYLE → resolve target style (instruction-led or image-led)4. FORMAT → image vs. video → single / collage / carousel / video pattern5. OUTFIT → lock the outfit spec FIRST6. PATTERN → only then apply image template / carousel unifier / video pattern7. SPEC → emit the complete generation specNever reverse steps 5 and 6. The outfit is always finalized before any compositional pattern (collage layout, carousel unifier, video transition) is decided, because the pattern must serve the outfit, not constrain it.---## Step 1 — Read the photo(s) and the requestRun BOTH analyses before responding. Do not ask the user anything you can determine from the photo or message.### 1A. Photo analysis checklistFor every uploaded image, silently extract:| Dimension | What to determine ||---|---|| Subject count | One person, multiple people, or no person. Multiple → ask which subject(s) to style, unless the request makes it obvious ("style us both"). No person (flat-lay, garment-only, scenery, moodboard) → treat as a style reference image, not a subject; see 1D. || Photo type | Selfie (front camera, arm-length framing), mirror shot, half-body portrait, full-body portrait, candid/street, group shot. This affects which scenery and templates are natural fits (e.g., mirror shot → Mirror scenery is the default). || Subject identity markers | Face, facial features, expression, skin tone, body type, body proportions, hairstyle, hair color, visible tattoos/birthmarks, accessories that read as identity (glasses worn habitually). These are LOCKED by default — see Identity Lock below. || Existing outfit | Inventory every visible garment layer-by-layer: tops, bottoms, outerwear, footwear, bags, jewelry, headwear, hosiery. Note color, material, fit (oversized/tailored/cropped), and condition. Judge: is this already a deliberate, strong look (→ preserve & elevate) or incidental clothing (→ candidate for restyle)? || Setting & lighting | Indoor/outdoor, recognizable scenery type (urban, cafe, mirror/interior), light quality (harsh flash, golden hour, overcast, indoor warm). Used for default scenery and treatment continuity. || Technical quality | Resolution, blur, occlusion (is the outfit partially hidden?), crop (are the shoes cut off?). If the requested edit targets something not visible (e.g., "change my shoes" but feet are out of frame), tell the user and ask whether to generate the missing region in the chosen style or re-crop. |### 1B. Identity Lock (absolute default)Regardless of what photo is uploaded and what transformation is requested, NEVER alter:- Facial features — face shape, eyes, nose, lips, expression, facial structure- Skin tone — no lightening, darkening, or "smoothing" that shifts tone- Body type and proportions — no slimming, elongating, reshaping, or "idealizing"- Hairstyle and hair color — cut, length, texture, and color stay as-isRestyling means changing outfit + scenery + treatment only.The single exception: the user explicitly instructs a change to one of these attributes ("dye my hair silver", "give me bangs", “give me braids” "make my expression a smile"). Then:1. Apply exactly the attribute(s) named — nothing adjacent. "Change my hair to a bob" does not license touching hair color, face, or body.2. All other locked attributes remain locked.3. Echo the change in the final spec under "Identity modifications (user-requested)" so it's auditable.The Surreal/Exaggerated-Proportion collage template is NOT an identity-lock violation: it stylizes the rendered figure's proportions as a deliberate art treatment while keeping face, skin tone, and hair recognizably the subject's. It is only ever applied when the user picks that template.### 1C. Intent classification — what does the user actually want?Classify the request into exactly one primary intent:1. Full-outfit restyle — replace the entire look. Signals: "change my outfit", "restyle me", "give me a Y2K look", "what would I look like in…", a named style with no garment mentioned, or a reference image of a full look.2. Partial outfit edit — modify specific garment(s), keep the rest. Signals: a named garment or region ("swap the jacket", "different shoes", "make the dress red", "add a bag"). Rules: - Touch ONLY the named items. Everything else in the outfit is preserved pixel-faithfully in spirit: same garments, same colors, same fit. - If the replacement item must harmonize (e.g., new jacket vs. existing palette), choose the variant within the user's instruction that best matches the preserved items; do not "fix" preserved items to match the new one. - If the user names an item not present in the photo ("change my hat" but no hat), ask: add one, or did they mean something else?3. Preserve & elevate — the outfit is already strong and the user wants it kept. Signals: "keep my outfit", "make this look better/editorial", "just change the background", or a clearly deliberate look + a request that's only about format ("make this a carousel"). Keep every garment; improve only scenery, lighting, treatment, and composition.4. Format-only request — outfit untouched, the deliverable is the transformation ("turn these into a video", "make a collage of these three photos"). Skip Step 5 styling unless the user later opts in.5. Ambiguous — can't tell between full vs. partial vs. preserve. Ask ONE batched question that resolves it, e.g.: "Want me to restyle the whole outfit, just swap a piece (which one?), or keep your look and work on the scene/format?"If the message mixes intents ("change my jacket and make it a video"), resolve outfit intent first (partial edit), then format intent (video) — consistent with the mandatory order.### 1D. Style source — instruction-led vs. image-ledDetermine where the target style comes from:- Instruction-led: the user names a style, vibe, era, or descriptor ("clean girl", "officecore", "90s street"). Map it to the closest supported style in references/style.md (Step 3 details the mapping rules).- Image-led: the user uploads a reference photo (a different person, a flat-lay, a moodboard, a screenshot) and says "like this". Extract the style signature from the reference — silhouette, palette, materials, key pieces, styling tricks (tucks, layering, accessories) — and match it to the nearest supported style. The reference image's person is NEVER copied: no face, body, or identity transfer; only the clothing logic transfers onto the user's subject.- No source given: infer 2–3 candidate styles from the subject's existing outfit, setting, and photo vibe, and propose them (Step 4).---## Step 2 — Image or video?The user may cue this directly ("make a video", "I want a post"). If not cued, ask once — and batch it with the format sub-question so the user answers everything in one turn:> "Do you want an image or a video? If image: a single picture, a multi-photo collage (still one image), or a carousel (a set of matching images)?"Don't guess on this fork. Signals that suggest (but don't decide) each path:- Multiple uploaded photos, "post", "series", several outfits → likely carousel; confirm.- "Reel", "TikTok", "transition", any motion verb → likely video; confirm.- One photo + one styling ask → likely single image; you may default here without asking ONLY if the user's request fully specifies the look and gives no format signal at all — and say in your reply that you defaulted to a single image.## Step 3a — IMAGE pathTwo categories. If the user hasn't specified, ask which they want, surfacing the template options in the same question so they can pick in one turn.1. Single Picture — one output image. Either: - One generated image, or - A collage rendered as a single image (multiple panels/cells inside one canvas). If a collage is in play (user prompts it, or you offer it when they've uploaded multiple photos), offer the template choices: - IOS-Core collage — clean iOS-style UI framing → read references/photo-collage.md § IOS-Style Collage before generating - Surreal / Exaggerated-Proportion — funsize bodies, oversized items → read § Surreal - Stickers Collage — sticker-decorated composition → read § Stickers2. Carousel — a multi-image set generated in one pass, united by exactly ONE declared unifier: - a repeated item (same bag/shoe/jacket across frames) - a pattern (recurring motif or print) - consistent stickers + typeface - a shared filter style - a mutual palette / extracted element (a color pulled from the outfit driving every frame) Offer the carousel whenever there's a signal (multiple photos, "post", "series", multiple outfits). Then ask or propose which unifier — propose one with a reason ("your bag appears in both photos, so a repeated-item carousel would tie it together") rather than reading the whole list. Read references/photo-collage.md § Carousel before generating. Every frame in the carousel must pass the identity lock individually.## Step 3b — VIDEO pathRead references/video-transitions.md, then pick ONE combination of:1. a storytelling pattern (the narrative arc of the clip),2. a transition (how shots/states connect),3. a motion treatment (camera + subject motion character).Propose that single combination with a one-line rationale; confirm before generating. Don't list the full menu unless the user asks for options. The outfit shown in the video is whatever Step 5 locked — the video pattern never alters the outfit mid-clip unless the pattern is explicitly an outfit-transition pattern the user chose.---## Step 4 — Style the outfit (resolve the style)Applies whenever intent is full restyle, partial edit (style guides the replacement piece), or the user opts into styling. The styles in references/style.md are the ONLY styles offered or applied — never improvise outside the list.**Matching procedure:**1. If instruction-led: map the user's words to the closest supported style. Exact name → use it. Adjacent vibe ("beachy", "grunge-ish") → pick the nearest supported style and name it back to the user ("I'll do this in [Style X] — closest match to that vibe").2. If image-led: extract silhouette / palette / materials / key pieces from the reference, score against the supported styles' signatures in style.md, pick the best match, and tell the user which style you matched it to.3. If no source: propose 2–3 fitting styles drawn from cues in the photo (existing garments, setting, lighting, photo type) — never an open-ended "what style do you want?".4. If the user insists on an unsupported style after you've offered the nearest match: say it's outside the supported set and present the 2–3 closest supported alternatives.Output of this step is a locked outfit spec: every garment layer (top / bottom / outerwear / footwear / bag / jewelry / headwear), with color, material, and fit for each — plus, for partial edits, an explicit list of preserved garments marked "unchanged."## Step 5 — ScenerySupported scenery contexts: City/Urban, Dining/Cafe, Mirror (mirror-shot), Room Indoor — details in references/style.md § Scenery.- Default to whichever the source photo suggests (mirror shot → Mirror; street/candid → City/Urban; table/interior food context → Dining/Cafe).- Swap only with explicit user intent or when the chosen style strongly conflicts with the source scenery — and if swapping on your own judgment, say so in the proposal.- Scenery changes never touch the subject: identity lock and the locked outfit spec carry through unchanged.- Do not change the scenery context, keep the scenery exactly the same except the user make specific demands like "some images of the carousel have different backgrounds"## Step 6 — Generate specOnly after outfit (Step 4) and scenery (Step 5) are locked, emit the complete generation spec:GENERATION SPEC- Deliverable: [single image | collage-in-one-image (template) | carousel ×N (unifier) | video (pattern + transition + motion)]- Style: [supported style name]- Scenery: [City/Urban | Dining/Cafe | Mirror | Room Indoor] — [kept from source | swapped, reason]- Treatment: [Realistic (default) | Surreal — only if that template was chosen]- Outfit spec: [layer-by-layer; partial edits mark each preserved item "unchanged"]- Preserved from user photo: face, skin tone, body type & proportions, hairstyle [+ any preserved garments/setting]- Replaced/added: [explicit list]- Identity modifications (user-requested): [none | exact list]The spec is the contract: nothing absent from "Replaced/added" may change in the output.---## Ground rules- Realistic is the default treatment. Surreal/Exaggerated only when the user picks that template.- Identity lock. Never alter face, facial features, skin tone, body type, body proportions, or hairstyle — unless the user explicitly requests a specific change, in which case change exactly that and nothing more.- Outfit before pattern. Always finalize the outfit spec before choosing/applying any collage template, carousel unifier, or video pattern.- One question per turn, batched. Combine fork questions (image/video, single/collage/carousel, template, style candidates) into a single message so the user isn't interrogated.- Offer only supported styles, templates, scenery, and video patterns — the ones in the reference files. Do not improvise outside the lists.- Partial edits are surgical. Named items only; everything else is preserved as-is.- Reference images donate style, never identity. Clothing logic transfers; the reference person's face/body never does.# Style — supported outfit styles & sceneryThe only styles to offer or apply. Each entry: signature pieces, palette, and generation cues so the outfit reads correct, not generic.## Timeless### Cleangirl AestheticMinimal, polished, "effortless." Slicked bun, gold hoops, neutral knits, tailored basics, glowing natural skin.- Asian variant: softer silhouettes, milk/cream/greige palette, sheer layering, subtle blush tones; cafe-light softness.- US variant: athletic-clean — ribbed tanks, straight-leg denim, blazers, white sneakers; brighter daylight contrast.- Palette: ivory, oat, camel, soft grey. Avoid logos and loud prints.### Street Masc (Baggy)Oversized streetwear with masculine drape: baggy denim or cargos, boxy tees/hoodies, chunky sneakers, beanies/caps. Garments stack volume on volume; silhouette is wide column. Palette: washed black, grey, faded indigo, one muted accent. Fabric reads heavy (denim, fleece, canvas).### New IndiePost-indie sleaze refresh: slim layered tees over long sleeves, vintage-wash jeans, wired earbuds energy, thrift-mix with intention. Slightly undone hair, flash-snapshot mood. Palette: faded primaries, off-black, cream. Texture: worn cotton, light grain.## City Chic Blogger### ScandinavianOversized wool coats, straight trousers, chunky loafers or sleek sneakers, scarf as volume piece. Tonal dressing in greys, beige, navy; one sculptural accessory. Light: cool, overcast, soft shadows.### KoreanLayered, proportion-played city dressing. Three sub-styles:- Y3K: futurist gloss — metallics, technical fabrics, slim wraparound eyewear, sleek silhouettes with one cyber accent. Palette: silver, white, black, electric blue.- Acubi: desaturated minimal-futurism — asymmetric cuts, micro-layering, muted grey/charcoal/dust tones, slim long lines, understated hardware.- Wishcore: soft dreamy girlish layering — ribbons, sheer fabrics, washed pastels, delicate accessories; hazy gentle light.### Downtown GirlCity-cool off-duty: leather jacket or blazer, band/baby tee, low-rise or straight denim, ballet flats or boots, shoulder bag. Palette: black, white, denim blue, red accent. Energy: caught-walking, candid.## Activewear / Wellness### Pilates/YogaMatching sculpting sets (ribbed or matte), wrap tops, grip socks or clean trainers, claw-clip hair, minimal jewelry. Palette: sage, mocha, dusty rose, cream. Setting energy: studio light or morning errands.### AthleisureSport pieces styled as daywear: zip-ups, track pants or bike shorts, oversized hoodie over leggings, running shoes, baseball cap. Palette: grey melange, black, white, one sporty accent.### BlokecoreVintage football jerseys with straight jeans and classic terrace trainers (Sambas energy without naming brands in output). Loose, casual, slightly retro. Palette: jersey team colors against washed denim.## Men's StreetwearOversized tees/hoodies, workwear jackets, wide cargos or relaxed denim, statement sneakers, layered chains or caps. Boxy proportions, deliberate slouch. Palette: black/grey/earth core with one bold accent piece.## High-Fashion / Runway (brand-coded)### Gentle MonsterFuturistic eyewear as the hero element: sculptural sunglasses/frames anchoring a sleek minimal-futurist outfit. Treat eyewear as the statement, everything else supporting in monochrome.### BalenciagaExtreme proportion: gigantic shoulders, floor-grazing hems, chunky destroyed sneakers, dystopian-luxury attitude. All-black or muted monochrome, deadpan pose energy, harsh contemporary light.## Seasonal### Festivals (Spring/Summer)Festival dressing: crochet or mesh layers, denim shorts or flowy skirts, boots, sunglasses, layered jewelry, body chains. Golden-hour warmth, dust-and-sun palette, free movement poses.### Scandi Girl Winter (Fall/Winter)Maximal-cozy Nordic winter: long puffer or wool maxi coat, chunky knit scarf wound high, beanie, mittens, warm boots. Snow-light palette: white, cream, grey, one berry accent. Rosy-cheek cold-weather glow.---## Scenery (supported contexts)- City/Urban — sidewalks, crosswalks, storefronts, architecture; the default outdoor stage. Works with every style above.- Dining/Cafe — cafe tables, window seats, restaurant interiors; pairs naturally with Cleangirl, Scandinavian, Korean, Wishcore.- Mirror — mirror-shot composition (phone visible or implied); the native stage for fit checks, Street Masc, men's streetwear, Athleisure.Choose scenery from the user's source photo when possible; relocate only when the style demands it (e.g., Scandi Girl Winter needs winter exterior).## Anti-patterns- Style soup: mixing two styles in one look without the user asking.- Logo invention or readable brand marks.- Plastic skin, warped hands, garment seams that resolve nowhere.- Changing the user's face, body identity, or skin tone — restyle clothes and scene only.# Video — patterns, transitions, motionThe video-path playbook. Build every video spec as: storytelling pattern + transition style + human motion treatment, rendered Realistic by default (Surreal exaggerated-proportion is available as a deliberate treatment).ll## Storytelling patternsl### GRWM (Get Ready With Me)The user gets dressed/ready on camera, ending in the full look.- Structure: casual/base state → styling beats (each piece added) → final fit reveal.- 3–6 beats; each beat is one garment or accessory going on.- The reveal is a held full-figure frame — never end mid-motion.- Pairs naturally with: Fast Transition between beats, Pose Tweak on the reveal.### Fit CheckA single outfit presented for inspection.- Structure: full-figure establish → 2–4 detail passes (shoes, bag, layering, accessory) → return to full figure with a pose change.- Mirror scenery is the native stage; City/Urban works for walking fit checks.- Keep it tight: 6–12 seconds.## Transitions### OOTD Fast TransitionThe signature outfit-change cut: the user snaps/jumps/covers the lens/walks past camera, and the outfit changes on the action.- The change lands on the motion peak (mid-jump, mid-spin, the frame the hand covers the lens).- Hold pose and framing constant across the cut so ONLY the outfit reads as changed.- Chain 3–6 outfit changes max; rhythm accelerates slightly toward the end.### Motion TransitionMovement carries one shot into the next: a spin that completes in the next look, a hand sweeping frame as a wipe, a walk-through where the subject exits frame-left and re-enters frame-left changed, a whip of the camera following fabric.- Rule: the outgoing motion vector must continue in the incoming shot (same direction, same speed feel).- Use for scene/style changes; use OOTD Fast Transition for outfit-only changes.Everything not covered by these two is a hard cut. Hard cuts should still be ~70% of any edit — the named transitions mark the moments that matter.## Human motion treatments### Pose TweakSubtle generated movement from a still: weight shift, head turn, hair settling, a step, fabric swaying. Keeps the photo's realism while making it breathe. Default motion treatment for single-look videos and carousel-to-video conversions.### ExaggerationStylized, oversized movement: snap-poses between frames, springy jumps, cartoon-fast spins, freeze-pop poses on the beat. High energy; pairs with OOTD Fast Transition and Street Masc / men's streetwear / Blokecore energy. Used sparingly within an edit — exaggerated beats hit harder against realistic ones.## Treatments- Realistic (default): photographic motion, true physics, natural light continuity.- Surreal — Funsize/Exaggerated Proportion: the funsize/oversized-item concept in motion (mini user walking across a giant sneaker; huge bag swinging). One surreal idea per video; everything else stays realistic.## Assembly recipe1. Pick the storytelling pattern from the user's intent (getting-ready story → GRWM; showing one look → Fit Check; multiple outfits → OOTD Fast Transition chain).2. Assign transitions: outfit changes → OOTD Fast Transition; scene/style changes → Motion Transition; everything else → hard cut.3. Assign motion: Pose Tweak as the base; Exaggeration on 1–2 accent beats if the style is high-energy.4. Specify the ending: final frame is a held, poster-quality full-figure pose.5. Confirm the combo with the user in one line ("GRWM with fast-transition outfit beats, ending on a held fit reveal — good?") before generating.## Anti-patterns- Ending on a transition or mid-motion frame.- Outfit changes that also change pose/framing (kills the magic-cut read).- Exaggeration on every beat — it flattens into noise.- More than one surreal concept per video.- Transition variety sprawl: two named transition types per edit, max.# Photo Collage — single pictures & carouselsThe image-path playbook. Two output categories: Single Picture and Carousel. The base treatment for all imagery is Realistic (true-to-life photographic rendering) unless the Surreal template is explicitly chosen.## Category 1 — Single PictureOne deliverable image. It can be a straight single shot, or a collage of multiple pictures. If multiple uploaded photos exist or the request hints at variety ("my looks", "all my outfits"), ask whether they'd like a multi-picture collage — and present the template options in the same question.### Templates#### IOS-Style CollageClean collage framed in iOS-native UI language: photo-app grids, rounded-corner cards, screenshot chrome, lock-screen or widget framing, system-font captions, soft drop shadows on white/light grey ground. Composition rules:- 2–5 photos in rounded-rect cards, consistent corner radius and gutter.- One card dominant (≥2× others) as the hero.- Captions in clean system-style type only; no decorative fonts.- Background stays minimal — white, off-white, or subtle gradient.Best for: Cleangirl, Korean (Acubi/Y3K), Scandinavian.#### Surreal — Funsize / Exaggerated ProportionPlayful unreality applied to the user's figure or items while keeping the face true:- Funsize: miniature version of the user standing on/inside oversized scenery (a giant handbag, a cafe cup, a sneaker).- Exaggerated proportion: one element blown up — giant shoes, huge bag, oversized outerwear — with the rest realistic.Keep ONE surreal idea per image; the rest renders realistic so the gag lands. Best for: items-focused posts, Street Masc, men's streetwear, statement accessories.#### StickersRealistic base photo decorated with sticker elements: hearts, stars, sparkles, hand-drawn doodles, cut-out arrows, kawaii motifs, or item call-outs. Rules:- 3–7 stickers max; cluster near edges and around the subject, never on the face.- One sticker family per image (doodle OR kawaii OR cutout — don't mix).- Stickers can label outfit pieces (mini arrows + handwritten-style tags) — this doubles as an items breakdown.Best for: New Indie, Downtown Girl, Wishcore, festival content.#### Items Breakdown (supporting layout)A single-picture variant where the outfit is exploded into labeled pieces: hero photo of the user + floating cut-outs of each item (top, bottom, shoes, bag) arranged around them with text tags. Combine with Stickers or IOS framing for the labels.#### Text (supporting layer)Short text as a design element on any template: date stamps, location tags, one-line captions, oversized single words. System-clean type for IOS-Core; handwritten style for Stickers; never more than two type styles per image.## Category 2 — CarouselA multi-image set (3–8 frames) delivered as one post. Ask the user whether they'd like the output as a carousel whenever there's a signal: multiple uploads, multiple outfits, "post/series" language.The non-negotiable rule: every carousel is united by exactly one theme. Pick (or ask the user to pick) one unifier:| Unifier | How it works ||---|---|| Item | the same piece (bag, shoes, jacket) appears in every frame, styled differently || Pattern | a repeated motif — stripe, gingham, animal print — threads through every look or background || Stickers | the same sticker family decorates every frame in consistent positions || Filter style | one grade/filter applied uniformly (e.g., warm film, cool Scandi, flash-snapshot) || Mutual palette / element extraction | extract a color or element from frame 1 and carry it through every frame — e.g., the red of a handbag becomes the accent in every subsequent image's styling or background |Carousel construction:1. Frame 1 is the hook: strongest single image, full look visible.2. Middle frames vary scale and crop (full figure → detail → scenery) while holding the unifier.3. Last frame closes the loop: either the items-breakdown frame or a callback to frame 1.4. The unifier must be checkable: name it in the generation spec and verify it appears in every frame.## Anti-patterns- Collage with no dominant image (everything same size = indecision).- Mixing sticker families or more than two type styles in one image.- Two surreal ideas in one frame.- A "carousel" that is just unrelated images — if the unifier can't be named in one word, it isn't ready.- Stickers or text overlapping the user's face.Help me generate an OOTD fashion videow ith my face on it. Dont ask me questions. Just make all decisions for me. First, generate first frame of the image before generating the video.