May 20269 min read

Putting Five Minds in a Candle-Lit Room

What would Baldwin actually say about AI? I built Salon to find out — a room where five generated agents debate your question in real time, remember what was said, and quietly walk closer to whoever challenged them.

Generative AgentsMulti-AgentMeDoERNIEResearch

Watch the demo

It started with a question I couldn't stop asking the air, late at night: what would James Baldwin actually say about AI if he were alive to see it? I wanted to put him in a room with Vincent van Gogh, a working therapist, an engineer who builds the things people are afraid of, and a Stoic philosopher who would tell them all to calm down. I wanted to listen.

The closest tool we have for that is reading. The next closest is asking a chatbot to pretend, which always feels thin. I wanted something between a research paper, an oil painting, and a video game — a real evening where minds I admire could argue in front of me, and I could lean in close to one and ask a question, and the others could overhear and answer back.

Stanford's paper, made walkable

The foundation is Stanford's Generative Agents paper — the idea that a small group of LLM-driven NPCs, given memory streams and reflection cycles, can form an emergent social world. The paper showed Smallville, a chat-room simulation. I wanted to put that in a browser tab where five characters debate a real question and you can stand in the room with them.

Salon convenes 3 to 5 minds of your choosing — historical figures, fictional characters, archetypes, or people you knew — around a candle-lit virtual table to debate any question you bring. Each participant gets an oil-painting portrait generated to look like them, a distinct TTS voice chosen to fit their archetype, and a persona conditioned on real quotes pulled from across the web.

Two views of the same evening

There are two ways to experience a salon. The classic view shows an octagonal table with portraits around it and exchange cards sliding in as each agent speaks. The Walk the Salon view is a top-down RPG room: a candle-lit ballroom you can move through as a custom avatar, approach any participant, listen to their inner thoughts hovering above them, and eavesdrop when two of them are whispering to each other near the fire. Click on the central candle and it speaks back — a cryptic line about what the room is becoming.

Salon classic view — octagonal table with five participant portraits — Classic view — five portraits around the octagonal table, the chronicle scroll on the right edge.

Walk the Salon — top-down RPG view of the candle-lit ballroom — Walk the Salon — a top-down view you can move through, with thought bubbles and whisper-pairs near the fire.

Memory streams and reflection cycles

The hardest layer was the agent architecture. Each NPC accumulates memories of what was said and how they felt about it. Every 8 exchanges or 90 seconds, a reflection cycle fires: each participant reads their twenty most-important memories and updates their view of the topic and of every other participant. The relationship valence drives their movement in the explore view, so an NPC who has been challenged will walk over to the challenger, and one who has grown alienated will quietly step away from the table.

After a few minutes of simulation, the NPCs are not in a chat log — they are at a table, with views of one another that shifted over time.

Agents reacting to each other in the salon — Agents reacting to each other — the relationship graph forming in real time.

Built entirely inside MeDo

I built Salon entirely inside MeDo through multi-turn chat. The very first prompt was a paragraph asking for a web app called Salon and what it should feel like. MeDo generated the database schema, the React frontend, every edge function, the orchestration loop, and the UI components.

I worked with MeDo the way you would work with a quick-witted but distracted collaborator. I learned to write prompts that named the invariant (what must always be true), the entity (what is being made), and the boundary (what must not change). The best results came from being specific to a fault — exact hex codes, exact font weights, exact word counts.

The plugins that made it real

ERNIE for persona generation and exchanges, with Baidu AI Studio's Access Token after BYO Qianfan auth turned out to be unworkable from my region.
AI Search and Google Scholar to ground each participant in real quotes attributable to them.
Kling Image Generation (Omni endpoint, after Lite failed three turns in a row) for the oil-painting portraits.
LemonFox TTS for distinct voices and Whisper for the speech-to-text mic button.
Google Translation to ship Mandarin, French, and Spanish UI end to end, including chronicle entries and synthesis paragraphs.

Salon UI rendered in Mandarin, French, and Spanish — End-to-end multilingual — the same salon, three languages.

What broke first

Audio pacing collided with the exchange cadence on the first build. The orchestrator was firing the next speaker every six seconds regardless of whether the previous person had finished talking, so audio piled up and cards faded mid-sentence. I had to amend the cadence to wait for the current audio plus a one-second pause, with the six-second timer as a floor.

The chroma-key filter to drop the white background on the player sprite needed three separate turns to actually land. MeDo would acknowledge it, claim success, and ship the same opaque rectangle. In the end it required regenerating the source PNG with a pure black background and applying a CSS blend-mode on top — not a filter.

What I learned

Multi-agent systems are a visibility problem before they are an algorithm problem. The first version of Salon ran the orchestrator correctly but looked to a user like a single chatbot taking turns. The relationship web on the replay page and the candle's whisper exist because the underlying agents needed a surface that made them legible.

I also learned to ship the wedge. Halfway through the build I had a list of fifteen elaborate features I wanted, including time-of-evening mood mechanics and mid-salon participant invitation. I cut several of them, twice. The version that shipped is smaller than my original sketch and stronger for it — the features that survived (memory streams, reflection cycles, relationship valence, the candle, the multilingual rendering) are the ones I would point to in a paper.

What's next

I want to push toward what AI Town and Project Sid have been doing: real persistence across sessions, so a salon you convene tonight remembers what happened last week. I want voice cloning on the participant side so people can speak in the actual voice of someone they loved. I want a third view where the conversation is rendered as a generated short film. And I want a public hall — a way for two strangers to share their salons and see how the same minds reasoned differently when given different questions.

Beyond that, I think there is a serious educational version of Salon for the classroom. A high school student asking five economists to debate inflation is one assignment. A medical student asking five clinicians to debate a case is another. The architecture supports both.

Related project

Salon: Generative Agents You Can Walk Into

View the project