Gradio

Quick-start

Upload or drop an image in the left-hand panel.
Pick a Caption Type and, if you wish, adjust the Caption Length.
(Optional) expand the "Extra Options" accordion and tick any boxes that should influence the caption.
(Optional) open Generation settings to adjust temperature, top-p, or max tokens.
Press Caption. The prompt sent to the model appears in the Prompt box (editable), and the resulting caption streams into the Caption box.

Caption Types

Mode	What it does
Descriptive	Formal, detailed prose description.
Descriptive (Casual)	Similar to Descriptive but with a friendlier, conversational tone.
Straightforward	Objective, no fluff, and more succinct than Descriptive.
Stable Diffusion Prompt	Reverse-engineers a prompt that could have produced the image in a SD/T2I model. ⚠︎ Experimental – can glitch ≈ 3% of the time.
MidJourney	Same idea as above but tuned to MidJourney’s prompt style. ⚠︎ Experimental – can glitch ≈ 3% of the time.
Danbooru tag list	Comma-separated tags strictly following Danbooru conventions (artist:, copyright:, etc.). Lower-case underscores only. ⚠︎ Experimental – can glitch ≈ 3% of the time.
e621 tag list	Alphabetical, namespaced tags in e621 style – includes species/meta tags when relevant. ⚠︎ Experimental – can glitch ≈ 3% of the time.
Rule34 tag list	Rule34 style alphabetical tag dump; artist/copyright/character prefixes first. ⚠︎ Experimental – can glitch ≈ 3% of the time.
Booru-like tag list	Looser tag list when you want labels but not a specific Booru format. ⚠︎ Experimental – can glitch ≈ 3% of the time.
Art Critic	Paragraph of art-historical commentary: composition, symbolism, style, lighting, movement, etc.
Product Listing	Short marketing copy as if selling the depicted object.
Social Media Post	Catchy caption aimed at platforms like Instagram or BlueSky.

Note on Booru modes: They’re tuned for anime-style / illustration imagery; accuracy drops on real-world photographs or highly abstract artwork.

Extra Options

These check-boxes fine-tune what the model should or should not mention: lighting, camera angle, aesthetic rating, profanity, etc. Toggle them before hitting Caption; the prompt box will update instantly.

Generation settings

Temperature – randomness. 0 = deterministic; higher = more variety.
Top-p – nucleus sampling cutoff. Lower = safer, higher = freer.
Max New Tokens – hard stop for the model’s output length.

Enjoy experimenting, and feel free to open an issue if you spot any bugs or have feature ideas!

🚨🚨🚨 If the "Help improve JoyCaption" box is checked, the _text_ query you write will be logged and I _might_ use it to help improve JoyCaption. It does not log images, user data, etc; only the text query. I cannot see what images you send, and frankly, I don't want to. But knowing what kinds of instructions and queries users want JoyCaption to handle will help guide me in building JoyCaption's dataset. This dataset will be made public. As always, the model itself is completely public and free to use outside of this space. And, of course, I have no control nor access to what HuggingFace, which are graciously hosting this space, collects.

JoyCaption Beta One

Quick-start

Caption Types

Extra Options

Generation settings