JoyCaption logo

JoyCaption Beta One

Image-captioning model  |  build mb3500zp


Caption Type
Caption Length
Select one or more
0 2
0 1
1 2048

Quick-start

  1. Upload or drop an image in the left-hand panel.
  2. Pick a Caption Type and, if you wish, adjust the Caption Length.
  3. (Optional) expand the "Extra Options" accordion and tick any boxes that should influence the caption.
  4. (Optional) open Generation settings to adjust temperature, top-p, or max tokens.
  5. Press Caption. The prompt sent to the model appears in the Prompt box (editable), and the resulting caption streams into the Caption box.

Caption Types

ModeWhat it does
Descriptive Formal, detailed prose description.
Descriptive (Casual) Similar to Descriptive but with a friendlier, conversational tone.
Straightforward Objective, no fluff, and more succinct than Descriptive.
Stable Diffusion Prompt Reverse-engineers a prompt that could have produced the image in a SD/T2I model.
⚠︎ Experimental – can glitch ≈ 3% of the time.
MidJourney Same idea as above but tuned to MidJourney’s prompt style.
⚠︎ Experimental – can glitch ≈ 3% of the time.
Danbooru tag list Comma-separated tags strictly following Danbooru conventions (artist:, copyright:, etc.). Lower-case underscores only.
⚠︎ Experimental – can glitch ≈ 3% of the time.
e621 tag list Alphabetical, namespaced tags in e621 style – includes species/meta tags when relevant.
⚠︎ Experimental – can glitch ≈ 3% of the time.
Rule34 tag list Rule34 style alphabetical tag dump; artist/copyright/character prefixes first.
⚠︎ Experimental – can glitch ≈ 3% of the time.
Booru-like tag list Looser tag list when you want labels but not a specific Booru format.
⚠︎ Experimental – can glitch ≈ 3% of the time.
Art Critic Paragraph of art-historical commentary: composition, symbolism, style, lighting, movement, etc.
Product Listing Short marketing copy as if selling the depicted object.
Social Media Post Catchy caption aimed at platforms like Instagram or BlueSky.

Note on Booru modes: They’re tuned for anime-style / illustration imagery; accuracy drops on real-world photographs or highly abstract artwork.

Extra Options

These check-boxes fine-tune what the model should or should not mention: lighting, camera angle, aesthetic rating, profanity, etc. Toggle them before hitting Caption; the prompt box will update instantly.

Generation settings

  • Temperature – randomness. 0 = deterministic; higher = more variety.
  • Top-p – nucleus sampling cutoff. Lower = safer, higher = freer.
  • Max New Tokens – hard stop for the model’s output length.

Enjoy experimenting, and feel free to open an issue if you spot any bugs or have feature ideas!


🚨🚨🚨 If the "Help improve JoyCaption" box is checked, the _text_ query you write will be logged and I _might_ use it to help improve JoyCaption. It does not log images, user data, etc; only the text query. I cannot see what images you send, and frankly, I don't want to. But knowing what kinds of instructions and queries users want JoyCaption to handle will help guide me in building JoyCaption's dataset. This dataset will be made public. As always, the model itself is completely public and free to use outside of this space. And, of course, I have no control nor access to what HuggingFace, which are graciously hosting this space, collects.