JoyCaption Beta One
Image-captioning model | build mb3500zp
Quick-start
- Upload or drop an image in the left-hand panel.
- Pick a Caption Type and, if you wish, adjust the Caption Length.
- (Optional) expand the "Extra Options" accordion and tick any boxes that should influence the caption.
- (Optional) open Generation settings to adjust
temperature
,top-p
, ormax tokens
. - Press Caption. The prompt sent to the model appears in the Prompt box (editable), and the resulting caption streams into the Caption box.
Caption Types
Mode | What it does |
---|---|
Descriptive | Formal, detailed prose description. |
Descriptive (Casual) | Similar to Descriptive but with a friendlier, conversational tone. |
Straightforward | Objective, no fluff, and more succinct than Descriptive. |
Stable Diffusion Prompt | Reverse-engineers a prompt that could have produced the image in a SD/T2I model. ⚠︎ Experimental – can glitch ≈ 3% of the time. |
MidJourney | Same idea as above but tuned to MidJourney’s prompt style. ⚠︎ Experimental – can glitch ≈ 3% of the time. |
Danbooru tag list | Comma-separated tags strictly following Danbooru conventions
(artist:, copyright:, etc.). Lower-case underscores only. ⚠︎ Experimental – can glitch ≈ 3% of the time. |
e621 tag list | Alphabetical, namespaced tags in e621 style – includes species/meta
tags when relevant. ⚠︎ Experimental – can glitch ≈ 3% of the time. |
Rule34 tag list | Rule34 style alphabetical tag dump; artist/copyright/character
prefixes first. ⚠︎ Experimental – can glitch ≈ 3% of the time. |
Booru-like tag list | Looser tag list when you want labels but not a specific Booru format. ⚠︎ Experimental – can glitch ≈ 3% of the time. |
Art Critic | Paragraph of art-historical commentary: composition, symbolism, style, lighting, movement, etc. |
Product Listing | Short marketing copy as if selling the depicted object. |
Social Media Post | Catchy caption aimed at platforms like Instagram or BlueSky. |
Note on Booru modes: They’re tuned for anime-style / illustration imagery; accuracy drops on real-world photographs or highly abstract artwork.
Extra Options
These check-boxes fine-tune what the model should or should not mention: lighting, camera angle, aesthetic rating, profanity, etc. Toggle them before hitting Caption; the prompt box will update instantly.
Generation settings
- Temperature – randomness. 0 = deterministic; higher = more variety.
- Top-p – nucleus sampling cutoff. Lower = safer, higher = freer.
- Max New Tokens – hard stop for the model’s output length.
Enjoy experimenting, and feel free to open an issue if you spot any bugs or have feature ideas!
🚨🚨🚨 If the "Help improve JoyCaption" box is checked, the _text_ query you write will be logged and I _might_ use it to help improve JoyCaption. It does not log images, user data, etc; only the text query. I cannot see what images you send, and frankly, I don't want to. But knowing what kinds of instructions and queries users want JoyCaption to handle will help guide me in building JoyCaption's dataset. This dataset will be made public. As always, the model itself is completely public and free to use outside of this space. And, of course, I have no control nor access to what HuggingFace, which are graciously hosting this space, collects.