What the Flux?

A new AI image model’s taken the world by storm. When Stable Diffusion 3 came out, the response was a little underwhelming; in many ways it was a back step from SDXL, and Midjourney breathed a sigh of relief.

A YouTube transcription of this post is available.

They ain’t feeling so relieved now. I can see Midjourney execs clutching their pearls while Stability AI suits are sweating bullets. Flux.1 (not from Stability AI, the new kids on the block are Black Forest Labs) aims to make AI art more … artistic. I thought I’d kick the tyres on it to see how good it is.

TL;DR: it’s quite good. While it still suffers some of the consistency and direction issues of AI ‘art’ in general, the composition, lighting, and most important, fucking fingers, are much improved. Let’s use an example.

Here’s the cover image I generated for The Fury of the Betrayed. This took about a million different iterations to get to; there’s a lot of inpainting and outpainting going on here. I spent so much time tweaking this image, I think I accidentally invented a new type of art – ‘over-edited digital painting’. Our final image:

Gorgon? Check. Banded armour? You betcha. Lightning, a burning car wreck, and badassery: we’ve got it going on.

Now, let’s compare some popular models. To do this, I interrogated the above image then fed the resulting prompt into different models. The big challenge with doing fantasy art with general purpose models is the AI not knowing WTF a gorgon is, or what banded armour looks like. Let’s see how some of the popular AI models handled it.

First up: the OG, SDXL. Spoilers: there’s something wrong with the face (even after using RestoreFormer, facial recognition software probably wouldn’t recognise this as a human). No gorgon. No cars. Not banded armour. So, basically a fail from top to bottom, and I don’t even know WTF that flame-like stuff is supposed to be. Serpent fire? Some shit, I dunno.

I admire the use of blusher and lippy before heading out to kick ass for the Lord.

Ok, what’s next? Let’s try DreamShaper. This model was what I used through thousands of iterations for the OG image; how does it do when we feed it a complete prompt?

Yeah, nah. We’ve got our cars on fire, sort of, and some kind of monster above the city, but it looks more cat than gorgon. Also, not banded armour.

Her eyes show how dead inside she is.

Juggernaut’s the AI model everyone wants to be seen with at the digital art party. I feel like the premise is that it’s supposed to be amazing for photorealistic images, and since we’ve got a lot of shit going on that you can’t photograph (like gorgons), how are we doing?

Closer? Maybe? The armour might be banded from a certain point of view. The fire from the car is shithouse, though, and there’s no gorgon (…as expected, but I did want to dream for a second there).

The armour looks like it was designed by the FX artist on Transformers. At least she’s got the right level of pissed-off in her expression.

Since DreamShaper development moved into their XL Turbo model, I thought it would be good to contrast with that. Turbo models are much faster (right there in the name…). The destruction and fire is on point in this one, and the armour is doing something rad as well. No gorgon, but I’m sensing a theme here.

I didn’t ask for a sword, but I got one anyway. All we need is the cast from Big Trouble In Little China to make this complete.

Conclusion: Juggernaut and DreamShaper XL Turbo have the best faces, and I’d say DreamShaper XL’s composition is the best of them all. So, where does that leave us with Flux.1?

We’re going to run two tests. The first is with flux.schnell, which is a model that aims to get results in about 8 steps (the same as DreamShaper XL Turbo). The second is flux.dev, which I’ve used 30 steps on, same as the other SDXL models. Behold:

flux.schnell is on your left; if you turn your attention to the right side of the plane, you’ll see flux.dev.

Gorgon? I think the intent’s there. Flaming cars? Yep. Human-like face? For sure. Schnell has done better on the person and armour, but I think Dev has the better lighting. Either way, both models have heavily respected the source prompt, and given us (without cherry-picking or doing 30 different iterations) an image that is quite similar to our prompted idea.

All SDXL variants aside from Dreamshaper Turbo were done with Euler Ancestral sampler, at 30 steps. Flux Schnell and Dev used Euler A with 4 and 30 steps, respectively. DSXLT was 8 steps, with DPM++ SDE Karras sampler (it requires this sampler for effectiveness but it’s fine, it’s a good sampler). Euler A seems to be the Chosen One™ for Flux.

I then spent a bit of time getting my head around some of the finer points of Flux.1 and started clean with a new prompt. Here, we want to go with our standard avenging angel, because why not? My hunch was that I’d gone too strong on CFG Scale for Flux; what would happen if we used a lower value? I’ve upped Schnell’s step count to 8 but left Dev at 30. What happens?

Good things, as it turns out. Schnell’s on the left; dev’s on the right.

So, yeah: if I was Midjourney or DALL·E I’d be a little concerned right now. This is some next-level shit; the detail is extraordinary, regardless of Schnell or Dev, and you can get images to S-tier status if you fiddle with CFG values. The faces out of the crate could still use some work! But we’re seeing far fewer six-knuckled aliens, weird body proportions, and extra limbs (SD3, I’m looking at you).


Discover more from Parrydox

Subscribe to get the latest posts sent to your email.