Two years ago I generated the first scene in my first novel in midjourney, as a static image, then got the Runway app to bring Roisin to life, this was the result. The post about it was here and later on on keeping a consistent character in the still image generation here
It was good and somewhat amazing but this week midjourney added a powerful new video generating model, and the same image gave this even more stunning result.
The new one has much more consistency and intricate details, down in the bottom right a silver zip toggle sways as she types. The music was not AI just added on a youtube short. Midjourney has the stated goal of getting to generate worlds i.e. metaverse. This seems to be at least as good as Google’s Veo3 that we just got here. Itโs much harder to spot the progress in the LLMs, but this is a good visual indication of the speed and improvementย
Also the flying penguins get a look in too. It’s been tricky to get penguins to fly so far but Google Veo3 did this one, the desert island with a soundtrack generated too. Hard to see they are penguins but they are ๐
Whilst that looked really good the ultimate one came from midjourney video again. Look at this ๐ (Sound added from YouTube short editor
It’s been a while since I posted anything for lots of reasons, that’s an offline conversation. However, this weekend something appeared on my feeds that was just too exciting not to do a quick test with and then share, which I already have on Twitter, LinkedIn, Mastodon and Facebook plus a bit of instagram. So, many channels to look at!
Back in August 2022 I dived into using Midjourney’s then new GenAI for images from text. It was a magical moment in tech for me of which there have been few over the 50+ years of being a tech geek (34 of those professionally). The constant updates to the GenAI and the potential for creating things digitally, both in 2D, movies and eventually 3D metaverse content has been exciting and interesting but there were a few gaps in the initial creation and control, one of which just got filled.
Midjourney released its character constant reference approach to point a text prompt at a reference image, in particular of a person, and to then use that person as a base for what is generated in the prompt. Normally you ask it to generate an image and try and describe the person in text or by a well known person, but accurately describing someone starts to make for very long prompts. Any gamers who have used avatar builders with millions of possible combinations will appreciate that a simple sentence is not going to get these GenAI’s to get the same person in two goes. This matters if you are trying to tell a narrative, such as, oh I don’t know… a sci fi novel like Reconfigure? I had previously written about the fun of trying to generate a version of Roisin from the book in one scene and passing that to Runway.ml where she came alive. That was just one scene, and to try any others would not have given me the same representation of Roisin in situ.
Initial render in midjourney some months ago
The image above was the one I used to then push to runway.ml to see what would happen, and I was very suprised how well it all worked. However, on Friday I pointed the –cref tag in midjourney to this image and asked for some very basic prompts related to the book. This is what I got
Another hacking version this time in a computer room not a forest
A more active shot involving a car
A different style, running at the car but similar to the previous one
Looking to buy some Marmite (a key attribute in the story and life of Roisin)
Another marmite shopping version
A more illustrative style in the snow (sleeves are down)
Another snow stylistic approach (notice sleave up and tattoos)
A more videogame/cell shaded look to the computer use in the bunker.
As you can see it is very close to being the same person, same clothing, different styles and these were all very short prompts. With more care, attention and curation these could be made to be even closer to one another. Obviously a bit of uncanny valley may kick in, but as a way to get a storyboard, sources for a genAi movie or create a graphic novel this is great innovation to help assist my own creative process in ways that there is no other way for me to do this without some huge budget from a Netflix series or a larger publisher. When I wrote the books I saw the scenes and pictures vividly in my head, these are not exactly the same, but they will be in the spirit of those.
Firstly, yes its been a while since I posted here, I could have done with an AI clone of me carrying on sharing interesting things. I was out mainly due to an annoying post-ailment “thing” (medical term) hitting my facial nerves which meant for months any screen time trying to read or write was visually and mentally uncomfortable. That was followed with a dose of Covid that knocked me out for a while but also seems to have had a potential positive impact on whatever else was going on. Anyway, I am back. Anything happened the past few months, I mean 1/2 a year?
I suddenly felt compelled to blog about some of the marvellous work going on with AI image generation of late with things such as Dall-e and Midjourney becoming available to more people. I have so far only used Midjourney, which is accessed through Discord. In the relevant channel you simply type the command /imagine followed by any descriptive text you fancy and it will generate four images based on that, with often surprising results. Any of those images can be selected as the root to generate more in that style, or you can upscale, get a big version, of one you favourite.
Handy hint- The mobile version of Discord really likes to scroll to new items, midjourney works by updating your post to the channel. As so many posts are being generated every few seconds you might think you can leave the app on your post to see the results but very quickly you will zoom up the screen. Constant scrolling back to try and catch your images being generated is a bit of a pain. However if you log into midjourney.com with Discord ID it will present you with your collection and lots of links to get back to do things with them. The Discord search doesn’t seem to work on userid etc so it took me a while to figure this out.
After a few attempts, and with one eye on the political situation and crisis faced in the UK I asked mid journey “an artist forced to stop painting in order to work in a bank” and got this wonderful set of images
an artist forced to stop painting in order to work in a bank
I also asked it to generate a protest by the people
a crowd of down trodden people protesting and rising up against the billionaires and corrupt politicians
Rather than continue the doom and gloom it was time to go a bit more techie ๐ I asked it to show me directing the metaverse from a control centre
epredator directing the Metaverse evolution from a hi-tech digital operations center
An obvious course of action was to explore my Reconfigure and Cont3xt novels and this is where I was very impressed by whatever is going on with this AI. I asked to see “a cyberpunk girl manipulating the physical world using an augmented reality digital twin” and it seems that Roisin (@Axelweight) is in there somewhere ๐
a cyberpunk girl manipulating the physical world using an augmented reality digital twin
This was worth exploring and picking couple of the images to then seed generating more similar to that which generated these two
a cyberpunk girl manipulating the physical world using an augmented reality digital twin (generation 2 from bottom left of gen 1)
And this one
a cyberpunk girl manipulating the physical world using an augmented reality digital twin (generation 2 from top right of gen 1)
These were good but the first version of the top right image in generation 1 was the one I did a more detail version of on its own. It could be the cover of the third book couldn’t it?
Midjourney generated potential book cover for third in the reconfigure trilogy. Of course one fo these from ore geometric results are more in line with my current cover designs asking “A world of Fractal Iterations, Quantum Computing and strange side effects opened up. It appeared to offer a programming interface to everything around her.”
A world of Fractal Iterations, Quantum Computing and strange side effects opened up. It appeared to offer a programming interface to everything around her.
In case this seems a all a bit too serious, I did ask a slightly weirder question of the AI and got this.
two ducks with a top hat on carrying a salmon
For a happy image how about this /imagine a bright green planet full of life
a bright green planet full of life
But back to the surreal and unusual and predlet 1.0 and I share a joke we “invented” when she was quite young, we were passing the pub in Lock’s Heath having been to the supermarket, she looked at the pavement and saw a creature. She said “dad look there’s a worm coming out of the pub” to which I replied “and he is legless”, and here it is
a worm legless outside a pub
As this kind of AI generation evolves from the static image to the moving one and then onto virtual world and the metaverse, what a wonderfully weird set of worlds we will be able to dream/nightmare up ๐