Summary: Music prompting harder than image prompting | Combining up to 8 different AI tools into a single task workflow | Midjourney launches a style randomization feature
UX Roundup for May 20, 2024. (Midjourney)
Music Prompting Harder than Image Prompting
I have recently been creating songs about UX ideas with generative AI services like Udio and Suno that specialize in music creation. (Examples: Dark Design rockabilly song, Recognition Rather than Recall country song.)
I find it hard to control the musical style of the compositions. In some cases, this is because the current models simply are not up to the task: every time I have asked for an operatic aria, I have gotten something more similar to a Broadway musical. (Asking for a song in the style of Broadway musicals works reasonably well.)
When working within modern genres like rock, country, or jazz, the results often sound very good, and definitely comply with the requested genre. As with other forms of AI, ideation is free with music generation, and a few clicks of a button gives you a wide range of variations.
Generative AI for music is like having a series of composers sitting next to each other, cranking out different versions of your song. (Midjourney)
The problem is that user control is hard. (That’s otherwise usability heuristic 3.) Yes, I can ask for more songs, but no, I have little ability to change the style of the music in my desired direction. Keywords like “upbeat” or “moody” do work, but it’s hard to get the type of, for example, rock’n’roll I like.
In some ways, this observation echoes the general uncertainty of using AI, which we know well from image generation. For example, the above image of a row of composers emerged unexpectedly from a prompt for a single composer sitting at the piano and writing notes on a sheet of music. I lost the (requested) writing but gained a surreal image of many (uncannily similar-looking) composers experimenting with music variations in parallel.
Maybe it’s because I never had any decent music education growing up, but I don’t have an easy command of the necessary keywords to tweak musical styles. I have a much stronger vocabulary for describing art styles for image creation. The articulation barrier exists for both forms of media creation with AI, but it seems much higher and more difficult to surmount for music than for visuals.
One workaround I have found is to switch from the music service to a language model like ChatGPT or Claude and ask it to give me 20 keywords that describe the musical style of an artist who exemplifies the type of music I’m after. Since recognition is indeed easier than recall, as the song says, I can pick out a few terms that describe what I want and use them to prompt the music-generating AI. Example from a recent creation: “Percussive, Twangy, Harmony-rich, Call-and-Response.” These are not terms I would have written on my own, but I can recognize that they describe the song I was envisioning.
It helps having a varied toolkit of AI products at your disposal and knowing how to get the best from each. The whole is stronger than the sum of the parts.
It’s easier to control image generation than music composition with the keywords needed by current generative AI tools. You can inspect an image and consider what to call each pictorial element or style, whereas music is fleeting and gone as soon as it’s played. That’s how I made this image of Beethoven composing a symphony — something that’s far beyond the abilities of current music AI. (Leonardo)
Combining Multiple AI Tools in a Workflow
My recent data showed that UX professionals use at least 93 different AI tools. More launch every week.
The beauty of specialized tools is that we can often find one that’s very good at some specific thing we need. The downside is that no one tool does everything perfectly, so we need to combine AI tools and construct multi-tool workflows. Knowing when to use which tool and how to combine tools for optimal effect are two major reasons for UX professionals to get hands-on experience with using AI for their particular work, as opposed to just reading about AI in general and how other people use AI tools for other jobs.
You need a well-stocked toolbelt before you climb up a utility pole to attempt a repair job. Similarly, you should acquire hands-on experience with using a wide range of AI tools for UX tasks before you create a complete AI-fueled UX workflow. (Midjourney)
Two examples.
Ioana Teleanu (Product Designer for AI at Miro in Romania) described how she combines Miro and UIzard into a single workflow for a design project. She used Miro for its strengths, for brainstorming and structuring ideas. For example, she created a moodboard in Miro for visual design inspiration. She then exported the Miro ideas to Uizard to rapidly bring UI design concepts to life. (Uizard is a generative UI tool that uses AI to design user interface prototypes.)
Ioana’s idea of exporting a design style from Miro to get concrete designs in Uizard reminded me of the way you can also use style references in many image-generation tools like Midjourney to make new images that (more or less) consistently apply the visual style of an image you upload for reference. (The style reference feature is called “sref” in Midjourney, given its propensity for obscuring feature names. In Midjourney’s web interface, style references are marked by a paperclip on the referenced image, whereas image references are marked with a landscape icon and character references are marked with a person-silhouette icon. Don’t worry, you’ll learn these conventions if you use Midjourney enough, but they are not what would call high usability features.)
Ioana Teleanu combined two tools into a single workflow: Miro for moodboards and UIzard for screen designs. (Ideogram)
My own example is my creation of one of the Dark Design songs I mentioned above: the Rockabilly version. For this project, I combined 8 AI tools:
Perplexity: research poetic meter for different song genres. [$200/year = $17/month to subscribe to Perplexity Pro.]
ChatGPT: write lyrics about Dark Design in my chosen meter and rhyming pattern. (I almost always use AABB rhyming for my songs, so I didn’t specifically investigate alternate rhyming approaches for this project, but I would probably turn to Perplexity if I wanted to do so another time.) [$25/month to subscribe to ChatGPT Plus.]
Claude: generate a variety of descriptive keywords for the style of music I wanted for my song. (As described in the previous news item.) [Subscribing through Poe at $299/year = $17/month.]
Udio: compose the music and perform the song with AI-generated singers. [Using the free plan for now.]
Ideogram: create a still image for the music video. I used Ideogram instead of Midjourney since I wanted tight control of the image and it currently offers the best prompt adherence of the main image generation tools. Specifically, I wanted the male singer to wear thick black-rimmed eyeglasses and look a little like Buddy Holly, who was my inspiration for the song. [$7/month basic subscription — I really ought to pay them more.]
Leonardo: use its universal more upscale to transform the Ideogram image from 720p to 4K resolution, since that’s the video format I wanted to upload to YouTube. [$24/month “artisan” subscription.]
Runway: make several short animations based on my still image. This was the weakest step in my workflow and I’m not very happy with the resulting animations. Other experimental AI tools generate better video when given a still image, but I currently don’t have access to these tools. [Using the free plan for now.]
YouTube: match the lyrics with the video to create subtitles. Adding captions to videos has always been a major accessibility guideline, but it's such a pain that most video publishers skip them. I am very pleased with YouTube’s AI-driven caption auto-synch tool, which lowers the barrier for subtitles so that I can do them even for a private passion project like warning against Dark Design. [$14/month YouTube Premium subscription.]
This is the still image I used as the basis for animating the music video for my Dark Design song. If the male singer reminds you of a famous rock star from the 1950s, that’s deliberate. Here, it’s downsampled for website publication, but the full image in the music video is 4K. (Ideogram + Leonardo)
In total, across these 8 tools, I’m using 2 free plans and 6 paid subscriptions, for a total cost of $104 per month. I also pay $48/month for Midjourney Pro, which I didn’t use in this sample workflow, making my total AI budget $152/month. This is cheap for the full set of AI capabilities which have so many other uses, but I would not pay this much only to make a few songs for fun.
Style Randomization in Midjourney
Midjourney launched a feature for randomizing the style in images a month ago. The code is “--sref random,” which you really just have to know. There is no usability here.
This is useful in the ideation phase if you don’t have a specific style in mind. Here are 4 images I got for a prompt for “a two-panel comic strip with a female UX designer composing a moodboard in the left panel and drawing mobile app screens in the right panel.” (This was to illustrate the case study of a two-tool workflow in the above piece. As you can see, I actually used an illustration from Ideogram because of its better prompt adherence, even though the Midjourney styles were more intriguing.)
Using Midjourney’s new style randomization feature creates wildly different pictures for the same prompt. I could see myself working more with variations of the upper left image: it’s stylistically cool, even though this first version doesn’t actually show the specific workflow steps I wanted to visualize.
Only the two top images complied with my request for a two-panel strip. The bottom two images show a single scene, while using the same idea of having the designer look at two different objects, representing the two different design steps. I like both styles, and I think this visualization of the underlying concept of a two-tool workflow may even be better. But the supposed moodboards don’t look like moodboards, and only the lower left image includes mobile app UI screens that look as such.
I like the style randomization feature, but for more complex scenes, Midjourney still needs to improve its prompt adherence.
Comments