Posted on 1 Comment

AI, Art and AI Art – Part 4

ai generated portrait of a woman in green with a green, broad-brimmed hat, with flowers

This is the last of four posts dealing with AI and AI Art. It takes a different form to the previous three. In this post, I look only at the output from AI Art apps, without regard to how it works or what issues its use might raise. The post concludes with an overall assessment of AI art and my reactions.

In part 3, I briefly described how I tested the app, and mentioned the problems experienced with the output. This post was intended to expand on that, giving example text prompts and the resulting image. However, the theory and the practice have proved very different.

I’m sure I will be returning to the topic. I will try and come here to add new inks.

Is AI just a party trick?

Testing the AI art app

To recap, my initial aim in testing the AI art app was to push it as far as possible. I was not necessarily trying to generate useable images. The prompts I wrote:

  • brought together named people, who could never have met in real life, and put them in unlikely situations.
  • were sometimes deliberately vague.
  • were written with male and female versions.
  • used a variety of ethnicities and ages.
  • used single characters, and multiple characters interacting with each other.
  • used characters in combination with props and/or animals
  • used a range of different settings.

I realised I also needed to test the capacity of the AI to generate an image to a precise brief. This is, I believe, the area where AI art is likely to have the most impact. Doing this proved much harder than I expected.

In essence, generating an attractive image with a single character does not require a complex prompt. I suspect this is already being used by self-publishers on sites like Amazon.

Creating more complex images, at least with Imagine AI, is much more difficult. There are ways around the problem, but these require use of special syntax. This takes the writing of the prompt into a form of coding for which documentation is minimal.

Talking to the AI art app

This problem of human-AI communication is not something I’ve given any real thought to, beyond fiddling with the text prompt. This paper addresses one aspect of it. From this, it became clear that the text prompt used in AI art apps, or the query used by the likes of ChatGPT are not used in their original form. The initial text (in what is termed Natural Language or NL) has to be translated into logical form first. Only then can it be turned into actions by the AI, namely the image generation, although that glosses over a huge area of other complex programming.

This is a continuously evolving area of research. As things stand, the models used have difficulty in capturing the meaning of prompt modifiers. This mirrors my own difficulties. The paper is part of the effort to allow the use of Natural Language without the need to employ special syntax or terms.

Research into HCI

The research, described in this paper, points towards six different types of prompt modifiers use in the text-to-image art community. These are:

  • subject terms,
  • image prompts,
  • style modifiers,
  • quality boosters,
  • repeating terms, and
  • magic terms.

This taxonomy reflects the community’s practical experience of prompt modifiers.

The paper’s author made extensive use of what he calls ‘grey literature’. Grey literature is materials and research produced by organizations outside of the traditional commercial or academic publishing and distribution channels. In the case of AI art, much is available from the companies developing the apps. This from Stable Diffusion and this, deal with prompt writing.

Both of them take a similar approach to preparing the text prompt. They suggest organising the content into categories, which could be mapped onto the list of prompt modifiers referred to above.

The text-to-image community

As with any sphere of interest, there seems to be a strong online community. Given the nature of this particular activity, they post their images on social media. Some of this, being tactful, is best described as ‘specialist’. Actual porn is generally locked down by the AI companies. That doesn’t stop people pushing the boundaries, of course. If you decide to explore the online world of AI art, expect lots of anime in the shape of big chested young women in flimsy clothing. From the few images I’ve seen which made it past the barrier, the files used for training the app must have included NSFW (Not Safe For Work) images. What we get is not quite Rule 34, but skates close…

What else?

It’s not all improbable girls, though. The Discord server for the Imagine AI art app has a wide range of channels. These include nature, animals, architecture, food, and fashion design as well as the usual SF, horror etc. The range of work posted is quite remarkable in isolation, but in the end quite samey. Posters tend not to share the prompt alongside the image. It isn’t clear therefore if this is a shortcoming in the AI, or a reflection of the comparatively narrow interests of those using the app.

Judging by the public response to AI, it seems unlikely that many artists in other media are using it with serious intent. That too will bias users to a particular mind set. Reading between the lines of the posts on Discord, my guess is that they tend to be young and male. Again, this limited user base will affect the nature of images made.

The output from the AI app

The problems I described above have prevented me from the sort of systematic evaluation, I planned. A step by step description of the process isn’t practical. It takes too long. The highest quality model on Imagine is restricted to 100 image generations in a day, for example. I hit that barrier while testing one prompt, still without succeeding.

In addition, I did a lot of this work before I decided to write about it, so only have broad details of the prompts I. I posted many of those images on Instagram in an account I created specifically for this purpose.

Generic Prompts

I began with some generic situations, adding variations as shown in brackets at the end of each prompt. In some cases, I inserted named people into the scenario. An example:

  • A figure walking down a street (M/F and various ages, physique, ethnicities, hair style/colour, style of dress)

Capturing a likeness

I wanted to see how well the app caught the likeness of well known people. By putting them in impossible, or at least unlikely situations, this would push the app even further. An example:

  • Marilyn Monroe dancing the tango with Patrick Stewart. I also tried Humphrey Bogart, Donald Trump and Winston Churchill.

I discovered a way to blend the likenesses of two people. This enables me to create a composite which can be carried through into several images. Without that, the AI would generate a new face each time. The numbers in the example are the respective weights given to the two people in making the image. If one is much better known than the other, the results may not be predictable, but should still be consistent:

  • (Person A: 0.4) (Person B: 0.6) sitting at a cafe table.

Practical applications

I also wanted to test the possibility of using the app for illustrations such as book covers, magazine graphics etc. Examples:

  • Man in his 50s with close-cropped black hair and a beard, wearing a yellow three-piece suit, standing at a crowded bar
  • Woman in her 50s with dark hair, cut in a bob, wearing a green sweater, sitting alone at a table in a bar.
  • Building, inspired by nautilus shell, art nouveau, Gaudi, Mucha, Sagrada Familia

To really push things, I wrote prompts drawn from texts intended for other purposes. Examples:

  • Lyrics to Dylan’s Visions of Johanna
  • Extracts from the Mars Trilogy by Kim Stanley Robinson
  • T S Eliot’s The Waste Land

I tried using random phrases, drawn from the news and whatever else was around, and finally random lists of unrelated words.

Worked example

This post would become too long if I included examples of everything from the list above, which is already shortened. Instead, I will show examples from a single prompt and some of those as I develop it. The prompt is designed to create the base image for a book cover. The story relates to three young people who become emotionally entangled as a consequence of an SF event. (A novel I’m currently writing)

Initial prompt:

Young man in his 20s, white, cropped brown hair, young woman, in her 20s, mixed race, afro, young woman in her 20s, white, curly red hair

This didn’t work, the output never showed three characters, often only one. If I wasn’t trying to get a specific image, they would be fine as generic illustrations.

Shifting away from photo realism, this one might have been nice, ethnicities apart, but for one significant flaw…

Next version

In order to get three characters, I obviously needed to be more precise. So I held off on the physical details in an attempt to get the basic composition right. After lots of fiddling and tweaking, I ended up with this

(((Young white man))), 26, facing towards the camera, standing behind (((two young women))), both about 24, facing each other

The brackets are a way to add priority to those elements with strength from 1 (*) to 3 (((*))).

The image I got, wasn’t perfect, but workable and certainly the closest so far.

Refining the prompt

My next step was refining the appearance, which proved equally problematic

(((young white man))), cropped brown hair, in his 20s, facing towards the camera, standing behind (((a young black woman))), ((afro hair)), in her 20s, facing (((a young white woman, curly red hair))), (((the two women are holding hands)))

I got nowhere with this. I usually got images where the man was standing between two black girls. In one a girl was wearing a bikini for some reason. In another she was wearing strange trousers, with one leg shorts, the other full length. I also got one with the composition I wanted, but with three women.

More attempts and tweaks failed. The closest to a usable image was this, using what is called by the app a pop art style. I eventually gave up. If there is a way to generate an image with three distinct figures in it, I have yet to find it.

This section is simply a slideshow of other images generated by the AI in testing. They are in no particular order, but show some of the possibilities, in terms of image quality. If only the image could be better matched to the prompt…

Consumer interest

I relaunched my Etsy shop, to test the market, so far without success. I haven’t put a lot of effort into this, so probably not a fair test. References to sales from the shop are to the previous version. At the time of writing, no AI output has sold. This is the URL:

I also noticed on Etsy, and in adverts on social media, what looks like a developing market in prompts, with offerings of up to 100 in a variety of genre. These are usually offered at a very low price. The differing syntax used by the different apps may be an issue, but I haven’t bought anything to check. I saw, too, a number offering prompts to generate NSFW images. I’m not sure how they bypass the AI companies restrictions. Imagine, at least, seems to vet both the prompt and the output.

Overall Conclusions

It’s art, Jim, but not as we know it

In Part 3, I asked ‘Is AI Art, Art?’ It’s clear that many of those in the AI art community, consider the answer to be yes. They even raise similar concerns to ‘traditional’ artists about posting on social media, the risk of theft etc. The more I look, the more I think they have a point. The art is not I believe in the image alone, but in the entire process. It is not digital painting, it is, in effect, a new art form.

Making the images, getting them to a satisfactory state, is a sort of negotiation with the AI. It requires skill and creative talent. It requires more than simple language fluency, but an analytic approach to the language which allows the components of the description to be disaggregated in a specific way. Making AI art also requires an artistic eye to evaluate and assess the images generated and to evaluate what is needed to refine that image, both in terms of the prompt and the image itself.

The State of the art

As things stand, AI art is far from being a click and go product. Paradoxically, it is that imperfection which triggers the creativity. It means users develop an addictive, game-like mind set, puzzling away at finding just the right form of words. In Part 3, I referred to Wittgenstein and his definition of games. This seemed a way into looking at the many forms taken by art. A later definition, by Bernard Suits, is “the voluntary attempt to overcome unnecessary obstacles.” This could be applied to poetry, for example, with poetic forms like the sonnet.

Writing the prompt is very similar, it needs to fit a prescribed format, with specific patterns of syntax. In this post, I wrote about breaking the creative block by working within self-imposed, arbitrary rules. The imperfect text-to-image system, as it currently stands, is, in effect, the unnecessary obstacle that triggers creative effort.

The future

It seems inevitable that the problems of Human-AI communication will be resolved. AIs will then be able to understand natural language. I don’t know if we will ever get a fully autonomous AI art program. It certainly wouldn’t be high on my To-Do list. We don’t need it. A better AI, able to understand natural language and generate art, without the effort it currently takes, would be a mixed blessing. It would, however minimally, offer an opportunity for creativity to people who, for whatever reason, don’t believe themselves to be creative. On the other hand, too many jobs and occupations have already had the creativity stripped from the by automation and computerisation. Stronger AIs are going to accelerate that process.

It’s easy to say, ‘new jobs will be created’, but those jobs usually go to a different set of people. Development of better, but still weak, AIs will be disruptive. With genuine strong AI, all bets are off. We cannot predict what will happen. It is possible that so many jobs will be engineered away by strong AI that we will be grateful for the entertainment value, alone, of deliberately weak AI art apps and games.

Posted on 2 Comments

AI, Art and AI Art – Part 3

AI generated image, Japanese art style showing two women walking with a tower on the horizon.

This is Part 3 of a series of linked posts considering AI. It looks at AI art from an artist’s perspective. Inevitably it raises the, probably unanswerable, question ‘what is art’.

Part 1 looked at AI in general. Some more specific issues raised by AI art are covered in Part 2. Part 4 will be about my experience with one app, including a gallery of AI-generated images.

Links to each post will be here, once the series is complete.

Is AI Art, Art?

Is the output from an AI art app, actually ‘art’? I’m not sure if that is a debate I want to enter or if there is a definitive answer. Just think of the diversity of forms which are all presented under the banner of Art. What brings together as ‘art’ the cave paintings at Lascaux, Botticelli, Michelangelo, Rembrandt, J M W Turner, David Hockney, Damian Hirst, Alexander Calder, Piet Mondrian, John Martin, Olafur Eliasson, Beryl Cooke, Pablo Picasso, Edward Hopper, Carl Andre, Kurt Schwitters and Roy Lichtenstein? Or any other random list of artists?

One way forward is suggested by the idea of family resemblances. When he considered the similar question “what is meant by a game?” the philosopher Ludwig Wittgenstein used the concept. He argued that the elements of games, such as play, rules, and competition, all fail to adequately define what games are. From this, he concluded that people apply the term game to a range of disparate human activities that bear to one another only what one might call family resemblances. By this he meant that that things which could be thought to be connected by one essential common feature may in fact be connected by a series of overlapping similarities, where no one feature is common to all of them. This approach seems as if it would work for the list above. It is possible to trace a thread of connections which eventually encompasses all of them.

Whether such a thread could be extended to include work made by AI is not clear. I don’t intend to pursue it further here, but it may yet surface as a separate blog post

See also

Thought Experiments

As I wondered how the idea of family resemblance applied to works generated via an AI app, I realised that the act of thinking about something can be as useful as actually reaching a conclusion. Asking open questions without having an answer in mind helps us tease out what things mean, what they involve, and to explore our personal boundaries. This is the approach I’m going to take here, with a series of thought experiments.

Non-human creation

Work by animals has in the past been accepted as art, notably Congo, a chimpanzee and Pockets Warhol, a Capuchin monkey. Congo, in particular, seems to have had some sense of composition and colour. He refused to add to paintings he felt complete. However, it seems that animals cannot own the copyright to their work, at least in the US.

So, is the ability of animals, non-human intelligences, to create comparable with the production of art by computers? If not, what distinguishes one from the other?

One of the criticisms, directed at AI art, is that it lacks human emotion in its creation. That seems to argue against the acceptance of work by Congo or Pockets Warhol as art.

Is it too limited, for other reasons? What about the emotional response which might be experienced by an observer? Is the emotional response to an image comparable to the response we might have to a beautiful view? In the latter case, there is no artist per se.

Alien Art

If human emotion in the creative process is the defining factor in art, can anything created by non-humans, be art in human terms? I don’t believe so. To paraphrase Arthur C Clarke – The rash assertion that man makes art in his own image is ticking like a time bomb at the foundation of the art world. Obviously, if we stick to that view, we also exclude the work made by Congo or Pockets Warhol.

We don’t know whether life exists elsewhere in the universe, let alone intelligent life. But, for the sake of our thought experiment, let’s assume aliens are here on earth and that some of them are, in their terms, artists. For our purposes, let’s also assume that these hypothetical aliens see light in more or less the same range of frequencies as humans.

Going back to Arthur C Clarke, he discussed the potential impact of alien contact on human society in Profiles of the Future, originally published in 1962. Clarke also cites Toynbee’s Study of History. From our own history, we can predict that the response to alien contact would be dramatic. If alien art became known to us, it would inevitably also have an impact.

Such art would, by definition, be beyond our experience. It would be entirely new. We would know little of the cultural context for their art. Nor would we have access to the internal mental dialogue of these alien artists. What drives them is likely to be unknowable. Our relationship with any art they make, can only be an emotional response – how it makes us feel. I suppose it could be argued that we have some common ground with primates, which helps us relate to Congo and Pockets Warhol. Lacking that common ground, would it be possible for humans to respond meaningfully to alien art?

How does your answer sit with arguments about cultural appropriation of art from other human cultures?

Animal or Human?

Closer to reality, suppose we set up a ‘blind viewing’ of work by Congo and work by other human artists.

Would our observer be able to identify which was which? On what basis? Quality or something else? If your answer is based on quality (i.e. good/bad) does that not imply art is only art if it is good? Who decides if it is good?

In case you are wondering, the image on the left is by Joan Mitchell, that on the right is by Congo.

What if Congo was still around and his work used as a dataset to train an AI. That AI then generates work using that dataset. Would our observer be able to identify which was which? What about a three-way comparison, adding AI work to the Congo/human choice above?

See also this:

Untouched by human hands…

Suppose, in some AI development lab, we link up a random text generator to an AI art app. The app is set up to take the random text and to generate an image from it. Each image is then automatically projected for a defined period of time, before the process is repeated with the next generation. Beyond the setup process, there is no human intervention.

What would an observer see? I suspect that, not knowing what was going on behind the scenes, they would see some remarkable images but also much dross and repetition. Isn’t dross and repetition, though, a characteristic component of almost all human endeavours? What does it mean if an AI does the same?

Are the individual images created in this way ‘art’? Would your view change, once you knew the origin of the image?

Ask yourself – what distinguishes an image generated by a human from one of otherwise comparable quality, generated by an AI? What happens if we compare the human dross with the AI dross?

Take a step back. Is the whole set up from text generator to projection equipment and everything in between, a work of art?

Does that view change if the setup moves from the lab to an art gallery, where it is called an ‘installation’? Why? The same human being conceived the idea. (I can assure you I’m not an AI.)

What would happen if a proportion of the images were randomly drawn from real digital works. Would our observer be able to distinguish the ‘real thing’ from the AI images? On what basis would that be made? What does it mean if they can’t separate them?

Original or reproduction?

Suppose, an AI app generates an image which looks as if it might be a photograph of a physical painting. Perhaps this one.

Suppose, further, that a human artist takes that flat image and paints an exact copy down to the last splash of paint.

How would an observer, ignorant of the order of events, see the painting and the digital image? Would it be unreasonable for them to assume the painting was the original and the digital image the copy? What does that say about the idea of the original? What if the AI image was the product of the random text generator? Does your view change if the painter wrote the original text prompt?

A further twist. Suppose that the digital image file was sent instead to a very sophisticated 3D printer to create a physical object that mimicked in every way the painting made by the artist. Where is the original, then?

For a long post on the difference between an original and a reproduction, go here.

Is AI art any good?

That is a question with several aspects.

Is it good, as art?

That can only be answered at all, if you accept the output as art. On the other hand, I don’t think a definition of art based on artistic quality stands up. All it does is shift the definition elsewhere, without answering the original question.

Is it good, technically?

That is almost as hard to answer. Look at this image. Clearly the horse has far too many legs. Is that enough to say it is technically bad?

So what about this image from Salvador Dali? Mere technical adherence to reality is clearly not enough.

Is it good at doing what it claims to do?

This section is based almost entirely on my experience with one app, but from other reading I believe that experience to be typical.

The apps seem to have little difficulty in handling stylistic aspects, provided obviously that those styles form part of the training data. Generally, if you specify say 1950s comics, that’s pretty much what you get.

Other aspects are much less successful. That isn’t surprising if you consider the complexity of the task. What’s probably more surprising is that it works as often as it does.

AI has a known problem with hands, but I found other problems too. A figure would often have more than the standard quota of limbs, often bending in ways that would require a trip to A&E in real life. Faces were often a mess. Two mouths, distorted noses, oddly placed eyes all appear – even without the Picasso filter! Certain combinations of model and style seemed to work consistently better than others.

Having more than one main figure in an image, or a figure with props such as tables or musical instruments, commonly caused problems. Humans in cars, more often than not, had their heads through the windscreen – or the bonnet. Cars otherwise tended to be driverless.

In a group of figures, it is common for limbs to be missing, or to be shared between figures. A figure sitting in a chair might lose a leg or merge into the furniture. If they are holding an object, it might float, or have two hands on it, with a third one elsewhere.

How close does it get, matching the image to the prompt?

In Imagine AI, the app I have using, it is possible to set a parameter which sets fidelity to the prompt against the quality of the image. I’m not sure how fidelity and quality are related, possibly through the allocation of processing resources.

I found getting specified attributes, like gender, ethnicity etc applied to the correct figure to be surprisingly difficult. Changes in word order can result in major changes to the generated image. Sometimes even the number of figures was wrong. Where I succeeded, there was no guarantee that this would be retained in further iterations. Generally, figures in the background of a scene tended to be dressed in similar colours to the main character and to be of the same ethnicity.

Getting variations in the physique of figures seems to be simpler for males than females. It seems very easy for depictions of women to become sexualised, compared to the same prompt used for a male figure. This is presumably a function of the training data.

What about the pictorial qualities?

Despite all the caveats, I have been surprised by the quality of the output, even quasi-photographic images and once the prompt is right, certain painting styles. Some styles still seem more likely to be successful, especially with faces and hands, or involving props like tables. Even so, and probably with some post-processing, much of the output could stand against the work of commercial illustrators and graphic designers, especially at the low cost end of the market. I have already noticed AI imagery in cover design of self-published books on Amazon.

It is the mimicry of techniques like impasto which give me the greatest doubts. I suppose it is early in the development of the field, but I saw no sign of anything which tried to use the essential characteristics of digital media in ways analogous, for example, to the use of grain in film photography. I suppose it could be argued that the widespread availability of reproductions has detached things like representations of impasto from their origins. In addition, digital imagery has been around for a limited period of time compared to traditional photography.

Impact on artists and art practice

As I said in Part 2:

For the future, much depends on the direction of development. Will these apps move towards autonomy, capable of autonomous generation of images on the basis of a written brief from a client? Or will they move towards becoming a tool for use by artists and designers, supporting and extending work in ‘traditional’ media? They are not mutually exclusive, so in the end the response from the market will be decisive.

I’m not sure that I would welcome a fully autonomous art AI. It wouldn’t do anything that humans can’t already do perfectly well. I can however see value in an AI enhanced graphic tool, which would have the capacity to speed up work in areas like advertising, film and TV.

Advertising and graphic design

In situations like this, where a quick turn round is required, I can envisage an AI generating a selection of outline layouts, based on a written brief from a client. This could be refined by say selecting an element and describing the changes needed. A figure could be moved, its pose altered, clothing changed etc. Once the key elements were agreed and fixed in position, the AI could then make further refinements until the finished artwork is generated.

Obviously this process could be managed by non-artists, but would be very much enhanced if used under the direction of an artist, working as part of a team. If the changes were made during a discussion, via a touch screen and verbal instruction, the position of the artist in the team would be enhanced.

Print on Demand

Print on demand services are common. Artists upload their files to a website, which then handles the production and shipping of products made using the uploaded image. Orders are typically taken on the artist’s own website or sites like Etsy. Products typically offered range from prints to clothing to phone cases. AI could contribute at various points in the process.

At the moment, a template has to be set up by the artist for each product they want to offer, which seems a perfect use for AI, probably with fine-tuning by the artist.

Preparing the full description for each product can be a complex process, especially when SEO is taken into account. Again, an AI could take on much of the donkey work, enabling artists to spend more time in making art. It may even be possible to partly automate the production of the basic descriptive text for an image. If an image can be created from text, it should be possible to generate text from an image.


Many department stores offer a selection of images in frames ready to hang. The images themselves are rarely very distinctive and probably available in stores across the country. It is likely that the image forms a significant part of the selling price.

Assuming the availability of an AI capable of generating images to a reasonably high resolution, I can see stores, or even framing shops, offering a custom process.

“Tell us what you want in your picture, and we will print and frame it for you.”


Many artists already work digitally. I can see how an interactive AI as described above under Advertising and Graphic Design could be used to assist. A sketch drawing could be elaborated by an AI, acting effectively as a studio assistant. This could then be refined to a finished state by the artist.

Printmakers can already use digital packages like Photoshop to prepare colour separations for printing or silk screens. It should be possible with an AI to go beyond simple CMYK separations and create multiple files which can be converted into print screens or perhaps used to make Risograph prints.

Testing the AI App

I looked at a range of apps, initially using only free versions and generally only the Android versions. Some of them were seriously hobbled by advertising or other limitations, so couldn’t be properly assessed.

Initially, I played with a series of different prompts to get a feel for how they worked. I then tried some standard prompts on all of them. I finally settled on Imagine, and paid a subscription. I’ll be basing the rest of this post on that app. I’ll include a couple of the worst horrors from others, but I won’t say which package produced them, since in all probability there will have been significant improvements that would make my criticism unfair.

The Imagine app in use.

My aim was as much to see what went wrong, as it was to generate usable images. I wrote prompts designed to push the AI system as far as possible. The prompts brought together named people, who could never have met in real life, and put them in unlikely situations. Some were deliberately vague. Others tried out the effect of male and female versions of the same prompt, different ethnicities and ages. I wrote prompts for single characters, for multiple characters interacting with each other, and for characters with props and/or animals and in different settings. I’ve given some examples below.

Imagine has different models for the image generation engine, plus a number of preset variations or styles. This adds extra complexity, so for some prompts, I ran them with different models, holding the style constant, and vice versa.


Obviously, it isn’t enough to talk about these apps. The only test of their capabilities is to see what they produce. Part 4 will look at a selection, good and bad, of images and offer some thoughts on prompt writing as a creative process.


As with AI in general, AI art raises some interesting moral and philosophical questions. They may not be so fundamental as the Trolley Problem, but they will affect the livelihood of many people and will have a significant social impact. Finding a path through those questions, as the thought experiments show, will not be easy.

Much more quickly, though, we will get apps and packages that do specific jobs. Some are there already – colourising old B&W films for example. These are likely to have significant economic impact.

Posted on 5 Comments

Art, AI and AI art – Part 2

An AI image of Aphrodite rising from the waves, after the original by Botticelli

Introduction to Part 2

This post began as a limited review of what has become known as AI Art. In order to do that, I had to delve deeper into AI in general. Consequently, the post has grown significantly and is now in three parts. Part 1 looked at AI in general. This post, Part 2 will look at more specific issues raised by AI art. Part 3 will look at the topic from the perspective of an artist. Finally, Part 4 will be a gallery of AI-generated images.

This isn’t a consumer review of the various AI art packages available. There are too many, and my budget doesn’t run to subscriptions to enough of them to make such a review meaningful. My main focus is on commonly raised issues such as copyright, or the supposed lack of creativity. I have drawn only on one AI app. Imagine AI, for which I took out a subscription. I tried a few others, using the free versions, but these are usually almost shackled by ads or with only a limited set of capabilities.

Links to each post will be here, once the series is complete.

How do AI art generators work?

What they do, is take a string of text, and from that, generate pictures in various styles. How do they achieve that? The short answer is that I have no idea. So, I asked ChatGPT again! (Actually I asked several times, in different ways) I’ve edited the responses, so the text below is my own words, using ChatGPT as a researcher.

In essence there are several steps, each capable of inserting bias into the output.

Data Collection and Preprocessing

The AI art generator is trained on a large dataset of existing images. This can include paintings, drawings, photographs, and more. Generally, each image is paired with a text that in some way describes what the image is about. The data can be noisy and contain inconsistencies, so a certain amount of preprocessing is required. The content of the dataset is critical to the images that can be produced. If it only has people in it, the model won’t be able to generate images of cats or buildings. If the distribution of ethnicities is skewed, so will be the eventual output.

Selection of Model Architecture

The ‘model’ is essentially the software that interprets the data and generates the images. There are numerous models in use. The choice of model is critical, it determines the kind of images the AI art generator can produce. In practice, the model may have several components. A base model might be trained on a large database of images, while a fine-tuning model is used to direct the base model output towards a particular aesthetic.


During training, the AI model learns to capture the underlying patterns and features of the images in the dataset of artworks. How this is done depends on the model in use. It seems, however, that they all depend on a process of comparing randomly generated images with the dataset and refining the generated image to bring it as close as possible to the original

Generating Art

After training, the AI can be used to generate new art. This is a significant task in its own right. The app’s AI model needs to understand the semantics of the text and extract relevant information about the visual elements mentioned. It then combines the information extracted from the text prompt with its own learned knowledge of various visual features, such as objects, colours, textures, and more. This fusion of textual and visual features guides the model in generating an image that corresponds to the given prompt.

Fine-Tuning and Iteration

There is a skill in writing the text prompts. The writer needs to understand how the text to image element of the app works in practice. In use, therefore, there is often a need for fine-tuning. Artists may adjust the prompt or other parameters to achieve the results they have in mind. Feedback from this process may also help in development and refinement of the model.

Style Transfer and Mixing

Some AI art generators allow for style transfer or mixing. The AI will generate a new image based on the content of a specific piece, but in the style of another.


The generated image may then be subject to further post-processing to achieve specific effects or to edit out artefacts such as extra fingers.

Is it really intelligent?

Many of these apps describe themselves as AI ‘Art’ generators. That is, I think, a gross exaggeration of their capabilities. There is little ‘intelligence’ involved. The system does not know that a given picture shows a dog. It knows that a given image from the training data is associated with a word or phrase, say dog. It doesn’t know that dogs have four legs. Likewise, it doesn’t know anatomy at all. It probably knows perhaps that dog images tend to have the shapes we identify as legs, broadly speaking one at each corner, but doesn’t know why, or how they work, or even which way they should face, except as a pattern.

Importance of training data

Indeed, in the unlikely event of a corruption in the training data, such as identifying every image of a dog as a sheep, and vice versa, the AI would still function perfectly, but the sheep would be herding the dogs. If the dataset did not include any pictures of dogs, it could not generate a picture of a dog.

On top of that, if there is any scope for confusion in the text prompts, these programs will find it. To be fair, humans are not very good at understanding texts either, as five minutes on Twitter will demonstrate. Even so, I’m sure that art AI will get better, technically at least. It will even learn to count limbs and mouths.

Whatever we call it, we know real challenges are coming. Convincing ‘deep fake’ videos are already possible. I’m guessing that making them, involves some human intervention at the end to smooth out the anomalies. That will change, at which point the film studios will start to invest.

We are still a long way from General AI though. An art AI can’t yet be redeployed on medical research, even if some of the pattern matching components are similar.

Is AI art, theft?

These apps do not generate anything. They depend upon images created by third parties, which have been scraped from the web, with associated text. It is often claimed that this dependency is plagiarism or breach of copyright. There are several class-action lawsuits pending in the US, arguing just that.


These claims include:

  • Violation of sections of the Digital Millennium Copyright Act’s (DMCA) covering stripping images of copyright-related information
  • Direct copyright infringement by training the AI on the scraped images, and reproducing and distributing derivative works of those images
  • Vicarious copyright infringement for allowing users to create and sell fake works of well-known artists (essentially impersonating the artists)
  • Violation of the statutory and common law rights of publicity related to the ability to request art in the style of a specific artist

Misconceived claims

It is difficult to see how they can succeed, but once cases get to court, aberrant decisions are not exactly rare. For what it’s worth, though, my comments below. (IANAL)

  • The argument about stripping images of copyright information seems to be based on an assumption the images are retained. If no version of an image exists without the copyright data, how is it stripped?
  • The link between the original data and the images created using the AI seems extremely tenuous and ignores the role of the text prompts, which are in themselves original and subject to copyright protection.
  • A style can not be copyrighted. The law does not protect an idea, only the expression of an idea. In prosaic terms, the idea of a vacuum cleaner cannot be copyrighted, but the design of a given machine can be. If a given user creates images in the style of a known artist, that is not, of itself, a breach of copyright. If they attempt to pass off that image as the work of that artist, it is dishonesty on the part of the user, not the AI company. This is no different to any other case of forgery. Suing the AI company is like suing the manufacturer of inks used by a forger.
  • If style cannot be protected, how can it be a breach to ask for something in that style?

Misconceived claims

Essentially, the claims seem to be based on the premise that the output is just a mash-up of the training data. They argue that the AI is basically just a giant archive of compressed images from which, when given a text prompt, it “interpolates” or combines the images in its archives to provide its output. The complaint actually uses the term “collage tool” throughout, This sort of claim is, of course, widely expressed on the internet. It rests though, in my view, on a flawed understanding of how these programs really work. For that reason, the claim that the outputs are in direct correlation with the inputs doesn’t stand. For example, see this comparison of the outputs from two different AI using the same input data.

As the IP lawyer linked above suggests:

…it may well be argued that the “use” of any one image from the training data is de minimis and/or not substantial enough to call the output a derivative work of any one image. Of course, none of this even begins to touch on the user’s contribution to the output via the specificity of the text prompt. There is some sense in which it’s true that there is no Stable Diffusion without the training data, but there is equally some sense in which there is no Stable Diffusion without users pouring their own creative energy into its prompts.

In passing, I have never found a match for any image I have generated using these apps on Google eye or Tineye. I haven’t checked every one, only a sample, but enough to suggest the ‘use’ of the original data is, indeed, de minimis, since it can’t actually be identified. Usually I would see lots of other AI generated images. I suspect this says more about originality than any claims to the copying of styles.

I suppose, if an artist consistently uses a specific motif, such as Terence Cuneo’s mouse, it could be argued there was a copyright issue, but even then I can’t see such an argument getting very far. If someone included a mouse in a painting with the specific intent of passing it off as by Cuneo, that is forgery, not breach of copyright.

Pre-AI examples

This situation isn’t unique. Long before AI was anything but science fiction, I saw an image posted on Flickr of a covered bridge somewhere in New England. The artist concerned had taken hundreds, perhaps thousands of photos of the same bridge, a well known landmark, all posted on Flickr, and digitally layered them together. He had not sought prior approval. The final image was a soft, misty concoction only just recognisable as a structure, let alone a bridge. The discussion was fierce, with multiple accusations of theft, threats of legal action etc.

In practice, though, what was the breach? No one could positively identify their original work. Even if an individual image was removed, it seems highly unlikely that there would be any discernible impact on the image. I would argue that the use of images from the internet to ‘train’ the AI is analogous to the artist’s use of the original photos of that bridge. In the absence of any identifiable and proven use of an image, there is no actionable breach.

Who has the rights to the image?

An additional complication, in the UK at least, stems from the fact that unlike many countries, the law makes express provision for copyright protection of computer-generated works. Where a work is “generated by computer in circumstances where there is no human author, the author of such a work is “the person by whom the arrangements necessary for the creation of the work are undertaken”. Protection lasts for 50 years from the date the work is made.

It could be argued that in the case of AI art packages, the person making the necessary arrangements is the person writing the text prompt. As yet, that hasn’t been tested in a UK court.

See Also

A paper produced by the Tate Legal and Copyright Department. I can give no assurance it is still current.

Is AI use of training data, moral?

Broader issues of morality are also often raised. There are two aspects to this.

There are moral rights within copyright legislation. Article 6bis of the Berne Convention says:

Independent of the author’s economic rights, and even after the transfer of the said rights, the author shall have the right to claim authorship of the work and to object to any distortion, modification of, or other derogatory action in relation to the said work, which would be prejudicial to the author’s honor or reputation.

If the use of a specific work in an AI generated image cannot be identified or even proven to be there in the first place, it is difficult to believe that its use in that way is ‘prejudicial to the author’s honor or reputation.

Broader morality

There is also a broader moral issue. Is it ethical to use someone else’s work, unacknowledged and without remuneration, to create something else. As with all moral argument, that is without a definitive answer. This Instagram account is interesting in that respect.

There is a fine line between taking inspiration and copying. That line is not changed by the existence of AI. Copying of artistic works has a long tradition. As Andrew Marr says in this Observer article, “the history of painting is also the history of copying, both literally and by quotation, of appropriation, of modifying, working jointly and in teams, reworking and borrowing.”

The iconic portrait of Henry VIII is actually a copy. The original, by Hans Holbein, was destroyed by fire in 1698, but is still well known because of the many copies. It is probably one of the most famous portraits of any English or British monarch. Copying of other works has also been a long-standing method of teaching.

Is it acceptable to sell copies of other peoples work?

That of course begs the question of whether AI art is a copy. Setting that aside, it also takes us back to the issue of forgery, or the intent of the copyist. For many years, the Chinese village of Dafen is supposed to have been the source of 60% of all art sold worldwide. Now the artists working there are turning to making original work for the Chinese market. Their huge sales of copies over the decades, suggests that buyers have no objection to buying copies. None of those sales pretended to be anything but.

Giclée a scam?

Many artists sell copies of their own work, via so-called ‘giclée’ (i.e. inkjet) reproductions. The marketing of these reproductions often seems to stray close to the line, with widespread use of empty phrases, even if impressively arty sounding – ‘limited edition fine art print’ and similar. I’ve even seen a reputable gallery offering a monoprint in an edition of 50. There was no explanation, in the description, of how this miraculous work of art was made. It was of course an inkjet reproduction. To be accurate, there was an explanation, but it was on a separate page. There was no link from the sale page.

Ignoring the fact that these are not prints as an artist-printmaker would expect to see them, the language and marketing methods used, are designed to obscure the fact that these are not, of themselves, works of art, but copies of works of art.

In that context, I believe an anonymous painted copy of a Van Gogh to be more honest about what it is, than an inkjet reproduction of an oil painting, by a modern minor artist. It at least has some creative input directly into it, whereas the reproduction is pretty much a mechanical copy. I’ll return to this in Part 3.

Bias in AI art

The possibility of bias in AI in general is a real cause for concern. In the specific case of AI art, the problem may be less immediately obvious, but as AI art is used more widely, the representations it generates will become problematic if they are biased towards particular groups or cultures. One remedy would be to increase transparency about data sources. If the datasets used to train AI models are not representative or diverse enough, the AI output is likely to be biased or even unfair in the representations created.

Issues likely to affect the dataset include:

  • A lack of balance in the representation of gender, age, ethnicity and culture
  • A lack of awareness of historical bias, which can then become replicated in the AI
  • If labels attached to images during preprocessing or dataset creation inaccurately describe the content of images or are influenced by subjective judgments, these biases may be perpetuated in the model.
  • Changes to the AI model after deployment may introduce bias if not properly managed and documented.

Lack of transparency may lead to other problems:

  • AI systems, often work as “black boxes,” They provide results without explaining how those results were obtained.
  • Difficulty in meeting regulatory requirements such as on data sources
  • Poor documentation of the data sources and data handling procedures, preprocessing steps, and algorithms used in the AI.
  • Inability to demonstrate the existence of clear user consent mechanisms, and adherence to data protection regulations (e.g., GDPR)

These can all lead to poor accountability and lack of trust.

How does this relate to AI?

AI, as it currently stands, does not copy existing works. Nor does it collage together parts of multiple works. Somehow, and I do not pretend to understand the technical process, it manages to generate new images. They may become repetitive. They may, especially the pseudo-photographs, reveal their AI origin, but despite all that they somehow produce work which is not a direct copy – i.e. original.

For the future, much depends on the direction of development. Will these apps move towards autonomy, capable of autonomous generation of images on the basis of a written brief from a client? Or will they move towards becoming a tool for use by artists and designers, supporting and extending work in ‘traditional’ media? They are not mutually exclusive, so in the end the response from the market will be decisive.