Stable Diffusion & Midjourney: Full Review & Comparison!🚀🌟

AI Samson
28 Nov 202205:42

TLDRIn this comparison, Mid-Journey's AI-generated art exhibits greater narrative, coherence, and anatomical accuracy compared to Stable Diffusion across various prompts, from portraits to landscapes. While Stable Diffusion shows improvement in stock photo quality, it lacks the aesthetic maturity and depth seen in Mid-Journey's outputs, which often carry a melancholic yet engaging tone. Despite advancements, Stable Diffusion's outputs are sometimes rudimentary and lack the intricate detail and composition found in Mid-Journey's creations.


  • 🌌 Mid-journey AI creates a more narrative-driven piece with a dream of a distant galaxy, including characters and context.
  • 💏 In the portrait of an elegant fantasy couple, mid-journey demonstrates better consistency in facial features and anatomy compared to stable diffusion.
  • 👩 A tired woman in a Valentino gown by mid-journey is depicted with more engaging composition and feeling, despite tiny hands.
  • 🤖 Stable diffusion's output tends to be more abstract and less coherent, as seen in the fantasy cyberpunk princess comparison.
  • 🏋️‍♀️ Mid-journey's depiction of a character with remarkable abs shows better symmetry and background composition, leading the viewer's gaze effectively.
  • 🌟 The absence of nudity and celebrities in stable diffusion's data set may have impacted its ability to accurately render anatomy.
  • 🐯 In the stock photo comparison of a lion, stable diffusion's performance is closer to mid-journey, but still lacks the underlying taste and aesthetic.
  • 🎨 Mid-journey's approach to art often has a melancholic feel, resonating with deeper human emotions and exploring the shadows within us.
  • 📸 Stable diffusion seems to excel in creating generic, overexposed, and unrealistic images, akin to typical stock photos.
  • 🏞️ While stable diffusion improves in landscapes, it does not reach the same level of depth and emotional resonance as mid-journey's Icelandic beach scene.

Q & A

  • What was the main purpose of the comparison between Mid-Journey and Stable Diffusion in the transcript?

    -The main purpose was to evaluate and compare the performance of both AI systems in generating images based on the same prompts, covering various themes from portraits to landscapes.

  • How did the narrative quality of the 'dream of a distant galaxy' image differ between Mid-Journey and Stable Diffusion?

    -Mid-Journey included a character with a narrative, looking into the space odyssey, while Stable Diffusion's output was more garish and less coherent, lacking a clear narrative.

  • What was observed about the consistency in facial features and anatomy in the 'elegant fantasy couple kissing' image?

    -Mid-Journey showed greater consistency in facial features and anatomy, with accurate input of details like five fingers to a hand, whereas Stable Diffusion's image had less coherence in the anatomy.

  • What was the main critique about the 'tired woman in a Valentino gown' image produced by Stable Diffusion?

    -The main critique was that the woman's hands looked more like a trotter than a pair of hands, and the overall composition was more abstract compared to Mid-Journey's more engaging piece.

  • How did the 'fantasy cyberpunk princess' image demonstrate the strengths of Mid-Journey over Stable Diffusion?

    -Mid-Journey's image had remarkable abs, wonderful symmetry, and leading lines that directed the viewer's gaze effectively, while Stable Diffusion's composition was less detailed and its anatomy was less accurate.

  • What was noted about the likeness of the celebrity, Timothée Chalamet, in the outputs of both AI systems?

    -Mid-Journey's output provided a greater likeness to Timothée Chalamet, despite using an older dataset. Stable Diffusion also managed to create a likeness, indicating some residual information in its dataset.

  • How did the comparison of a stock photo of a lion show the strengths of Stable Diffusion?

    -Stable Diffusion's lion image was very realistic and could be mistaken for a real photo, showing its strength in creating realistic images, especially in the stock photo area.

  • What was the general critique about Stable Diffusion's output in terms of aesthetics?

    -Stable Diffusion's images were considered more rudimentary, immature, and lacking an aesthetic eye, often producing generic images similar to those found on stock sites.

  • What emotional tone was often observed in Mid-Journey's images?

    -Mid-Journey's images often had a slightly melancholic feel, reflecting a deeper exploration of the human experience and emotions.

  • In the final landscape comparison, how did the Icelandic Beach image produced by Mid-Journey differ from Stable Diffusion's?

    -Mid-Journey's Icelandic Beach image was more engaging and of higher quality compared to Stable Diffusion's, which, while improving, was not at the same level as Mid-Journey in terms of landscape composition.

  • What was the speaker's final verdict on using Mid-Journey and Stable Diffusion for their work?

    -The speaker decided to continue using Mid-Journey for their work due to its superior performance in creating aesthetically pleasing and coherent images.



🎨 Artistic Comparison of AI-Generated Images

This paragraph presents a comparative analysis of AI-generated images using two models: Mid-Journey and Stable Diffusion. The comparison spans various themes, such as portraits, landscapes, and fantasy scenes. It highlights the strengths and weaknesses of each model in terms of narrative coherence, anatomical accuracy, and aesthetic appeal. The discussion includes specific examples, such as a dreamy galaxy scene, an elegant fantasy couple, a tired woman in a Valentino gown, a cyberpunk princess, a celebrity portrait of Timothée Chalamet, a lion stock photo, and an Icelandic beach landscape. The summary notes that while Stable Diffusion shows promise in certain areas, Mid-Journey demonstrates greater consistency and maturity in its outputs, particularly in capturing emotional depth and creating more engaging compositions.


🏞️ Evaluation of AI Art in Landscapes and Still Life

In this paragraph, the focus shifts to evaluating the performance of AI art models, specifically Stable Diffusion and Mid-Journey, in creating landscapes and still life images. The comparison reveals that while Stable Diffusion has improved in these areas, it still lags behind Mid-Journey in terms of anatomical accuracy and consistency. The speaker expresses a personal preference for Mid-Journey due to its more aesthetically pleasing and emotionally resonant outputs. The paragraph concludes with the speaker's intention to continue using Mid-Journey for their work and invites the audience to share their thoughts and preferences for future developments in AI art. The speaker, Samson Bowles, signs off with a positive note, highlighting the delightful aspects of design and personal enjoyment.




Refers to an AI system being evaluated in the video, which is noted for its ability to create images with a greater narrative and coherence. It is compared favorably to 'stable diffusion' in terms of generating more engaging and aesthetically pleasing compositions, as seen in the examples provided, such as the dreamy distant galaxy and the fantasy cyberpunk princess.

💡stable diffusion

An AI system that is being compared to 'mid-journey' in the video. It is described as producing images that are less coherent and more garish, with less attention to anatomy and consistency. Despite its shortcomings, it is noted that 'stable diffusion' is improving in areas such as landscapes and stock photos.


In the context of the video, 'narrative' refers to the storytelling element present in the AI-generated images. A strong narrative is seen as a positive attribute, as it adds depth and engagement to the artwork. 'Mid-journey' is praised for having a greater narrative in its creations, which helps to draw the viewer into the scene and understand the story being told.


Refers to the accurate depiction of the human body's structure in the AI-generated images. The video discusses the importance of anatomical correctness and consistency, with 'mid-journey' being favored for its more accurate representation of human anatomy, such as the number of fingers on a hand or the proportions of the body.


In art, 'composition' refers to the arrangement of elements in a work of art and the overall visual effect it creates. The video highlights the importance of a well-composed image, with 'mid-journey' being praised for its coherent and engaging compositions that lead the viewer's gaze and create a focal point.


Refers to the visual appeal and artistic beauty of the AI-generated images. The video discusses how 'mid-journey' produces images with a more refined and pleasing aesthetic, in contrast to 'stable diffusion', which is seen as producing more generic and rudimentary images.


In the context of the video, 'melancholic' describes a tendency for 'mid-journey' to create images with a somewhat sad or reflective tone, which is seen as a positive quality. It suggests a depth and exploration of the human condition that resonates with viewers on a deeper level.


The video discusses the impact of the removal of nudity and celebrities from the data set used by 'stable diffusion'. It explores the idea that even without explicit celebrity data, there is still a residual likeness that can be generated, as seen in the example of Young Timothy Chalamet.


Refers to the depiction of natural or urban scenes in the AI-generated images. The video notes that while 'stable diffusion' has improved in creating landscapes, it still does not reach the same level of quality and engagement as 'mid-journey'.

💡stock photos

In the context of the video, 'stock photos' refers to generic, pre-existing images that are often used for commercial purposes. The discussion highlights 'stable diffusion's' ability to create images that closely resemble stock photos, both in terms of quality and style.

💡Young Timothy Chalamet

Timothy Chalamet is a real-life actor whose likeness is used as an example in the video to illustrate the capabilities of the AI systems. The discussion around his image touches on the concept of data sets, likeness, and the impact of the removal of celebrities from AI data.


Comparative analysis of mid-journey and stable diffusion AI art generation

Mid-journey's art has a greater narrative quality, as seen in the dream of a distant galaxy piece

Stable diffusion produces more garish and less coherent outputs compared to mid-journey

In the portrait of an elegant fantasy couple, mid-journey maintains consistency in facial features and anatomy

Stable diffusion's output of the fantasy couple lacks detail and has anatomical inconsistencies

Mid-journey's depiction of a tired woman at a roadside diner is more engaging and realistic

Stable diffusion's version of the woman appears more abstract and less human-like

Mid-journey's fantasy cyberpunk princess has remarkable abs and a well-composed background

Stable diffusion's cyberpunk princess lacks intricacy and has anatomical failings

Mid-journey's output of young Timothée Chalamet retains a likeness despite using an older dataset

Stable diffusion's output of Chalamet shows a residual likeness, but with a more boyish appearance

Stable diffusion's performance is comparable to mid-journey in creating a realistic lion stock photo

Stable diffusion's images are often generic and lack the aesthetic quality of mid-journey's outputs

Mid-journey's art tends to have a melancholic feel, resonating with deeper human emotions

The Icelandic beach landscape by mid-journey is superior to stable diffusion's in terms of depth and engagement

While stable diffusion improves in landscapes and stock photos, it regresses in anatomy and consistency

The speaker, Samson Bowles, prefers mid-journey for its aesthetic and emotional depth

The talk concludes with an invitation for the audience to share their thoughts and preferences