英语轻松读发新版了,欢迎下载、更新

ChatGPT or DeepSeek: Which AI platform creates the most realistic images

2025-03-01 13:01:00 英文原文

作者:By Ben Khalesi

Artificial Intelligence (AI) has reshaped digital art and creative design. Generative AI helps you generate stunning artwork on your tablet and Chromebook. This guide examines Janus-Pro-7B (DeepSeek) and DALL·E 3 (ChatGPT), comparing which generates realistic images.

The initials 'AI' against a light gray background with the Android Police logo visible

Related

What is generative AI?

An agent of the human will, an amplifier of human cognition. Discover the power of generative AI

DALL·E 3 uses diffusion modeling and ChatGPT to generate images

DALL·E 3 is a generative model with a diffusion-based decoder trained on vast multimodal datasets. This allows it to generate detailed images across diverse artistic styles. A key advancement in DALL·E 3 is its tight integration with ChatGPT, language processing, and large-scale transformers.

This allows it to parse complex descriptions with a higher level of semantic accuracy. Unlike multimodal architectures that understand and generate images, DALL·E 3 is optimized for generative tasks and lacks an image processing pipeline. ChatGPT understands images because OpenAI deploys additional vision models. OpenAI integrates separate vision models that process and analyze images.

Janus-Pro-7B separates understanding and image generation with a dual-encoder design

Janus-Pro-7B is a generative model by DeepSeek with 7 billion parameters. The neural networks in Janus-Pro-7B are trained for precise, structured outputs. Its decoupled architecture separates visual understanding from text-to-image generation. Unlike DALL·E 3, which only produces images, Janus-Pro-7B processes and generates images and text.

There are two specialized encoders instead of one. The Understanding Encoder analyzes images, identifies objects, and interprets relationships. It looks at a picture, analyzes what's in it (objects, people, or scenes), and turns that into meaningful text. The Generation Encoder converts a description into visual elements, allowing the model to generate images based on text prompts.

Comparing realism in AI-generated images from Janus-Pro-7B and DALL·E 3

Prompt: A realistic photo of a potted cactus and a bicycle.

The first image generated by DALL·E 3 shows overly controlled lighting and a lack of natural imperfections essential for realism. Even after refining the prompt for more realism, DALL·E 3 did not match the quality of the DeepSeek. It also added an extra plant and a vintage camera, which were not specified in the prompt. This shows a tendency to take creative liberties instead of strictly adhering to realism.

Meanwhile, Janus-Pro-7B generated a single potted cactus with a blurred background, producing a natural photographic quality. The depth of field, lighting, and textures in the Janus-Pro-7B image feels authentic. It has realistic reflections, especially on the bicycle. Overall, Janus-Pro-7B delivers higher realism by maintaining accuracy and fidelity to the prompt.

Comparing spatial positioning in DALL·E 3 and Janus-Pro-7B

Prompt: An image of a black dog on the left, a cat in the middle, and a mouse on the right.

The first image generated by ChatGPT depicts an outdoor scene with a black dog, cat, and mouse naturally positioned. Although the prompt specifies a structured left-to-right arrangement, the image loosely follows the layout.

DeepSeek precisely follows the prompt's spatial instructions, positioning the black dog on the left, the cat in the middle, and the mouse on the right. Both images are cartoonish, but DeepSeek's output is lower resolution and less refined. Again, deepSeek strictly follows the prompt's spatial positioning, while ChatGPT's model introduces artistic liberties that modify the layout.

Comparing DALL·E 3 and Janus-Pro-7B with multiple elements in complex prompts

Prompt: A fluffy orange cat with green eyes lounging on a stone pathway in a Japanese garden.

Models processing dense prompts interpret multiple elements, constraints, and style details to generate images. In benchmark testing, Janus-Pro-7B scored 84.19 on the DPG-Bench, and DALL-E 3 scored 83.50, showing a similar ability to create complex scenes.

However, comparing them on this dense prompt shows differences in interpretation and refinement. DALL-E 3 includes nearly all elements, including cherry blossoms, a stone pathway, and a Japanese garden with a pagoda and bridge. However, despite an impressive composition, the cat lacks realism.

DeepSeek covers most elements but misses key cultural markers. Additionally, DeepSeek has a lower resolution than DALL-E 3. Despite this, DeepSeek wins again because it adheres more closely to the realistic depiction of a fluffy orange cat, even if it sacrifices some background complexity.

Comparing DALL·E 3 and Janus-Pro-7B in color accuracy

Prompt: A composition featuring a bright yellow banana, a deep red apple, a rich blue ceramic mug, and a green pear, all placed on a white marble table.

Color accuracy is a key difference between DeepSeek (Janus-Pro-7B) and DALL-E 3. The DeepSeek banana shows a natural, balanced, yellow tone, whereas DALL-E 3 seems waxy. DeepSeek's ceramic mug displays a soft, muted blue, while DALL-E 3 renders it in deep teal. Both models depict a naturally textured red apple. For the pear, DALL-E 3 introduces color variation with hints of orange, while DeepSeek's pear appears more uniform.

Lighting affects color perception. DeepSeek uses softer daylight effects, keeping colors realistic, while DALL-E 3 uses harsher lighting and higher contrast, resulting in vivid but less natural colors. DeepSeek (Janus-Pro-7B) demonstrates superior color realism, particularly for the ceramic mug, while DALL-E 3 prioritizes a high-contrast, stylized look that distorts color accuracy.

a pictogram of an image against an abstract background

Related

Final Verdict: DALL·E 3 for Creativity, Janus-Pro-7B for Realism

Choosing between DALL·E 3 and Janus-Pro-7B depends on your creative needs. DALL·E 3 delivers refined outputs with vibrant colors for artistic flexibility. If you prioritize realism, accurate spatial positioning, and prompt adherence, Janus-Pro-7B produces a natural photographic style.

关于《ChatGPT or DeepSeek: Which AI platform creates the most realistic images》的评论


暂无评论

发表评论