Will Artificial Intelligence (AI) Take over 3D Visualization? I tested Dall-E-2 to find out.

Background

We’ve been hearing that artificial intelligence (AI) will dominate our future for decades now. And while we may be numb to some of these predictions in movies, games and pop culture, AI has been creeping into our lives in recent years, whether we know it or not. Machine learning has been especially effective at augmenting technology for everyday use with things like Siri, Google Photos (image recognition), self-driving vehicles and more. 

But even with these advancements, it has been easy to dismiss the potential of AI as it improves in future years. Sure, it can help us with specific tasks, but will it ever become close to human intelligence? Sure, it can handle and augment automated tasks, but creative tasks will always be dominated by humans, right?

Right?

In recent months, even the creative side of the AI argument has taken a huge blow as OpenAI’s Dall-E 2 has been slowly rolled out to the public and companies like Google and Facebook have announced similar developments in AI research. Often called ‘prompt to image’ or ‘text to image’, this technology flips the script with image to text available in recent years. Facebook can easily identify your face in any image, and Google can identify nearly any object in any image and make them searchable using image learning. 

I’ll leave the detailed description of how prompt to image technology works to others (who understand it better than I) but, put simply, training these systems on billions of images with text descriptions have made it possible to generate mind-blowing, photorealistic images in seconds.

I’ve been watching this from afar over the past few months, considering what effect this and similar technologies will have on our specific corner of the world (3D visualization in civil and transportation). So when I finally received access to Dall-E 2 (the other platforms haven’t allowed public access yet) this past week, I decided to put it through its paces to see how close AI is to augmenting or taking over visualization in our industry. 

Predictions

I’ve seen some early comparisons of Dall-E 2 to human work with varying results. I’ve also seen many comments of artists arguing that this technology is far away from the true human touch and will only serve as another tool to augment true human artists.

All that said, I want to be clear that I believe this technology will absolutely disrupt the world. It will put millions out of jobs and spread well past images and into movies, music, video games and other creative creations. Sure, an artist can have control over a creation that AI may never fully have, but if we can generate dozens of images of whatever we can imagine in minutes in the year 22, what chance will artists have in the year 2030? My children will likely grow up in a world where they will speak an idea for a movie and instantly be able to watch it, with quality surpassing our current blockbusters.

So yeah, I’m bullish on this technology and have been curious how it may augment or even disrupt my own business and industry.

I was confident that nothing currently available would replace our products now or any time soon, but my bullish perspective prepared me to see some surprising results.

That said, SPOILER ALERT, I was barely able to produce anything that wasn’t pure visual garbage. 

Transportation Visualization Results

Dall-E 2 can create images in seconds from any text prompt, which was my first test. It also allows you to edit or recreate previously rendered images, which I tested later.

First, a simple test: 

a roundabout connecting a 4 lane road with a 2 lane road

From a distance, these look believable. Upon closer inspection, however, these images and the associated designs, really fall apart. The striping is wrong. The roads would never work. The images- and the roads- just make no sense.

I did a few similar tests with similar results. Dall-E 2 didn’t know what a diverging diamond interchange was, nor how to represent it. 

a diverging diamond interchange with a freeway going over the top

Beyond Typicals Style Results

While Beyond CAD is a more general civil visualization tool capable of presenting nearly any type of project, Beyond Typicals is a much more specific type of visualization and aesthetic, so I thought the results might be better. However, the results were similar.

3d cross section of a 4 lane road with bike lanes going in each direction, palm trees in the median
a 3d cross section rendering of a street with two lanes in each direction and a median with palm trees in a downtown with tall buildings
50 foot road section showing the earth beneath the road with two lanes in each direction and a train going down the middle

Non-Specific Results

I was determined to show that Dall-E 2 could produce useable results in some manner, so I tried a different tactic. If the detail of transportation projects was too complex to handle, what if I did something more narrative that I might be able to use as part of a proposal? This was where Dall-E 2 came closest to producing something close to useable, but most of these were similar visual confusion. 

a cyclist with helmet riding on a busy downtown street in the designated bike lane, golden hour

While the three pictures on the right are not just wrong but dangerous (driving head-on with a truck?!), the picture on the left is almost usable from a distance. Never mind the running a red light nor the floating streetlights or the garbled detail of the bike and rider, but it does seem to be almost passable and with enough prompts and testing images like these might take the place of stock images.

The best image I was able to create with Dall-E 2 related to transportation
a roundabout with a few cars and people and trees and flowers in the style of thomas kinkade

Image Editing

Alright, so Dall-E 2 clearly struggles with the specifics and details of transportation, but what if we used it to edit previously rendered images? I used a variety of Beyond CAD and Beyond Typicals renders and used both the ‘create similar images’ feature and the ‘inpainting’ feature to see how it would far. Again, no matter what I threw at it, the results were consistently disappointing. Oh, and I should probably point out that Dall-E 2 rarely handles text well, which is a problem for any sort of sign (Google Imagen has shown better results with text)

Beyond CAD render on the far left with the remaining similar images generated by Dall-E 2
Beyond CAD render on the far left with the remaining similar images generated by Dall-E 2
Beyond CAD render on the far left with the remaining similar images generated by Dall-E 2
Beyond Typicals render on the far left with the remaining similar images generated by Dall-E 2

Inpainting

Inpainting allows the Dall-E 2 user to replace a specific part of an image with AI generated parts. It seemed confused at both what the original image was and what I was trying to replace in these tests.

Trying to use inpainting to replace the palm trees with conifer trees
Trying to use inpainting to replace the rail lane with a bus lane

Non-Transportation Results

I want to be clear that using Dall-E 2 outside these transportation related tests has been mostly a delight. It can absolutely produce spectacular images in seconds. Below are some of my non-transportation related favorites (with some influence from my kids).

DALL·E 2022-08-01 08.53.07 - high school basketball player holding a ball, comic book style
DALL·E 2022-08-01 08.53.33 - 8 year old boy hitting a golf ball, golden hour, oil painting
DALL·E 2022-08-01 08.56.45 - a snow rattlesnake
DALL·E 2022-08-01 12.02.11 - a lamb crossed with a tiger as a stuffed animal

Conclusion

Dall-E 2 seems to be the forefront of a revolutionary and disruptive technology. It will quickly shake up industries like stock photos, product design, thumbnail generation, art production and more. 

But what about 3D visualization in civil and transportation? Well, I initially thought this industry would start to see this technology used within the next few years, but after testing myself it seems further off. The unique detail and specific variables associated with these complex projects are just too much for AI to handle now and in the near future. Design elements like striping, lane design, landscape and aesthetics and more just confuse the system. And while Dall-E 2 and similar technologies can handle complex elements like lighting, refraction, shadows, hair and more, it has been trained on those subjects over billions of pictures with consistent results while roads and highways are neither as common nor consistent. Something like a 3D typical section is difficult for Dall-E to handle because there are probably so few of them in the training photos and even those probably aren’t all labeled consistently. 

Oh, and this is just as it relates to image renders. Looking for a full 3D flythrough of your project? That is much further out than the simple image renders, not to mention real-time or virtual reality 3D visualization.

I would imagine that inpainting (and similar technologies) will be the first uses of this technology in conjunction with 3D CAD design and visualization. And inasmuch as this relates to Beyond CAD, Beyond Typicals and any other future products we have? We will definitely keep our eye out to integrate these tools with our own products for the best and most efficient results. 

AI will is changing the world and these technologies prove that it may happen faster than we previously imagined. All of that said, however, a very specific use and niche like what is made possible with Beyond CAD and Beyond Typicals will still need the talented engineers, CAD designers, artists and software for the foreseeable future.