a multimodal AI search engine Devoteam Rebirth Blog

Reading time : 3 minutes

We often ask ourselves the question of the impact of AI on the creative and artistic professions. As we are optimistic technophiles, we tried to approach the problem from the reverse side: how could generative AI help us facilitate access to art? It is with this in mind that we developed the Art Search Engine prototype presented at the Innovation Corner of the AWS Paris 2024 Summit.

Art Search Engine relies on a public dataset composed of 60,000 worksin 10 different artistic styles: art nouveau, baroque, expressionism, impressionism, renaissance, etc. Based on a user request, Search AI can perform several actions:

Generate an image text based
Generate a query based on user preferences, such as artistic styles
Go search in the database similar works to a text request or an image request
Generate a description and a title for the work created

To do this, this multimodal engine uses different LLMs through Amazon Bedrock: Stable Diffusion, Claude, Titan.

To develop this prototype, the team made up of Thomas Ounas, Shield of N’Bouyaa And Fabien Lallemand completed the different stages of the project in less than a month: testing the model APIs, setting up the database connections, and deploying the infrastructure supporting the prototype.

Application architecture

The architecture is structured as follows:

Amazon API Gateway to trigger the StepFunction
AWS StepFunctions to implement the orchestration of different tasks
AWS Lambda to perform each action in the StepFunction workflow
Amazon Bedrock for access to the different LLMs (Stable Diffusion XL, Claude 2.1, Titan, Claude 3 Sonnet)
Amazon OpenSearch (serverless) used as Vector Store for storing image embeddings and certain metadata
Amazon DynamoDB for caching and storing information
Amazon S3 for storing raw images of the different artworks in the dataset
Amazon SES for sending results by email
AWS System Manager Parameter Store for storing certain parameters, such as prompts used for LLMs depending on different tasks

The different models used make it possible to respond to the terms of the user request: Stable Diffusion generates images, Claude 3 the descriptions and titles of the works, Claude 2.1 generates a prompt, etc.

Architecture diagram

Workflow step function

Introduction to GenAI

This prototype, developed in a very short time despite technical constraints (mainly the selection and testing of models, the load test in anticipation of the Summit), offers a fun introduction to GenAI.

It also allows us to better understand the benefit of a service like Amazonian substrate in the development of applications based on LLMs. This solution was also presented at AWS London Summit as part of the AWS Generative AI Accelerator.

Technology

a multimodal AI search engine Devoteam Rebirth Blog

Application architecture

Architecture diagram

Workflow step function

Introduction to GenAI

Leave a Reply Cancel reply