a multimodal AI search engine Devoteam Rebirth Blog


Reading time : 3 minutes

We often ask ourselves the question of the impact of AI on the creative and artistic professions. As we are optimistic technophiles, we tried to approach the problem from the reverse side: how could generative AI help us facilitate access to art? It is with this in mind that we developed the Art Search Engine prototype presented at the Innovation Corner of the AWS Paris 2024 Summit.

Art Search Engine relies on a public dataset composed of 60,000 worksin 10 different artistic styles: art nouveau, baroque, expressionism, impressionism, renaissance, etc. Based on a user request, Search AI can perform several actions:

  • Generate an image text based
  • Generate a query based on user preferences, such as artistic styles
  • Go search in the database similar works to a text request or an image request
  • Generate a description and a title for the work created

To do this, this multimodal engine uses different LLMs through Amazon Bedrock: Stable Diffusion, Claude, Titan.

To develop this prototype, the team made up of Thomas Ounas, Shield of N’Bouyaa And Fabien Lallemand completed the different stages of the project in less than a month: testing the model APIs, setting up the database connections, and deploying the infrastructure supporting the prototype.

Application architecture

The architecture is structured as follows:

  • Amazon API Gateway to trigger the StepFunction
  • AWS StepFunctions to implement the orchestration of different tasks
  • AWS Lambda to perform each action in the StepFunction workflow
  • Amazon Bedrock for access to the different LLMs (Stable Diffusion XL, Claude 2.1, Titan, Claude 3 Sonnet)
  • Amazon OpenSearch (serverless) used as Vector Store for storing image embeddings and certain metadata
  • Amazon DynamoDB for caching and storing information
  • Amazon S3 for storing raw images of the different artworks in the dataset
  • Amazon SES for sending results by email
  • AWS System Manager Parameter Store for storing certain parameters, such as prompts used for LLMs depending on different tasks

The different models used make it possible to respond to the terms of the user request: Stable Diffusion generates images, Claude 3 the descriptions and titles of the works, Claude 2.1 generates a prompt, etc.

Architecture diagram

Workflow step function

Introduction to GenAI

This prototype, developed in a very short time despite technical constraints (mainly the selection and testing of models, the load test in anticipation of the Summit), offers a fun introduction to GenAI.

It also allows us to better understand the benefit of a service like Amazonian substrate in the development of applications based on LLMs. This solution was also presented at AWS London Summit as part of the AWS Generative AI Accelerator.



Technology

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top