Published on

Marqo: An End-to-End Vector Search Engine

Authors
  • avatar
    Twitter

Marqo: An End-to-End Vector Search Engine

Marqo is an end-to-end vector search engine that provides state-of-the-art embeddings, high-performance search capabilities, and support for multimodal data. With Marqo, you can easily generate, store, and retrieve vectors using a single API, without the need to bring your own embeddings.

Core Features

🤖 State of the art embeddings: Marqo allows you to use the latest machine learning models from PyTorch, Huggingface, OpenAI, and more. You can start with a pre-configured model or bring your own. Marqo also supports ONNX models for faster inference and higher throughput, and it provides CPU and GPU support.

Performance: Marqo stores embeddings in in-memory HNSW indexes, which enables cutting-edge search speeds. You can scale to indexes with hundreds of millions of documents by using horizontal index sharding. Marqo also supports async and non-blocking data upload and search operations.

🌌 Documents-in-documents-out: Marqo handles vector generation, storage, and retrieval out of the box. You can build search, entity resolution, and data exploration applications using text and images. Marqo allows you to build complex semantic queries by combining weighted search terms and filter search results using its query DSL. You can also store unstructured data and semi-structured metadata together in documents, using a range of supported datatypes like booleans, integers, and keywords.

🍱 Managed cloud: Marqo offers a managed cloud solution that provides low-latency optimized deployment, scalability, high availability, 24/7 support, and access control. You can easily scale your inference capabilities with just a few clicks.

Integrations

Marqo is integrated into popular AI and data processing frameworks, making it easy to incorporate vector search capabilities into your existing workflows. Here are a couple of notable integrations:

🛹 Griptape: Griptape enables the safe and reliable deployment of LLM-based agents for enterprise applications. The MarqoVectorStoreDriver integration allows these agents to access scalable search capabilities with your own data. With this integration, you can leverage open source or custom fine-tuned models through Marqo to deliver relevant results to your LLMs.

🦜🔗 Langchain: The Langchain integration allows you to leverage open source or custom fine-tuned models through Marqo for LangChain applications with a vector search component. The Marqo vector store implementation can plug into existing chains such as the Retrieval QA and Conversational Retrieval QA.

Getting Started

To get started with Marqo, follow these steps:

  1. Install Docker by visiting the Docker Official website. Make sure Docker has at least 8GB of memory and 50GB of storage.

  2. Run the following command to start Marqo:

docker rm -f marqo
docker pull marqoai/marqo:latest
docker run --name marqo -it --privileged -p 8882:8882 --add-host host.docker.internal:host-gateway marqoai/marqo:latest
  1. Install the Marqo client:
pip install marqo
  1. Start indexing and searching. Here's a simple example:
import marqo

mq = marqo.Client(url='http://localhost:8882')

mq.create_index("my-first-index")

mq.index("my-first-index").add_documents([
    {
        "Title": "The Travels of Marco Polo",
        "Description": "A 13th-century travelogue describing Polo's travels"
    },
    {
        "Title": "Extravehicular Mobility Unit (EMU)",
        "Description": "The EMU is a spacesuit that provides environmental protection, "
                       "mobility, life support, and communications for astronauts",
        "_id": "article_591"
    }],
    tensor_fields=["Title", "Description"]
)

results = mq.index("my-first-index").search(
    q="What is the best outfit to wear on the moon?", searchable_attributes=["Title", "Description"]
)

This example demonstrates how to create an index, add documents to it, and perform a search query. The results will include documents that match the search query, along with their scores and highlights.

Use Cases

Marqo can be used in a variety of use cases, including:

  • Building search engines for text and image data
  • Entity resolution and deduplication
  • Data exploration and discovery
  • Question answering systems
  • Recommendation systems
  • Semantic search and similarity matching

Future Directions

Marqo is continuously evolving and expanding its capabilities. Some future directions for the project include:

  • Integration with more machine learning frameworks and models
  • Support for additional data types and formats
  • Enhanced query DSL and search capabilities
  • Improved scalability and performance optimizations
  • Integration with more AI and data processing frameworks

Conclusion

Marqo is a powerful end-to-end vector search engine that simplifies the process of generating, storing, and retrieving vectors. With its state-of-the-art embeddings, high-performance search capabilities, and support for multimodal data, Marqo enables developers to build advanced search and recommendation systems with ease. Whether you're a beginner or an experienced AI enthusiast, Marqo provides the tools and features you need to take your projects to the next level.

To learn more about Marqo, visit the Marqo website and check out the documentation. You can also join the Marqo community on Discourse and Slack to connect with other users and share your experiences.