Published on

Introducing Dataherald: Query Your Structured Data in Natural Language

Authors
  • avatar
    Twitter

📚 Introducing Dataherald: Query Your Structured Data in Natural Language

Have you ever wished you could ask your database questions in plain English and get accurate answers? Look no further than Dataherald, a natural language-to-SQL engine built for enterprise-level question answering over structured data. With Dataherald, you can set up an API from your database that can understand and respond to queries in natural language. Whether you want to enable business users to access insights from the data warehouse without going through a data analyst, integrate Q+A functionality into your SaaS application, or create a ChatGPT plug-in from your proprietary data, Dataherald has got you covered.

How Does Dataherald Work?

Dataherald is designed to be modular, allowing different implementations of core components to be plugged in. It comes with best-in-class implementations for components like text to SQL and evaluation. The engine is easy to set up and use with major data warehouses, and it improves with usage. Dataherald is built for speed, ensuring that you get fast and accurate responses to your queries.

Getting Started with Dataherald

The simplest way to get started with Dataherald is to use the hosted version, which is currently being rolled out to select customers. You can sign up for the waitlist on the Dataherald website. If you prefer to self-host the engine, you can use Docker. By default, the engine uses MongoDB to store application data.

To run Dataherald locally using Docker, follow these steps:

  1. Create a .env file based on the provided .env.example file. Set the necessary fields, including your OpenAI credentials, LLM model, organization ID, and encryption key for storing DB connection data in MongoDB.

  2. Install and run Docker on your machine.

  3. Create a Docker network for communication between services by running the following command: docker network create backendnetwork.

  4. Build the Docker images, create containers, and start them using the command: docker-compose up --build. Make sure to include the --build flag if you need to rebuild the image due to updates.

  5. Check that the containers are running by running docker ps. You should see two containers: one for the Dataherald app and one for MongoDB.

  6. Open your browser and visit http://localhost/docs to access the Dataherald API documentation.

Connecting to and Querying Your SQL Databases

Once the Dataherald engine is up and running, you can start using it to query your SQL databases. Here's how:

  1. Connect to your data warehouses. Dataherald currently supports connections to PostGres, BigQuery, Databricks, and Snowflake. You can create connections through the API or at application start-up using environment variables.

  2. Add context about the data to the engine. This step is optional but recommended for improving the accuracy of the generated SQL. You can add context in three ways: by scanning the database tables and columns, adding verified SQL (golden SQL), or adding string descriptions of the tables and columns.

  3. Query the data in natural language. Use the POST /api/v1/question endpoint to send your natural language question to the Dataherald engine. Make sure to specify the database alias in the request.

Replacing Core Modules

Dataherald is built with replaceable modules, allowing you to customize and extend its functionality. Some of the main modules that can be replaced include the SQL generator, vector store, DB, and evaluator. Dataherald already includes multiple implementations for testing and benchmarking purposes.

Benefits and Use Cases

Dataherald offers several benefits and can be used in various scenarios. Here are a few examples:

  • Empower business users: With Dataherald, business users can directly access insights from the data warehouse without relying on data analysts or technical experts.

  • SaaS application integration: You can integrate Dataherald into your SaaS application to enable Q+A functionality for your users. This allows them to query the production databases directly from within your application.

  • ChatGPT plug-in: Dataherald can be used to create a ChatGPT plug-in that can understand and respond to queries based on your proprietary data. This opens up possibilities for building intelligent chatbots and virtual assistants.

Future Directions

Dataherald is an actively developed project, and the API may change over time. The team behind Dataherald is continuously working on improving and expanding its capabilities. They welcome contributions from the open-source community and are open to new features, improved infrastructure, and better documentation.

Conclusion

Dataherald is a powerful natural language-to-SQL engine that allows you to query your structured data in plain English. With its modular design, ease of use, and speed, Dataherald is a valuable tool for anyone looking to unlock the full potential of their data. Whether you're a business user, a SaaS application developer, or an AI enthusiast, Dataherald has the potential to revolutionize the way you interact with your data. Give it a try and see the magic unfold!

To learn more about Dataherald and get started, visit the official website and check out the documentation. You can also join the Dataherald community on Discord to connect with other users and get support.

🔗 Dataherald GitHub Repository