How To Create A RAG Custom Database

Goal

Enable document upload and querying through Supabase to support complex dataset parsing and knowledge base access via custom tools in Voice AI and chat interfaces. This system unlocks voice-accessible knowledge bases and enhanced querying for chat-based assistants.

Resources

Buildship Remix Link: https://app.buildship.com/remix/f1643e2b-bd80-48b0-b556-49dea270b2f9
AI Assistant Upload Portal: https://createassistants.com/supabase-knowledge

Prerequisites

OpenAI Account

API Key for embedding and processing capabilities

Supabase Account

API Key (Project Settings > API)
Project URL (Project Settings > API)

Buildship Account

Access to the Remix project (linked above)

AI Assistant Account

Active assistant and custom tool setup

Implementation Steps

Step 1: Set Up Buildship Project

Open and Duplicate the Remix Project

Open the Remix Link in your Buildship account
Duplicate the project from the link provided
This will create a copy of the pre-configured workflows in your account

Configure API Keys

Add your API credentials to the project: Required Keys:

OpenAI API Key for embedding generation
Supabase API Key for database access
Supabase Project URL (Found in Supabase → Project Settings > API, right at the top)

Update Supabase Nodes

Update all Supabase nodes with your credentials: Node Locations:

5 nodes in the “Add Document Chunks” workflow
1 node in the “RAG using Supabase” workflow

Total: 6 nodes that need to be updated with correct keys and URL

Deploy the Project

After updating all Supabase nodes (project URL and API key) and OpenAI nodes (API key)
Click “Ship” in the top right corner to save changes
This will generate the API URLs needed for later steps

Step 2: Configure Supabase Database

Enable Vector Extension

In Supabase, click on the Database tab
Add the extension “vector” to enable knowledge embedding functionality
This extension is required for storing and querying document embeddings

Set Up Database Tables

Go to the “SQL Editor” tab
Run the following SQL commands individually
Each command should result in “Success no rows returned” output

Create Files Table

CREATE TABLE files (
    "id" UUID DEFAULT gen_random_uuid() PRIMARY KEY,
    "createdAt" TIMESTAMP WITH TIME ZONE DEFAULT timezone('utc'::text, now()),
    "size" NUMERIC,
    "mimeType" TEXT,
    "encoding" TEXT,
    "originalName" TEXT,
    "downloadUrl" TEXT
);

Create Chunks Table

CREATE TABLE chunks (
    "id" TEXT PRIMARY KEY,
    "fileId" TEXT,
    "position" INTEGER,
    "originalName" TEXT,
    "extractedText" TEXT,
    "downloadUrl" TEXT,
    "embedding" vector(1536),
    "fts" TSVECTOR GENERATED ALWAYS AS (to_tsvector('english', "extractedText")) STORED
);

Create Index for Full-Text Search

CREATE INDEX ON chunks USING GIN ("fts");

Create Hybrid Search Function

CREATE OR REPLACE FUNCTION hybrid_search(
  query_text TEXT,
  query_embedding VECTOR(1536),
  match_count INT,
  full_text_weight FLOAT = 1,
  semantic_weight FLOAT = 1,
  rrf_k INT = 50
)
RETURNS SETOF chunks
LANGUAGE SQL
AS $$
WITH full_text AS (
  SELECT "id", ROW_NUMBER() OVER (ORDER BY ts_rank_cd("fts", websearch_to_tsquery('english', query_text)) DESC) AS rank_ix
  FROM chunks
  WHERE "fts" @@ websearch_to_tsquery('english', query_text)
  LIMIT LEAST(match_count, 30) * 2
),
semantic AS (
  SELECT "id", ROW_NUMBER() OVER (ORDER BY "embedding" <#> query_embedding) AS rank_ix
  FROM chunks
  LIMIT LEAST(match_count, 30) * 2
)
SELECT chunks.*
FROM full_text
FULL OUTER JOIN semantic ON full_text."id" = semantic."id"
JOIN chunks ON COALESCE(full_text."id", semantic."id") = chunks."id"
ORDER BY
  COALESCE(1.0 / (rrf_k + full_text.rank_ix), 0.0) * full_text_weight +
  COALESCE(1.0 / (rrf_k + semantic.rank_ix), 0.0) * semantic_weight
DESC
LIMIT LEAST(match_count, 30);
$$;

Expected Output: Each command should return “Success no rows returned”

Step 3: Configure Upload Portal

Set Up the Upload Interface

Go to: https://createassistants.com/supabase-knowledge
Add your PDF Upload workflow API URL into the “Buildship API PDF Upload URL” field under the “Upload” tab
This URL should be generated from your Buildship project after shipping

Upload Documents

Document Format Requirements:

Primary format: PDF
Other formats: Convert to PDF first
- Google Docs: File > Download > PDF
- MS Word: File > Download > PDF
- Online converters: Use any reliable PDF converter

Upload Process:

Select your PDF files for upload
Submit the upload - this will schedule the processing
Check status in the Buildship workflow logs
Monitor progress through the Buildship dashboard

Step 4: Verify Database Population

Check Uploaded Data

Go to your Supabase database
Open the “Table Editor” tab
Click on the “chunks” database
Refresh the page to see uploaded data
Verify data appearance - you should see processed document chunks with embeddings

Data Validation

What to look for:

Document chunks with extracted text
Embedding vectors (1536 dimensions)
File metadata including original names
Proper indexing for search functionality

Step 5: Integrate with AI Assistant

Create Custom Tool

Add a custom tool to your assistant
Configure the tool to query your RAG database
Set up proper parameters for search queries

Tool Configuration Example

Tool Name: query_knowledge_base
Description: “Search the custom knowledge base for relevant information using both semantic and full-text search capabilities.”
Endpoint: Your Buildship “RAG using Supabase” workflow URL
Parameters:
- query: The search query or question
- match_count: Number of results to return (default: 5)

Testing the System

Test in multiple interfaces:

Voice AI: Ask questions about uploaded documents
Chat AI: Query the knowledge base through text
Web Orbs: Access knowledge through web interface

System Architecture

Data Flow

Document Upload → PDF processing in Buildship
Text Extraction → Document chunking and preprocessing
Embedding Generation → OpenAI embeddings for semantic search
Database Storage → Supabase with vector and full-text search
Query Processing → Hybrid search combining semantic and keyword matching
Result Delivery → Formatted responses through AI assistant

Search Capabilities

Hybrid Search Features:

Semantic Search: Using vector embeddings for meaning-based matching
Full-Text Search: Traditional keyword-based search
Weighted Combination: Configurable balance between search types
Ranking Algorithm: RRF (Reciprocal Rank Fusion) for optimal results

Advanced Configuration

Customization Options

Search Parameters

Full-text weight: Adjust importance of keyword matching
Semantic weight: Adjust importance of meaning-based search
RRF constant: Fine-tune ranking algorithm
Match count: Control number of returned results

Document Processing

Chunk size: Optimize for your document types
Overlap settings: Ensure context preservation
File type support: Extend beyond PDF if needed

Performance Optimization

Database Performance

Indexing strategy: Optimize for your query patterns
Connection pooling: Manage database connections efficiently
Query optimization: Monitor and improve search performance

Embedding Efficiency

Batch processing: Process multiple documents efficiently
Caching strategy: Store frequently accessed embeddings
Model selection: Choose appropriate embedding models

Troubleshooting

Common Issues

Upload failures:

Verify Buildship API URL is correct
Check PDF file format and size
Monitor Buildship workflow logs for errors

Database connection errors:

Confirm Supabase API keys are correct
Verify project URL formatting
Check database permissions

Search not returning results:

Ensure documents were processed successfully
Verify embeddings were generated
Check database table population

Performance issues:

Monitor database query performance
Optimize search parameters
Consider document chunking strategy

Debugging Steps

Verify Setup

Check Buildship logs for processing errors
Inspect Supabase tables for data integrity
Test API endpoints individually
Validate search function with simple queries

Performance Monitoring

Query response times
Database resource usage
Search result relevance
User satisfaction metrics

Security Considerations

Data Protection

API key security: Store credentials securely
Access control: Implement proper permissions
Data encryption: Ensure sensitive information protection
Audit logging: Track database access and changes

Privacy Compliance

Document handling: Ensure compliance with data regulations
User consent: Obtain appropriate permissions for data processing
Data retention: Implement appropriate retention policies
Cross-border considerations: Handle international data transfer requirements

Benefits and Use Cases

Key Benefits

✅ Voice-accessible knowledge bases for hands-free information access
✅ Enhanced query capabilities with hybrid search
✅ Scalable document processing for large knowledge bases
✅ Multi-interface support across voice, chat, and web
✅ Real-time information retrieval from uploaded documents
✅ Semantic understanding for intelligent question answering

Common Use Cases

Customer support knowledge bases
Technical documentation systems
Educational content libraries
Company policy and procedure databases
Research paper repositories
Product information systems

Maintenance and Updates

Regular Maintenance

Monitor database performance
Update embeddings for modified documents
Clean up unused data
Backup database regularly

System Updates

Keep dependencies current
Monitor API changes
Update search algorithms
Optimize based on usage patterns

This RAG Custom Database system provides a powerful foundation for AI-powered knowledge retrieval across multiple interfaces, enabling sophisticated document querying and information access through voice, chat, and web applications.

Getting Started

Agency Dashboard

Learning the Software - Beginner

Voice AI

Chat AI

Custom Tools

How To

Troubleshooting & Help

AI Prompting

​Goal

​Resources

​Prerequisites

​OpenAI Account

​Supabase Account

​Buildship Account

​AI Assistant Account

​Implementation Steps

​Step 1: Set Up Buildship Project

​Open and Duplicate the Remix Project

​Configure API Keys

​Update Supabase Nodes

​Deploy the Project

​Step 2: Configure Supabase Database

​Enable Vector Extension

​Set Up Database Tables

​Create Files Table

​Create Chunks Table

​Create Index for Full-Text Search

​Create Hybrid Search Function

​Step 3: Configure Upload Portal

​Set Up the Upload Interface

​Upload Documents

​Step 4: Verify Database Population

​Check Uploaded Data

​Data Validation

​Step 5: Integrate with AI Assistant

​Create Custom Tool

​Tool Configuration Example

​Testing the System

​System Architecture

​Data Flow

​Search Capabilities

​Advanced Configuration

​Customization Options

​Search Parameters

​Document Processing

​Performance Optimization

​Database Performance

​Embedding Efficiency

​Troubleshooting

​Common Issues

​Debugging Steps

​Verify Setup

​Performance Monitoring

​Security Considerations

​Data Protection

​Privacy Compliance

​Benefits and Use Cases

​Key Benefits

​Common Use Cases

​Maintenance and Updates

​Regular Maintenance

​System Updates