Goal

Enable document upload and querying through Supabase to support complex dataset parsing and knowledge base access via custom tools in Voice AI and chat interfaces. This system unlocks voice-accessible knowledge bases and enhanced querying for chat-based assistants.

Resources

Prerequisites

OpenAI Account

  • API Key for embedding and processing capabilities

Supabase Account

  • API Key (Project Settings > API)
  • Project URL (Project Settings > API)

Buildship Account

  • Access to the Remix project (linked above)

AI Assistant Account

  • Active assistant and custom tool setup

Implementation Steps

Step 1: Set Up Buildship Project

Open and Duplicate the Remix Project

  1. Open the Remix Link in your Buildship account
  2. Duplicate the project from the link provided
  3. This will create a copy of the pre-configured workflows in your account

Configure API Keys

Add your API credentials to the project: Required Keys:
  • OpenAI API Key for embedding generation
  • Supabase API Key for database access
  • Supabase Project URL (Found in Supabase → Project Settings > API, right at the top)

Update Supabase Nodes

Update all Supabase nodes with your credentials: Node Locations:
  • 5 nodes in the “Add Document Chunks” workflow
  • 1 node in the “RAG using Supabase” workflow
Total: 6 nodes that need to be updated with correct keys and URL

Deploy the Project

  1. After updating all Supabase nodes (project URL and API key) and OpenAI nodes (API key)
  2. Click “Ship” in the top right corner to save changes
  3. This will generate the API URLs needed for later steps

Step 2: Configure Supabase Database

Enable Vector Extension

  1. In Supabase, click on the Database tab
  2. Add the extension “vector” to enable knowledge embedding functionality
  3. This extension is required for storing and querying document embeddings

Set Up Database Tables

  1. Go to the “SQL Editor” tab
  2. Run the following SQL commands individually
  3. Each command should result in “Success no rows returned” output

Create Files Table

CREATE TABLE files (
    "id" UUID DEFAULT gen_random_uuid() PRIMARY KEY,
    "createdAt" TIMESTAMP WITH TIME ZONE DEFAULT timezone('utc'::text, now()),
    "size" NUMERIC,
    "mimeType" TEXT,
    "encoding" TEXT,
    "originalName" TEXT,
    "downloadUrl" TEXT
);

Create Chunks Table

CREATE TABLE chunks (
    "id" TEXT PRIMARY KEY,
    "fileId" TEXT,
    "position" INTEGER,
    "originalName" TEXT,
    "extractedText" TEXT,
    "downloadUrl" TEXT,
    "embedding" vector(1536),
    "fts" TSVECTOR GENERATED ALWAYS AS (to_tsvector('english', "extractedText")) STORED
);
CREATE INDEX ON chunks USING GIN ("fts");

Create Hybrid Search Function

CREATE OR REPLACE FUNCTION hybrid_search(
  query_text TEXT,
  query_embedding VECTOR(1536),
  match_count INT,
  full_text_weight FLOAT = 1,
  semantic_weight FLOAT = 1,
  rrf_k INT = 50
)
RETURNS SETOF chunks
LANGUAGE SQL
AS $$
WITH full_text AS (
  SELECT "id", ROW_NUMBER() OVER (ORDER BY ts_rank_cd("fts", websearch_to_tsquery('english', query_text)) DESC) AS rank_ix
  FROM chunks
  WHERE "fts" @@ websearch_to_tsquery('english', query_text)
  LIMIT LEAST(match_count, 30) * 2
),
semantic AS (
  SELECT "id", ROW_NUMBER() OVER (ORDER BY "embedding" <#> query_embedding) AS rank_ix
  FROM chunks
  LIMIT LEAST(match_count, 30) * 2
)
SELECT chunks.*
FROM full_text
FULL OUTER JOIN semantic ON full_text."id" = semantic."id"
JOIN chunks ON COALESCE(full_text."id", semantic."id") = chunks."id"
ORDER BY
  COALESCE(1.0 / (rrf_k + full_text.rank_ix), 0.0) * full_text_weight +
  COALESCE(1.0 / (rrf_k + semantic.rank_ix), 0.0) * semantic_weight
DESC
LIMIT LEAST(match_count, 30);
$$;
Expected Output: Each command should return “Success no rows returned”

Step 3: Configure Upload Portal

Set Up the Upload Interface

  1. Go to: https://createassistants.com/supabase-knowledge
  2. Add your PDF Upload workflow API URL into the “Buildship API PDF Upload URL” field under the “Upload” tab
  3. This URL should be generated from your Buildship project after shipping

Upload Documents

Document Format Requirements:
  • Primary format: PDF
  • Other formats: Convert to PDF first
    • Google Docs: File > Download > PDF
    • MS Word: File > Download > PDF
    • Online converters: Use any reliable PDF converter
Upload Process:
  1. Select your PDF files for upload
  2. Submit the upload - this will schedule the processing
  3. Check status in the Buildship workflow logs
  4. Monitor progress through the Buildship dashboard

Step 4: Verify Database Population

Check Uploaded Data

  1. Go to your Supabase database
  2. Open the “Table Editor” tab
  3. Click on the “chunks” database
  4. Refresh the page to see uploaded data
  5. Verify data appearance - you should see processed document chunks with embeddings

Data Validation

What to look for:
  • Document chunks with extracted text
  • Embedding vectors (1536 dimensions)
  • File metadata including original names
  • Proper indexing for search functionality

Step 5: Integrate with AI Assistant

Create Custom Tool

  1. Add a custom tool to your assistant
  2. Configure the tool to query your RAG database
  3. Set up proper parameters for search queries

Tool Configuration Example

  • Tool Name: query_knowledge_base
  • Description: “Search the custom knowledge base for relevant information using both semantic and full-text search capabilities.”
  • Endpoint: Your Buildship “RAG using Supabase” workflow URL
  • Parameters:
    • query: The search query or question
    • match_count: Number of results to return (default: 5)

Testing the System

Test in multiple interfaces:
  • Voice AI: Ask questions about uploaded documents
  • Chat AI: Query the knowledge base through text
  • Web Orbs: Access knowledge through web interface

System Architecture

Data Flow

  1. Document Upload → PDF processing in Buildship
  2. Text Extraction → Document chunking and preprocessing
  3. Embedding Generation → OpenAI embeddings for semantic search
  4. Database Storage → Supabase with vector and full-text search
  5. Query Processing → Hybrid search combining semantic and keyword matching
  6. Result Delivery → Formatted responses through AI assistant

Search Capabilities

Hybrid Search Features:
  • Semantic Search: Using vector embeddings for meaning-based matching
  • Full-Text Search: Traditional keyword-based search
  • Weighted Combination: Configurable balance between search types
  • Ranking Algorithm: RRF (Reciprocal Rank Fusion) for optimal results

Advanced Configuration

Customization Options

Search Parameters

  • Full-text weight: Adjust importance of keyword matching
  • Semantic weight: Adjust importance of meaning-based search
  • RRF constant: Fine-tune ranking algorithm
  • Match count: Control number of returned results

Document Processing

  • Chunk size: Optimize for your document types
  • Overlap settings: Ensure context preservation
  • File type support: Extend beyond PDF if needed

Performance Optimization

Database Performance

  • Indexing strategy: Optimize for your query patterns
  • Connection pooling: Manage database connections efficiently
  • Query optimization: Monitor and improve search performance

Embedding Efficiency

  • Batch processing: Process multiple documents efficiently
  • Caching strategy: Store frequently accessed embeddings
  • Model selection: Choose appropriate embedding models

Troubleshooting

Common Issues

Upload failures:
  • Verify Buildship API URL is correct
  • Check PDF file format and size
  • Monitor Buildship workflow logs for errors
Database connection errors:
  • Confirm Supabase API keys are correct
  • Verify project URL formatting
  • Check database permissions
Search not returning results:
  • Ensure documents were processed successfully
  • Verify embeddings were generated
  • Check database table population
Performance issues:
  • Monitor database query performance
  • Optimize search parameters
  • Consider document chunking strategy

Debugging Steps

Verify Setup

  1. Check Buildship logs for processing errors
  2. Inspect Supabase tables for data integrity
  3. Test API endpoints individually
  4. Validate search function with simple queries

Performance Monitoring

  • Query response times
  • Database resource usage
  • Search result relevance
  • User satisfaction metrics

Security Considerations

Data Protection

  • API key security: Store credentials securely
  • Access control: Implement proper permissions
  • Data encryption: Ensure sensitive information protection
  • Audit logging: Track database access and changes

Privacy Compliance

  • Document handling: Ensure compliance with data regulations
  • User consent: Obtain appropriate permissions for data processing
  • Data retention: Implement appropriate retention policies
  • Cross-border considerations: Handle international data transfer requirements

Benefits and Use Cases

Key Benefits

  • Voice-accessible knowledge bases for hands-free information access
  • Enhanced query capabilities with hybrid search
  • Scalable document processing for large knowledge bases
  • Multi-interface support across voice, chat, and web
  • Real-time information retrieval from uploaded documents
  • Semantic understanding for intelligent question answering

Common Use Cases

  • Customer support knowledge bases
  • Technical documentation systems
  • Educational content libraries
  • Company policy and procedure databases
  • Research paper repositories
  • Product information systems

Maintenance and Updates

Regular Maintenance

  • Monitor database performance
  • Update embeddings for modified documents
  • Clean up unused data
  • Backup database regularly

System Updates

  • Keep dependencies current
  • Monitor API changes
  • Update search algorithms
  • Optimize based on usage patterns
This RAG Custom Database system provides a powerful foundation for AI-powered knowledge retrieval across multiple interfaces, enabling sophisticated document querying and information access through voice, chat, and web applications.