> ## Documentation Index
> Fetch the complete documentation index at: https://docs.buildassistants.app/llms.txt
> Use this file to discover all available pages before exploring further.

# How To Create A RAG Custom Database

## Goal

Enable document upload and querying through Supabase to support complex dataset parsing and knowledge base access via custom tools in Voice AI and chat interfaces. This system unlocks voice-accessible knowledge bases and enhanced querying for chat-based assistants.

## Resources

* **Buildship Remix Link:** [https://app.buildship.com/remix/f1643e2b-bd80-48b0-b556-49dea270b2f9](https://app.buildship.com/remix/f1643e2b-bd80-48b0-b556-49dea270b2f9)
* **AI Assistant Upload Portal:** [https://createassistants.com/supabase-knowledge](https://createassistants.com/supabase-knowledge)

## Prerequisites

### OpenAI Account

* **API Key** for embedding and processing capabilities

### Supabase Account

* **API Key** (Project Settings > API)
* **Project URL** (Project Settings > API)

### Buildship Account

* **Access to the Remix project** (linked above)

### AI Assistant Account

* **Active assistant** and custom tool setup

## Implementation Steps

### Step 1: Set Up Buildship Project

#### Open and Duplicate the Remix Project

1. **Open the Remix Link** in your Buildship account
2. **Duplicate the project** from the link provided
3. This will create a copy of the pre-configured workflows in your account

#### Configure API Keys

Add your API credentials to the project:

**Required Keys:**

* **OpenAI API Key** for embedding generation
* **Supabase API Key** for database access
* **Supabase Project URL** (Found in Supabase → Project Settings > API, right at the top)

#### Update Supabase Nodes

**Update all Supabase nodes** with your credentials:

**Node Locations:**

* **5 nodes** in the "Add Document Chunks" workflow
* **1 node** in the "RAG using Supabase" workflow

**Total: 6 nodes** that need to be updated with correct keys and URL

#### Deploy the Project

1. After updating all Supabase nodes (project URL and API key) and OpenAI nodes (API key)
2. **Click "Ship"** in the top right corner to save changes
3. This will generate the API URLs needed for later steps

### Step 2: Configure Supabase Database

#### Enable Vector Extension

1. **In Supabase, click on the Database tab**
2. **Add the extension "vector"** to enable knowledge embedding functionality
3. This extension is required for storing and querying document embeddings

#### Set Up Database Tables

1. **Go to the "SQL Editor" tab**
2. **Run the following SQL commands** individually
3. Each command should result in "Success no rows returned" output

#### Create Files Table

```sql theme={null}
CREATE TABLE files (
    "id" UUID DEFAULT gen_random_uuid() PRIMARY KEY,
    "createdAt" TIMESTAMP WITH TIME ZONE DEFAULT timezone('utc'::text, now()),
    "size" NUMERIC,
    "mimeType" TEXT,
    "encoding" TEXT,
    "originalName" TEXT,
    "downloadUrl" TEXT
);
```

#### Create Chunks Table

```sql theme={null}
CREATE TABLE chunks (
    "id" TEXT PRIMARY KEY,
    "fileId" TEXT,
    "position" INTEGER,
    "originalName" TEXT,
    "extractedText" TEXT,
    "downloadUrl" TEXT,
    "embedding" vector(1536),
    "fts" TSVECTOR GENERATED ALWAYS AS (to_tsvector('english', "extractedText")) STORED
);
```

#### Create Index for Full-Text Search

```sql theme={null}
CREATE INDEX ON chunks USING GIN ("fts");
```

#### Create Hybrid Search Function

```sql theme={null}
CREATE OR REPLACE FUNCTION hybrid_search(
  query_text TEXT,
  query_embedding VECTOR(1536),
  match_count INT,
  full_text_weight FLOAT = 1,
  semantic_weight FLOAT = 1,
  rrf_k INT = 50
)
RETURNS SETOF chunks
LANGUAGE SQL
AS $$
WITH full_text AS (
  SELECT "id", ROW_NUMBER() OVER (ORDER BY ts_rank_cd("fts", websearch_to_tsquery('english', query_text)) DESC) AS rank_ix
  FROM chunks
  WHERE "fts" @@ websearch_to_tsquery('english', query_text)
  LIMIT LEAST(match_count, 30) * 2
),
semantic AS (
  SELECT "id", ROW_NUMBER() OVER (ORDER BY "embedding" <#> query_embedding) AS rank_ix
  FROM chunks
  LIMIT LEAST(match_count, 30) * 2
)
SELECT chunks.*
FROM full_text
FULL OUTER JOIN semantic ON full_text."id" = semantic."id"
JOIN chunks ON COALESCE(full_text."id", semantic."id") = chunks."id"
ORDER BY
  COALESCE(1.0 / (rrf_k + full_text.rank_ix), 0.0) * full_text_weight +
  COALESCE(1.0 / (rrf_k + semantic.rank_ix), 0.0) * semantic_weight
DESC
LIMIT LEAST(match_count, 30);
$$;
```

> **Expected Output:** Each command should return "Success no rows returned"

### Step 3: Configure Upload Portal

#### Set Up the Upload Interface

1. **Go to:** [https://createassistants.com/supabase-knowledge](https://createassistants.com/supabase-knowledge)
2. **Add your PDF Upload workflow API URL** into the "Buildship API PDF Upload URL" field under the "Upload" tab
3. This URL should be generated from your Buildship project after shipping

#### Upload Documents

**Document Format Requirements:**

* **Primary format:** PDF
* **Other formats:** Convert to PDF first
  * **Google Docs:** File > Download > PDF
  * **MS Word:** File > Download > PDF
  * **Online converters:** Use any reliable PDF converter

**Upload Process:**

1. **Select your PDF files** for upload
2. **Submit the upload** - this will schedule the processing
3. **Check status** in the Buildship workflow logs
4. **Monitor progress** through the Buildship dashboard

### Step 4: Verify Database Population

#### Check Uploaded Data

1. **Go to your Supabase database**
2. **Open the "Table Editor" tab**
3. **Click on the "chunks" database**
4. **Refresh the page** to see uploaded data
5. **Verify data appearance** - you should see processed document chunks with embeddings

#### Data Validation

**What to look for:**

* **Document chunks** with extracted text
* **Embedding vectors** (1536 dimensions)
* **File metadata** including original names
* **Proper indexing** for search functionality

### Step 5: Integrate with AI Assistant

#### Create Custom Tool

1. **Add a custom tool** to your assistant
2. **Configure the tool** to query your RAG database
3. **Set up proper parameters** for search queries

#### Tool Configuration Example

* **Tool Name:** `query_knowledge_base`
* **Description:** "Search the custom knowledge base for relevant information using both semantic and full-text search capabilities."
* **Endpoint:** Your Buildship "RAG using Supabase" workflow URL
* **Parameters:**
  * `query`: The search query or question
  * `match_count`: Number of results to return (default: 5)

#### Testing the System

**Test in multiple interfaces:**

* **Voice AI:** Ask questions about uploaded documents
* **Chat AI:** Query the knowledge base through text
* **Web Orbs:** Access knowledge through web interface

## System Architecture

### Data Flow

1. **Document Upload** → PDF processing in Buildship
2. **Text Extraction** → Document chunking and preprocessing
3. **Embedding Generation** → OpenAI embeddings for semantic search
4. **Database Storage** → Supabase with vector and full-text search
5. **Query Processing** → Hybrid search combining semantic and keyword matching
6. **Result Delivery** → Formatted responses through AI assistant

### Search Capabilities

**Hybrid Search Features:**

* **Semantic Search:** Using vector embeddings for meaning-based matching
* **Full-Text Search:** Traditional keyword-based search
* **Weighted Combination:** Configurable balance between search types
* **Ranking Algorithm:** RRF (Reciprocal Rank Fusion) for optimal results

## Advanced Configuration

### Customization Options

#### Search Parameters

* **Full-text weight:** Adjust importance of keyword matching
* **Semantic weight:** Adjust importance of meaning-based search
* **RRF constant:** Fine-tune ranking algorithm
* **Match count:** Control number of returned results

#### Document Processing

* **Chunk size:** Optimize for your document types
* **Overlap settings:** Ensure context preservation
* **File type support:** Extend beyond PDF if needed

### Performance Optimization

#### Database Performance

* **Indexing strategy:** Optimize for your query patterns
* **Connection pooling:** Manage database connections efficiently
* **Query optimization:** Monitor and improve search performance

#### Embedding Efficiency

* **Batch processing:** Process multiple documents efficiently
* **Caching strategy:** Store frequently accessed embeddings
* **Model selection:** Choose appropriate embedding models

## Troubleshooting

### Common Issues

**Upload failures:**

* Verify Buildship API URL is correct
* Check PDF file format and size
* Monitor Buildship workflow logs for errors

**Database connection errors:**

* Confirm Supabase API keys are correct
* Verify project URL formatting
* Check database permissions

**Search not returning results:**

* Ensure documents were processed successfully
* Verify embeddings were generated
* Check database table population

**Performance issues:**

* Monitor database query performance
* Optimize search parameters
* Consider document chunking strategy

### Debugging Steps

#### Verify Setup

1. **Check Buildship logs** for processing errors
2. **Inspect Supabase tables** for data integrity
3. **Test API endpoints** individually
4. **Validate search function** with simple queries

#### Performance Monitoring

* **Query response times**
* **Database resource usage**
* **Search result relevance**
* **User satisfaction metrics**

## Security Considerations

### Data Protection

* **API key security:** Store credentials securely
* **Access control:** Implement proper permissions
* **Data encryption:** Ensure sensitive information protection
* **Audit logging:** Track database access and changes

### Privacy Compliance

* **Document handling:** Ensure compliance with data regulations
* **User consent:** Obtain appropriate permissions for data processing
* **Data retention:** Implement appropriate retention policies
* **Cross-border considerations:** Handle international data transfer requirements

## Benefits and Use Cases

### Key Benefits

* ✅ **Voice-accessible knowledge bases** for hands-free information access
* ✅ **Enhanced query capabilities** with hybrid search
* ✅ **Scalable document processing** for large knowledge bases
* ✅ **Multi-interface support** across voice, chat, and web
* ✅ **Real-time information retrieval** from uploaded documents
* ✅ **Semantic understanding** for intelligent question answering

### Common Use Cases

* **Customer support knowledge bases**
* **Technical documentation systems**
* **Educational content libraries**
* **Company policy and procedure databases**
* **Research paper repositories**
* **Product information systems**

## Maintenance and Updates

### Regular Maintenance

* **Monitor database performance**
* **Update embeddings** for modified documents
* **Clean up unused data**
* **Backup database regularly**

### System Updates

* **Keep dependencies current**
* **Monitor API changes**
* **Update search algorithms**
* **Optimize based on usage patterns**

This RAG Custom Database system provides a powerful foundation for AI-powered knowledge retrieval across multiple interfaces, enabling sophisticated document querying and information access through voice, chat, and web applications.
