Perform embedding similarity search with Free-Tier Oracle Autonomous Database 26ai

In previous posts, I’ve demonstrated multimodal search against the Unsplash Lite dataset using both Chroma, pgvector and the CLIP model. While the local search performs great with minimal latency (~1.6ms average), the embedding phase is very slow (~130ms for a simple text string).

In this post, I will look at optimizing this latency, specifically in improving the query embedding phase with different methods. One method I will survey deeper is moving to Oracle Autonomous Database 26ai, which offers a unique performance advantage by running the embedding model directly inside the database kernel.

Why optimize?

For a dataset of 25,000 images (with 512-dimensional embeddings for ViT-B/32), our primary bottleneck is the query embedding phase, as even brute-force similarity search takes less than 5ms for that volume. Optimizing CLIP inference for query embedding involves shifting from general-purpose deep learning libraries like PyTorch/SentenceTransformers to specialized inference runtimes and managed services. The followings are standard optimization practices:

Switch to ONNX Runtime (CPU/GPU): SentenceTransformer uses PyTorch by default, which carries significant overhead. Converting the model to ONNX and using ONNX Runtime (with OpenVINO for Intel CPUs or TensorRT for NVIDIA GPUs) can reduce latency by 3x to 5x. > This is the approach I will take in this post.
Quantization (FP16 or INT8): Converting the weights from FP32 to FP16 (minimal accuracy loss) or INT8 (requires calibration) significantly reduces the computational load and memory bandwidth requirements.
Model Pruning & Distillation: Use smaller “distilled” versions like MobileCLIP or FastCLIP if the slightly lower accuracy is acceptable for your use case.
Preprocessing Optimization: For image queries, ensure our image resizing and normalization (typically using Pillow or OpenCV) are vectorized and pre-allocated to avoid Python’s Global Interpreter Lock (GIL) bottlenecks.

Why Oracle Autonomous AI Database (OADB)

I’ve been a big fan of Oracle Cloud’s free-tier offerings as I found them to be generous and robust enough for independent developers. In the database realm, each Always-Free tier account can provision two Autonomous Database instances with 1 OCPU and 20 GB of storage each.

In my user scenario, I need a database that is similar to pgvector, which provides a first-class VECTOR type, distance functions, and approximate/exact similarity search. Autonomous Database nails this. What caught my eyes in addition to the built-in vector store is its In-Database ONNX Runtime, which eliminates the need to provision, configure, and maintain a separate environment to generate embeddings. It also removes the need to move data from the database to a separate, improving both latency and security. By moving my CLIP model inside the database, I should be able to eliminate the overhead of Python-to-DB data transfers, and leverage Oracle’s native C-based ONNX runtime for on-the-fly embedding of queries directly in SQL via the VECTOR_EMBEDDING function.

We are limited to models that can be converted to ONNX. While CLIP is supported, extremely large or custom architectures may require more effort to import than simply calling an API.

Here are the advantages of OADB:

Latency: Because the CLIP model runs inside the DB kernel via an integrated ONNX runtime, there is no network round-trip. End-to-end SQL execution for an embedding typically takes < 10ms.
1. Elimination of Inter-Process Communication (IPC): In our current setup, the query embedding (created in Python, whether via ONNX or SentenceTransformer) must be generated in our app’s memory space, serialized and sent over a network/socket to Chroma or pgvector, and then deserialized. In OADB, the embedding is generated and passed to the search engine in-memory.
2. Hardware-Level Optimizations: Oracle’s in-database ONNX runtime is highly optimized for SIMD (AVX-512) instructions on Intel/AMD CPUs. It bypasses the Python interpreter’s overhead entirely.
3. Unified Execution Plan: We can run a single SQL query like:
```
`SELECT * FROM images ORDER BY VECTOR_DISTANCE(image_vector, VECTOR_EMBEDDING(clip_model USING :query_text)) FETCH FIRST 5 ROWS ONLY;`

The database optimizer treats the embedding and the search as a single operation, maximizing CPU cache locality.
```
Throughput: Throughput is bound by the OCPUs or GPUs allocated to an Autonomous instance. It is highly efficient for bulk vectorization because it avoids the overhead of thousands of HTTP/JSON calls.
Converged Database: The Transaction Processing workload has multi-model support: it handles JSON, graph, text, spatial, vector, and relational data within a single platform. It includes a native JSON data type (OSON) that is faster than standard text-based JSON, allowing us to index and search documents with SQL or simple APIs. In our case, we can store image embeddings (vectors), documents (JSON/Text), and relational data (photo metadata) all in one place, as the backend for a search application for fast, low-latency responses with a user-facing frontend.

For example, we can create a single table that looks like this to handle our entire requirement:
```
CREATE TABLE image_library (
    id NUMBER PRIMARY KEY,
    image_name VARCHAR2(255),
    -- The Vector:
    embedding VECTOR(512, FLOAT32), 
    -- The Document:
    metadata JSON, 
    -- Other Data:
    upload_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    tags VARCHAR2(4000)
); 
```
This enables us to run a hybrid search in one query: “Find images similar to this text AND where the document metadata says the category is ‘Nature’.” In this way, a single query can retrieve documents, images, audio, video, and table rows together without ETL hops.

Choosing from different OADB workloads

For a vector search use case involving 25,000 images and real-time queries, here is the breakdown:

Category	Best For	Technical Characteristics	Why Not Ideal for our Vector Search App?	Recommendation
Autonomous Transaction Processing (ATP)	Real‑time applications, high concurrency, frequent small updates, OLTP workloads.	Row‑oriented storage, low‑latency index lookups, optimized for single-query response such as point queries, low‑latency vector search and transactional workloads.	None — this is the correct engine for a “search bar” experience where each query must return in tens of milliseconds.	Use ATP. Best match for low‑latency vector search + metadata queries. ATP is faster for single‑query, low‑latency workloads because row‑store avoids columnar decompression and is optimized for OLTP access paths.
Autonomous Data Warehouse (ADW)	Analytics, BI dashboards, massive scans, aggregations, complex reports.	Columnar storage, vectorized execution, optimized for high-throughput batch and analytic scans, not for interactive search.	Columnar layout adds overhead for single‑row access patterns; latency is higher for single‑query vector search.	Use only if the workload is analytics‑heavy (millions of rows scanned per query).
Lakehouse	Big data analytics, ETL, large‑scale warehousing.	Designed for high‑throughput ingestion and large analytical scans; not optimized for millisecond‑level point queries.	Latency too high for interactive search; optimized for batch analytics, not real‑time vector retrieval.	Not recommended for my use case.
JSON Database	Pure document‑store apps with JSON‑only data models.	Essentially ATP with a restricted feature set; optimized for JSON operations.	I have relational + vector + metadata needs. JSON DB removes features we need (indexes, SQL flexibility).	Stick with full ATP for maximum flexibility.
APEX Service	Low‑code developers building UI‑driven apps quickly.	ATP instance with a UI layer; same underlying engine.	not useful if we’re building our own frontend.	Use standard ATP instead of APEX.

Perform embedding search in Oracle Database 26ai

Let’s setup our environment! We will first provision a free-tier Autonomous Database instance, create a working connection to it, then upload our existing Unsplash image embeddings to a table with native vector support. We will then convert and upload the CLIP model in ONNX format to the database. Finally, we will build a search index for better perform, and query the images with natural language.

Phase 1: Setup Oracle 26ai Autonomous Database (Free Tier)

Sign Up/Login: Go to the Oracle Cloud Console.
Create Database: Navigate to Oracle Database > Autonomous Database.
Configuration:
- Display Name: VectorDB
- Database Version: Ensure you select 26ai.
- Workload Type: Transaction Processing
- Deployment Type: Shared Infrastructure.
- Always Free: Toggle the Always Free switch to ON.
- Credentials: Set a strong ADMIN password.

Phase 2: Setup database connection

Oracle Autonomous Database supports both mTLS and TLS connections. Here’s a breakdown of the two authentication modes:

Feature	mTLS (Mutual TLS)	TLS (One-way TLS)
Authentication	User/Pass + Client Certificate (Wallet)	Username / Password only
Setup	Download & manage `.zip` wallet files	Copy & Paste a connection string
Port	1522	1521
Best For	High-security corporate apps	Quick prototyping

By default, it is set to “mTLS only” for security. TLS authentication is easier to use, provides better connection latency, and does not require you to download client credentials (wallet) when using many modern drivers (like the python-oracledb driver in “Thin” mode). If your database is currently set to “Secure access from everywhere” without any rules, Oracle prevents you from disabling mTLS to protect you from brute-force attacks. To switch to TLS, the database must first be secured by either an Access Control List (ACL) or a Private Endpoint.

Since I will be building a frontend for multimodal search in a serverless environment, I will enable TLS as it works much better with a simple connection string than trying to manage an encrypted zip file in a serverless environment.

Enable TLS

Enable the Access Control List (ACL):
- On the Autonomous Database Details page, go to Network section.
- Click Edit next to Access Control List.
- In the slide-out, click + Add My IP Address. This ensures only the current computer can talk to the database.
- Click Update. The database will enter an “Updating” state for a moment.
Update mTLS Setting:
- Wait for the DB to show the green Available status.
- Click the Edit button next to Mutual TLS (mTLS) Authentication
- uncheck the box “Require mutual TLS (mTLS) authentication.”
- Click Update.

Obtain Connection string

Click “Database Connection”: Look for this button near the top of the page, right next to “Stop” and “Restart”
Look for a dropdown or toggle labeled Mutual TLS (mTLS) and change it to TLS.
In the table of connection strings below (usually named high, medium, and low), find the high service (this gives the most resources to vector search).
Click Copy on that connection string. It will look something like this: (description= (address=(protocol=tcps)(port=1522)(host=adb.us-ashburn-1.oraclecloud.com))(connect_data=(service_name=db_name_high.adb.oraclecloud.com))(security=(ssl_server_dn_match=yes)))

Phase 3: Upload 25k Embeddings to Database

Since we already have the vectors in pickle format, we will use the python-oracledb driver to push them from our local environment to the DB.

Create a table

The SQL Worksheet is our primary interface for running SQL and PL/SQL commands against our database instance.

Navigate to Oracle Database > Autonomous Database.
Click on the database name (VectorDB).
On the Database Details page, look for the Database Actions button and select SQL.
This opens a web-based IDE (SQL Worksheet) where you can paste and run all scripts.

Create the Table: Run this in the SQL Worksheet

CREATE TABLE image_search (
    id NUMBER PRIMARY KEY,
    image_name VARCHAR2(255),
    embedding VECTOR(512, FLOAT32) -- CLIP-ViT-B-32 produces 512-dim vectors
);

Upload embeddings to new table

Install Driver with pip install oracledb
Run this script:

import oracledb
import pickle 

# Database instance credentials
user = "ADMIN"
password = "Admin_Password"
# The Connection string from Phase 2
dsn_str = "(description= (address=(protocol=tcps)(port=1522)(host=...)))"

# Connect in THIN mode (no wallet needed)
connection = oracledb.connect(
    user=user,
    password=password,
    dsn=dsn_str
)

print("Successfully connected using TLS!")

with open('embeddings.pkl', 'rb') as f:
    data = pickle.load(f) # Assuming list of (id, name, vector)

# 1. Extract the two parts from raw_data tuple
filenames = raw_data[0]
embeddings = raw_data[1]

# 2. Reformat into a list of tuples for the database
# We'll use the loop index 'i' as a unique ID
formatted_data = []
for i in range(len(filenames)):
    img_name = filenames[i]
    # Ensure the vector is a standard Python list of floats
    vector = embeddings[i].tolist() 
    
    formatted_data.append((i, img_name, vector))

# 3. Batch Insert into Autonomous Database 
with connection.cursor() as cursor:
    # 1. Define the SQL
    sql = "INSERT INTO image_search (id, image_name, embedding) VALUES (:1, :2, :3)"
    
    # 2. specify the column types
    # oracledb.DB_TYPE_NUMBER for ID
    # oracledb.DB_TYPE_VARCHAR for Name
    # oracledb.DB_TYPE_VECTOR for the Embedding
    cursor.setinputsizes(oracledb.DB_TYPE_NUMBER, oracledb.DB_TYPE_VARCHAR, oracledb.DB_TYPE_VECTOR)
    
    # 3. Execute
    cursor.executemany(sql, formatted_data, batch_size=1000)
    connection.commit()
    
    print(f"Success! Uploaded {len(formatted_data)} records.")

When this is done, we will see Success! Uploaded 24996 records. in the console. We can verify the data in SQL Worksheet with:

SELECT count(*) FROM image_search;
-- And check the vector format:
SELECT id, image_name, embedding FROM image_search FETCH FIRST 1 ROWS ONLY;

Phase 4: Convert CLIP to ONNX and Load into DB

To use LLMs in the database, we need to convert the model into ONNX format. Oracle provides preconfigured ONNX pipeline models. These are augmented pipelines that include tokenization and post-processing steps for generating vector embeddings in the Oracle Database.

If your desired model is not prebuilt by Oracle, you can use the Oracle Machine Learning for Python (OML4Py) utility to convert your model.

Export to ONNX (Local Python)

Oracle’s oml library can wrap the CLIP model with the necessary pre-processing (like image decoding, resizing, normalization) so it works natively in SQL. Run this in an environment with OML4Py installed:

from oml.utils import ONNXPipeline, ONNXPipelineConfig

# Use the built-in multimodal CLIP template
config = ONNXPipelineConfig.from_template("multimodal_clip")

# Specify the exact CLIP-ViT-B-32 model from Hugging Face
pipeline = ONNXPipeline("openai/clip-vit-base-patch32", config=config)

# Export to files (generates two ONNX files with preprocessing + normalization)
pipeline.export2file("clip_vit_b32")

Notice that this creates 2 files:

clip_vit_b32_txt.onnx (text encoder)
clip_vit_b32_img.onnx (image encoder)

Unlike the original multimodal model which is a single file, each modality file contains its own pre-processing and post-processing pipeline.

Upload Files to OCI Object Storage

Next we are going to upload both .onnx files to a free-tier OCI Object Storage bucket accessible from our Autonomous Database. The Always-Free tier offers 20 GB of capacity across Standard, Infrequent Access, and Archive Storage tiers with an S3-compatible API for integration with existing tools and applications.

1. Create Bucket

In OCI Console, go to Storage > Buckets

Bucket Name: Provide a clear, unique name (e.g., AI_Models).

For the remaining options:

Option	Setting	Reason
Default Storage Tier	Standard	The database requires fast, immediate access to the model file during the loading process. The Archive tier will cause the load command to fail because the data must be manually restored before use.
Enable Auto-Tiering	Disabled (Optional)	While good for costs, it’s safer to keep models in Standard to avoid latency during re-loads.
Enable Object Versioning	Enabled	This allows us to keep track of different versions of the model file without overwriting the original.
Emit Object Events	Disabled	You likely don’t need automation triggers (like OCI Functions) for this simple use case.
Uncommitted Multipart Uploads Cleanup	Enabled	Good hygiene to clean up failed large-file uploads after 7 days.
Encryption	Encrypt using Oracle-managed keys	Using customer-managed keys adds significant complexity (Vault setup) that is usually unnecessary for this phase.
Bucket Visibility	Private (the default)	Never make the bucket public for model storage. Instead, use a Pre-Authenticated Request (PAR) or Database Credentials to give the database secure, temporary access.

2. Upload Model

After creating a bucket, click Upload Object to upload the onnx file. When you click Upload in the OCI Bucket, use these specific settings for the model.onnx file:

Storage Tier: Keep this as Standard (Inherited from bucket).
Object Name: Use a simple name like clip_multimodal.onnx. Avoid special characters or spaces.

3. Grant Database Access

Once we upload the model.onnx file, the easiest way for the database to “see” the file is to create a Pre-Authenticated Request (PAR). This generates a temporary, secure URL that the database can use without needing complex login credentials.

In the Bucket, find the clip_multimodal.onnx file and click the three dots (Actions menu).
Select Create Pre-authenticated Request.
Access Type: Select Object and Permit object reads.
Expiration: Set it for a few weeks (enough time to finish the setup).
Click Create and COPY THE URL IMMEDIATELY. You will not be able to see this URL again.

Load Model into the Database

Autonomous Database supports importing ONNX models for classification, regression, clustering, and embeddings. The models will be imported as first-class MINING MODEL objects in the database schema. Inference can then be done using the family of OML scoring operators, including PREDICTION, CLUSTER, and VECTOR_EMBEDDING.

In the following SQL statement, we use DBMS_CLOUD.GET_OBJECT to fetch each file into a BLOB, then we import the CLIP model via DBMS_DATA_MINING.IMPORT_ONNX_MODEL.

DECLARE 
    ONNX_MOD_FILE VARCHAR2(100) := 'clip_vit_base_patch32_txt.onnx';
    MODNAME VARCHAR2(500);
    LOCATION_URI VARCHAR2(200) := 'PAR URI';

BEGIN
    DBMS_OUTPUT.PUT_LINE('ONNX model file name in Object Storage is: '||ONNX_MOD_FILE); 
--------------------------------------------
-- Define a model name for the loaded model
--------------------------------------------
    SELECT UPPER(REGEXP_SUBSTR(ONNX_MOD_FILE, '[^.]+')) INTO MODNAME from dual;
    DBMS_OUTPUT.PUT_LINE('Model will be loaded and saved with name: '||MODNAME);

-----------------------------------------------------
-- Read the ONNX model file from Object Storage into 
-- the Autonomous Database data pump directory
-----------------------------------------------------

BEGIN DBMS_VECTOR.DROP_ONNX_MODEL(model_name => MODNAME);
EXCEPTION WHEN OTHERS THEN NULL; END;

    DBMS_CLOUD.GET_OBJECT(                            
        credential_name => NULL,
        directory_name => 'DATA_PUMP_DIR',
        object_uri => LOCATION_URI||ONNX_MOD_FILE);

-----------------------------------------
-- Load the ONNX model to the database
-----------------------------------------                   

    DBMS_VECTOR.LOAD_ONNX_MODEL(
        directory => 'DATA_PUMP_DIR',
        file_name => ONNX_MOD_FILE,
        model_name => MODNAME);

    DBMS_OUTPUT.PUT_LINE('New model successfully loaded with name: '||MODNAME);
END;

Verify loaded models by running SELECT model_name, mining_function FROM user_mining_models; to confirm clip_vit_base_patch32_txt is now a “first-class citizen” in our DB.

Phase 5: Create an index

We will need to set up a vector index on our embedding column for significantly faster similarity searches, especially as our dataset grows. Without an index, queries perform exhaustive full-table scans, which slow down noticeably. With an index, Oracle uses approximate nearest neighbor (ANN) search for sub-second response times on large datasets while maintaining high accuracy.

Oracle Database 26ai supports advanced vector indexes like HNSW (Hierarchical Navigable Small World) for excellent accuracy-recall balance with cosine distance (ideal for normalized CLIP embeddings).

CREATE VECTOR INDEX img_hnsw_idx 
ON image_search (embedding) 
ORGANIZATION INMEMORY NEIGHBOR GRAPH
DISTANCE COSINE
WITH TARGET ACCURACY 95

DISTANCE COSINE: Required for CLIP embeddings (which are L2-normalized; cosine is the standard metric).
WITH TARGET ACCURACY 95: Optional but recommended—controls the recall/accuracy tradeoff during approximate searches (higher = better accuracy, slightly slower). You can adjust (e.g., 90–99) or omit for default.

HNSW indexes are in-memory only and load into the database’s Vector Pool. For large datasets (millions of 512-dim vectors), this may take time/memory. The Vector Pool is often auto-managed, and can be monitored via V$VECTOR_MEMORY_POOL.

Run this to check our index is properly created:

SELECT index_name,
       index_type,
       table_name,
       status
FROM all_indexes
WHERE owner = 'ADMIN'
ORDER BY index_name;

We will see our img_hnsw_idx index showing up with

INDEX_TYPE = 'VECTOR'
STATUS = VALID

We can also check vector index metadata in USER_IND_COLUMNS to confirm the column the vector index is built on:

SELECT index_name,
       column_name,
       column_position
FROM user_ind_columns
WHERE index_name = 'IMG_HNSW_IDX';

which will output

INDEX_NAME   COLUMN_NAME COLUMN_POSITION 
------------ ----------- --------------- 
IMG_HNSW_IDX EMBEDDING                 1

Phase 6: Perform a Similarity Search

Now we can now use our CLIP_MODEL directly in SQL to search our images using plain English.

We call VECTOR_EMBEDDING() with our model, passing to it a search text as bind variable
We then use the VECTOR_DISTANCE() function to compute the distance between our embedding column and the query embedding. Use COSINE distance since the pipeline normalizes embeddings.
We order the output by the vector distance between our query and existing embeddings.
The database uses our HNSW index automatically for FETCH APPROX.

SELECT image_name, 
       VECTOR_DISTANCE(embedding, VECTOR_EMBEDDING(clip_vit_base_patch32_txt USING :search_text), COSINE) as score
FROM image_search
ORDER BY score
FETCH APPROX FIRST 10 ROWS ONLY;

SELECT image_name
FROM image_search
ORDER BY VECTOR_DISTANCE(
    embedding, 
    VECTOR_EMBEDDING(CLIP_MODEL USING :search_text), 
    COSINE
)
FETCH FIRST 5 ROWS ONLY;

Benchmarking Latency

Let’s compare different ways to embed and search

Using SentenceTransformer to embed, as shown in sample documentation and my previous post
Convert the CLIP model to ONNX and embed using a local ONNX runtime
Connect to our Oracle database from a Python client and perform embed and search in the database
Directly embed and search in the database using SQL Developer

SentenceTransformer

from sentence_transformers import SentenceTransformer, util 
import time

#Load CLIP model
model = SentenceTransformer('clip-ViT-B-32')
 
start_time = time.perf_counter()
text_emb = model.encode('Two dogs in the snow')
end_time = time.perf_counter()

latency_ms = (end_time - start_time) * 1000
print(f"\nLocal Inference Time: {latency_ms:.2f} ms")

Output:

Local Inference Time: 130.43 ms

Local ONNX

In this test, we are using the same augmented ONNX model from Oracle to ensure a fair comparison.

import numpy as np
import onnxruntime as ort
from onnxruntime_extensions import get_library_path
from pathlib import Path
import time

# 1. Setup Path
current_dir = Path.cwd()
model_file = "clip_vit_base_patch32_txt.onnx"

if not model_file.exists():
    print(f"❌ File not found at: {model_file}")
else:
    # 2. Register the Custom Ops  
    so = ort.SessionOptions()
    so.register_custom_ops_library(get_library_path())

    # 3. Load Session with Custom Options
    session = ort.InferenceSession(str(model_file), sess_options=so)
    print("Model loaded successfully with Custom Extensions!")

    # 4. Prepare Input
    # Note: Augmented models exported from Oracle often expect 'TEXT' or 'DATA'
    input_name = session.get_inputs()[0].name
    text_input = np.array(["a photo of a blue car"], dtype=object)

    # 5. Benchmark
    start_time = time.perf_counter()
    outputs = session.run(None, {input_name: text_input})
    end_time = time.perf_counter()

    latency_ms = (end_time - start_time) * 1000
    print(f"\nLocal Inference Time: {latency_ms:.2f} ms")
    print("Vector (First 5 dims):", outputs[0][0][:5])

Output:

Model loaded successfully with Custom Extensions!

Local Inference Time: 29.47 ms
Vector (First 5 dims): [ 0.01946913 -0.00345288 -0.01435798  0.02153437 -0.00712121]

Python with Oracle

import time
import statistics

def run_separated_benchmarks(text_query="a photo of a blue car", k=5, num_search_runs=10):
    print(f"Separating embedding (once) from search ({num_search_runs} times)")
    print(f"Query: '{text_query}' (k={k})")
    print("First embedding may be cold; searches will benefit from warmed model/index.")
    print("-" * 60)
    
    embedding_latency_ms = None
    search_latencies_ms = []
    
    with pool.acquire() as connection:
        with connection.cursor() as cursor:
            # --- PHASE 1: Single Embedding (timed separately) ---
            t0 = time.perf_counter()
            cursor.execute("""
                SELECT VECTOR_EMBEDDING(CLIP_VIT_BASE_PATCH32_TXT USING :txt AS DATA) 
                FROM DUAL
            """, txt=text_query)
            query_vector = cursor.fetchone()[0]
            t1 = time.perf_counter()
            
            embedding_latency_ms = (t1 - t0) * 1000
            print(f"Embedding Latency: {embedding_latency_ms:.2f} ms")
            print(f"Query Vector (first 5 dims): {query_vector.toarray()[0][:5] if hasattr(query_vector, 'toarray') else query_vector[:5]}")
            print("-" * 60)
            
            # --- PHASE 2: Multiple Searches Using the Same Vector ---
            for i in range(num_search_runs):
                t2 = time.perf_counter()
                cursor.execute("""
                    SELECT /*+ NO_PARALLEL USE_VECTOR_INDEX(img_hnsw_idx) */
                           image_name,
                           VECTOR_DISTANCE(embedding, :bv, COSINE) AS distance
                    FROM image_search
                    ORDER BY distance
                    FETCH APPROX FIRST :k ROWS ONLY
                    WITH TARGET ACCURACY 90
                """, bv=query_vector, k=k)
                
                results = cursor.fetchall()
                t3 = time.perf_counter()
                
                search_latency_ms = (t3 - t2) * 1000
                search_latencies_ms.append(search_latency_ms)
                
                print(f"Search Run {i+1:2d}: {search_latency_ms:6.2f} ms  |  Top result: {results[0][0]} (dist: {results[0][1]:.4f})")
    
    # Search Statistics
    avg_search = sum(search_latencies_ms) / len(search_latencies_ms)
    min_search = min(search_latencies_ms)
    max_search = max(search_latencies_ms)
    median_search = statistics.median(search_latencies_ms)
    
    print("-" * 60)
    print(f"Embedding (once)      : {embedding_latency_ms:.2f} ms")
    print(f"Search Average        : {avg_search:.2f} ms")
    print(f"Search Min            : {min_search:.2f} ms")
    print(f"Search Max            : {max_search:.2f} ms")
    print(f"Search Median         : {median_search:.2f} ms")
    print(f"Steady-State Total (~embedding + search): {embedding_latency_ms + median_search:.2f} ms")

# Run it
run_separated_benchmarks()

Output:

Separating embedding (once) from search (10 times)
Query: 'a photo of a blue car' (k=5)
First embedding may be cold; searches will benefit from warmed model/index.
------------------------------------------------------------
Embedding Latency: 73.18 ms
Query Vector (first 5 dims): array('f', [0.019469130784273148, -0.003452877514064312, -0.014357976615428925, 0.021534373983740807, -0.007121213711798191])
------------------------------------------------------------
Search Run  1: 124.96 ms  |  Top result: bQATESa8mig.jpg (dist: 0.6908)
Search Run  2: 119.27 ms  |  Top result: bQATESa8mig.jpg (dist: 0.6908)
Search Run  3: 107.14 ms  |  Top result: bQATESa8mig.jpg (dist: 0.6908)
Search Run  4: 108.00 ms  |  Top result: bQATESa8mig.jpg (dist: 0.6908)
Search Run  5: 108.30 ms  |  Top result: bQATESa8mig.jpg (dist: 0.6908)
Search Run  6: 107.83 ms  |  Top result: bQATESa8mig.jpg (dist: 0.6908)
Search Run  7: 107.59 ms  |  Top result: bQATESa8mig.jpg (dist: 0.6908)
Search Run  8: 107.66 ms  |  Top result: bQATESa8mig.jpg (dist: 0.6908)
Search Run  9: 109.31 ms  |  Top result: bQATESa8mig.jpg (dist: 0.6908)
Search Run 10: 107.37 ms  |  Top result: bQATESa8mig.jpg (dist: 0.6908)
------------------------------------------------------------
Embedding (once)      : 73.18 ms
Search Average        : 110.74 ms
Search Min            : 107.14 ms
Search Max            : 124.96 ms
Search Median         : 107.92 ms
Steady-State Total (~embedding + search): 181.10 ms

Notice that this steady-state latencies (~73 ms embedding + ~108 ms search, total ~181 ms) include client-side overhead from my local machine (west coast) to the Oracle Cloud Infrastructure (OCI) region hosting my Autonomous Database instance in Phoenix.

Each benchmark query involves:

Network: assuming ~30-50 ms per query (dominant in search latency).
- One round-trip (send SQL + receive small results: top-5 rows, negligible data transfer).
- Potential extra for connection setup/handshake if not pooled perfectly.
Pure database execution: (which we will test next)
- Embedding: (ONNX inference CPU-bound).
- Search: (HNSW traversal + distance calculation).

Measuring Pure Database Performance (Excluding Network)

Let’s check both the embedding and query phases to see if there’s any gain by moving to Oracle Database. Note that the first run for both embedding and query will incur cold-start latency, so a warm up is performed to normalize the time.

To isolate server-side execution time, I will clock both phases inside the database in Database Actions → SQL Worksheet (browser-based, executes server-side with negligible client overhead for timing). I will use a PL/SQL block using DBMS_UTILITY.GET_TIME (10 ms resolution) or SYSTIMESTAMP (fractional seconds precision). I will fetch the top-5 results during search timing to ensure the full ANN execution runs:

SET SERVEROUTPUT ON;

DECLARE
  l_start     TIMESTAMP WITH TIME ZONE;
  l_end       TIMESTAMP WITH TIME ZONE;
  l_embedding VECTOR;

  diff        INTERVAL DAY TO SECOND;
  total_ms    NUMBER;

  -- Cursor fetch (to ensure full ANN execution is timed) for search results
  CURSOR c_search IS
    SELECT /*+ NO_PARALLEL USE_VECTOR_INDEX(img_hnsw_idx) */ image_name
    FROM image_search
    ORDER BY VECTOR_DISTANCE(embedding, l_embedding, COSINE)
    FETCH APPROX FIRST 5 ROWS ONLY WITH TARGET ACCURACY 90;

  l_dummy VARCHAR2(32767);
BEGIN
  -- Optional warm-up
  SELECT VECTOR_EMBEDDING(CLIP_VIT_BASE_PATCH32_TXT USING 'warmup text' AS DATA)
  INTO l_embedding FROM DUAL;

  -- Embedding Timing
  l_start := SYSTIMESTAMP;
  SELECT VECTOR_EMBEDDING(CLIP_VIT_BASE_PATCH32_TXT USING 'a photo of a blue car' AS DATA)
  INTO l_embedding FROM DUAL;
  l_end := SYSTIMESTAMP;

  diff := l_end - l_start;
  total_ms := 
    EXTRACT(DAY FROM diff)    * 86400000 +
    EXTRACT(HOUR FROM diff)   * 3600000 +
    EXTRACT(MINUTE FROM diff) * 60000 +
    EXTRACT(SECOND FROM diff) * 1000;

  DBMS_OUTPUT.PUT_LINE('Pure Embedding Time: ' || TO_CHAR(total_ms, 'FM9990.000') || ' ms');

  -- Search Timing
  l_start := SYSTIMESTAMP;
  OPEN c_search;
  FOR i IN 1..5 LOOP
    FETCH c_search INTO l_dummy;
    EXIT WHEN c_search%NOTFOUND;
  END LOOP;
  CLOSE c_search;
  l_end := SYSTIMESTAMP;

  diff := l_end - l_start;
  total_ms := 
    EXTRACT(DAY FROM diff)    * 86400000 +
    EXTRACT(HOUR FROM diff)   * 3600000 +
    EXTRACT(MINUTE FROM diff) * 60000 +
    EXTRACT(SECOND FROM diff) * 1000;

  DBMS_OUTPUT.PUT_LINE('Pure Search Time: ' || TO_CHAR(total_ms, 'FM9990.000') || ' ms');
END;
/

Running this multiple times in SQL Worksheet, we get these results:

Pure Embedding Time: 12.730 ms\
Pure Search Time: 2.118 ms\  

Pure Embedding Time: 12.524 ms\
Pure Search Time: 2.203 ms\ 
 
Pure Embedding Time: 12.960 ms\
Pure Search Time: 2.199 ms

This is sub-20 ms end-to-end inside the database, which is blazing fast for a general-purpose RDBMS with custom ONNX models and vector indexing, all hosted in a free-tier shared infrastructure! Breaking it down:

Pure Embedding Time (in-DB ONNX CLIP-ViT-B-32 text inference): ~12.5–13 ms consistently. ONNX CLIP inference is fast when cached
Pure Search Time (HNSW ANN on 25k × 512-dim vectors is extremely quick in-memory, top-5): ~2.1–2.2 ms consistently.
Total Pure Database Execution: ~14.6–15.2 ms per query (embedding + search).

On scaled OCPU or dedicated hardware, sub-10 ms search is standard for this scale. Our runs prove that the HNSW index + query are extremely efficient server-side due to:

Full in-memory caching: After 1–2 runs, the entire HNSW graph resides in the Vector Memory Pool (SGA), and the ONNX model session is persistently loaded. The “cold start” penalty (model/index loading) is gone in these runs.
Result/Plan Caching: The query is deterministic (same text → same embedding vector → same top-5 results). Oracle aggressively caches:
- Scalar subquery results (embedding vector).
- Full query results (via implicit result cache for small outputs).
- Execution plans.
Optimized Path: No network, no row fetching overhead in our timing (cursor fetch is minimal), pure ANN traversal on warmed data → often <10 ms real time for 25k × 512-dim vectors on decent CPU.

compare with client-side

This pure numbers should confirm network’s share in our earlier client-side benchmarks (~73 ms embed + ~108 ms search = ~181 ms total).

~165 ms was network + client overhead (mostly round-trip latency from west coast to OCI Phoenix region—likely ~50–100 ms RTT + oracledb handling).
Pure DB work is only ~8–10% of what we saw from a local machine!

If we want to optimize further, we can consider co-locating our app closer to the ADB region (e.g., OCI Compute in same region) for sub-100 ms end-to-end.

compare with pgvector/chroma

As for local pgvector/Chroma (~1–5 ms search, ~29 ms local embed), Oracle matches or beats on search when normalized for free-tier hardware/shared cloud infra, and wins query embedding with in-DB embedding <13ms, due to no data movement/external calls.

Latency Comparison Breakdown

For a dataset of 25k images, we have seen in the above tests how the switch from a “Local Python + Vector DB” setup to “In-Database AI” impacts latency:

Phase	Local Pipeline (PyTorch / ONNX + pgvector/Chroma)	Oracle OADB 26ai (In‑Database ONNX + HNSW)	Technical Notes / Why OADB Wins
Embedding Generation	PyTorch CPU: ~130 ms Local ONNX Runtime: ~30 ms	In‑Database ONNX Runtime: ~13 ms	OADB uses the same C++ ONNX Runtime engine we use locally. The speedup vs PyTorch comes from skipping Python + Torch overhead.
IPC / Data Transfer	2–10 ms (Python → DB driver serialization → network hop → vector DB deserialization)	0 ms	OADB executes embedding + vector search inside the kernel. No serialization, no network, no Python IPC.
Similarity Search (25k vectors)	1–5 ms (pgvector/Chroma + network)	2 ms	For 25k vectors, HNSW search is sub‑millisecond on any modern CPU. OADB’s HNSW index is implemented natively inside the SQL engine with shared‑memory access.
Total Latency	~140 ms (PyTorch path) ~33–40 ms (ONNX local path)	~15 ms	OADB removes Python overhead and network hops. Gains are largest when replacing PyTorch.
Overall Improvement	—	~4x faster embedding vs PyTorch 2.3× faster embedding vs local ONNX 55–67% faster end‑to‑end vs local ONNX ~89% faster end‑to‑end vs PyTorch pipelines	The biggest win is eliminating Python + network + driver overhead.

Massive End‑to‑End Latency Reduction

~89% faster end‑to‑end vs existing PyTorch + pgvector/Chroma pipeline
(133–145 ms → 15 ms)
55–67% faster end‑to‑end vs local ONNX + pgvector/Chroma pipeline
(33–45 ms → 15 ms)

Embedding Performance Gains

4.3× faster by replacing PyTorch with ONNX, (130 ms → 30 ms)
2.3× faster embedding vs local ONNX
(30 ms → 13 ms)
10× faster embedding vs PyTorch when using OADB’s in‑database ONNX
(130 ms → 13 ms)
Query Embedding Phase Gain: The primary gain here isn’t the inference speed (ONNX is ONNX), but the elimination of context switching. In a local setup, the CPU must switch context from the Python app to the Database process. In OADB, the execution stays within the database’s thread/process context.
Search Phase Gain: In pgvector or Chroma, the search engine receives a “blob” or “array” of numbers via the SQL/gRPC protocol. It must validate and parse this. In OADB, the result of the VECTOR_EMBEDDING function is already a native VECTOR type in memory, ready for the VECTOR_DISTANCE operator.

Potential batch throughput gains

While the latency gain for a single query is modest (about 10-20ms saved), the throughput difference for batch operations can be massive:

Parallel Execution: OADB can parallelize the VECTOR_EMBEDDING function across all available OCPUs. In a local Python setup, you are often limited by the Python Global Interpreter Lock (GIL) or the overhead of managing a multiprocessing pool to feed the DB.
Pipelining: In OADB, the database engine can pipeline the embedding of one query while the distance operator is searching for the previous one.
Batch Ingestion: If we ever need to re-embed our 25k images (e.g., moving to a newer CLIP model), OADB can do this with a single UPDATE statement. This will be 3x–5x faster than a Python script that has to pull the image, embed it, and send the vector back to the DB over a network/socket.

Conclusion

In this post, I have scratched the surface of Oracle AI Autonomous Database by testing its in-memory ONNX runtime and integrated vector store. I’ve shown that it delivers impressive embedding and query result by:

Eliminating Python overhead, driver serialization, and network hops
Running embedding + vector search inside the database kernel
Providing a predictable, low‑variance latency ideal for real‑time search UX

In my next post, I will create a search frontend and test OADB with image and spatial query. I am also curious about 26ai’s native JSON support that should make my web scrap ingestion much faster. I also want to try out the Natural-Language-to-SQL via Select AI, which allows a user to ask questions in English and have the database calls a preconfigured LLM provider to generate SQL statement for query, or peruse the local vector index for RAG. Stay tuned!