API Documentation
Complete reference for integrating with alAPI's LLM, OCR, and Retrieval services
https://dev.alapi.deep.sa/v1
OpenAI Compatible Use the official OpenAI SDK with our base URL. Drop-in replacement for existing applications.
LLM API
OpenAI-compatible API for chat completions and embeddings. Use your favorite models through a unified interface.
Authentication
All API requests require authentication using a Bearer token in the Authorization header.
Request Header:
Authorization: Bearer YOUR_API_KEY
API Key: Generate API keys from your Dashboard.
SDK Setup Example:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://dev.alapi.deep.sa/v1"
)
Chat Completions
Create Chat Completion
Creates a model response for the given chat conversation
Endpoint:
https://dev.alapi.deep.sa/v1/chat/completions
Request Body:
{
"model": "llama-3.3-70b-versatile",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
"temperature": 0.7,
"max_tokens": 1024,
"stream": false
}
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | ID of the model to use |
messages |
array | Yes | Array of message objects with role and content |
temperature |
number | No | Sampling temperature (0-2). Default: 1 |
max_tokens |
integer | No | Maximum tokens to generate |
stream |
boolean | No | If true, returns a stream of events |
top_p |
number | No | Nucleus sampling parameter. Default: 1 |
Response (200 OK):
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1706745000,
"model": "llama-3.3-70b-versatile",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 20,
"completion_tokens": 10,
"total_tokens": 30
}
}
Code Examples:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://dev.alapi.deep.sa/v1"
)
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of Saudi Arabia?"}
],
temperature=0.7,
max_tokens=1024
)
print(response.choices[0].message.content)
Streaming Responses
Server-Sent Events (SSE)
Stream responses token by token for real-time output
How it works: Set stream: true in your request. The response will be sent as Server-Sent Events, with each chunk containing a delta of the response content.
Code Examples:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://dev.alapi.deep.sa/v1"
)
stream = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[
{"role": "user", "content": "Write a short poem about coding"}
],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Embeddings
Create Embeddings
Creates an embedding vector representing the input text
Endpoint:
https://dev.alapi.deep.sa/v1/embeddings
Request Body:
{
"model": "deep-sa/alEmbedding",
"input": "The quick brown fox jumps over the lazy dog"
}
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | ID of the embedding model to use |
input |
string | array | Yes | Text to embed. Can be a string or array of strings |
encoding_format |
string | No | Format for the embeddings: 'float' or 'base64'. Default: float |
dimensions |
integer | No | Number of dimensions for the output embeddings (model-dependent) |
Response (200 OK):
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0023064255, -0.009327292, ...]
}
],
"model": "deep-sa/alEmbedding",
"usage": {
"prompt_tokens": 9,
"total_tokens": 9
}
}
Code Examples:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://dev.alapi.deep.sa/v1"
)
response = client.embeddings.create(
model="deep-sa/alEmbedding",
input="The quick brown fox jumps over the lazy dog"
)
embedding = response.data[0].embedding
print(f"Embedding dimension: {len(embedding)}")
Available Models
List Models
Returns the list of currently available models
Endpoint:
https://dev.alapi.deep.sa/v1/models
Required Scope:
This endpoint requires an API key with the
models
Response (200 OK):
{
"object": "list",
"data": [
{
"id": "deep-sa/alEmbedding",
"object": "model",
"created": 1769554378,
"owned_by": "deepcloud",
"type": "embedding"
},
{
"id": "deep-sa/alLLM",
"object": "model",
"created": 1769554340,
"owned_by": "deepcloud",
"type": "llm"
},
{
"id": "google/gemini-2.5-flash",
"object": "model",
"created": 1764841607,
"owned_by": "google_gemini",
"type": "llm"
},
{
"id": "google/gemini-2.5-flash-lite",
"object": "model",
"created": 1764841599,
"owned_by": "google_gemini",
"type": "llm"
},
{
"id": "google/gemini-3-flash",
"object": "model",
"created": 1766048167,
"owned_by": "google_gemini",
"type": "llm"
},
{
"id": "google/gemini-3-pro",
"object": "model",
"created": 1765783673,
"owned_by": "google_gemini",
"type": "llm"
},
{
"id": "gpt-oss-120b",
"object": "model",
"created": 1764841644,
"owned_by": "groq",
"type": "llm"
},
{
"id": "gpt-oss-20b",
"object": "model",
"created": 1764841650,
"owned_by": "groq",
"type": "llm"
},
{
"id": "llama-3.3-70b",
"object": "model",
"created": 1764841634,
"owned_by": "groq",
"type": "llm"
},
{
"id": "llama-4-maverick-17b",
"object": "model",
"created": 1765976388,
"owned_by": "groq",
"type": "llm"
},
{
"id": "openai/gpt-5-mini",
"object": "model",
"created": 1765786635,
"owned_by": "openai",
"type": "llm"
},
{
"id": "qwen3-32b",
"object": "model",
"created": 1764841624,
"owned_by": "groq",
"type": "llm"
}
]
}
Code Examples:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://dev.alapi.deep.sa/v1"
)
models = client.models.list()
for model in models.data:
print(f"{model.id} ({model.type})")
The following models are currently available through alAPI. Use the model name in your API requests.
| Model Name | Type | Provider | Avg Latency |
|---|---|---|---|
deep-sa/alEmbedding
|
embedding | deepcloud | ~585ms |
deep-sa/alLLM
|
llm | deepcloud | ~2913ms |
google/gemini-2.5-flash
|
llm | google_gemini | ~6457ms |
google/gemini-2.5-flash-lite
|
llm | google_gemini | - |
google/gemini-3-flash
|
llm | google_gemini | ~3470ms |
google/gemini-3-pro
|
llm | google_gemini | ~17051ms |
gpt-oss-120b
|
llm | groq | ~1625ms |
gpt-oss-20b
|
llm | groq | ~900ms |
llama-3.3-70b
|
llm | groq | ~767ms |
llama-4-maverick-17b
|
llm | groq | ~1370ms |
openai/gpt-5-mini
|
llm | openai | ~5616ms |
qwen3-32b
|
llm | groq | ~888ms |
OCR API
Extract text from documents with advanced Arabic and English OCR. Supports PDFs and images with automatic deskewing and layout detection.
Supported file formats
Supports Arabic and English text extraction with automatic language detection
Upload Document
Upload File for OCR
Upload a document and start OCR processing
Endpoint:
https://dev.alapi.deep.sa/v1/ocr/upload
Request Format:
Content-Type: multipart/form-data
| Name | Type | Description |
|---|---|---|
file |
file | Document file (PDF) |
Response (200 OK):
{
"token": "abc123xyz...",
"status": "queued",
"progress": 0,
"upload_progress": 100,
"queue_position": 1
}
Code Examples:
import requests
url = "https://dev.alapi.deep.sa/v1/ocr/upload"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
with open("document.pdf", "rb") as f:
files = {"file": f}
response = requests.post(url, headers=headers, files=files)
result = response.json()
token = result["token"]
print(f"Job started, token: {token}")
Job Status
Get Job Status
Retrieve the current status and results of an OCR job
Endpoint:
https://dev.alapi.deep.sa/v1/ocr/jobs/{token}
Status Values:
| Status | Description |
|---|---|
queued
|
Job is waiting in the processing queue |
processing
|
Document is being processed |
done
|
Processing complete. Results available in the pages array |
error
|
Processing failed. Check the error field for details |
Response (when status is done):
{
"filename": "document.pdf",
"status": "done",
"progress": 100,
"total_pages": 5,
"pages": [
{
"page_num": 1,
"text": "Extracted text from page 1...",
"num_segments": 12,
"elapsed": 2.45
}
],
"expires_at": "2026-02-05T18:30:00Z"
}
Code Examples:
curl "https://dev.alapi.deep.sa/v1/ocr/jobs/YOUR_TOKEN_HERE" \
-H "Authorization: Bearer YOUR_API_KEY"
Retry a Job
Resubmit a failed or stuck job without re-uploading the file. alAPI keeps a copy of every upload in object storage for the lifetime of the job, so retries reuse that copy. The client-facing token stays the same across retries.
Retry OCR Job
Safe to call on any job in failed state with retryable: true.
Endpoint:
https://dev.alapi.deep.sa/v1/ocr/jobs/{token}/retry
Response (200 OK):
{
"token": "…same token you already have…",
"status": "pending",
"retry_count": 1
}
Error responses:
| Status | Meaning |
|---|---|
| 404 | Token unknown or not yours |
| 409 | Job is already done / still uploading |
| 422 | Legacy job without a retryable copy — re-upload |
| 429 | Hit retry_count cap (3 by default) |
| 502 | Upstream alOCR transport failure — safe to retry |
Tip: a background worker automatically retries failed-retryable jobs with exponential backoff. You only need to call /retry if you want to fail fast or surface an error to a user.
cURL:
curl -X POST "https://dev.alapi.deep.sa/v1/ocr/jobs/YOUR_TOKEN_HERE/retry" \
-H "Authorization: Bearer YOUR_API_KEY"
Get Single Page
Get Page by Number
Retrieve OCR text and thumbnail for a single page
Endpoint:
https://dev.alapi.deep.sa/v1/ocr/jobs/{token}/{page_num}
Request Format:
| Name | Type | Description |
|---|---|---|
token |
String | Access token received from upload |
page_num |
Integer | Page number (1-indexed) |
Response (200 OK):
{
"page_num": 1,
"text": "Extracted text from this page...",
"thumbnail": {
"url": "https://objectstorage.me-jeddah-1.oraclecloud.com/..."
},
"image": {
"url": "https://...presigned-s3-url..."
},
"status": "done",
"num_segments": 12,
"elapsed": 3.45
}
The image field is null if the image is not yet available (job in progress or images not uploaded).
Code Examples:
import requests
API_BASE = "https://dev.alapi.deep.sa/v1"
API_KEY = "YOUR_API_KEY"
headers = {"Authorization": f"Bearer {API_KEY}"}
token = "YOUR_TOKEN_HERE"
page_num = 3
response = requests.get(
f"{API_BASE}/ocr/jobs/{token}/{page_num}",
headers=headers
)
page = response.json()
print(f"Page {page['page_num']}: {page['status']}")
print(page["text"])
Error Handling:
| Status | Description |
|---|---|
400 |
page_num must be >= 1 |
404 |
Job or page not found |
410 |
Job has expired |
Thumbnails
Get Page Thumbnails
Retrieve page thumbnails with pagination
Endpoint:
https://dev.alapi.deep.sa/v1/ocr/thumbnails/{token}?start=0&limit=10
Response:
{
"thumbnails": [
{
"page_num": 1,
"data": "data:image/png;base64,iVBORw0KGgo..."
}
],
"start": 0,
"end": 10,
"total_pages": 25,
"has_more": true
}
Page Images
Get Full-Resolution Page Images
Retrieve paginated full-resolution page images (same pattern as thumbnails)
Endpoint:
https://dev.alapi.deep.sa/v1/ocr/images/{token}?start=0&limit=10
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
token |
String | Yes | Access token received from upload |
start |
Integer | No | Offset to start from (default: 0) |
limit |
Integer | No | Number of images to return (default: 10, max: 50) |
Response (200 OK):
{
"images": [
{
"page_num": 1,
"url": "https://...presigned-s3-url..."
}
],
"start": 0,
"end": 10,
"total_pages": 25,
"has_more": true
}
When S3 is not configured, images are returned as inline base64 data URIs instead of presigned URLs.
Base64 Fallback (no S3):
{
"images": [
{
"page_num": 1,
"data": "data:image/png;base64,..."
}
],
// ...pagination fields same as above
}
alRetrieval API
Ingest documents and query them with natural-language semantic retrieval. Supports PDF, DOCX, PPTX, TXT, and Markdown files.
Supported file formats
PDF, DOCX, PPTX, TXT, MD
PDFs are processed via OCR; DOCX, PPTX, TXT, and MD have text extracted directly — no OCR cost
Scope Required: All alRetrieval endpoints require an API key with the alretrieval scope. Pass it as a Bearer token in the Authorization header.
Create Collection
Create a New Collection
Creates an empty upstream collection owned by the authenticated user. Returns the collection_id used in all subsequent requests.
Endpoint:
POST /v1/alRetrieval/collections
Response (200 OK):
{
"collection_id": "9f3f8d5a0c9a4fe8b5d8a7bc0f13d2e1"
}
Code Examples:
import requests
headers = {"Authorization": "Bearer YOUR_API_KEY"}
response = requests.post("https://dev.alapi.deep.sa/v1/alRetrieval/collections", headers=headers)
collection_id = response.json()["collection_id"]
print(f"Collection: {collection_id}")
List Collections
List Your Collections
Returns all collections owned by the authenticated user, sorted by creation date (newest first).
Endpoint:
GET /v1/alRetrieval/collections
Response (200 OK):
{
"collections": [
{
"collection_id": "9f3f8d5a0c9a4fe8b5d8a7bc0f13d2e1",
"created_at": "2026-04-26T12:00:00Z"
}
]
}
Code Examples:
import requests
headers = {"Authorization": "Bearer YOUR_API_KEY"}
response = requests.get("https://dev.alapi.deep.sa/v1/alRetrieval/collections", headers=headers)
for col in response.json()["collections"]:
print(col["collection_id"], col["created_at"])
Ingest Document
Upload Document for Ingestion
Upload a file or send pre-extracted pages. Processing is asynchronous — poll the status endpoint until ready.
Endpoint:
POST /v1/alRetrieval/collections/{collection_id}/documents/ingest
Supported Content Types:
| Content-Type | Description |
|---|---|
multipart/form-data |
File upload (PDF, DOCX, PPTX, TXT, MD). Form field: file |
application/json |
Pre-extracted pages as JSON (title, filename, pages array) |
Processing Pipelines by File Type:
| Extension | Pipeline | Description |
|---|---|---|
.pdf |
OCR | Sent to AlOCR for page-by-page extraction. Charges OCR page balance. |
.docx |
Direct | Text extracted from Word XML. Pages split on page breaks; falls back to one page if no breaks are present. |
.pptx |
Direct | Text extracted from slide XML. Each slide becomes a page. |
.txt / .md |
Direct | Content used as-is (must be valid UTF-8). Single page. |
Response (200 OK):
{
"id": "a1b2c3d4e5f6",
"status": "queued",
"pages": 12,
"collection": "9f3f8d5a0c9a4fe8b5d8a7bc0f13d2e1"
}
Code Examples:
import requests
collection_id = "9f3f8d5a0c9a4fe8b5d8a7bc0f13d2e1"
url = f"https://dev.alapi.deep.sa/v1/alRetrieval/collections/{collection_id}/documents/ingest"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
with open("report.pdf", "rb") as f:
doc = requests.post(url, headers=headers, files={"file": f}).json()
print(f"doc_id={doc['id']} status={doc['status']}")
Error Handling:
| Status | Description |
|---|---|
400 | Invalid request, missing file, unsupported file type, or empty content |
402 | Insufficient OCR page balance (PDF only) |
404 | Collection not found or not owned by caller |
415 | Unsupported Content-Type |
Document Status
Get Document Status
Check the processing status of an ingested document
Endpoint:
GET /v1/alRetrieval/collections/{collection_id}/documents/{doc_id}/status
Status Values:
| Status | Description |
|---|---|
processing | Document is being processed (OCR, text extraction, or indexing) |
ready | Document is indexed and ready for queries |
error | Processing failed. Check the error field for details. |
Response (200 OK):
{
"doc_id": "a1b2c3d4e5f6",
"status": "ready"
}
Code Examples:
import requests, time
collection_id = "9f3f8d5a0c9a4fe8b5d8a7bc0f13d2e1"
doc_id = "a1b2c3d4e5f6"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
while True:
resp = requests.get(
f"https://dev.alapi.deep.sa/v1/alRetrieval/collections/{collection_id}/documents/{doc_id}/status",
headers=headers
).json()
if resp["status"] == "ready": break
if resp["status"] == "error": raise RuntimeError(resp.get("error"))
time.sleep(2)
Query Document
Semantic Retrieval Query
Search an ingested document with a natural-language question. Charges one LLM request.
Endpoint:
POST /v1/alRetrieval/collections/{collection_id}/query
Request Body:
| Name | Type | Description |
|---|---|---|
query * |
string | Natural-language question to search the document |
doc_ids |
string[] | Optional list of doc_id values to scope the query. Omit to search all ready documents in the collection. |
Response (200 OK):
{
"context": [
{
"title": "Section Title",
"page_start": 3,
"page_end": 3,
"score": 0.85,
"snippet": "Relevant text from the document...",
"is_direct_hit": true,
"document_id": "a1b2c3d4e5f6"
}
],
"effective_scope_doc_ids": ["a1b2c3d4e5f6"]
}
Response Fields (context array):
| Name | Description |
|---|---|
context[] | Section or chunk title |
score | Relevance score (higher = more relevant) |
snippet | Preview of the matching content |
document_id | Bridge document ID from which the context chunk was retrieved |
effective_scope_doc_ids | Actual doc_ids searched (equals request doc_ids, or all ready docs in collection when omitted) |
Code Examples:
import requests
collection_id = "9f3f8d5a0c9a4fe8b5d8a7bc0f13d2e1"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
response = requests.post(
f"https://dev.alapi.deep.sa/v1/alRetrieval/collections/{collection_id}/query",
headers=headers,
json={"query": "What are the key findings?"}
)
result = response.json()
for chunk in result["context"]:
print(f"[score={chunk['score']:.2f}] {chunk['snippet'][:100]}")
Error Handling:
| Status | Description |
|---|---|
400 | Invalid request body |
402 | Request limit exceeded |
404 | Document not found or not owned by caller |
422 | Missing query field |
List Collection Documents
List Documents in a Collection
Returns all documents in a collection with enriched metadata including a presigned S3 download URL for each ingested document.
Endpoint:
GET /v1/alRetrieval/collections/{collection_id}/documents
Response (200 OK):
{
"collection_id": "9f3f8d5a0c9a4fe8b5d8a7bc0f13d2e1",
"documents": [
{
"doc_id": "a1b2c3d4e5f6",
"filename": "report.pdf",
"status": "ready",
"mime_type": "application/pdf",
"pages": 12,
"file_url": "https://…?expires=3600",
"created_at": "2026-04-26T12:00:00Z"
}
]
}
Each ingested document includes a time-limited file_url (presigned S3 URL) valid for 1 hour. Re-fetch the endpoint to get a fresh URL.
Response Fields (context array):
| Name | Description |
|---|---|
doc_id | Unique document ID (bridge record ID) |
filename | Original uploaded filename |
status | Document is indexed and ready for queries |
file_url | Presigned S3 download URL for the stored source payload/file. Expires in 1 hour. |
pages | Page range of the matching chunk |
Code Examples:
import requests
collection_id = "9f3f8d5a0c9a4fe8b5d8a7bc0f13d2e1"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
response = requests.get(
f"https://dev.alapi.deep.sa/v1/alRetrieval/collections/{collection_id}/documents",
headers=headers
)
for doc in response.json()["documents"]:
print(doc["doc_id"], doc["filename"], doc["status"])
Error Handling
The API uses standard HTTP status codes to indicate success or failure of requests.
Error Response Format:
{
"error": {
"message": "Error description",
"type": "invalid_request_error",
"code": "invalid_api_key"
}
}
Ready to Get Started?
Generate an API key from your dashboard and start building