API Access
Integrate Roysa's traceable multimodal AI — documents, images, audio, and video — into your applications
API Credits
Balance: -- credits
Minimum purchase: $10
Your API Key
BetaLoading your API key...
Quick Start
Works with documents, images, audio & video
# Works for PDF, image, audio (mp3/wav/...), and video (mp4/mov/...)
curl -X POST https://roysa-chatbot-781352878414.us-central1.run.app/extract \
-H "X-API-Key: YOUR_API_KEY" \
-F "file=@invoice.pdf" \
-F 'fields=[{"name":"Vendor"},{"name":"Invoice Number"},{"name":"Total Amount"},{"name":"Due Date"}]'
Input Formats
In the table below, Docs means any document format here. Every document
endpoint (extract, extract-schema, generate-schema, parse, classify, split, ask, review, verify, compute,
geo, redact, translate) accepts all of them — non-PDF documents are converted to PDF automatically before
grounded extraction, so behavior is identical across formats.
Max file size 50 MB. Audio/Video are accepted by /extract, /transcribe,
and /process-media; for Ask/Review/Compute on media, transcribe first then pass the
transcript as text / transcript_text.
Endpoint Reference
1 · Extract data
Turn a document into structured data. Don't know the fields yet? Start with Generate schema. Want a few flat fields? Use Extract. Nested data or repeating rows? Use Extract (schema). Need the whole doc as markdown/blocks? Use Parse.
| Task | Endpoint | What it does · when to use | Cost |
|---|---|---|---|
| Generate schema | POST /generate-schema |
Infers the field list (a JSON Schema) from sample docs. Use first, when you don't know what fields exist — then feed it into Extract. | 1 / sample |
| Extract | POST /extract |
Pulls a flat list of fields you name (e.g. name, date, total) → values + confidence + bounding boxes. Also reads audio/video. | 1 / page |
| Extract (schema) | POST /extract-schema |
Same idea, but for nested objects & repeating rows (line items, multiple policies) described by a JSON Schema. | 1 / page |
| Parse | POST /parse |
Converts the whole document to clean markdown + typed layout blocks (headings/tables/…); grounded=true adds a box per block. |
1 / page |
2 · Understand & route
Figure out what a document is, where it splits, or just ask it a question.
| Task | Endpoint | What it does · when to use | Cost |
|---|---|---|---|
| Classify | POST /classify |
Identifies the document type (invoice, COI, resume…) + confidence + alternatives. | 1 / req |
| Split | POST /split |
Finds boundaries inside a multi-document PDF pack and labels each segment. | 1 / req |
| Ask | POST /document-ask |
Free-form Q&A / summaries → answer + grounded references. Reuse session_id for follow-ups. |
1 / req |
3 · Verify deterministic
Check declared rules against grounded values — the same document + same rules always give the identical verdict. Define review saves the rules once; Review runs them on an incoming doc; Verify is the same check framed as a yes/no gate for a doc you're about to send out; Compute derives new numbers from grounded fields.
| Task | Endpoint | What it does · when to use | Cost |
|---|---|---|---|
| Define review | POST /reviews |
Saves a named set of criteria once → review_id. A reusable template; checks nothing by itself. |
— |
| Review | POST /review |
Runs the check: document + criteria (or a saved review_id) → pass/fail per criterion, each with the field, operator, reference, extracted value, and source box. |
1 / req |
| Verify | POST /verify |
Same engine as Review, for an outbound/generated doc → adds a top-level verified true/false gate. |
1 / req |
| Compute | POST /compute |
Derives new values from grounded fields (sum/avg/min/max…) with provenance — e.g. a total that isn't printed in the doc. | 1 / req |
4 · Transform
Hand back a modified version or a derived view of the document.
| Task | Endpoint | What it does · when to use | Cost |
|---|---|---|---|
| Redact | POST /redact |
Finds sensitive info (PII) and, with apply=true, blacks it out. The opposite of Extract — remove vs read. |
1 / req |
| Translate | POST /translate-document |
Translates the document into another language → translated PDF. | 1 / page |
| Geo | POST /geo |
Extracts geographic entities (addresses, parcels, coordinates) with boxes — for mapping. | 1 / req |
5 · Media audio & video
Speech-to-text and media analysis. For Ask/Review/Compute on media, transcribe first, then pass the transcript as text.
| Task | Endpoint | What it does · when to use | Cost |
|---|---|---|---|
| Transcribe | POST /transcribe |
Audio/video → speaker-labeled transcript PDF. | 3 / min |
| Process media | POST /process-media |
Audio/video → transcript_segments + video_intelligence JSON (feed into Ask/Review/Compute). |
3 / min |
Authentication: Pass your key as X-API-Key: rk_... or
Authorization: Bearer rk_....
Deterministic review (the differentiator): POST /review takes declared
criteria (field, operator, value) and returns a
white-box, reproducible verdict — the same document + same criteria yield identical verdicts every run.
Operators: equals, not_equals, gt/gte/lt/lte, before/after/on_or_before/on_or_after,
contains/not_contains, in/not_in, matches, starts_with/ends_with (+ not_), between, is_empty/is_not_empty,
is_true/is_false. The token today resolves to as_of for reproducible date
checks. Persist a named review with POST /reviews and re-invoke it with
/review?review_id=….
Computed fields: POST /compute derives values
(sum, difference, product, quotient, average, min, max, count, concat) from grounded inputs and
returns provenance (which inputs fed each value). A missing input yields null — never a
fabricated number.
Audio/Video with Ask: Call POST /process-media first to get
transcript_segments, join the text, then send it as transcript_text to
POST /document-ask. For Review / Compute on audio/video, pass that same
joined transcript as text / transcript_text to POST /review or
POST /compute (no new media path).
Sessions: The session_id in every /document-ask response can be
reused for follow-up questions on the same document — no re-upload and 1 credit per question.
Supported audio: MP3, WAV, FLAC, OGG, M4A, AAC, WMA, OPUS · Video: MP4, MOV, AVI, MKV, WEBM, WMV, FLV, M4V
Error codes: 402 Insufficient credits · 400 Bad
request · 500 Processing error
Rate Limits
Need higher limits? Contact us for custom plans.