Loading...

Please wait...

API Access

Integrate Roysa's traceable multimodal AI — documents, images, audio, and video — into your applications

API Credits

Balance: -- credits

¢8.5 /credit
$
≈ 118 pages

Minimum purchase: $10

Your API Key

Beta

Loading your API key...

Quick Start

Works with documents, images, audio & video

Endpoint: POST /extract Input: PDF, Image, Audio, Video Output: JSON — fields, confidence, bounding boxes, speakers, timestamps Cost: 1 credit / page · 3 credits / minute for audio & video
Extract structured fields from a PDF, image, audio, or video
# Works for PDF, image, audio (mp3/wav/...), and video (mp4/mov/...)
curl -X POST https://roysa-chatbot-781352878414.us-central1.run.app/extract \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "file=@invoice.pdf" \
  -F 'fields=[{"name":"Vendor"},{"name":"Invoice Number"},{"name":"Total Amount"},{"name":"Due Date"}]'

Input Formats

In the table below, Docs means any document format here. Every document endpoint (extract, extract-schema, generate-schema, parse, classify, split, ask, review, verify, compute, geo, redact, translate) accepts all of them — non-PDF documents are converted to PDF automatically before grounded extraction, so behavior is identical across formats.

PDFscanned & digital
ImagesJPEG, PNG, TIFF, WEBP, GIF, BMP
Word.doc, .docx, ODT, RTF
Excel / CSV.xls, .xlsx, ODS, .csv, .tsv
PowerPoint.ppt, .pptx, ODP
Plain text.txt, .md, .json, .xml, .yaml, .html
AudioMP3, WAV, FLAC, M4A, AAC, OGG, OPUS, WMA
VideoMP4, MOV, AVI, MKV, WEBM, WMV, FLV, M4V

Max file size 50 MB. Audio/Video are accepted by /extract, /transcribe, and /process-media; for Ask/Review/Compute on media, transcribe first then pass the transcript as text / transcript_text.

Endpoint Reference

1 · Extract data

Turn a document into structured data. Don't know the fields yet? Start with Generate schema. Want a few flat fields? Use Extract. Nested data or repeating rows? Use Extract (schema). Need the whole doc as markdown/blocks? Use Parse.

TaskEndpointWhat it does · when to useCost
Generate schema POST /generate-schema Infers the field list (a JSON Schema) from sample docs. Use first, when you don't know what fields exist — then feed it into Extract. 1 / sample
Extract POST /extract Pulls a flat list of fields you name (e.g. name, date, total) → values + confidence + bounding boxes. Also reads audio/video. 1 / page
Extract (schema) POST /extract-schema Same idea, but for nested objects & repeating rows (line items, multiple policies) described by a JSON Schema. 1 / page
Parse POST /parse Converts the whole document to clean markdown + typed layout blocks (headings/tables/…); grounded=true adds a box per block. 1 / page

2 · Understand & route

Figure out what a document is, where it splits, or just ask it a question.

TaskEndpointWhat it does · when to useCost
Classify POST /classify Identifies the document type (invoice, COI, resume…) + confidence + alternatives. 1 / req
Split POST /split Finds boundaries inside a multi-document PDF pack and labels each segment. 1 / req
Ask POST /document-ask Free-form Q&A / summaries → answer + grounded references. Reuse session_id for follow-ups. 1 / req

3 · Verify deterministic

Check declared rules against grounded values — the same document + same rules always give the identical verdict. Define review saves the rules once; Review runs them on an incoming doc; Verify is the same check framed as a yes/no gate for a doc you're about to send out; Compute derives new numbers from grounded fields.

TaskEndpointWhat it does · when to useCost
Define review POST /reviews Saves a named set of criteria once → review_id. A reusable template; checks nothing by itself.
Review POST /review Runs the check: document + criteria (or a saved review_id) → pass/fail per criterion, each with the field, operator, reference, extracted value, and source box. 1 / req
Verify POST /verify Same engine as Review, for an outbound/generated doc → adds a top-level verified true/false gate. 1 / req
Compute POST /compute Derives new values from grounded fields (sum/avg/min/max…) with provenance — e.g. a total that isn't printed in the doc. 1 / req

4 · Transform

Hand back a modified version or a derived view of the document.

TaskEndpointWhat it does · when to useCost
Redact POST /redact Finds sensitive info (PII) and, with apply=true, blacks it out. The opposite of Extract — remove vs read. 1 / req
Translate POST /translate-document Translates the document into another language → translated PDF. 1 / page
Geo POST /geo Extracts geographic entities (addresses, parcels, coordinates) with boxes — for mapping. 1 / req

5 · Media audio & video

Speech-to-text and media analysis. For Ask/Review/Compute on media, transcribe first, then pass the transcript as text.

TaskEndpointWhat it does · when to useCost
Transcribe POST /transcribe Audio/video → speaker-labeled transcript PDF. 3 / min
Process media POST /process-media Audio/video → transcript_segments + video_intelligence JSON (feed into Ask/Review/Compute). 3 / min

Authentication: Pass your key as X-API-Key: rk_... or Authorization: Bearer rk_....

Deterministic review (the differentiator): POST /review takes declared criteria (field, operator, value) and returns a white-box, reproducible verdict — the same document + same criteria yield identical verdicts every run. Operators: equals, not_equals, gt/gte/lt/lte, before/after/on_or_before/on_or_after, contains/not_contains, in/not_in, matches, starts_with/ends_with (+ not_), between, is_empty/is_not_empty, is_true/is_false. The token today resolves to as_of for reproducible date checks. Persist a named review with POST /reviews and re-invoke it with /review?review_id=….

Computed fields: POST /compute derives values (sum, difference, product, quotient, average, min, max, count, concat) from grounded inputs and returns provenance (which inputs fed each value). A missing input yields null — never a fabricated number.

Audio/Video with Ask: Call POST /process-media first to get transcript_segments, join the text, then send it as transcript_text to POST /document-ask. For Review / Compute on audio/video, pass that same joined transcript as text / transcript_text to POST /review or POST /compute (no new media path).

Sessions: The session_id in every /document-ask response can be reused for follow-up questions on the same document — no re-upload and 1 credit per question.

Supported audio: MP3, WAV, FLAC, OGG, M4A, AAC, WMA, OPUS  ·  Video: MP4, MOV, AVI, MKV, WEBM, WMV, FLV, M4V

Error codes: 402 Insufficient credits  ·  400 Bad request  ·  500 Processing error

Rate Limits

100
Requests per minute
1M
Requests per day
50 MB
Max file size

Need higher limits? Contact us for custom plans.