Skip to content
Go to Dashboard

Multimodal Content

This page explains how external applications connect text, image, video, and PDF resources to GUMem: upload the file first to get gumem://resources/<hex>, then reference that resource from conversation Message input or behavior ActionLog input through resource_urls.

Resource URLs keep file references only. If the resource content should become Memory, write OCR text, captions, transcripts, summaries, or human descriptions into content.

Session Multimodal Data

Write Flow

Session multimodal data is written in two steps:

  1. Call POST /api/resources to upload the file and get gumem://resources/<hex>.
  2. Call SDK add_messages / addMessages, or HTTP POST /api/sessions/{session_id}/messages, and pass resource_urls in the Message input.

POST /api/resources uses multipart/form-data:

FieldRequiredDescription
AuthorizationYesHeader in the form Api-Key <api_key>.
user_idYesUser ID from your application.
session_idNoRelated Session ID. Empty or missing values use default.
content_typeYesContent type of the uploaded file.
fileYesBinary file uploaded through the multipart file field.

In Message input, resource_urls is an array and should contain only internal resource URLs:

json
{
  "role": "user",
  "content": "User uploaded a receipt. OCR text: dinner at Bistro A, total 86.40 SGD.",
  "resource_urls": ["gumem://resources/<hex>"]
}

Supported File Types

text, image, video, and pdf resources are all uploaded through the multipart file field. Set content_type during upload to identify the resource type.

Typecontent_type
Texttext/plain
Imageimage/png, image/jpeg, image/webp
Videovideo/mp4
PDFapplication/pdf

Images Integration Example

Upload the image resource first:

bash
curl -X POST "http://localhost:8000/api/resources" \
  -H "Authorization: Api-Key <api_key>" \
  -F "user_id=user_123" \
  -F "session_id=session_123" \
  -F "content_type=image/png" \
  -F "file=@./receipt.png"

Take the resource URL from the response, such as gumem://resources/<hex>, then write the conversation Message:

bash
curl -X POST "http://localhost:8000/api/sessions/session_123/messages" \
  -H "Authorization: Api-Key <api_key>" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "User uploaded a receipt image. OCR text: dinner at Bistro A, total 86.40 SGD, paid on 2026-04-24.",
        "resource_urls": ["gumem://resources/<hex>"]
      }
    ]
  }'

Behavior Multimodal Data

Behavior multimodal data uses the same resource flow as session multimodal data: call POST /api/resources to upload the resource first, then write the behavior record.

The difference is the write method. Use User Actions / ActionLog methods, such as gumem.userActions.create(...) in the Node SDK or gumem.user_actions.create(...) in the Python SDK. Pass resource_urls in ActionLog input to associate the files produced or referenced when the behavior happened.

ts
await gumem.userActions.create({
  user_id: "user_123",
  timestamp: new Date(),
  content: "User uploaded a signed contract PDF during onboarding.",
  session_id: "session_123",
  event_type: "document_upload",
  page: "onboarding",
  resource_urls: ["gumem://resources/<hex>"]
});
python
from datetime import datetime, timezone

gumem.user_actions.create({
    "user_id": "user_123",
    "timestamp": datetime(2026, 4, 24, 12, 30, tzinfo=timezone.utc),
    "content": "User uploaded a signed contract PDF during onboarding.",
    "session_id": "session_123",
    "event_type": "document_upload",
    "page": "onboarding",
    "resource_urls": ["gumem://resources/<hex>"],
})

resource_urls only represents the reference between the ActionLog and the resources. If GUMem should remember what is inside the file, write a summary, transcript, recognition result, or human description into content.

File Size Limits

Callers should control file size before upload based on the resource type:

TypeRecommended limit
Text1 MB
Image10 MB
PDF25 MB
Video100 MB

Uploads fail when files exceed the limit. Compress the file before upload, or split long text, long PDFs, and long videos into smaller resources before uploading.