Multimodal Content

This page explains how external applications connect text, image, video, and PDF resources to GUMem: upload the file first to get gumem://resources/<hex>, then reference that resource from conversation Message input or behavior ActionLog input through resource_urls.

Resource URLs keep file references only. If the resource content should become Memory, write OCR text, captions, transcripts, summaries, or human descriptions into content.

Session Multimodal Data

Write Flow

Session multimodal data is written in two steps:

Call POST /api/resources to upload the file and get gumem://resources/<hex>.
Call SDK add_messages / addMessages, or HTTP POST /api/sessions/{session_id}/messages, and pass resource_urls in the Message input.

POST /api/resources uses multipart/form-data:

Field	Required	Description
`Authorization`	Yes	Header in the form `Api-Key <api_key>`.
`user_id`	Yes	User ID from your application.
`session_id`	No	Related Session ID. Empty or missing values use `default`.
`content_type`	Yes	Content type of the uploaded file.
`file`	Yes	Binary file uploaded through the multipart `file` field.

In Message input, resource_urls is an array and should contain only internal resource URLs:

json

{
  "role": "user",
  "content": "User uploaded a receipt. OCR text: dinner at Bistro A, total 86.40 SGD.",
  "resource_urls": ["gumem://resources/<hex>"]
}

Supported File Types

text, image, video, and pdf resources are all uploaded through the multipart file field. Set content_type during upload to identify the resource type.

Type	`content_type`
Text	`text/plain`
Image	`image/png`, `image/jpeg`, `image/webp`
Video	`video/mp4`
PDF	`application/pdf`

Images Integration Example

Upload the image resource first:

bash

curl -X POST "http://localhost:8000/api/resources" \
  -H "Authorization: Api-Key <api_key>" \
  -F "user_id=user_123" \
  -F "session_id=session_123" \
  -F "content_type=image/png" \
  -F "file=@./receipt.png"

Take the resource URL from the response, such as gumem://resources/<hex>, then write the conversation Message:

bash

curl -X POST "http://localhost:8000/api/sessions/session_123/messages" \
  -H "Authorization: Api-Key <api_key>" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "User uploaded a receipt image. OCR text: dinner at Bistro A, total 86.40 SGD, paid on 2026-04-24.",
        "resource_urls": ["gumem://resources/<hex>"]
      }
    ]
  }'

Behavior Multimodal Data

Behavior multimodal data uses the same resource flow as session multimodal data: call POST /api/resources to upload the resource first, then write the behavior record.

The difference is the write method. Use User Actions / ActionLog methods, such as gumem.userActions.create(...) in the Node SDK or gumem.user_actions.create(...) in the Python SDK. Pass resource_urls in ActionLog input to associate the files produced or referenced when the behavior happened.

JavaScriptPython

await gumem.userActions.create({
  user_id: "user_123",
  timestamp: new Date(),
  content: "User uploaded a signed contract PDF during onboarding.",
  session_id: "session_123",
  event_type: "document_upload",
  page: "onboarding",
  resource_urls: ["gumem://resources/<hex>"]
});

python

from datetime import datetime, timezone

gumem.user_actions.create({
    "user_id": "user_123",
    "timestamp": datetime(2026, 4, 24, 12, 30, tzinfo=timezone.utc),
    "content": "User uploaded a signed contract PDF during onboarding.",
    "session_id": "session_123",
    "event_type": "document_upload",
    "page": "onboarding",
    "resource_urls": ["gumem://resources/<hex>"],
})

resource_urls only represents the reference between the ActionLog and the resources. If GUMem should remember what is inside the file, write a summary, transcript, recognition result, or human description into content.

File Size Limits

Callers should control file size before upload based on the resource type:

Type	Recommended limit
Text	1 MB
Image	10 MB
PDF	25 MB
Video	100 MB

Uploads fail when files exceed the limit. Compress the file before upload, or split long text, long PDFs, and long videos into smaller resources before uploading.

Multimodal Content ​

Session Multimodal Data ​

Write Flow ​

Supported File Types ​

Images Integration Example ​

Behavior Multimodal Data ​