Multimodal Content
This page explains how external applications connect text, image, video, and PDF resources to GUMem: upload the file first to get gumem://resources/<hex>, then reference that resource from conversation Message input or behavior ActionLog input through resource_urls.
Resource URLs keep file references only. If the resource content should become Memory, write OCR text, captions, transcripts, summaries, or human descriptions into content.
Session Multimodal Data
Write Flow
Session multimodal data is written in two steps:
- Call
POST /api/resourcesto upload the file and getgumem://resources/<hex>. - Call SDK
add_messages/addMessages, or HTTPPOST /api/sessions/{session_id}/messages, and passresource_urlsin the Message input.
POST /api/resources uses multipart/form-data:
| Field | Required | Description |
|---|---|---|
Authorization | Yes | Header in the form Api-Key <api_key>. |
user_id | Yes | User ID from your application. |
session_id | No | Related Session ID. Empty or missing values use default. |
content_type | Yes | Content type of the uploaded file. |
file | Yes | Binary file uploaded through the multipart file field. |
In Message input, resource_urls is an array and should contain only internal resource URLs:
{
"role": "user",
"content": "User uploaded a receipt. OCR text: dinner at Bistro A, total 86.40 SGD.",
"resource_urls": ["gumem://resources/<hex>"]
}Supported File Types
text, image, video, and pdf resources are all uploaded through the multipart file field. Set content_type during upload to identify the resource type.
| Type | content_type |
|---|---|
| Text | text/plain |
| Image | image/png, image/jpeg, image/webp |
| Video | video/mp4 |
application/pdf |
Images Integration Example
Upload the image resource first:
curl -X POST "http://localhost:8000/api/resources" \
-H "Authorization: Api-Key <api_key>" \
-F "user_id=user_123" \
-F "session_id=session_123" \
-F "content_type=image/png" \
-F "file=@./receipt.png"Take the resource URL from the response, such as gumem://resources/<hex>, then write the conversation Message:
curl -X POST "http://localhost:8000/api/sessions/session_123/messages" \
-H "Authorization: Api-Key <api_key>" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "user",
"content": "User uploaded a receipt image. OCR text: dinner at Bistro A, total 86.40 SGD, paid on 2026-04-24.",
"resource_urls": ["gumem://resources/<hex>"]
}
]
}'Behavior Multimodal Data
Behavior multimodal data uses the same resource flow as session multimodal data: call POST /api/resources to upload the resource first, then write the behavior record.
The difference is the write method. Use User Actions / ActionLog methods, such as gumem.userActions.create(...) in the Node SDK or gumem.user_actions.create(...) in the Python SDK. Pass resource_urls in ActionLog input to associate the files produced or referenced when the behavior happened.
await gumem.userActions.create({
user_id: "user_123",
timestamp: new Date(),
content: "User uploaded a signed contract PDF during onboarding.",
session_id: "session_123",
event_type: "document_upload",
page: "onboarding",
resource_urls: ["gumem://resources/<hex>"]
});from datetime import datetime, timezone
gumem.user_actions.create({
"user_id": "user_123",
"timestamp": datetime(2026, 4, 24, 12, 30, tzinfo=timezone.utc),
"content": "User uploaded a signed contract PDF during onboarding.",
"session_id": "session_123",
"event_type": "document_upload",
"page": "onboarding",
"resource_urls": ["gumem://resources/<hex>"],
})resource_urls only represents the reference between the ActionLog and the resources. If GUMem should remember what is inside the file, write a summary, transcript, recognition result, or human description into content.
File Size Limits
Callers should control file size before upload based on the resource type:
| Type | Recommended limit |
|---|---|
| Text | 1 MB |
| Image | 10 MB |
| 25 MB | |
| Video | 100 MB |
Uploads fail when files exceed the limit. Compress the file before upload, or split long text, long PDFs, and long videos into smaller resources before uploading.