09 — Connectors & Platform Integration

How each platform connects to Arvis, what media types are supported, and how to add new platforms.

Supported Platforms

Platform	Text	Images (receive)	Voice/Audio	Files (receive)	Files/Images (send)	Read Receipts
Discord	✓	✓ (CDN download)	Planned	✓ (CDN download)	✓ (attachments)	-
Telegram	✓	✓ (base64 embedded)	✓ (Whisper)	✓ (Bot API download)	✓ (photo/audio/video/doc)	-
Slack	✓	Partial	-	-	✓ (filesUploadV2)	-
WhatsApp	✓	✓ (Graph API)	✓ (Whisper)	Caption only	✓ (media upload)	✓ (blue checks)
Matrix	✓	-	-	-	✓ (mxc:// upload)	-
Web/Dashboard	✓	✓ (base64 upload)	-	✓ (base64 upload)	✓ (base64 WS)	✓ (ACK)
SMS	✓	-	-	-	-	-
Email	✓	-	-	-	-	-
Webhooks	✓	-	-	-	-	-

How Connectors Work

Each connector implements the same simple interface:

Listen for incoming messages from the platform
Normalize them into IncomingMessage format
Emit them on the MessageBus
Listen for send events on the bus
Forward responses back to the platform

Platform SDK → Connector → bus.emit('message') → Arvis core
Arvis core → bus.emit('send') → Connector → Platform SDK

Discord Connector

Token env vars: DISCORD_TOKEN, DISCORD_TOKEN_1, ..., DISCORD_TOKEN_50

What it handles:

Text messages (messageCreate event)
Image attachments (stored as CDN URLs, downloaded in arvis.ts before LLM call)
Mentions (@BotName message)
DMs (routed to conductor)
Typing indicator (channel.sendTyping())

Channel IDs: 18-digit snowflakes. Right-click → Copy Channel ID (enable Developer Mode).

Bot permissions required:

Read Messages / View Channels
Send Messages
Read Message History
Message Content Intent (enable in Bot settings on developer portal)

Rate limiting: Discord's default rate limit is ~50 messages/second. Arvis serializes sends so you won't hit it with normal usage.

Telegram Connector

Token env vars: TELEGRAM_BOT_TOKEN, TELEGRAM_BOT_TOKEN_1, ...

What it handles:

Text messages
Photos → downloaded via Bot API → base64 embedded in attachment.data
Voice messages → downloaded OGG → transcribed via OpenAI Whisper → content = [Voice]: "transcription"
Commands (/start, /help, etc.)
Inline keyboard button clicks

Voice transcription: Requires OPENAI_API_KEY to be set. If not set, voice messages are silently dropped.

message:voice event
  → bot.getFile(file_id) → download OGG
  → OpenAI Whisper API (model: whisper-1)
  → content = '[Voice]: "What is the price of BTC?"'
  → Emitted as normal text message

Photo handling:

message:photo event
  → bot.getFile(largest_size) → download JPEG
  → base64 encode
  → attachment.data = base64 string
  → attachment.mimeType = 'image/jpeg'
  → Passed to LLM as vision input

Slack Connector

Token env vars: SLACK_BOT_TOKEN, SLACK_APP_TOKEN, SLACK_SIGNING_SECRET

Uses Socket Mode (no public URL needed).

What it handles:

Channel messages
DMs
Mentions (@BotName)
Typing indicator (users.setPresence + conversations.typing)

Setup:

Create Slack app at api.slack.com/apps
Enable Socket Mode
Add scopes: chat:write, channels:history, im:history, app_mentions:read
Get Bot Token (xoxb-...) and App Token (xapp-...)

WhatsApp Connector

Token env vars: WHATSAPP_ACCESS_TOKEN, WHATSAPP_PHONE_NUMBER_ID, WHATSAPP_VERIFY_TOKEN

Uses Meta's WhatsApp Business Cloud API. Requires:

A Meta Business account
A verified phone number in WhatsApp Business
A webhook URL accessible from the internet

What it handles:

Text messages
Images → downloaded via Graph API → base64 embedded
Audio → downloaded via Graph API → Whisper transcription
Reactions (parsed as [Reaction: emoji])
Read receipts — Arvis sends read receipts (blue checkmarks) when processing incoming messages

Webhook setup:

Set WHATSAPP_WEBHOOK_PATH=/whatsapp in .env
Point Meta's webhook to: https://your-domain.com/whatsapp
Verification token: same as WHATSAPP_VERIFY_TOKEN in .env

Matrix Connector

Token env vars: MATRIX_HOMESERVER_URL, MATRIX_ACCESS_TOKEN, MATRIX_USER_ID

What it handles:

Text messages in rooms the bot has joined
DMs

Setup:

Create a Matrix account for your bot on your homeserver
Get the access token from Element → Settings → Help & About → Access Token
Set homeserver URL, access token, and user ID in .env

Web/Dashboard Connector

Built into the dashboard. Always available on port 5070 (configurable via WEB_CONNECTOR_PORT).

What it handles:

Browser-based real-time chat via WebSocket
Dashboard → Chat page
Dashboard → Agents → [Agent] → Chat tab
File uploads — drag-and-drop or paperclip button, sent as base64 via WebSocket
Message ACK — server sends { type: 'ack', messageId } after receiving each message
Images uploaded are saved to data/uploads/{conversationId}/ and passed to the LLM as vision input

Protocol:

Client connects to ws://localhost:5070
Client sends: { type: 'auth', apiKey: 'WEB_API_KEY' }
Server responds: { type: 'auth_ok' }
Client sends: { type: 'message', channelId: 'dashboard-agent-3', content: 'Hello' }
Server sends: { type: 'typing', channelId: '...' }
Server sends: { type: 'message', content: 'Hello! How can I help?' }

SMS Connector

Token env vars: TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN, TWILIO_PHONE_NUMBER

HTTP server on port 5080 receives Twilio webhooks.

Signature verification: Validates Twilio HMAC-SHA1 signature on every request.

Long message handling: SMS has a 160-character limit. Responses >160 chars are chunked automatically.

Setup:

Get a Twilio phone number
Set webhook URL in Twilio console: https://your-domain.com/sms
Set your .env credentials

Email Connector

Token env vars: EMAIL_IMAP_HOST, EMAIL_IMAP_PORT, EMAIL_IMAP_USER, EMAIL_IMAP_PASS, EMAIL_SMTP_HOST, etc.

Polls IMAP inbox every 30 seconds for new messages. Sends responses via SMTP.

What it handles:

Plain text emails
HTML emails (stripped to text)
Reply-to threading (basic)

Channel ID format: Email address of the sender (e.g., john@example.com)

Image Understanding (Vision)

When images are attached to messages, they're passed to the LLM as vision input:

ConnectorReceives image
  ↓
Downloads image (if URL) or reads base64 (if already embedded)
  ↓
Stored in attachment.data (base64) + attachment.mimeType
  ↓
arvis.ts handleMessage() collects all images[]
  ↓
Passed in JobPayload.images
  ↓
Provider runner builds multi-modal content:
  Anthropic: content: [{ type: 'image', source: { type: 'base64', ... }}, { type: 'text', text: prompt }]
  OpenAI:    content: [{ type: 'image_url', ... }, { type: 'text', text: prompt }]
  Google:    parts: [{ inlineData: { mimeType, data } }, { text: prompt }]

Which models support vision:

Anthropic: claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5
OpenAI: gpt-4.1, gpt-4.1-mini, gpt-4o
Google: gemini-2.5-pro, gemini-2.5-flash, gemini-2.0-flash-lite

Multi-Bot Setup

You can run multiple bots for the same platform, each linked to a different agent:

DISCORD_TOKEN=token-for-conductor-bot
DISCORD_TOKEN_1=token-for-sol-price-bot
DISCORD_TOKEN_2=token-for-support-bot

In Dashboard → Channels → Bot Instances:

DISCORD_TOKEN → linked to Conductor
DISCORD_TOKEN_1 → linked to SOL Price Monitor
DISCORD_TOKEN_2 → linked to Support Agent

Messages to each bot are routed directly to the assigned agent (Router Step 0).

Platform Limitations By Feature

Feature	Discord	Telegram	Slack	WhatsApp	Matrix
Message length	2000 chars	4096 chars	40k chars	4096 chars	8000 chars
Markdown	Partial (bold, code)	Full HTML	Mrkdwn	None	HTML (org.matrix.custom.html)
Typing indicator	✓	✓	✓	✓	-
Message edit	Planned	Planned	-	-	-
Thread reply	Planned	-	Planned	-	-
Button/keyboard	-	✓ (inline)	✓ (blocks)	✓	-
Voice receive	Planned	✓ (Whisper)	-	✓ (Whisper)	-
Image/file send	✓	✓ (photo/audio/video/doc)	✓ (filesUploadV2)	✓ (media upload)	✓ (mxc:// upload)