88% of organizations use AI in at least one business function. Only 7% have mature programs. For Integration Leads, that gap doesn't point to a strategy deficit. It points to missing contracts, unclear ownership, and architectures that were never designed to carry AI workloads in production.
The numbers get sharper when you look at the labor market. In 2023, Data Engineer was the most common AI-related role in the U.S., accounting for 23.8% of all AI job postings (1,898 out of roughly 7,991). The signal: AI value concentrates where data moves between systems, not where models get trained. That's integration territory.
Assumptions Before Architecture
Every pattern below assumes a specific landscape. If yours differs, the recommendations change.
- System landscape: At least one SAP system (ECC or S/4HANA) plus an iPaaS or middleware layer handling B2B/EDI flows.
- Volumes: 10K–500K messages/day across interfaces; latency SLOs between near-real-time (sub-5s) and batch (nightly).
- AI scope: LLM-based features integrated into existing flows (classification, extraction, routing assistance), not standalone model training.
- Constraints: GDPR or equivalent data residency rules; existing change management and audit requirements.
- Team: Integration team owns interface contracts and runtime; a separate data/AI team owns model selection and evaluation.
If you can't name the data contract and the owner, you don't have an integration. You have a dependency.
The Pattern: LLM as a Service Behind the Integration Layer
The cleanest production pattern treats the LLM as a stateless service behind your existing integration layer. Your middleware (SAP Integration Suite, MuleSoft, Boomi, or equivalent) remains the system of record for routing, transformation, and error handling. The LLM sits behind an internal API gateway with its own rate limits, token budgets, and circuit breakers.
Components and boundaries look like this:
- Integration layer (owned by Integration Lead): message routing, schema validation, retry logic, idempotency keys, audit logging.
- AI service layer (owned by AI/data team): prompt management, model versioning, guardrails (input/output validation rules that constrain what the model can receive and return), evaluation (systematic measurement of model output quality against defined criteria).
- Shared contract: a versioned interface contract specifying request/response schemas, SLOs, token limits, and fallback behavior when the AI service is unavailable or returns low-confidence results.
This separation matters because it preserves your existing observability and incident response. The AI service is another downstream system. It gets monitored like one.
Three Failure Modes You'll Hit First
1. Contract drift between integration and AI layers. The AI team updates prompt templates or switches models. Output structure shifts. Your downstream mapping breaks. This surfaces as silent data corruption in the target system, not as an error in your middleware logs. Fix: version the AI service contract independently, run schema validation on every response, alert on structural deviations.
2. Token and cost blowups on retry. Your integration layer retries a failed call. The LLM processes it again. You pay twice (or ten times, if backpressure isn't configured). Worse, non-idempotent prompts can produce different outputs on each retry, creating duplicates or contradictions downstream. Fix: idempotency keys at the integration layer, a token budget ceiling per interface per hour, and a dead-letter queue for calls that exceed cost thresholds.
3. Authorization gaps at the AI service boundary. The LLM has access to data your integration user shouldn't see, or vice versa. This happens when teams treat the AI layer as internal and skip authN/authZ design. In regulated environments, this is an audit finding waiting to happen. Fix: enforce the same authorization model at the AI service boundary that you enforce at every other system boundary. No exceptions for "internal" services.
Operational Checklist: Security, Observability, Ownership
Security: Data classification for every field sent to the LLM. No PII or restricted data without explicit approval and encryption in transit. Audit log of every prompt/response pair, retained per your compliance schedule.
Observability: Latency percentiles (p50, p95, p99) on AI service calls. Token consumption per interface per day. Error rate by failure type (timeout, schema mismatch, low-confidence response). Alert thresholds tied to SLOs, not arbitrary numbers.
Ownership and runbooks: Integration Lead owns the interface contract, runtime monitoring, and incident response for routing and transformation failures. AI/data team owns model performance, prompt drift, and evaluation. A shared escalation path covers contract mismatches. Who gets paged at 2 a.m. must be documented before go-live, not after the first incident.
Research consistently shows that AI-assisted delivery speed can improve, but without close supervision it degrades quality and increases technical debt. Developer skill matters more than AI usage; experienced engineers using AI show small maintainability improvements, but AI doesn't automatically improve code quality. The same principle applies to AI-generated integration logic: supervision, testing, and modular design aren't optional.
What to Do Now, What to Defer, What to Measure
Do now: Pick one existing interface with a classification or extraction step that's currently manual or rule-based. Build the LLM service behind your integration layer with a versioned contract. Run it in shadow mode (process both paths, compare outputs, don't trust the AI path yet).
Defer: Multi-agent orchestration, autonomous decision-making, and anything described as "fully autonomous." With 42% of organizations still in pilots and 44% still planning, the organizations that win aren't the ones who go widest. They're the ones who make one pattern repeatable.
Measure in production: Output accuracy against a human-reviewed baseline (weekly, record-level). Cost per transaction. Drift rate (how often the AI output deviates from the contracted schema). Time-to-detect and time-to-resolve for AI-layer incidents.
The maturity gap between "we use AI" and "we run AI reliably" closes one interface contract at a time. Not with a platform purchase. Not with a strategy deck. With a versioned contract, a named owner, and a runbook that someone actually follows when the model returns garbage at 2 a.m. on a Saturday. That's the work.