Physical AI & Manufacturing
Data Pipeline

Industrial data depth × LLM product ownership — the rare pair that robot manufacturing and traditional manufacturing AI both keep failing to find.Two proof cases, one transition: data primitives proven on industrial vehicle fleets (as a team member), and an LLM product I plan, build, deploy, and operate alone — both pointed at robot- manufacturing foundation-model data infrastructure and traditional-manufacturing AI workflows.

Industrial Telemetry
LLM Product Ops
Foundation Model Data
Manufacturing AI

§2 Bottleneck

Models open up. Data pipelines don’t.RT-2, GR00T, Cosmos, π0 — foundation models keep moving. What blocks the path to production is not the model but the data pipeline. And the people who have built data pipelines rarely overlap with the people who have run LLM products.

Multi-source temporal alignment
A humanoid carries 30+ joints, a fab has dozens of chambers, a steel line has N machines — each emits on its own clock. If the timestamps cannot be reconciled, training data is not training data.
Fragmented industrial protocols
CAN ISO-TP, ROS2 chunked, OPC-UA, MTConnect, Modbus — almost every industrial protocol is fragmented and asynchronous. Loss, jitter, and out-of-order delivery are the norm; a unit only exists after a windowed reassembly.
Heterogeneous device fleets
Humanoids, AMRs, and cobots in one fleet. Five PLC vendors on one line. The lifeline of operations is a schema-registry that absorbs new devices without redeploys.
One substrate must feed two outlets
When the production-monitoring stack and the model-training stack live on different systems, the resulting distribution mismatch becomes permanent debt. The same pipeline has to feed both outlets.

§3 Primitives

Three pillars proven on an industrial vehicle fleet. They port directly into robot manufacturing and traditional manufacturing.

P1
Fragmented Stream Reassembly
Verified env
Industrial vehicle CAN ISO-TP — 0x10 first → 0x21..0x2F consecutive → 0x20 rollover. Mask-bitmap partial fill inside a ±N-second timeline window, with bounded memory.
Robot manufacturing
ROS2 chunked publish (PointCloud2, images, F/T sequences); MCAP replay integrity; explicit tracking of partial loss during humanoid teleop demo capture.
Existing manufacturing
OPC-UA chunked publish, MTConnect fragmented streaming, end-of-line test sequences — second-level line KPIs only work on top of this.
P2
Multi-Source Temporal Alignment
Verified env
Master 1·2 and Slave 1·2 four-pack BMS — each pack’s V·I arrives asynchronously under independent PIDs. Aligned inside a ±N-second timeline window, then summed across four packs for instantaneous power.
Robot manufacturing
Imitation-learning data: time-sync of 30+ joints, gripper, vision, and teleop commands. Sim-to-real: simulator timestamp vs hardware timestamp jitter quantified as a reality-gap metric. VLA triplet: precise correspondence of vision ↔ language window ↔ action sequence.
Existing manufacturing
Semiconductor fab: cycle-level alignment of in-chamber sensors with end-of-line defect inspection. Steel: N machines on a line collapsed into a single produced unit. Cell manufacturing: causal trace from per-stage measurements to post-shipment field failures.
P3
Schema-Driven Device Decoder
Verified env
Per-vehicle signal mappings expressed as a single Excel sheet. Expression DSL → AST whitelist evaluation + compile-cache.
Robot manufacturing
URDF + topic-schema integration across humanoids / AMRs / cobots; absorbing OEM firmware variance; Open X-Embodiment compatible data conversion.
Existing manufacturing
Per-PLC protocol absorption (Siemens / Mitsubishi / LS), vendor OPC-UA AddressSpace integration, an operator surface where OT engineers can register a new line without redeploying.

§4 Proof · EV fleet · team

팀 작업 · 본인 기여 명시

Verified environment — industrial vehicle fleet telemetry pipeline (team work).A 4-tier distributed telemetry system delivered through team work. First-person plural; own contribution stated explicitly.

[vehicle terminal] → Webhook → Bridge InfluxDB
  → V2InfluxConverterProcess (multi-process)
  → Measurement InfluxDB → Celery batch → Avro/GCS

Tier 1 (ingest): Django / Flask webhook · raw hex payload preserved
Tier 2 (decode): ISO-TP reassembly + expression DSL + 4-pack BMS alignment
Tier 3 (analytics): Celery module plug-ins (summary / driving_score / submatrix / avro)
Tier 4 (output): measurement InfluxDB + Avro on GCS

Why this transfers to robot / manufacturing

Industrial vehicle fleet	Robot / manufacturing
4-pack BMS async signals per vehicle	30+ joints + F/T + vision per robot · N machines per line
CAN ISO-TP multi-frame	ROS2 chunked / OPC-UA chunked
Per-model .dbc / Excel DSL	Per-robot URDF / per-PLC vendor protocol

own contribution: InfluxDB ops · Converter module ops
team size: 2 dev teams
operation period: 1 year 5 months

Public metrics

Operation duration only. Vehicle counts, throughput, and latency stay under NDA.

§5 Proof · untamedai · solo

1인 풀스택 · 기획부터 운영까지

Verified environment — untamedai.me, plan → build → deploy → operate, as a solo full-stack.untamedai.me — an AI friend that remembers your feelings. §4 (team / industrial data / constrained disclosure) and §5 (solo / LLM product / open) form a deliberate pair — the contrast itself is the message.

01
Plan
Differentiated concept (the Little Prince fox metaphor + emotional memory), user personas, free / paid (SOULMATE) tier design, copy and brand voice. Product decision = business decision = ops-cost decision, treated as one.
02
Architecture
Memory architecture (short-term context / long-term vector / summary store layered), MBTI-inference consistency, emotion-calendar color mapping, safety guardrails. The system is not one model call — it is memory + session + safety wired together.
03
Build
Next.js frontend · Python FastAPI backend · Supabase DB · Cloudflare hosting · GPT + Claude Opus for LLMs · Polar for payments — solo full-stack.
04
Deploy
Hosting · CI/CD · domain (untamedai.me + multilingual routing — /samakyeowoo for Korean SEO) · TLS · monitoring channels.
05
Operate
Token-cost discipline (for a solo operator, tokens = runway), moderation balance (Korean AI sensitivity post-Iruda), inflow monitoring, iterative-improvement decisions.

Why this is an asset for Physical AI / manufacturing AI

For robot-manufacturing foundation-model data R&D: VLA training-data curation — splitting language instructions into semantic units is the LLM operator’s territory. Cost · quality · safety trade-offs in foundation-model training-data pipelines are exactly what production LLM ops decides every day.
For traditional-manufacturing AI workflows: The operator-team LLM assistant (RAG over machine logs / line manuals / SOP) — having owned this kind of system from plan to deploy is the asset itself. Cost · safety · ops-metric balance in LLM system design is the daily constraint of production.

§6 Manufacturing

What I want to build — robot-manufacturing and existing-manufacturing AI workflows.Current assets are industrial data pipelines and LLM product operations. Robotics, semiconductor, and steel domain depth are separated honestly as post-hire learning areas.

6a — Robot manufacturing & foundation-model training data

P1P2P3
Imitation-learning data pipeline
Teleop demos → automatic builds in RLDS / TFDS / Open X-Embodiment formats. Multi-source time alignment (vision · proprio · action · language) → quality filtering → segmentation → augmentation. Data quality at training time is the model’s ceiling; lifting that ceiling is the pipeline’s job. (deps: P1 + P2 + P3)
P2
Sim-to-real telemetry bridge
Reconciling simulator output vs real-robot telemetry on time, units, and distribution. Domain-randomization parameter distributions sourced from measured data automatically. Reality-gap metric dashboards. Sim-to-real failures are almost always alignment failures. (deps: P2)
P2P3
VLA foundation-data curation
Vision-Language-Action triplet sync, mining-ratio control across failure / success, automatic long-horizon segmentation. Splitting language instructions into semantic units + the cost / safety / iteration loop of LLM ops are exactly what untamedai.me handles daily. (deps: P2 + P3 + LLM product ops)
P2P3
Robot-line QC telemetry
Per-station measurements as a robot traverses the line + post-ship field telemetry, joined causally. End-of-line QC → field-failure traceability as one system. (deps: P2 + P3)

6b — Existing-manufacturing AI workflow

P1P2P3
Line-telemetry substrate
A unified telemetry pipeline across multi-vendor PLC + OPC-UA + MTConnect for semiconductor / steel / cell / display lines. Production ops and model-training data on the same substrate. (deps: P1 + P2 + P3)
P2
Cycle-level quality prediction
Machine-telemetry time-series → predicted end-of-line inspection results. Gradient-boosting baseline → Temporal Fusion Transformer / Patch-TST. Cycle-definition alignment in time is harder than the model itself. (deps: P2)
P3
Line-assistant LLM
A natural-language interface for operators — “what caused the line-3 alarm at 02:00 last night?” style RAG over machine logs + SOP + history. Having owned this kind of LLM system end-to-end (§5 untamedai.me) ports directly into the line-assistant problem. (deps: P3 + LLM product ops)
P2P3
Anomaly localization
Which machine on the line is the source of the defect? SHAP-based contribution decomposition, drift monitoring, training-distribution guards. (deps: P2 + P3)

The two sub-sections look separate but both run on the same three primitives from §3. That is why the same person ports cleanly into either domain.

§7 Adjacent

Adjacent — robot fleet operations

The same primitives also work for fleet operations. The first priority is §6 (manufacturing + foundation data); these adjacent areas remain ready to deploy: a unified telemetry substrate across mixed fleets (humanoids / AMRs / cobots) · motor & joint predictive maintenance (RUL regression) · in-operation motion-anomaly detection (autoencoder / GMM). Primitive deps: P1 + P2 + P3 (same as §6).

§8 AI Layer Matrix

One data substrate, six AI outlets. People who have only handled the model don’t carry it to production. Only people who have handled the data pipeline and run an LLM product carry it all the way.

AI workload	Primitive deps	LLM-ops leverage
Imitation-learning data build	P1 + P2 + P3	—
Sim-to-real telemetry alignment	P2	—
VLA triplet curation	P2 + P3	⭐ instruction segmentation
Cycle-level quality prediction	P2	—
Time-series anomaly detection	P2	—
Operator LLM assistant (RAG over logs / SOP)	P3 + LLM ops	⭐⭐ direct 1:1 mapping

§9 Engineering Practice

How I work — process signal. From running untamedai.me solo and from team work on industrial data systems, I have learned that how you work matters as much as the result. Three working postures.

AI-fluent engineering practice

The 2026 senior signal is not “uses AI tools” — it is being explicit about what and how. AI as first-pass code reviewer when entering a new domain; AI as an option-space explorer for design decisions (final call mine); the line of where AI is trusted vs not, applied consistently — drawn daily in LLM product ops.

Signal. AI as a teammate joining the codebase — a collaborator, not a tool.

Operator mindset

Running untamedai.me solo means deciding daily: token cost vs response quality; moderation false-positive vs false-negative balance (Korean AI sensitivity); ROI of new features vs accumulating tech debt.

Signal. Holding model / system / user / cost in view at once — the intersection of senior engineer and PM.

Honest transition posture

This page separates two things. Current assets — industrial data pipeline (team contribution) + LLM product full-stack (solo) — ready to deploy. Learning area — robotics / semiconductor / steel domain depth — to be acquired post-hire.

Signal. Refusing to fake it is the senior definition. Saying “I don’t know” explicitly, on top of a learning plan, is what gets trusted.

§10 Tech Stack

Stack used on the industrial vehicle fleet, mapped to the equivalents that port into robot manufacturing and traditional manufacturing. Production code stays under NDA — selective OSS extraction is a later question.

Ingestion / Bus	Industrial Fleet: Django · Flask webhook → Robot · Mfg: ROS2 · DDS · Kafka · OPC-UA · MQTT
Time-series store	Industrial Fleet: InfluxDB → Robot · Mfg: TimescaleDB · ClickHouse · MCAP
Metadata DB	Industrial Fleet: MySQL → Robot · Mfg: PostgreSQL
Distributed task	Industrial Fleet: Celery + django-celery-beat → Robot · Mfg: Celery · Airflow · Dagster · Ray
Process pool	Industrial Fleet: multiprocessing → Robot · Mfg: Ray · Dask
Replay format	Industrial Fleet: Avro → Robot · Mfg: MCAP · Parquet · RLDS
Storage	Industrial Fleet: GCS → Robot · Mfg: S3 · Azure Blob
LLM stack (untamedai.me)	Next.js (frontend) · Python FastAPI (backend) · Supabase (DB) · Cloudflare (hosting) · GPT + Claude Opus (LLM) · Polar (payments) → Foundation-model data / VLA / RAG

§11 About

Woon · Industrial Real-Time Data + LLM Product Engineer.Industrial vehicle-fleet telemetry pipeline as a team member → an LLM product (untamedai.me) operated solo → next: robot-manufacturing foundation-model data R&D, or traditional-manufacturing AI workflow pipelines.

Physical AI & ManufacturingData Pipeline

Multi-source temporal alignment

Fragmented industrial protocols

Heterogeneous device fleets

One substrate must feed two outlets

Imitation-learning data pipeline

Sim-to-real telemetry bridge

VLA foundation-data curation

Robot-line QC telemetry

Line-telemetry substrate

Cycle-level quality prediction

Line-assistant LLM

Anomaly localization

Adjacent — robot fleet operations

AI-fluent engineering practice

Operator mindset

Honest transition posture

Physical AI & Manufacturing
Data Pipeline