Physical AI & Manufacturing
Data Pipeline

Industrial data depth × LLM product ownership — the rare pair that robot manufacturing and traditional manufacturing AI both keep failing to find.Two proof cases, one transition: data primitives proven on industrial vehicle fleets (as a team member), and an LLM product I plan, build, deploy, and operate alone — both pointed at robot- manufacturing foundation-model data infrastructure and traditional-manufacturing AI workflows.

  • Industrial Telemetry
  • LLM Product Ops
  • Foundation Model Data
  • Manufacturing AI
fleetuntamedaisubstratedata pipelineproductiontraining data
§2 Bottleneck

Models open up. Data pipelines don’t.RT-2, GR00T, Cosmos, π0 — foundation models keep moving. What blocks the path to production is not the model but the data pipeline. And the people who have built data pipelines rarely overlap with the people who have run LLM products.

  • Multi-source temporal alignment

    A humanoid carries 30+ joints, a fab has dozens of chambers, a steel line has N machines — each emits on its own clock. If the timestamps cannot be reconciled, training data is not training data.
  • Fragmented industrial protocols

    CAN ISO-TP, ROS2 chunked, OPC-UA, MTConnect, Modbus — almost every industrial protocol is fragmented and asynchronous. Loss, jitter, and out-of-order delivery are the norm; a unit only exists after a windowed reassembly.
  • Heterogeneous device fleets

    Humanoids, AMRs, and cobots in one fleet. Five PLC vendors on one line. The lifeline of operations is a schema-registry that absorbs new devices without redeploys.
  • One substrate must feed two outlets

    When the production-monitoring stack and the model-training stack live on different systems, the resulting distribution mismatch becomes permanent debt. The same pipeline has to feed both outlets.
§3 Primitives

Three pillars proven on an industrial vehicle fleet. They port directly into robot manufacturing and traditional manufacturing.

  • P1

    Fragmented Stream Reassembly

    t₀t₁arrival fragmentsmask bitmapreassembled signalCAN ISO-TP · ROS2 · OPC-UA
    Verified env
    Industrial vehicle CAN ISO-TP — 0x10 first → 0x21..0x2F consecutive → 0x20 rollover. Mask-bitmap partial fill inside a ±N-second timeline window, with bounded memory.
    Robot manufacturing
    ROS2 chunked publish (PointCloud2, images, F/T sequences); MCAP replay integrity; explicit tracking of partial loss during humanoid teleop demo capture.
    Existing manufacturing
    OPC-UA chunked publish, MTConnect fragmented streaming, end-of-line test sequences — second-level line KPIs only work on top of this.
  • P2

    Multi-Source Temporal Alignment

    channel Achannel Bchannel Cchannel Daligned
    Verified env
    Master 1·2 and Slave 1·2 four-pack BMS — each pack’s V·I arrives asynchronously under independent PIDs. Aligned inside a ±N-second timeline window, then summed across four packs for instantaneous power.
    Robot manufacturing
    Imitation-learning data: time-sync of 30+ joints, gripper, vision, and teleop commands. Sim-to-real: simulator timestamp vs hardware timestamp jitter quantified as a reality-gap metric. VLA triplet: precise correspondence of vision ↔ language window ↔ action sequence.
    Existing manufacturing
    Semiconductor fab: cycle-level alignment of in-chamber sensors with end-of-line defect inspection. Steel: N machines on a line collapsed into a single produced unit. Cell manufacturing: causal trace from per-stage measurements to post-shipment field failures.
  • P3

    Schema-Driven Device Decoder

    heterogeneousschemaregistrynormalized
    Verified env
    Per-vehicle signal mappings expressed as a single Excel sheet. Expression DSL → AST whitelist evaluation + compile-cache.
    Robot manufacturing
    URDF + topic-schema integration across humanoids / AMRs / cobots; absorbing OEM firmware variance; Open X-Embodiment compatible data conversion.
    Existing manufacturing
    Per-PLC protocol absorption (Siemens / Mitsubishi / LS), vendor OPC-UA AddressSpace integration, an operator surface where OT engineers can register a new line without redeploying.
§4 Proof · EV fleet · team
팀 작업 · 본인 기여 명시

Verified environment — industrial vehicle fleet telemetry pipeline (team work).A 4-tier distributed telemetry system delivered through team work. First-person plural; own contribution stated explicitly.

T1edgeT2gatewayT3pipelineT4warehouse
[vehicle terminal] → Webhook → Bridge InfluxDB
  → V2InfluxConverterProcess (multi-process)
  → Measurement InfluxDB → Celery batch → Avro/GCS
  • Tier 1 (ingest): Django / Flask webhook · raw hex payload preserved
  • Tier 2 (decode): ISO-TP reassembly + expression DSL + 4-pack BMS alignment
  • Tier 3 (analytics): Celery module plug-ins (summary / driving_score / submatrix / avro)
  • Tier 4 (output): measurement InfluxDB + Avro on GCS
Why this transfers to robot / manufacturing
Industrial vehicle fleetRobot / manufacturing
4-pack BMS async signals per vehicle30+ joints + F/T + vision per robot · N machines per line
CAN ISO-TP multi-frameROS2 chunked / OPC-UA chunked
Per-model .dbc / Excel DSLPer-robot URDF / per-PLC vendor protocol
own contribution
InfluxDB ops · Converter module ops
team size
2 dev teams
operation period
1 year 5 months
Public metrics
Operation duration only. Vehicle counts, throughput, and latency stay under NDA.
§5 Proof · untamedai · solo
1인 풀스택 · 기획부터 운영까지

Verified environment — untamedai.me, plan → build → deploy → operate, as a solo full-stack.untamedai.me — an AI friend that remembers your feelings. §4 (team / industrial data / constrained disclosure) and §5 (solo / LLM product / open) form a deliberate pair — the contrast itself is the message.

01plan0102design0203build0304deploy0405operate05
  1. 01
    Plan
    Differentiated concept (the Little Prince fox metaphor + emotional memory), user personas, free / paid (SOULMATE) tier design, copy and brand voice. Product decision = business decision = ops-cost decision, treated as one.
  2. 02
    Architecture
    Memory architecture (short-term context / long-term vector / summary store layered), MBTI-inference consistency, emotion-calendar color mapping, safety guardrails. The system is not one model call — it is memory + session + safety wired together.
  3. 03
    Build
    Next.js frontend · Python FastAPI backend · Supabase DB · Cloudflare hosting · GPT + Claude Opus for LLMs · Polar for payments — solo full-stack.
  4. 04
    Deploy
    Hosting · CI/CD · domain (untamedai.me + multilingual routing — /samakyeowoo for Korean SEO) · TLS · monitoring channels.
  5. 05
    Operate
    Token-cost discipline (for a solo operator, tokens = runway), moderation balance (Korean AI sensitivity post-Iruda), inflow monitoring, iterative-improvement decisions.
Why this is an asset for Physical AI / manufacturing AI
  • For robot-manufacturing foundation-model data R&D: VLA training-data curation — splitting language instructions into semantic units is the LLM operator’s territory. Cost · quality · safety trade-offs in foundation-model training-data pipelines are exactly what production LLM ops decides every day.
  • For traditional-manufacturing AI workflows: The operator-team LLM assistant (RAG over machine logs / line manuals / SOP) — having owned this kind of system from plan to deploy is the asset itself. Cost · safety · ops-metric balance in LLM system design is the daily constraint of production.
§6 Manufacturing

What I want to build — robot-manufacturing and existing-manufacturing AI workflows.Current assets are industrial data pipelines and LLM product operations. Robotics, semiconductor, and steel domain depth are separated honestly as post-hire learning areas.

production ↑substrate · pipelineRLDSTFDSOXEtraining data ↓
6a

6a — Robot manufacturing & foundation-model training data

  • P1P2P3

    Imitation-learning data pipeline

    Teleop demos → automatic builds in RLDS / TFDS / Open X-Embodiment formats. Multi-source time alignment (vision · proprio · action · language) → quality filtering → segmentation → augmentation. Data quality at training time is the model’s ceiling; lifting that ceiling is the pipeline’s job. (deps: P1 + P2 + P3)

  • P2

    Sim-to-real telemetry bridge

    Reconciling simulator output vs real-robot telemetry on time, units, and distribution. Domain-randomization parameter distributions sourced from measured data automatically. Reality-gap metric dashboards. Sim-to-real failures are almost always alignment failures. (deps: P2)

  • P2P3

    VLA foundation-data curation

    Vision-Language-Action triplet sync, mining-ratio control across failure / success, automatic long-horizon segmentation. Splitting language instructions into semantic units + the cost / safety / iteration loop of LLM ops are exactly what untamedai.me handles daily. (deps: P2 + P3 + LLM product ops)

  • P2P3

    Robot-line QC telemetry

    Per-station measurements as a robot traverses the line + post-ship field telemetry, joined causally. End-of-line QC → field-failure traceability as one system. (deps: P2 + P3)

6b

6b — Existing-manufacturing AI workflow

  • P1P2P3

    Line-telemetry substrate

    A unified telemetry pipeline across multi-vendor PLC + OPC-UA + MTConnect for semiconductor / steel / cell / display lines. Production ops and model-training data on the same substrate. (deps: P1 + P2 + P3)

  • P2

    Cycle-level quality prediction

    Machine-telemetry time-series → predicted end-of-line inspection results. Gradient-boosting baseline → Temporal Fusion Transformer / Patch-TST. Cycle-definition alignment in time is harder than the model itself. (deps: P2)

  • P3

    Line-assistant LLM

    A natural-language interface for operators — “what caused the line-3 alarm at 02:00 last night?” style RAG over machine logs + SOP + history. Having owned this kind of LLM system end-to-end (§5 untamedai.me) ports directly into the line-assistant problem. (deps: P3 + LLM product ops)

  • P2P3

    Anomaly localization

    Which machine on the line is the source of the defect? SHAP-based contribution decomposition, drift monitoring, training-distribution guards. (deps: P2 + P3)

The two sub-sections look separate but both run on the same three primitives from §3. That is why the same person ports cleanly into either domain.

§7 Adjacent

Adjacent — robot fleet operations

The same primitives also work for fleet operations. The first priority is §6 (manufacturing + foundation data); these adjacent areas remain ready to deploy: a unified telemetry substrate across mixed fleets (humanoids / AMRs / cobots) · motor & joint predictive maintenance (RUL regression) · in-operation motion-anomaly detection (autoencoder / GMM). Primitive deps: P1 + P2 + P3 (same as §6).

§8 AI Layer Matrix

One data substrate, six AI outlets. People who have only handled the model don’t carry it to production. Only people who have handled the data pipeline and run an LLM product carry it all the way.

AI workloadPrimitive depsLLM-ops leverage
Imitation-learning data buildP1 + P2 + P3
Sim-to-real telemetry alignmentP2
VLA triplet curationP2 + P3⭐ instruction segmentation
Cycle-level quality predictionP2
Time-series anomaly detectionP2
Operator LLM assistant (RAG over logs / SOP)P3 + LLM ops⭐⭐ direct 1:1 mapping
§9 Engineering Practice

How I work — process signal. From running untamedai.me solo and from team work on industrial data systems, I have learned that how you work matters as much as the result. Three working postures.

AI-fluent engineering practice

The 2026 senior signal is not “uses AI tools” — it is being explicit about what and how. AI as first-pass code reviewer when entering a new domain; AI as an option-space explorer for design decisions (final call mine); the line of where AI is trusted vs not, applied consistently — drawn daily in LLM product ops.

Signal. AI as a teammate joining the codebase — a collaborator, not a tool.

Operator mindset

Running untamedai.me solo means deciding daily: token cost vs response quality; moderation false-positive vs false-negative balance (Korean AI sensitivity); ROI of new features vs accumulating tech debt.

Signal. Holding model / system / user / cost in view at once — the intersection of senior engineer and PM.

Honest transition posture

This page separates two things. Current assets — industrial data pipeline (team contribution) + LLM product full-stack (solo) — ready to deploy. Learning area — robotics / semiconductor / steel domain depth — to be acquired post-hire.

Signal. Refusing to fake it is the senior definition. Saying “I don’t know” explicitly, on top of a learning plan, is what gets trusted.
§10 Tech Stack

Stack used on the industrial vehicle fleet, mapped to the equivalents that port into robot manufacturing and traditional manufacturing. Production code stays under NDA — selective OSS extraction is a later question.

Ingestion / BusIndustrial Fleet: Django · Flask webhook → Robot · Mfg: ROS2 · DDS · Kafka · OPC-UA · MQTT
Time-series storeIndustrial Fleet: InfluxDB → Robot · Mfg: TimescaleDB · ClickHouse · MCAP
Metadata DBIndustrial Fleet: MySQL → Robot · Mfg: PostgreSQL
Distributed taskIndustrial Fleet: Celery + django-celery-beat → Robot · Mfg: Celery · Airflow · Dagster · Ray
Process poolIndustrial Fleet: multiprocessing → Robot · Mfg: Ray · Dask
Replay formatIndustrial Fleet: Avro → Robot · Mfg: MCAP · Parquet · RLDS
StorageIndustrial Fleet: GCS → Robot · Mfg: S3 · Azure Blob
LLM stack (untamedai.me)Next.js (frontend) · Python FastAPI (backend) · Supabase (DB) · Cloudflare (hosting) · GPT + Claude Opus (LLM) · Polar (payments) → Foundation-model data / VLA / RAG
§11 About

Woon · Industrial Real-Time Data + LLM Product Engineer.Industrial vehicle-fleet telemetry pipeline as a team member → an LLM product (untamedai.me) operated solo → next: robot-manufacturing foundation-model data R&D, or traditional-manufacturing AI workflow pipelines.