Engineering

Dec 2024–Jul 2025

AI-Powered Workflow Automation

TECHWAVE & OPEO · Industrial AI Internship

PythonMistral (local LLM)PyMuPDFpandasopenpyxlREST APIsLM StudioOllama

102

days / year automated

€85,750

gross ROI / year

5

use cases deployed

7

months of mission

TECHWAVE is an industrial electronics group. For seven months, I sat at the interface between the AI consultancy OPEO and TECHWAVE's engineering and procurement teams — listening to where time was lost, then building Python pipelines to recover it.

Every pipeline runs entirely on-premise. No data leaves the company network. This was a non-negotiable constraint given the confidentiality of client specifications and regulatory documents — and it shaped every architectural decision.

OPEO, TECHWAVE and Arts et Métiers partnership
The three-way setup behind this internship: OPEO (AI consultancy methodology), TECHWAVE (industrial host and data), Arts et Métiers (academic framework).

All inference runs on-premise. Zero data leaves the network — and the client never had to trust a third-party API.

The five use cases

01

Order processing

Automated document parsing and entry into Sage 100 ERP. 11.94 days recovered per year from a single pipeline.

02

RAO sourcing

Component sourcing from supplier APIs (Mouser, DigiKey). The pipeline detects header rows, maps column names, queries pricing and availability, then exports a formatted Excel ready for import.

03

BOM update — Refabrication

Bills of materials updated automatically when a refab is triggered. 6 days/year recovered — deceptively simple, but the column-matching logic handles messy real-world files.

04

Traceability & review

Document review and traceability chains automated. 34.29 days recovered — the largest single gain before compliance.

05

Regulatory compliance matrices

The most technically complex pipeline. Input: regulatory PDFs (200+ pages). Output: structured compliance matrix with article, description, and classification. 39.64 days recovered per year.

RAO sourcing pipeline
The RAO sourcing pipeline — BOM in, automated distributor lookup (Mouser, DigiKey) with error handling, formatted Excel and Sage import out.
BOM matching code
Column-matching logic for BOM ingestion — fuzzy-matches inconsistent supplier column headers (nomenclature, distributeur, référence) before the pipeline can run.
Time savings across 5 use cases
Impact summary — 102 days/year automated across 5 use cases, €85,750 gross gain. Source: internal TECHWAVE analysis, shared with permission.
Detailed time savings breakdown per use case
The detailed breakdown behind the headline number — time before/after and gross gain for each of the 5 use cases, in days and euros per year.

The hardest pipeline: regulatory compliance

Regulatory PDFs are 200+ pages of dense structured text. Three approaches were tested before reaching production.

Rejected

Rule-based extraction

Deterministic section parsing — fast, but collapses on inconsistent formatting. Too many false positives.

~Tested

Full LLM extraction

Flexible — the model reads intent, not just structure. But hallucinates on dense regulatory text. Precision too low for production.

Deployed

Hybrid approach

PyMuPDF handles structure deterministically. LLM only processes ambiguous blocks. Best of both worlds — chosen for production.

Three extraction methods compared
The three methods side by side — rule-based (1), full-LLM (2), and the hybrid pipeline (3) that combines deterministic PDF parsing with targeted LLM calls.
Rule-based extraction failure on a CDC document
Rule-based extraction, tested on a real CDC (cahier des charges): too many requirements detected, badly sorted. The deterministic approach alone wasn't viable.
Rule-based extraction failure on a CGA document
Same failure mode on a CGA (conditions générales d'achat) document — confirming the rule-based approach couldn't generalise across document types.
Hybrid pipeline architecture
The hybrid extraction pipeline — PyMuPDF parses document structure deterministically; the LLM only handles sections where structure is ambiguous.
Production compliance matrix output
The hybrid pipeline in production — clean compliance matrix with article, date, title, description and technical/legal classification, ready for review.

Local LLM infrastructure

All LLM inference runs via Ollama (llama3), wrapped in a TECHWAVE-branded secure chat interface. The system accepts file drag-and-drop, encrypts connections end-to-end, and stores nothing externally. The interface was built to be adopted by non-technical procurement staff.

TECHWAVE-branded secure chat interface
The secure chat interface built around Ollama — TECHWAVE-branded, encrypted connection, drag-and-drop file ingestion for non-technical staff.
On-premise LLM architecture
Local LLM deployment — Ollama running on company hardware. All data stays inside the network perimeter.

What this demonstrates

This internship was not about applying AI in a sandbox. The pipelines handle real client documents, run in production, and the ROI figures come from TECHWAVE's own time-tracking data. The key engineering decisions — hybrid pipelines, local LLM inference, modular Python architecture — came from real constraints (confidentiality, regulatory precision, staff adoption), not from textbook choices.


Client names, specific document content, and detailed metrics are confidential. Figures shown with permission.