AI-Powered Workflow Automation — Pierre-Antoine Faribaud

102

days / year automated

€85,750

gross ROI / year

use cases deployed

months of mission

TECHWAVE is an industrial electronics group. For seven months, I sat at the interface between the AI consultancy OPEO and TECHWAVE's engineering and procurement teams — listening to where time was lost, then building Python pipelines to recover it.

Every pipeline runs entirely on-premise. No data leaves the company network. This was a non-negotiable constraint given the confidentiality of client specifications and regulatory documents — and it shaped every architectural decision.

OPEO, TECHWAVE and Arts et Métiers partnership — The three-way setup behind this internship: OPEO (AI consultancy methodology), TECHWAVE (industrial host and data), Arts et Métiers (academic framework).

All inference runs on-premise. Zero data leaves the network — and the client never had to trust a third-party API.

The five use cases

01

Order processing

Automated document parsing and entry into Sage 100 ERP. 11.94 days recovered per year from a single pipeline.

02

RAO sourcing

Component sourcing from supplier APIs (Mouser, DigiKey). The pipeline detects header rows, maps column names, queries pricing and availability, then exports a formatted Excel ready for import.

03

BOM update — Refabrication

Bills of materials updated automatically when a refab is triggered. 6 days/year recovered — deceptively simple, but the column-matching logic handles messy real-world files.

04

Traceability & review

Document review and traceability chains automated. 34.29 days recovered — the largest single gain before compliance.

05

Regulatory compliance matrices

The most technically complex pipeline. Input: regulatory PDFs (200+ pages). Output: structured compliance matrix with article, description, and classification. 39.64 days recovered per year.

The RAO sourcing pipeline — BOM in, automated distributor lookup (Mouser, DigiKey) with error handling, formatted Excel and Sage import out.

BOM matching code — Column-matching logic for BOM ingestion — fuzzy-matches inconsistent supplier column headers (nomenclature, distributeur, référence) before the pipeline can run.

Time savings across 5 use cases — Impact summary — 102 days/year automated across 5 use cases, €85,750 gross gain. Source: internal TECHWAVE analysis, shared with permission.

Detailed time savings breakdown per use case — The detailed breakdown behind the headline number — time before/after and gross gain for each of the 5 use cases, in days and euros per year.

The hardest pipeline: regulatory compliance

Regulatory PDFs are 200+ pages of dense structured text. Three approaches were tested before reaching production.

✕Rejected

Rule-based extraction

Deterministic section parsing — fast, but collapses on inconsistent formatting. Too many false positives.

~Tested

Full LLM extraction

Flexible — the model reads intent, not just structure. But hallucinates on dense regulatory text. Precision too low for production.

✓Deployed

Hybrid approach

PyMuPDF handles structure deterministically. LLM only processes ambiguous blocks. Best of both worlds — chosen for production.

Three extraction methods compared — The three methods side by side — rule-based (1), full-LLM (2), and the hybrid pipeline (3) that combines deterministic PDF parsing with targeted LLM calls.

Rule-based extraction failure on a CDC document — Rule-based extraction, tested on a real CDC (cahier des charges): too many requirements detected, badly sorted. The deterministic approach alone wasn't viable.

Rule-based extraction failure on a CGA document — Same failure mode on a CGA (conditions générales d'achat) document — confirming the rule-based approach couldn't generalise across document types.

Hybrid pipeline architecture — The hybrid extraction pipeline — PyMuPDF parses document structure deterministically; the LLM only handles sections where structure is ambiguous.

Production compliance matrix output — The hybrid pipeline in production — clean compliance matrix with article, date, title, description and technical/legal classification, ready for review.

Local LLM infrastructure

All LLM inference runs via Ollama (llama3), wrapped in a TECHWAVE-branded secure chat interface. The system accepts file drag-and-drop, encrypts connections end-to-end, and stores nothing externally. The interface was built to be adopted by non-technical procurement staff.

On-premise LLM architecture — Local LLM deployment — Ollama running on company hardware. All data stays inside the network perimeter.

What this demonstrates

This internship was not about applying AI in a sandbox. The pipelines handle real client documents, run in production, and the ROI figures come from TECHWAVE's own time-tracking data. The key engineering decisions — hybrid pipelines, local LLM inference, modular Python architecture — came from real constraints (confidentiality, regulatory precision, staff adoption), not from textbook choices.

Client names, specific document content, and detailed metrics are confidential. Figures shown with permission.