The science behind Fieldloop
Research-grade NLP, deployed for sales intelligence.
The Fieldloop engine was developed as a joint project between Brighten Digital and Mendel University in Brno's Faculty of Business and Economics. It is the analytical core that makes the editorial layer trustworthy. This page summarises how the engine works, why it was designed this way, and where to read the full peer-reviewed account.
The architecture in one paragraph
A BERTopic-style pipeline, tuned for multilingual CRM text.
The engine ingests short, free-text CRM mentions in Czech, English, and German. It removes personally identifying entities through a multilingual NER service, generates transformer embeddings of the cleaned text, reduces dimensionality with UMAP, and discovers thematic clusters with HDBSCAN. Class-based TF-IDF extracts representative keywords, an LLM assigns human-readable labels and business dimensions, and embedded Qdrant storage preserves continuity across chronological releases. The output is a structured set of topics, narrative shifts, and lineage indicators — the raw material the MediaHouse publication layer turns into editorial copy.
Privacy by design
Pseudonymisation before any model — by architecture, not by policy.
Privacy in Fieldloop is structural. The engine is built so that personal data physically cannot leave the controlled boundary in identifiable form. The following is how that is enforced at each stage of the pipeline.
What gets pseudonymised, and when
Before any document is embedded, before any topic is clustered, and before any LLM is invoked, the engine runs the source text through a multilingual named-entity recognition service. Personal names, organisation names, and location references are replaced with placeholders such as [PERSON], [ORG], and [LOC]. The NER service is a Docker-isolated sidecar combining a Slavic-language model and an English model, with a regular-expression fallback for cases the models miss. The cleaned text — and only the cleaned text — drives everything downstream.
What is stored in the vector layer
The embedded Qdrant vector store contains only the cleaned, pseudonymised text. The original surface forms are retained separately, outside the analytical pipeline, in a tenant-isolated store. Re-identification — the mapping from [PERSON_xxx] back to a real name — happens only when the final published article is composed, and only inside the tenant's environment. No external service is ever asked to perform this mapping.
What goes to external models
External LLMs — currently Anthropic Claude and OpenAI for image rendering — receive only pseudonymised text. Cluster labels, topic descriptions, and generated article copy are produced from text that has already had identifying entities replaced. The LLM never sees a real person's name, a real account name, or a real location during inference. At no point does an external provider receive enough data to re-identify a source.
Data residency and tenant isolation
Each Fieldloop tenant is isolated at the storage and processing level. Tenant data does not co-mingle in shared vector spaces or shared analytical runs. The default deployment topology places tenant infrastructure inside the European Economic Area, with the option to deploy inside a single national jurisdiction where compliance demands it (typical for pharma in DE, CH, and FR). External LLM calls operate under enterprise agreements that contractually restrict retention and prohibit use of the data for model training.
Auditability
Every stage of the pipeline produces structured, inspectable output: which mentions entered a clustering run, which entities were detected, which were pseudonymised, which were used in which generated article. The architecture supports the audit-trail expectations typical of GxP-adjacent industries: who, what, when, on what input, with what output. Logs are tenant-scoped and retained according to the tenant's policy.
Why this matters more for editorial AI than for dashboards
A dashboard surfaces numbers. A weekly editorial publication surfaces decisions. A privacy failure in a dashboard exposes a chart. A privacy failure in an editorial publication exposes a named individual in a leadership briefing that is then forwarded across the organisation and possibly externally. The asymmetry of consequence is the architectural reason Fieldloop treats privacy as upstream design rather than downstream policy.
Compliance posture
Fieldloop is designed to align with the European General Data Protection Regulation, with sector-specific provisions applicable to pharma (including pharmacovigilance data-handling expectations), financial services, and energy infrastructure. The pipeline supports the data-subject rights expected under GDPR — access, rectification, restriction, erasure — operating on the original-text store rather than on the analytical layer, because the analytical layer never contains identifying information in the first place. Forthcoming obligations under the EU AI Act are addressed by the same architectural choices.
Multilingual processing
Czech, English, and German — and a path to more.
The engine treats multilingual operation as a first-class design constraint rather than a feature to be added later. Embeddings are produced by a multilingual transformer model. NER is split into a Slavic-language pipeline and an English pipeline running side by side.
Topic labels are generated in a canonical analytical language for cross-market comparability, while source mentions preserve their original language for editorial fidelity. Adding a new market means extending the language coverage of a single subsystem, not rebuilding the pipeline.
Continuity & narrative shift
Tracking change, not just classifying mentions.
Most topic-modelling systems analyse one batch at a time. Fieldloop's engine is built to operate over a chronological series of releases — week after week, month after month — and to recognise when something has shifted. Continuity scores link topics across releases.
Narrative-shift categories capture volume increases, volume decreases, keyword drift within a stable topic, and the emergence of genuinely new themes. The editorial layer leans heavily on these signals to decide what is worth a story this week and what is background.
Protective depth
Polished delivery raises the standard for what sits beneath it.
A weekly editorial publication is faster to read and easier to act on than an analytical dashboard. That is its point. It is also its risk. An elegant narrative is more persuasive than a noisy chart, and a poorly grounded narrative is therefore more dangerous than a poorly grounded chart.
The research argument behind Fieldloop is that rigor in the engine — privacy controls, robust clustering, traceable lineage, narrative-shift detection — is not optional infrastructure beneath the editorial layer. It is the protective infrastructure that makes high-velocity editorial delivery commercially and ethically defensible.
Who built this
A consultancy with regulatory scars, a university with methodological depth.
A Prague-based customer-experience consultancy and the technical lead of the Fieldloop programme. We are partners of Salesforce, SAP, and Oracle, delivering digital-transformation projects for large multinational organisations. Our delivery experience is concentrated in regulated industries — pharma, energy, financial services — where the gap between data and decision is widest and the consequences of getting it wrong are most asymmetric.
Through its Faculty of Business and Economics, provides the methodological core: multilingual NLP, topic modelling, privacy-aware preprocessing, and evaluation rigor.
The two sides did not hand the work off to each other. They translated method and application jointly, in a shared iterative pipeline — a working pattern that is the subject of the paper as much as the architecture itself.
The full paper
From Campus to Company — Collaborative Innovation in the Age of AI
Authored by Brighten Digital and Mendel University in Brno. Presented at the IDIMT 2026 conference — Interdisciplinary Information Management Talks — held 2–4 September 2026 in Hradec Králové, Czech Republic. Lead academic author: prof. Tomáš Pitner.
Counting down to presentation
…until the paper is presented at IDIMT 2026. Leave your email and we will send you the paper the moment the embargo lifts.
The paper places Fieldloop in the broader context of university–industry collaboration in applied AI. It documents the engine's technical architecture, the role of synthetic-data validation, and the editorial delivery layer — and argues that protective depth (privacy, rigor, traceability) is what makes editorial AI defensible at speed.
Project funding and timeline
A research and development programme co-funded by the European Union.
Fieldloop has been under active development since May 2025. Total committed investment in the programme — engineering, research, and partner collaboration — exceeds EUR 1,000,000.
The work is conducted under the project Fieldloop.ai: Vývoj inteligentní automatizace pro CRM systémy (Fieldloop.ai: Development of Intelligent Automation for CRM Systems), registration number CZ.01.01.01/01/24_062/0007568, funded under the Operational Programme Technologies and Applications for Competitiveness (OP TAK), administered by the Ministry of Industry and Trade of the Czech Republic, and co-financed by the European Union.
The programme runs in collaboration with Mendel University in Brno, Faculty of Business and Economics.
Co-funded by the European Union