Text Processing

We are intimately acquainted with every aspect of complex text processing issues: whether unusual character sets for input, publication research or the precision postprocessing of large documents containing tables and pictures for publication.

Structured texts

The structured filing and automated downstream processing of documents has posed a data processing challenge for decades: storage in databases, research and indexing, web publishing, but also printed products, partial translation of modified portions only etc.
XML editors may well prove to be the solution of choice. But users want to be able to use their familiar word processors (in most cases Microsoft Word) without having to worry about formal structures.
Our solutions enable Word to be extended by add-ins so that authoring complies with strict formal criteria (of a DTD, or Document Type Definition) — without the user being aware. These texts can be published (e.g. as XML files) directly on the Web (DocBook), exchanged losslessly between various organizations and automated. Our solutions are employed in particular for the authoring of legislative documents in European Institutions and at the federal and state level in Germany.

Complex publications

Thanks to the possibilities offered by the Net, the path from the author of a text or information to publication is becoming increasingly shorter. A press-ready rendering is now enabled by PDF format, with a PDF document frequently being generated and published on a website in lieu of printing.
Since users and authors are accustomed to Microsoft Word as the word processor of choice in most cases, it should be the word processing software used for the preparation of publications satisfying professional DTP-like requirements. One example of this is the publication of statistics brochures by EU bodies featuring complex tables and charts: this is done using Word as well!
Our Word add-ins also enable the creation, authoring, checking and verification of large publications involving many authors located at various sites. We compile the disparate complex parts full of charts and tables into a uniformly formatted and numbered document, while monitoring the currentness of the individual contributions.

PDF, PDF/A and interactive PDF forms

PDF is a ubiquitous format for electronic documents that was developed by Adobe and published in 1993. At the end of 2005 PDF/A was approved by the International Organization for Standardization (ISO) as a standard for long-term archival: “A file format based on PDF, called PDF/A, that provides a mechanism for representing electronic documents in a manner that preserves their visual appearance over time, independent of the tools and systems used for creating, storing or rendering the files.” (excerpt from ISO 19005-1).
PDF is also increasingly coming to be used as a format for electronic forms.

Our expertise in these areas is repeatedly in demand by public-sector institutions. We have conducted several studies on topics relating to PDF technology for these bodies, leveraged our expertise in development projects (e.g. for producing PDF format from XML data), and designed PDF forms.


Thanks to our experience with multilingual documents acquired in the course of servicing European Institutions, we were instrumental in the development of the Unicode standard for character encoding: our research contributions and studies went into the considerations on which the standardization of the Unicode character set was based. Our expertise in this field is constantly sought after by the EU Institutions, owing to enlargement bringing in new member states with their own character sets and sorting rules.


In the early 1980s we transferred text content from dedicated word processors and memory typewriters with their vendor-specific storage formats into the then common word processor systems (Microsoft Word for DOS and WordPerfect). Our in-depth knowledge of internal format and storage structures is more in demand than ever: our server software — implemented as a simple e-mail service — converts text between various Microsoft Office file formats, ODF and PDF, enabling the processing of telefaxes, scanned or photocopied documents using OCR. The result: users are relieved of the burden of contending with the installation and configuration of a wide variety of converter software products.
Our understanding of the internal structures of word processing systems also aids us in creating compatibility between Open Office XML (OOXML) and Open Document Format (ODF).