Document Processing Systems
(Documentum, IBM FileNet Content Manager, LivelinkECM-eDocs Suite, TRIM Context 6, Vignette Records & Documents, Arbortext, XQuery, XSLT, ...)
Algorithmic manipulation of symbolic documents has been a core capability of Mathematica since the mid-1990s. Widely used within Mathematica itself, it has made possible an expanding series of large-scale document processing projects—not least nearly a million pages of content on Wolfram Research's own prominent websites.
Mathematica is a unique bridge between documents and algorithmic computation. With its fundamentally symbolic architecture, it handles structured documents just like any other data, using both its vast network of built-in algorithms and its unique symbolic language, which integrates rule-based, pattern-based, functional, string-based and other programming paradigms.
With the ability to import and export hundreds of document, web, graphics, data and other formats, Mathematica can automatically extract relevant elements, process and analyze them, then use its built-in typesetting, layout, visualization and interface-building capabilities to programmatically create final static or dynamic content. Tightly integrated with both XML and external databases, and with support for modern distributed computing, Mathematica makes possible a major new level of document processing—allowing not only content and format transformations, but also full algorithmic analysis and processing of text and structure. With its immediate ability to call on sophisticated algorithms, Mathematica also makes possible new kinds of automated project management and document quality assurance that greatly enhance the efficiency of large-scale document processing projects.
Document Processing System Features in Mathematica:
- State-of-the-art string manipulation »
- Tree-based symbolic transformation rules »
- Full support for general XML »
- Advanced symbolic document object model »
- Fully scalable to very large document collections
- Ability to extract elements and metadata from document formats
- Systemwide Unicode support »
- webMathematica for dynamic content delivery
- Used in many large-scale industrial projects
- Worldwide network of experienced Mathematica programmers
Key Advantages of Mathematica for Document Processing:
- Full multiparadigm symbolic programming language »
- World's most powerful practical pattern transformation system
- Import and export of hundreds of standard and specialized formats »
- High-quality built-in text and math typesetting, table layout and graphics generation
- Systematic HTML generation, with image maps, CSS, etc. »
- Symbolic representation of website content for generation, transformation, etc. »
- Ability to extract text, links and other data from HTML, PDF, Word, TEX, etc. »
- Import, processing and analysis of image, sound and multimedia data »
- Support for high-quality print output, including PDF, TIFF, SCT, etc. »
- Full integration with all standard SQL databases »
- Built-in full-function WYSIWYG document interface
- Instant high-level construction of robust custom interfaces »
- Standards-based integrated development environment »
- Extensive data, tree and graph analysis, with highly automated visualization
- Large-scale string and text analysis, with built-in linguistic databases »
- Support for mbox, Apache log, RSS and other systems formats »
- Full support for all standard computer platforms »
- Integrated support for grid computing, web services, etc. »
- 20-year history of compatible language and system development »
Interoperability with Document Processing Systems:
- Import and export of all standard document formats »
- Support for arbitrary XML import and export »
- Eclipse-based cross-language integrated development environment »
- Immediate integration with Java, .NET, C/C++, scripting languages »
Interesting Tidbits:
- Mathematica's vast in-product and on-web documentation system is built with Mathematica
- Wolfram MathWorld—the #1 math information website—is created with Mathematica
- The documents for Stephen Wolfram's award-winning book A New Kind of Science were processed with Mathematica
- The Wolfram Demonstrations Project is a pure Mathematica system
- The math formulas in U.S. patents are typeset with Mathematica
- Wolfram Research was a dominant force behind the MathML standard
- Mathematica supported symbolic documents before the term "XML" was coined
See Also Analyses On: