Tech Pubs: Classification vs XML Metadata

By Trish Laedtke


Technical publications software products vary in their approach to classification. Tech pubs authoring requires classification of the content. It’s one of the most important but often difficult parts of the tech writer’s job.

The content you create needs to be found before it can be used by others.

As tech writers create XML content, they add metadata and key words to classify it.

Some tech pubs organizations go so far as to create:

  • approved key word lists
  • hierarchies of categorization
  • formal taxonomies

What if a user needs to add a term? There’s a formal process to review, approve, and add it.

However, as our product lines grow and change, our “known” – our ontology – grows.
The context of metadata and keywords change as well.

Now we’ve got an entire set of extra tasks. It will take even more time to manage the classification of tech publication content.

Content Classification
Content Classification

Prefer XML?

In XML content, involvement and additional work grows exponentially. Why? Because we put metadata and keywords on the object AND in the XML content itself. We’ve been working towards richer, more complete data. But does all that richness have to be in the XML?

Are you an XML purist? You might want to stop reading now.

Is XML enough?

Many industries create product documentation in XML for its inherit benefits:

  • reuse and the associated savings for authoring
  • translating and automated publishing
  • more consistent data
  • shorter time to create documents

But not all industries are required to exchange XML, or deliver raw XML.

Commercial and hi-tech products, software, life sciences and medical devices, heavy equipment manufacturers, and energy groups are different. They often create their technical publications and publish directly to formatted online or hard copy documents.

So their XML is never interchanged with customers or program partners. And they may never need to deliver an electronic, industry standard-compliant, set of XML topics.

In those situations, why should we impose the extra time and effort to plug additional classification into the XML? For those tech writers, the end goal isn’t to make that XML as rich and complete as possible. The goal is to make the final publication as complete and effective as is appropriate for their end users.

Now we can look at classification and all the related ways of managing XML content in a whole new way.

The truth is, XML attribute classification is not enough.

Data retrieval

Unless you deal with products simpler than a toaster, key words and metadata taxonomies struggle to hold all the complexity of the ways we manage product and document configuration.

It means we could continue adding and updating classifications to content until, in some cases, there’s more metadata than content.

In an optimal scenario, classification comes as a side effect or result of doing work, or of the data you’re working with, not as extra steps to input classification content into the XML topic.

For customers with products managed in a PLM environment, much of this semantic and product classification is already available.

Options and variants that are applied to product configuration, change information and release states of products – all are available to the tech writer to use, and not just as source information.

When the information can be inherited from the engineering and product world into technical documentation through technical publications software, and organized by both category or part relationships and reuse of parts directly as content, we realize multiple benefits:

  • saving time – both in the initial creation of the illustration or topic, and in future research and updates to that content
  • lower risk of inaccurate information – we’re applying it from the source, instead of re-creating, retyping, or reinterpreting it into a classification hierarchy
  • all writers in the organization will have a more complete view of the entire ontology of information both available and applicable to documents and products
  • we can modify and apply classification as needed, without driving additional, often inaccurate, revisions to the content

When we have to modify an XML topic simply to change the metadata, we’re imposing additional constraints and rules about how and where that topic can be reused.

At a minimum, writers would have to spend additional time analyzing if the change to the content impacts the other documents it is referenced from.

By relating the classifications that are not critical to content, instead of embedding them, we allow topics to be more flexible in their reuse. We can add context and applicability without impacting the existing uses for that content.

Obviously this doesn’t work for all customers.

Aerospace and defense customers who are required to provide their content in S1000D and similar compliant XML structures will always have metadata- and attribute-heavy authoring processes.

Tech pubs groups who produce online documents that are interactive are dependent on metadata and attributes in the XML to drive the appropriate display of document content, but in future, they may want to consider pushing that classification into the final output of the technical publications software, instead of imposing that work on the writers up front.

About the author:

Trish Laedtke is the Product Manager for Content and Document Management applications in Teamcenter. Her focus is on integrating tech pubs and supporting roles into the PLM environment, and taking advantage of the knowledge stored in Teamcenter to provide more accurate and effective documentation.

Leave a Reply

This article first appeared on the Siemens Digital Industries Software blog at