About XML

XML is one of those boring ideas that can make business run more smoothly, like ISBN numbers or barcodes. Really it’s just some general rules for how to write down information so that computers as well as people can read it – mainly computers, though. It’s not even a full set of rules; it’s just enough to help people make a start on designing their own formats for sharing information.

So, in publishing, people have taken the XML rules and added some extra, book-specific ones until they came up with a standardised way for describing new titles – or old ones, for that matter. The standardisation makes it easy for the whole industry to share information. Publishers can tell all the book stores, book clubs, library services and industry databases about a new title by sending them all a copy of the same XML file. In other sectors, XML is being used as the starting point for storing a multitude of different types of information. There are XML standards for creating an invoice or describing a gene – or (to return to publishing) to store the contents of an e-book in a standardised way.

The guts of XML (made palatable for non-techies)

The first thing to learn about XML is good news: it’s written in English. Or rather it’s written using ordinary words, with a few squiggles added, and not in some sort of computer hieroglyphics.

So, imagine you’re sending out details of a new book you’re publishing. Naturally you want everyone to add its details to their stock systems so they can easily order it. Let’s start with the title, The Life and Times of Ned Lud. A human being can take a guess that it’s a book title, but in XML you always label information to make it clear. So we might write this:

    <TitleText>The Life and Times of Ned Lud</TitleText>

It’s like that maxim for lecturing: tell people what you’re going to say, then say it, then tell them what you just said. So that line above says: Here comes something called TitleText, ‘The Life and Times of Ned Lud’, that’s the end of the TitleText. When you put ‘/’ in front of a label you’re marking the end of something.

Of course you can make up your own names for information. You could choose or , but it just so happens that is the name that’s been agreed on by largish group of book publishers as part of the ONIX standard. The bit in the angle brackets is called a ‘tag’. It’s fairly easy to see why; you ‘tag’ information to say what it means.

For instance, if all you have is the piece of text ‘Winston Churchill’, it’s difficult to tell whether that’s a book about him or a book written by him – both are plausible. So everything gets tagged for clarity’s sake. Take a look at this:

    <Author>Katie Daynes</Author>
    <TitleText>Winston Churchill</TitleText>
    <Publisher>Usborne Publishing Ltd</Publisher>

Hopefully it’s pretty obvious what that all means. Each ‘tag’ has a start and an end and the bit in the middle is the information you want to share – also known as the ‘contents’ of the tag. A computer can easily read it and so can a person (with a little effort). Unfortunately for us, information about books gets complicated and so the XML used to store it has to get complicated too.

For instance, what do we do if there’s more than one author? Or if there’s no author, just an editor and some contributors? And what about all the other pieces of information we might want to share, like publication date, price and distributor details? The people who wrote ONIX came up with something that allows you store a vast amount of information about a title in a structured way.

One feature of XML the ONIX designers made use of was the idea of putting one tag inside another. I want to show you what that looks like, but real ONIX documents are a bit difficult to read, so this next example is just a made-up one; it doesn’t follow the ONIX standard. But on the plus side, it’s actually possible for a human to understand it.

        <MainTitle>The Life and Times of Ned Lud</MainTitle>
        <SubTitle>Backward Looking Visionary</SubTitle>

So in this made-up example, if a <Surname> tag is inside an <Author> tag, then it’s the name of an Author; if it’s inside an <Illustrator> tag, then it’s the name of an illustrator.

Ok. So there’s lots more you could learn about XML, but that’s enough so you can join in fun dinner party conversations on the subject and look techies in the eye without flinching. Let’s get back to the real world of publishing.

The ONIX Standard

If you want an easy way to tell Nielsen or Amazon or Waterstones about a new book, you can put all the relevant info in an ONIX message and e-mail or FTP it to them. In case you’re interested, the British contributors to the ONIX standard were the BIC, made up of the Library Association, The British Library, The Booksellers Association and The Publishers Association, so it’s got some weight behind it.

It’s a gigantic and complicated standard because it needs to be able to hold gigantic and complicated amounts of information for each title. For instance, it gives you tags for listing the back cover quotes on your book, and giving the names of each quote contributor and the organisations they work for. It lets you include details of discounts and promotions by date and region. It holds information on formats and rights and physical dimensions – and even what units the measurements are being given in.

Unfortunately for anyone who wants to open an ONIX message and actually read the contents, the standard also makes use of numbers where a name would have been easier to read. For instance, if you want to know whether a book has been published yet you could look at the tag. Here’s one:


But what does ‘04’ mean? Well, if you hunt down a copy of the ONIX documentation you’ll find a list called List 64: Publishing status, with entries such as:

  04: Active. The product was published, and is still active in the sense that the publisher will accept orders for it, though it may or may not be immediately available, for which see <SupplyDetail>.


    05: No longer our product. Ownership of the product has been transferred to another publisher (with details of acquiring publisher if possible in PR.19).

And many more. There are over 150 of these lists explaining what the various different numbers and codes mean, so while humans can get the gist of what’s in an ONIX message, the details are often hard to follow. The ONIX people could have chosen to use the words ‘Active’ and ‘No longer our product’ instead of the numbers ‘04’ and ‘05’, but they probably took the view that machines, rather than people, would be reading these messages and numbers were more concise.

How can I use ONIX?

Having established that ONIX messages are swines to read – unless you’re a machine – the obvious thing to do is enlist the help of a machine whenever you want to work with ONIX.

Bibliocloud, and any publishing management system worth its salt, makes up ONIX messages. Instead of us looking up what all the tags and numbers mean, the software does that. The system gives you helpful forms to fill out; they list the available options in words. Then, behind the scenes, the program inserts the relevant code on your behalf. So if you choose ‘Paperback’ from the drop-down list, the program puts the code ‘BC’ into your ONIX message, saving you the bother of looking it up. (If we actually had to write ONIX messages by hand, we probably wouldn’t bother.)

You send your ONIX files to anyone you like: Nielsen, Bowker, Amazon and so on. You can also download it if you want to import it into other programs such as InDesign to make your catalogues or AIs. This is nothing new: check out out a seven-year old video of mine here.

But most publishers still aren’t harnessing the power of structured data, whether that’s ONIX, another sort of XML or JSON (which is like XML, but more concise). If you did your last catalogue by hand, and survived the ordeal, you should really find out more about what computers can do for you and save yourself (and your company) from death-by-copy-and-pasting.


    Most popular

  1. Ruby code and why you should care
  2. A quick look at data visualisation and analysis
  3. Menial publishing jobs are destroying our future
  4. It's us in the industry who need to be able to code
  5. A manifesto for skills
  6. Learning how to code, the long way around
  7. Company news

  8. New website
  9. 2018 Customer survey report
  10. 2017 in review
  11. Sara O'Connor to join the team!
  12. And now we are five
  13. Prizes galore
  14. Product news

  15. 'Continuing to solve real problems': Futurebook 40, London Book Fair 2018 and the Works page
  16. How many authors is too many?
  17. Better ONIX fragments
  18. Advanced advance information!
  19. Schedules page
  20. Publishers hack their own bibliographic data
  21. Case studies

  22. Burleigh Dodds Science Publishing
  23. Zed Books
  24. IOP Publishing
  25. Code

  26. A publisher’s guide to APIs
  27. What publishers need to know about Ruby on Rails
  28. How APIs can make publishing more efficient
  29. A day in the life of a programmer
  30. eCommerce

  31. To go direct, publishers must mean business
  32. Don’t outsource your publishing business away
  33. Who has the balance of power over data?
  34. Inbound marketing
  35. The business case for going direct
  36. Why publishers must use direct sales
  37. ONIX

  38. A hidden benefit
  39. Thema Subject Codes Update November 2017
  40. ONIX. Not very standard
  41. Three ways to do more with ONIX
  42. A non-technical, beginners’ guide to ONIX for Books
  43. ONIX Changes
  44. BIC, Thema and artificial intelligence...
  45. How to create a catalogue automatically using ONIX and InDesign
  46. Skills

  47. Embrace the code
  48. Mechanical sympathy
  49. Publishers can learn a few things from programmers
  50. A taste of code
  51. Strategy

  52. Rejuvenation
  53. Why ‘easy’ publishing solutions hardly ever are
  54. The right tool for the job
  55. No computer system can fix a broken publisher
  56. Five things I've learned since moving into enterprise product management
  57. Managing expectations
  58. Start with Why – How to refine your publishing mission
  59. The real price of a strategy shift
  60. Technical debt
  61. Decisions, decisions
  62. Creative industries and the division of labour
  63. A company of one's own.
  64. Responsibility, Authority, Capability
  65. Sometimes, size matters
  66. The search for publishing's holy grails