Document Markup and Publishing

Issues are frequently encountered when publishing documents online. Some pitfalls are technical or typographical, while others are generally the result of the author or publisher being unaware of the problems caused by the use of desktop publishing software, particularly that of Microsoft Office.

Table of Contents

  • Professional Document Management
  • Microsoft-Office Problems
  • Portable Document Format
  • TeX and LaTeX
  • Cover designs may vary

    Professional Document Management

    Do not use WYSIWYG Applications

    Business and non-technical users love WYSIWYG (What You See Is What You Get) applications, like MS-Word or Powerpoint.

    But WYSIWYG is notorious for generating bad markup, leading to illegal encoding, the retention of meaningless or empty proprietary-tags, bad tabling, etc..

    For professional management of paper-based and browser-based documentation, the problems caused by bad markup far outweigh the ease-of-use benefits of WYSIWYG. Therefore, the use of WYSIWG applications is best avoided.

    Microsoft-Office Problems

    Although we consider only technical document publishing, many of the comments on this page apply also to other kinds of documents.

    Do not Post MS-Word Documents

    You should not post MS-Word documents on line. If you do then you are condemning your visitors to use preparatory software they may not possess - or wish to possess.

    Further, as many people know to their cost, MS-Word documents do not format identically on identical platforms.

    MS-Office & Open-Office

    MS-Office is used extensively by business, but documents produced by this software should be converted to pdf before posting on line. Users of Open Office can do this very easily.

    Portable Document Format

    Portable Document Format pdf is the internationally recognised standard for distributing documents electronically.

    It conforms to ISO paper-size formats, handles jpg files and hyper-links well, and can easily be password encrypted.

    Sometimes it is useful to have a pdf version of an html page, particularly if the text is extensive or has complicated typography (like mathematics). You will almost certainly have a link from the html page to the pdf file - but make certain you also have a hypertext (URL) link from the pdf file back to the specific html page. This is necessary because search engines will sometimes record the presence of a pdf file but not the corresponding html page.

    Free viewers

    Viewers for pdf are freely available. Most people have heard of the Adobe Acrobat, but there is also Ghostview, xpdf and kpdf, Foxit, etc.

    The use of pdftex or pdflatex is recommended for producing pdf documents.

    Distiller

    This application is sometimes prescribed for creating, editing, and manipulating pdf files. Unfortunately, distiller often produces very large files - sizes of a few Megabytes are not usual for documents of only several pages. The source of the problem may lie with the original form (eg MS-Word) of the document, rather than distiller itself. For this reason, TeX/LaTeX is to be preferred.

    TeX and LaTeX

    These are not word-processing programs, like MS-Word, but complete type-setting applications. They ensure correct letter-spacing, shape and justification, and produce a properly-finished, professional document - read the Wikipedia entry. Further, TeX and LaTeX are free and run on MS-Windows as well as unix-flavour systems.

    There are also commercially available variants of TeX/LaTeX for those users who require support. The most well-known are pctex and Scientific Word. Many people use - and seem to like - Scientific Word. I do not recommend it, however, since it sometimes generates non-portable files - thus defeating one of the advantages of TeX and LaTeX - and seems also occasionally to have associated printing and BibTeX problems.