CSTUG

CSTUG: XML Repository


Notice This is the very first version of the XML repository. Although it has passed some testing, there may still be some errors. If you find one or wish to submit a new feature, please write to bulletin AT cstug DOT cz.
 
Introduction The tables contents of CSTUG Bulletins are available in XML. The pages were originally designed for Mozilla because the XML files may be easily viewed in the browser. However, it does not transfer reliably information on the character set through WWW in all operating systems and moreover a great many browsers do not support XSLT at all. It was thus decided to present pure XML files as well as pretty looking files tranformed statically into HTML. Remember that these HTML files are intended just as a pretty looking view of the XML and XSL files. All these HTML files contain links to the original XML or XSL files and links are converted to HTML hyperlinks. The stylesheets are presented in HTML views and are also included in the ZIP files.
 
Usage All XSLT processors should be able to read both the XML files and stylesheets directly from the web. The files are designed to allow such usage. The Relax NG Schema resides in http://bulletin.cstug.cz/xml/ and all other files reside in http://bulletin.cstug.cz/xml/documents/ (both the originals and the HTML versions). The files can also be used locally. You can get them as ZIP files.

Character set All files are distributed in UTF-8 because all parsers are required to handle it.

Document structure The contents are split into several files the structure of which is defined by the Relax NG Schema with Schematron assertions that is available in hypertext documented form. The files may be validated e.g. by msv available from java.sun.com. Alternatively you can use non-normative Schema files generated by trang: Unfortunately, unlike Relax NG, Schema does not support an optional group of attributes.

Validation As written above, you can validate your file(s) by using the Relax NG Schema directly from the web. MSV can validate against various schemas including Relax NG and the language is recognized automatically. If you wish to validate your local files file1.xml, file2.xml, and file3.xml and you have a script named msv (or msv.cmd in OS/2 or msv.bat in Windows), You can use (remove the backslash and put everything on a single line):
msv http://bulletin.cstug.cz/xml/cstugbulletin.rng \
    file1.xml file2.xml file3.xml
You can also validate files present on this server, e.g. the "toc" file can be validated by (remove the backslash and put everything on a single line):
msv http://bulletin.cstug.cz/xml/cstugbulletin.rng \
    http://bulletin.cstug.cz/xml/documents/toc.xml
MSV itself does not recognize Schematron assertions. If you wish to make use of them, your script must explicitly invoke java -jar relames.jar instead of java -jar msv.jar.
 
Note for OS/2 users If your msv.cmd is implemented in Rexx, you have a problem. Rexx stops reading the command-line arguments at the first double slash, so it will only see the first argument as http: and nothing else. Unless you handle it yourself by other means, you must use http:\\ instead of http://, Java will cope with it (you can even use backslashes instead of slashes anywhere within URL). If you do not believe that Java handles backslashes, use the following workaround:
parse arg arguments
parse var arguments pre ':\\' post
do while pre \= '' & post \= ''
  arguments = pre || '://' || post
  parse var arguments pre ':\\' post
end
The arguments may then be parsed as you wish.

XML documents The XML documents as well as their HTML views reside in http://bulletin.cstug.cz/xml/documents/. Each file contains a link to the original XML or XSL file and the CSTUG logo points back to this document. Remember that they are nothing but HTML views of the XML files. They are pretty displayed, the links to other documents are converted to active HTML hyperlinks, acronyms, authors and keywords, which are stored in the XML files as identifiers, are expanded to their text but you should not expect anything else.

Abstracts This project contains stylesheets which enable retrieving titles and abstracts in a form suitable for printing in LATEx. The output file requires keyval.sty and this LATEX package with definition of acronyms (last modified Wednesday, 08-May-2024 10:00:48 CEST). The output is not a standalone file, it is supposed to be included in another document and may even require hand editing before it can be used. The files can either be used directly from the web or you can download a local copy. Generally, the information is obtained by:
xslt -o output.tex source.xml stylesheet.xsl parameters
  • xslt represents your script for invoking the XSLT processor. The example is written having Saxon in mind but other XSLT processors can be used as well and their invocation should be similar.
  • output.tex is the output TEX file.
  • source.xml is the source XML document. You can retrieve information from:
    1. the whole set of documents if you specify the source as http://bulletin.cstug.cz/xml/documents/toc.xml (or just toc.xml if you have a local copy)
    2. all issues within a year, e.g. the sources for the year 2000 are contained in http://bulletin.cstug.cz/xml/documents/2000.xml
    3. a single issue, e.g. the source for the issue 4 of year 2000 is http://bulletin.cstug.cz/xml/documents/2000~4.xml, the source for the triple issue 1-3 of year 2000 is http://bulletin.cstug.cz/xml/documents/2000~1-3.xml
    You can find the name of the particular file by browsing the HTML views of the documents (start with toc.xml).
  • stylesheet.xsl is the stylesheet to be used. We prepared http://bulletin.cstug.cz/xml/documents/articles.xsl for use with XSLT 1.0 processors and http://bulletin.cstug.cz/xml/documents/articles2.xsl for XSLT 2.0 processors. The documentation is inside the stylesheets
  • The parameters are optional. There are two things which can be specified:
    1. You should select output encoding because it is not specified in the stylesheet and the TEX file would thus be in UTF-8. Unless you have the latest encTEX (it is included in TEX Live as well as in tetex), you would choose either ISO-8859-2 or Windows-1250 or CP852 depending on your operating system. If you use Saxon v.8, you can do it on the command line e.g. by !encoding=ISO-8859-2. If your XSLT processor lacks such feature, you must create your own stylesheet which will first import one of the stylesheets, e.g. <xsl:import href="http://bulletin.cstug.cz/xml/documents/articles2.xsl"/> (for XSLT 2.0 processors) and then specify the output encoding, e.g. by <xsl:ouput encoding="ISO-88592"/>.
    2. You can specify the language using the lang parameter. If you do not specify any langugae, the stylesheet will write titles, keywords and abstracts in all languages in the order as they appear in the XML file. If the lang parameter is given, the stylesheet will select the language with these preferences (information in some language may be missing):
      1. the requested language
      2. the language of the article
      3. Czech language
      4. the first item in the XML file
      The parameter must be 3-letter code according to ISO 639-2. You can force the stylesheet to try to output the information in the language of the article by using a nonsense language code (the first condition will thus never be fulfilled).
In addition, we supply two sample stylesheets which demonstrate how the above mentioned stylesheets can be imported and their properties easily modified.
  1. abstracts.xsl (for XSLT 1.0 processors) and abstracts2.xsl (for XSLT 2.0 processors) output articles which contain an abstract in any language.
  2. abstractslang.xsl (for XSLT 1.0 processors) and abstractslang2.xsl (for XSLT 2.0 processors) output articles which contain an abstract in the requested language. It makes no sense to invoke these stylesheets without the lang parameter because the output will be empty.
Notice the use of <xsl:apply-imports/> in the stylesheets.

Bibliography One of the stylesheets can create bibliography references in XML or LATEX. You can try a sample file bib-sample.xml with one of the stylesheets: bibliography.xsl for XSLT 1.0 processors or bibliography2.xsl for XSLT 2.0 processors. The documentation is inside the stylesheets. Use the files either directly from the web or get them in one of the ZIP files. You will also need this LATEX package with definition of acronyms (last modified Wednesday, 08-May-2024 10:00:48 CEST).

BibTEX I do not use BibTEX myself and do not understand its file structure. However, if a volunteer develops a stylesheet for transformation of the list of references into a BibTEX file, I will be glad to include it to this package.

ZIP files
  1. Relax NG schema with a stylesheet plus Schema files (last modified Wednesday, 08-May-2024 10:00:47 CEST)
  2. XML version of the contents of CSTUG Bulletins and other related XML files (last modified Wednesday, 08-May-2024 10:00:47 CEST)
  3. Relevant stylesheets both for XSLT 1.0 and XSLT 2.0 processors (last modified Wednesday, 08-May-2024 10:00:48 CEST)
  4. Stylesheets for generation of bibliography references including a sample XML file (last modified Wednesday, 08-May-2024 10:00:47 CEST
  5. The manual (this document)

Stylesheets The stylesheets contain some documentation inside. You can see their HTML views which contain links to the original files. The stylesheets are included in the ZIP files above. The stylesheet should work with any XSLT 1.0 processor, they can even be used in Mozilla by linking them to the XML or XSL file via the <?xml-stylesheet ... ?> processing instruction. Only files the names of which contain number 2 are designed for XSLT 2.0 processor and some of them require explicitly Saxon v.8.

Acknowledgment I would like to thank Jiří Kosek for valuable advice and testing the preliminary version with several XSLT processors.

CSTUG Last modified: Wednesday, 08-May-2024 10:00:47 CEST

(C) CSTUG, 2005