XML API standardization
Big Picture
For JRE 1.5, Sun has included with the JRE new, standardized versions of the XML APIs supporting DOM Level 3 and JAXP 1.3. These APIs are available for JRE 1.4 via jars that can be downloaded and installed into JRE 1.4.
uPortal 2.5 will standardize upon and use the APIs.
What specific APIs are we talking about?
They are the APIs and implementation .jars included in JRE 1.5 (J2SE 5) and distributed at Java.net for use in JRE 1.3 and JRE 1.4. uPortal 2.5 requires JRE 1.4 or later and so we aren't particularly concerned with how to make these APIs available in JRE 1.3.
The JAXP 1.3 standalone release includes:
- dom.jar
- jaxp-api.jar
- sax.jar
- xalan.jar
- xercesImpl.jar
What exactly do I have to do to make uPortal 2.5 work?
What about our legacy APIs?
IPortalDocument, PortalDcoumentImpl, and the uPortal DocumentFactory API become formally deprecated by this change. Instead, code using XML functionality in uPortal 2.5 is encouraged to move to use the new standard XML APIs.
Specific code examples
Getting DOM 3 support
Previously the core org.w3c.dom.Document implementation in uPortal was a DOM level 2 Document. DOM 3 support was added by means of an IPortalDocument interface and PortalDocumentImpl implementation.
package org.jasig.portal.utils; import org.w3c.dom.Document; import org.w3c.dom.Element; /** * An interface that allows a Document to cache elements by known keys. * This is used to locally store and manager the ID element mappings * regardless of the actual DOM implementation. * * @author Nick Bolton * @version $Revision: 1.3 $ */ public interface IPortalDocument extends Document { /** * Registers an identifier name with a specified element node. * * @param idName a key used to store an <code>Element</code> object. * @param element an <code>Element</code> object to map. * document. */ public void putIdentifier(String idName, Element element); /** * Copies the element cache from the source document. This will * provide equivalent mappings from IDs to elements in this * document provided the elements exist in the source document. * * @param sourceDoc The source doc to copy from. */ public void copyCache(IPortalDocument sourceDoc); }
Now, the core org.w3c.dom.Document implementation in uPortal is a DOM level 3 Document natively supporting the methods that were added by IPortalDocument. It is therefore no longer necessary for any code to expect IPortalDocument instances – instead W3C DOM 3 Documents can be consumed directly.
The uPortal 2.5 IPortalDocument is therefore formally deprecated and adds no methods beyond what is available in a baseline org.w3c.dom.Document (DOM Level 3):
/** * IPortalDocument used to provide DOM 3 support on top of a DOM 2 * org.w3c.dom.Document implementation. Since uPortal 2.5, our Documents * have been DOM 3 themselves and as such IPortalDocument is no longer needed. * This interface is formally deprecated and will be removed in a future release. * @deprecated use Document directly instead public interface IPortalDocument extends Document { }
Getting a new, empty Document
import org.jasig.portal.utils.DocumentFactory; import org.w3c.dom.Document; ... Document doc = DocumentFactory.getNewDocument();
import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.DocumentBulder; import org.w3c.dom.Document; ... Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
While the 2.4 uPortal DocumentFactory implementation used PropertiesManager to determine the name of the class that it should Class.forName().instance() to get an instance of IPortalDocument to return, the uPortal 2.5 DocumentFactory implementation will simply return DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument(). For most existing clients of DocumentFactory, this will be sufficient since they were using only the DOM Level 2 Document API, of which DOM Level 3 is a superset. For those clients that were recasting the return value from Document to IPortalDocument, this will no longer be possible and they will need to change to recognize that the return value is no longer an IPortalDocument and is instead a Document which directly provides the methods desired.
Why Deprecate DocumentFactory?
Why deprecate DocumentFactory and ask clients to use the javax.xml DocumentBuilderFactory API directly? After all,
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
is longer than
Document doc = DocumentFactory.getNewDocument();
The reason to deprecate DocumentFactory is to encourage direct use of the javax.xml API and thereby reduce dependency within uPortal code upon other uPortal code. Code that does not depend upon uPortal DocumentFactory will be more reusable outside of uPortal. DocumentFactory in uPortal 2.5 no longer has a unique role to play – the work it was doing has been assumed by the core Java XML API in its provision of a DocumentBuilderFactory.
While
DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
is long, it becomes idiom – the way one gets a fresh Document instance in Java, not just in uPortal.
The long version
If you're thinking "Wait, wait! You've been too concise – I really want to read a longer account of what's going on here." – here it is. This is a re-presentation of content Howard Gilbert presented to the Shibboleth community.
Feedback from the problems uncovered deploying XML applications drives the evolution of the W3C standards. New versions of the standards solve real problems. Thus the migration of code to new versions of XML support may be driven by necessity rather than a desire to pick up neat new features. Applications that are centered entirely on XML (plausibly, uPortal) are forced to keep up to date.
Things would be simpler if the W3C produced new standards that were compatible with their previous standards. Unfortunately, they have adopted a policy of replacing the definition of each interface with new versions of the same interface name with additional methods. (Think, transition from DOM 2 to DOM 3 resulted in new methods to the same Interface, org.w3c.dom.Document). This means that the bundle of interfaces (associated with one version of the standard) are tightly coupled to a separate Jar file containing versions of the implementing classes that support the new methods.
One of the basic programming interfaces is the DOM (Document Object Model). The DOM interfaces are defined by packages of the form org.w3c.dom.* and define a set of objects and methods that provide operations on the objects. A DOM 2 standard was developed years ago, and DOM 3 component standards are now released. Driven by requirements emerging from layout management implementations, uPortal requires DOM 3 support.
The Apache Xerces project was formed from submissions from IBM (XML4J) and Sun (ProjectX). It represents a common codebase to which all parties can submit bugfixes and new features. Apache distributes versions of Xerces directly, but Sun distributes the a version of the same code with slightly different packaging.
Starting with Java 1.4, Sun decided that XML was so important that it should be a standard part of the J2SE runtime library. However, Sun's standards require that all XML requests filter through the JAXP API, just as all database request go through JDBC and all directory requests go through JNDI. The Apache code contains some programming interfaces with concrete classes left over from the old IBM XML4J days. So although Sun's distribution is based on Apache Xerces, they tend to rename some of the classes to require everyone to go through the public JAXP interface.
Unfortunately, Sun decided to freeze the features and standards at major release boundaries. When Java 1.4.0 came out in Feb. 2002, the standards were DOM 2 and JAXP 1.2. So although bugs were fixed, these versions of the standards remained the basis for the Sun library through releases of 1.4.1 and 1.4.2 (up to 1.4.2_06). The only way to override this type of built in function is to use the "endorsed" library function of Java, and the only other version of code reasonably available was the distribution from Apache.
The current version of the Xerces XML support distributed by Apache contain interface definitions based on the old DOM 2 standard, and classes that implement that standard. Apache provides an Ant build option to create a version of its current Xerces release with the DOM 3 interfaces and implementations, but it regards a library built this way to be experimental Beta code. The plan is to convert to DOM 3 support in the 2.7.0 release of Xerces, which currently has no planned release date.
In the Summer of 2004, Sun finally released a new major release. Designated as 1.5 under the old system, or as J2SE 5.0 in a new naming convention, this release includes as standard both support for DOM 3 and JAXP 1.3. In November they also released a version of the same XML library for use on earlier Java releases.
So at this moment, Sun has leapfrogged ahead of Apache. Eventually Apache will relase 2.7.0 and catch up, but even then the Sun version of the code will have the advantage that it is built into Java (at least if you are running J2SE 5.0). It provides all the function needed for OpenSAML and Shibboleth, and some useful new features, but will require some conversion.
The proposal is to convert the uPortal project to use the new Sun version of these libraries rather than the older Apache version. If a customer is using J2SE 5.0 as his JRE, then no libraries are needed and everything will work with just the standard Java runtime. For older JREs, then the five Sun jar files replace the previously distributed two Apache jar files in the /endorsed library.
This enables converting some existing code to use the JAXP factory standard instead of using the uPortal DocumentFactory API. This has the benefit that code written for uPortal can more easily be picked up and used in other environments – dependency is directly upon standard libraries rather than upon uPortal-specific APIs. This goes the other direction too – code written to these standard APIs can be picked up and dropped into uPortal. An additional benefit to the conversion is that XSD schema files can become first class programming objects.
The Libraries
A customer who uses J2SE 5.0 as his JRE (and a Servlet container such as Tomcat 5.5 that supports it) has the desired level of XML support and requires no libraries.
A customer using some version of Java 1.4.x requires the Sun distribution of new XML support for old Java systems. If this is checked into the current OpenSAML and Shibboleth projects, the /endorsed directory would now have five Jar files replacing the previous two jar files:
- dom.jar (contains the org.w3c.dom interface packages)
- sax.jar (contains the org.xml.sax interface packages)
- jaxp-api.jar (contains the javax.xml interface packages)
- xercesImpl.jar (Xerces, but with the packages renamed as com.sun.org.apache.xerces...)
- xalan.jar (Xalan, but with the packages renamed as com.sun.org.apache.xalan...)
Essentially, Sun breaks the Apache xml-apis library of interfaces into three separate Jar files representing the three different interface standards (DOM, SAX, and JAXP) from three separate organizations. This seems like a sensible piece of housekeeping.
The implementing classes (org.apache...) then have their packages renamed to com.sun.org.apache... Direct use of Apache implementing classes bypasses JAXP. It is essentially the same thing as using an Oracle database class directly instead of going through JDBC. Since Sun has to maintain the same classes as Apache, they did not want to change the source. However, by renaming the packages they could be sure that any code that makes direct use of an Apache class would have to be converted.
DocumentBuilderFactory (not "new DOMParser", not org.jasig.portal.utils.DocumentFactory.getNewDocument())
The Sun approach to functional libraries is to create a factory interface with pluggable providers. JAXP is the factory interface for XML. Sun provides a set of implementing classes, but I suppose you might find an alternate source of classes to implement one or more of the XML standards.
Apache used to expose some concrete classes to perform specific functions. Some Shibboleth and OpenSAML source includes the following statement to define the concrete class that provides XML to DOM parsing:
import org.apache.xerces.parsers.DOMParser;
Sun doesn't want you to use direct classes, so it renamed the packages. There is still a DOMParser class, but when Sun distributes it it is com.sun.org.apache.xerces.internal.parsers.DOMParser. If you convert from Apache to Sun libraries, then the old import statements and direct use of DOMParser and a few other concrete classes will not compile.
To correct such statements, replace the direct use of classes with the JAXP factory interface. The first step is to create a DocumentBuilderFactory object. This object is then parameterized with information about the type of XML parser you want (especially the XSD Schemas it should use). Then, the DocumentBuilderFactory can be called to create one or more DocumentBuilder objects. DocumentBuilder is almost the same as DOMParser, though a few method details are different.
There is a similar Transformer factory interface to get an object that will convert DOM back to a string of characters (serialize the XML).
Although there are some rough one-to-one translations between old classes and new factories, the details of methods and properties are important. The existing code contains some optimizations, and the same things need to be expressed with a new semantic.