-
Lack of interoperability among software systems and repositories from different domains is a major barrier to the exchange of digital content between communities. This project will explore how semantic interoperability (the accurate translation of meaning) in the following four domains can be enhanced through the use of XSLT-based crosswalks between key XML specifications: 1) digital libraries and repositories (METS); 2) educational technologies and learning management systems (SCORM, IMS-Content Packaging (IMS-CP), and IMS-Metadata (IMS-MD)); 3) web syndication and portal technologies (RSS); and 4) desktop applications and structured content authoring tools. (e.g., Microsoft Office 11).
mission of the IU
-
The UC Berkeley Interactive University Project has focused on how content encoded in a particular XML specification can be translated to another XML specification through crosswalks. We have done preliminary work with XML specifications from the digital library and educational technology communities -- as well as those used in the syndication of web content and mainstream desktop productivity tools. The mission of the Interactive University Project is to use the Internet to open UC Berkeley's unique resources and people to the public, especially California's K-12 schools and citizens. The IU aims to engage the academic core of the campus -- faculty, academic departments, organized research units, libraries, and research museums -- in using technology to structure content so that it can add value to teaching, research, and public service. A key means of structuring content will be to create and use XML-based digital objects. The IU is building the Berkeley Open Learning Environment (B-OLE) for the sharing and creating of such digital objects both on and off the campus.
the IU and B-OLE/Scholar's Box -- and how that leads to semantic interop problem
-
A key component of the B-OLE is the Scholar's Box that will enable faculty, students, and the public to create, manipulate, annotate, and share personal collections of digital cultural objects gathered from multiple digital repositories -- core activities in both scholarship and teaching. In creating ideas, developing presentations, sifting through evidence, researching papers, or compiling readers, scholars build de facto collections from which they create their desired product. Gathering, manipulating, organizing, annotating, and sharing personal collections of cultural objects is also a core activity that can support many teaching and learning practices and styles. Ideally, the Scholar's Box would enable users to draw upon multiple sources in seamless, integrated ways regardless of underlying protocols and data/metadata encoding schemes. Creating the full spectrum of interoperability required for such functionality remains an extremely challenging and multifaceted research problem. [5] Among the various aspects of interoperability, the problem of semantic interoperability, "integrating resources that were developed using different vocabularies and different perspectives on the data" [3], has been of special interest to the IU.
challenge of the semantic interop problem in the large
-
There have been a variety of attempts to solve the general semantic interoperability problem through the creation of an abstract scheme in which specific vocabularies can be subsumed as particular cases of the scheme. Translation between any two vocabularies is then handled by using the abstract scheme as an intermediary. That is, a translation from a given specification X to specification Y is accomplished by translating specification X to the abstract scheme and then from the abstract scheme to specification Y. The existence of such an abstract scheme renders unnecessary direct translations between specifications, which would grow rapidly in number as the number of specifications increases. [2-4] Of course, constructing an abstract scheme that can accurately subsume all specific vocabularies of interest remains a major unsolved challenge. Nevertheless, projects that aim to solve the semantic interoperability problem in the large are valuable since the desire to work seamlessly with the multiplicity of digital content in varied formats will continue to grow.
our pragmatic approach to enhancing semantic interop
-
Meanwhile, the IU has been pursuing a pragmatic approach to enhancing semantic interoperability among libraries, educational technology, and web syndication. By focusing on a small number of XML-based interoperability specifications that are of importance in the various domains, we have written direct crosswalks between the specifications, thereby avoiding the need for an abstract scheme. In our work, we have produced baseline translations, rather than crosswalks of the highest fidelity. We have transported materials between library repositories and instructional technology applications via these functional converters for three purposes: 1) to demonstrate the interchange of digital content between libraries and instructional technology systems; 2) to learn from practitioners in the educational and library communities where to invest further effort in making the crosswalks useful in production services; and 3) to encourage the developers of interoperability specifications in the library and educational technology communities to harmonize related specifications where possible, thus reducing the need for crosswalks in the first place. This project will enable the IU to make further progress on these three fronts.
where we plan to implement our crosswalk work; IU partnerships
-
The Interactive University Project is deeply committed to this project because enhancing semantic interoperability is crucial to the functioning of the Scholar's Box -- a high priority for the IU. Although the IU is focused on leveraging practical techniques to enhance interoperability, we also integrate insights from long-range research efforts. Working in dialog with experts from the library and educational technology domains helps us develop useful crosswalks. The IU already has partnerships on campus (with the University Library, Berkeley Art Museum, Educational Technology Services, Berkeley Natural History Museums, the Multimedia Authoring Center for Teaching in Anthropology) and off-campus (the California Digital Library) that will help in finding those experts.
next steps for the crosswalks -- still applicable today
-
We currently have written preliminary crosswalks among METS, IMS-CP, and RSS (Version 1). The crosswalks enable materials to be moved from one environment to another. Structural elements of the materials are preserved whenever possible. A start has been made at metadata translation. In some cases, the crosswalks capture the one and only appropriate translation. In other cases, a choice was made from among a number of possible reasonable choices. Version 2 of the crosswalks will be refinements based on the input of the Advisory Board (see below for a description of the Advisory Board). The Advisory Board will help us to understand properly the semantics of specifications from the various domains and specifically the nuances around translating concepts from one domain to another. Moreover, we need to understand the specific contexts in which the crosswalks will be deployed. Since there is often more than one viable crosswalk, we will document the reasoning behind the choices we make so that the crosswalks can be intelligently recontextualized as needed. The Advisory Board will help us determine important practical scenarios to address.
relating crosswalks to other approaches -- specific semantic interop projects
-
The lack of complete semantic interoperability will remain a problem for the foreseeable future. Although crosswalks are a practical approach to improving interoperability among a small number of specific domains, crosswalks are by no means the only way to address the problem. Two examples of other, more ambitious and long-term efforts to enable semantic interoperability among large numbers of arbitrary domains are: " SIMILE ("Semantic Interoperability of Metadata and Information in unLike Environments") -- a recently funded $4 million three-year collaboration among the MIT Library (DSpace), the MIT CS department, and the W3C to leverage the connection among libraries, the semantic web, and personal information management. [2] " The HARMONY project, a three-year project that investigated "a conceptual model for interoperability among community-specific metadata vocabularies." [1]
application profiles
first encounter with MOA2 and why we got excited about MOA2/METS
-
http://iu.berkeley.edu/rdhyee/2003/08/22#a901 For well over a year now, the Interactive University Project -- and I, in particular, have been involved in translating various XML-based formats to other formats. How did we ever get into such a line of work? It basically all started when we at the IU were shown the scrapbook of Yoshiko Uchida (http://sunsite.berkeley.edu/xdlib/servlet/archobj?DOCCHOICE=http://sunsite.berkeley.edu/xmlrepos/jarda/brk00007.00000149c.xml) digitized by the Bancroft Library. Uchida was within within a few weeks of her graduation from UC Berkeley, when in 1942, she and her family were sent to an internment camp. (I think that I got this part of her story right.) The scrapbook is a moving record of her experience, a composite of newspaper clippings, cards, correspondence and other personal effects. The UC Berkeley library digitized the scrapbook and encoded it in the MOA2 format (http://sunsite.berkeley.edu/xmlrepos/jarda/brk00007.00000149c.xml), an XML format for encoding digital objects. We at the IU were immediately impressed by not only the technical quality of the digital material but its stunning educational potential. Not only did the library at Berkeley have a treasure trove of documents like the Uchida scrapbook that have already been digitized or were waiting digitization, but the library was also encoding these documents in a standard XML format, the MOA format, which later evolved into METS. Because the IU is in the business of opening up the resources of the campus to the public, especially to K-12 teachers, we saw that the encoding of large quantities of high quality research content was a huge boon to the public for the following reasons:
-
That digital documents were not only encoded in XML and that this XML source document was available to the public makes it much easier for others to create new and multiple representations and interpretations of the document as a whole or in part. (Though the XML markup of the documents was done primarily to make the exchange and handling of these documents easier for the housing instittutions such as libraries, musuems, and archives, a beneficial side effect is that the markup is also available for other uses by those outside the library and museum community.) The adoption of METS by a significant number of libraries, musuems, and archives to mark up their content would result in many thousands of interesting, high quality content in a common format, which in turn, encourages the development of tools. The markup of archival objects documents clearly the relationships among the parts in a complex object and their attendant metadata.. Such clear inidcators makes for more intelligent disaggregation and reaggregation of objects.
At the same time, the second-generation web was influencing the development of educational technology, weblogging, and general purpose computational tools. Each of those domains had their own XML specifications. what can be learned from RSS
-
There is a lot to learn from RSS, for example. (format wars, use of RDF, how is content syndication and aggregation happening, how are large numbers of people responding to the format, will RSS dominate over other XML formats?)
why METS and why IMS-CP specifically -- there are others after all
-
for example, http://www.chin.gc.ca/English/Standards/metadata_educational.html lists at least DC and others such as GEM METS and IMS-CP are not the only relevant XML formats -- but we want to focus on the intersection between libraries and ed. tech
It seems that the nature of the second generation web is that if the stuff from a given community (say the library) is useful outside of that comunity, there will be a need to cross community boundaries. As we do this work, we do have specific use cases (surrounding collection building and use -- the scholar's box) that drives our work in certain directions, getting us to pull repositories together with tools. Rick and I come from two different communities -- the library and a instructional technology perspective -- and this partnership is an example of a dialog needed to work out interoperability issues. I come from the perspective of someone If this paper does nothing else, it is to help the two communities get a better awareness of the activities that have gone one largely in parallel -- and to offer tentative bridges between the two through the mechanism of the crosswalk. Probley too late, unrealistic, and actually inappropriate to ask for the abandonment of the specs efforts. One big reason is that there different needs (I can reference Friesen here). So there's no need for unification for unification's sake. And we don't plan to offer all the answers here -- the issues are much too complex and involve a lot more people than the two of us authors wiritng and theorizing. Why XSLT for our crosswalks?
-
The translations we have been doing fall under the rubric of metadata standards crosswalks
-
When we started translating METS to other XML formats such as RSS and IMS-CP, I suspected that this type of translation work must have a historical and professional context but did not understand that context. I soon came across the term "metadata crosswalks" (A google search for "metadata standards crosswalk" yields pointers to a good number of important efforts in this area: http://www.google.com/search?hl=en&lr=&safe=off&q=metadata+standards+crosswalk&btnG=Google+Search ) A good working definition of crosswalk comes from the The Dublin Core Metadata Glossary (http://library.csun.edu/mwoodley/dublincoreglossary.html):
-
"A table that maps the relationships and equivalencies between two or more metadata formats. Crosswalks or metadata mapping support the ability of search engines to search effectively across heterogeneous databases, i.e. crosswalks help promote interoperability."
-
Crosswalks: the Path to Universal Access? http://www.getty.edu/research/institute/standards/intrometadata/2_articles/woodley/index.html Issues in Crosswalking Content Metadata Standards http://www.niso.org/press/whitepapers/crsswalk.html
-
What exactly are the theoretical, practical, and historical strengths and weaknesses of crosswalks? Are there any well-thought out mechanisms for generating good crosswalks? What is the relationship between a traditional crosswalk approach and other approaches to solving the semantic interoperability problem?
-
it's both practically actionable
-
a tutorial -- the best source yet -- that runs (slightly differently) in IE6 and Mozilla 1.4 http://devedge.netscape.com/viewsource/2003/xslt-browser/ looks like IE 5.5 (?) and above has pretty good support Opera and Gecko and many other browsers don't tinyxml --http://www.bayes.co.uk/xml/
Paul Prescod in Jan 2002 -- support is not that widespread
Cameron Laird makes the point that CSS might be what is actually needed instead of XSLT for some applications
-
XSLT is "the official transformation language" of the XML family. That is, if you want to process XML limiting yourself as much as possible a unified body of technologies, then XSLT is the choice. see example of this architecture in amazon.com web services ability to let people give you "code" without opening yourself up to running totally arbitrary (and possibly dangerous) code that needs to be sanboxed -- it will be interesting to see whether there will be XSLT-based viruses /worms/attacks on amazon.com and other services that might eventually do the same thing of letting people run transforms on their system. (Denial of service -- wasteful XSLT?) XSLT is XML itself, opening itself up to nice recursive tricks that we have yet to play XSLT is appropiately expressive of the logic of translating tree-based documents (XML) declarative rather procedural approach works well in the XML-document centric/document-choreography/pipelining model also didn't require extensive back-ends -- there is a server-side service to pass XSLT along. Advantage of not having to run one's own server-side stuff but just riding off the the Web as a whole.
There is great value of working apps viewers/players in stuying interop. Specs without tools is a big barrier. The fact that there still isn't a publicly available viewer for IMS-CP, I think, hinders IMS-CP adoption and development. Nothing like easily available tools for people to experiment with to get into it. However, it is important not to read too much into a specification by the workings of a particular application either since there may be varying, incomplete, or incorrect interpretation of specifications found in particular applications. RLI will be a concrete test in near future. Crosswalks don't have to be perfect to be useful. If you insist on high quality crosswalks all the time, you won't be able to translate among that many different domains -- the combinatorics goes up too quickly. "Pretty Good" interoperability is what we are after. The crosswalks should enable the migration of content from editing tools from one domain (such as METS) to IMS-CP editing tools. And if we write What next? Larger context?
