Open content is a natural partner to The Scholar's Box, a tool for users to gather content, recreate it to the users' purposes, and then sharing that new work. Today, I did an analysis of what it would take to explicitly connect open content repositories to the Scholar's Box. Below is an analysis of what it would take to explicitly connect open content repositories to the Scholar's Box. The conclusion is that there doesn't currently seem to be some sort of federated search API over open content that would be ideal -- but that using, say Yahoo's Creative Commons search API, we could hook Scholar's Box up to this open content.
What is Open Content?
What exactly is "open content"? According to
crupedia.com, " Open content, coined by analogy with "open source," (though technically it is actually share-alike) describes any kind of creative work including articles, pictures, audio, and video that is published in a format that explicitly allows the copying of the information." Perhaps even more telling than formal definitions is how the open content concept has been operationalized by David Wiley (who is credited with doing huge amount for open content) and
hewlett.org: The William and Flora Hewlett Foundation Home, a major funder of David's work and other initiatives around
initiatives around open content.
I saw from
OpenContent that the notion of open content is broadly construed. I figured that the term would encompasse the OpenCourseWare projects that Hewlett has co-funded.
oishii! - ephemeral pheromonal de.icio.us-ness showed me that David and colleagues are thinking of things beyond OCW, including social bookmarking systems as del.icio.us. Indeed, the recent
$1.3 million grant to Creative Commons is another indication that the Hewlett Foundation is thinking of open content in a variety of contexts. Given this expansive sense of open content, I can then tackle the issues of 1) how to do a search over the world of open content and bring back a handle for the content and 2) how to process the content as more than a reference, including issues of data typing, disaggregation, and granularity. In this essay, let me take up the question of searching open content and leave the second topic to a later essay.
Connecting open content to the Scholar's Box
Hooking the Scholar's Box to open content sources/repositories has been something we have been interested in doing since we started building SB. (It's a natural connection since SB functionality thrives on materials for which intellectual property issues are amenable to reuse of the material.) In an early prototype of SB, we were searching MIT's OCW, particularly its images, by screenscraping the http://ocw.mit.edu site. We gave up that approach as too fragile after the screenscraper broke after a short period of time. We were hoping that MIT OCW would get around to providing a public API that provided finegrained access to the content of the site. (A while back, I was in touch with MacKenzie Smith and William Reilly of the
CWSpace Project, wondering whether I'd be able to access OCW materials once they are moved into DSpace. I'm not up on the latest on that front.)
I started looking at http://opencontent.org, hoping to find hints of a search API. What I have seen are David's use of google.com to search the site at
Google OCW and the
OpenCourseWare Finder. I had great fun with the OCWFinder, because when I first saw it, I thought that it would reveal a dynamic API to all the OpenCourseWare material. I finally figured out that OCWFinder is dependent on http://www.opencontent.org/ocwfinder/all.xml, which I would guess is a static file that David is generating either through some screenscraping algorithm or closed API. (I was hoping for something akin to the
del.icio.us/doc/api delicious api.) My conclusion is then there are currently no search APIs provided by the OCW community that I can use to get at OCW open content.
I do, however, see potential in using the search engines -- as David is using. Google is a natural choice, but perhaps the more interesting one at this point is Yahoo's search engine for two reason: 1) it has a
an option for search Creative Commons licensed materials and 2) this type of search is available through an API that has more open terms than the Google API For example,
Yahoo! Search Results for Milosz (CC materials) and the XML from
a search of CC modifiable stuff on Milosz and The Creative Commons itself hosts a Nutch-based
search engine -- but I don't see any signs of an API yet. The availability of APIs is definitely hit-or-miss these days.
Flickr's search API has an option for looking for CC marked photos. Apparently, ourmedia.org doesn't yet have any API:
Open API | Ourmedia. The bottom line right now is that by using a mix of Yahoo's search API aimed at CreativeCommons matierials and a bunch of other techniques (to be elaborated later), we can start explicitly hooking SB into the world of open content. A search API that is particularly tuned to open content sources might make for even better functionality, but that might still need to be proven. (For example, the NSDL does provide the
NSDL Repository API, and we have implemented searching the NSDL repository in SB through that API -- but I have not yet seen particular benefit from using that API over say using a general search engine yet.)
There is a lot of related work that I have not worked into this essay, but which may be relevant for more thinking. The library community has invested a lot of effort into thinking into ederated searching, metasearch and OAI harvesting. I have been wondering what can be learned from efforts to federate search over learning object repositories such as
Splash.
I look forward to finding out more about open content projects. I'm particularly curious about the status of EduCommons, specifically the following description:
EduCommons Update — wiley.ed.usu.edu: "One of the nastinesses of making entire courses worth of content available in this way is linking to and embedding granular content. In EduCommons all resources are stored in the repository as first class objects with their own unique identifiers. Putting things in the repository is easy, but once the resource names change based on the GUID assignment, hooking all the pieces back together can be tedious to say the least."
