UserPreferences

CdlScreenScraping


We have prototypes of the ScholarsBox that can incorporate materials from the CaliforniaDigitalLibrary. We want to incorporate even more. Here are notes on how to do some more ScreenScraping on the CDL. (ScreenScraping might be the wrong word since the CaliforniaDigitalLibrary is kind enough to produce XML not just HTML!)

The main URL for the public CDL portal: http://californiadigitallibrary.org

A key tip: for any given CDL query, take on &xslt=raw+xml and you might be able to get XML. (e.g., compare the [WWW]HTML for a search on horse vs the [WWW]XML-resultant query) Learn more from the WebNetTalkOnRss page.

Browsing Collections:

Note that these collection browsing URLs use a different search script than the standard keyword searches, which only spans collections. You can add a "search=foo" parameter to this URL, but you can't limit the search to images or text, since it doesn't return those results anyway. Similarly, you can't limit a normal search to only results from (say) UCB.

Browsing images:

Searching images:

Different ways to limit search. Use the example of "London"

Melvyl

Another CDL system is Melvyl, the UC system-wide union catalog. If you ever want to figure out how to create the right URL to do a particular query, you might find the [WWW]Melvyl Access guide handy. The Ex Libris ALEPH server that powers Melvyl can also send raw XML, which may be useful.