Overview An Easy Way to Get Started Our Data Harvesting Our Data Issue Areas and "Sets" Technology and Thanks Questions? Comments?
We are excited to share the nonprofit-produced research data we archive.
Everything you need to know to get started with our data shows below. However, we hope you will take a little time to understand the ideas behind a couple of protocols and standards that we adhere to in order to provide our data to the widest audience.
We use the Open Archive Initiative Protocol for Metadata Harvesting (OAI-PMH) to provide our data. Never heard of OAI-PMH? All you ever wanted to know is available on the OAI website. If you don't have a lot of time (and who does?) scan this informative tutorial to get the lay of the land. In a nutshell,
"The essence of the open archives approach is to enable access to Web-accessible material through interoperable repositories for metadata sharing, publishing and archiving. It arose out of the e-print community, where a growing need for a low-barrier interoperability solution to access across fairly heterogeneous repositories lead to the establishment of the Open Archives Initiative (OAI). The OAI develops and promotes a low-barrier interoperability framework and associated standards, originally to enhance access to e-print archives, but now taking into account access to other digital materials. As it says in the OAI mission statement "The Open Archives Initiative develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content."
From "OAI for Beginners: Basic OAI concepts and features"
In keeping with OAI-PMH, our data is expressed in XML and uses unqualified Dublin Core. There is a lot more to learn about Dublin Core on the DC website. Here's a sentence to give you the gist of what's going on:
The Dublin Core Metadata Initiative (DCMI) is an organization dedicated to promoting the widespread adoption of interoperable metadata standards and developing specialized metadata vocabularies for describing resources that enable more intelligent information discovery systems.
From "Dublin Core Metadata Initiative: About the Initiative"
We have set up a data harvester that might make it easier to orient yourself to all of this talk about metadata, harvesting, data providers, Dublin Core, OAI-PMH, etc. Our data harvester makes it very simple for you to take a peek at our metadata collection, as well as browse and search for metadata records. Give it a try!
The rest of the information and instructions in this document are meant for folks who want to extract our metadata in it's raw form for their own purposes.
Our data is delivered in XML -- one XML file per research listing. This chart provides info on all the data points we collect. Note that not every research listing includes all of these data points. At a minimum, the data points that appear in bold type will be included in every XML file.
| Data points | Description | |
| Coverage | <dc:coverage> | The geographic areas the research considers (uncontrolled list). |
| Creator | <dc:creator> | The author(s) of the research. |
| Date | <dc:date> | The date the research was published. Format YYYY-MM-DD. |
| Description | <dc:description> | The summary provided for the research work. |
| Identifier | <dc:identifier> | The listings unique. |
| Language | <dc:language> | The language in which the research was authored/published. |
| Publisher | <dc:publisher> | The nonprofit organization(s) that made the research in question available to/through the IssueLab archive. |
| Rights | <dc:rights> | The copyright and usage instructions for the research work. |
| Subject | <dc:subject> | The issue areas that the research work falls under (see "Issue Areas and Sets" below). Research works can fall into up to three issue areas. |
| Title | <dc:title> | Title of the research work. |
| Type | <dc:type> | The research can be categorized as these types. Controlled list: Dataset; Whitepaper; Policy Brief; FactSheet; Survey; Ethnography; CaseStudy; Testimonial; MovingImage; StillImage; InteractiveResource. |
Here is an example of the contents of one research metadata record:
<?xml version="1.0" encoding="UTF-8" ?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2008-04-09T18:53:51Z</responseDate>
<request identifier="2003_2004_statewide_survey_of_immigrants_and_refugees" verb="GetRecord" metadataPrefix="oai_dc">http://harvest.issuelab.org/provider/oai</request>
<GetRecord>
<record>
<header>
<identifier>oai:harvest.issuelab.org:2003_2004_statewide_survey_of_immigrants_and_refugees</identifier>
<datestamp>2008-04-09T17:26:47Z</datestamp>
<setSpec>human_rights_and_civil_liberties</setSpec>
<setSpec>education_and_literacy</setSpec>
<setSpec>immigration</setSpec>
</header>
<metadata>
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>2003-2004 Statewide Survey of Immigrants and Refugees</dc:title>
<dc:subject>education_and_literacy</dc:subject>
<dc:subject>human_rights_and_civil_liberties</dc:subject>
<dc:subject>immigration</dc:subject>
<dc:creator>Illinois Coalition for Immigrant and Refugee Rights</dc:creator>
<dc:publisher>Illinois Coalition for Immigrant and Refugee Rights</dc:publisher>
<dc:date>2007-08-01</dc:date>
<dc:description>This report weaves together demographic and field research conducted by ICIRR in 2003 to assess the needs of immigrants and refugees throughout Illinois and to recommend the formation of an Illinois immigrant integration policy.</dc:description>
<dc:identifier>http://www.issuelab.org/research/2003_2004_statewide_survey_of_immigrants_and_refugees</dc:identifier>
</oai_dc:dc>
</metadata>
</record>
</GetRecord>
</OAI-PMH>
As we mentioned above, we use the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) to let others access our data. There are six case-sensitive commands, or "verbs", provided by the OAI protocol for querying purposes:
Here are the most basic HTTP requests you can issue. Note: a few verbs require that you include the "metadataPrefix" parameter as well as the "verb"; the "metadataPrefix" shows in green below:
http://harvest.issuelab.org/provider/oai?verb=Identify [Try it!]
http://harvest.issuelab.org/provider/oai?verb=ListSets [Try it!]
http://harvest.issuelab.org/provider/oai?verb=ListMetadataFormats [Try it!]
http://harvest.issuelab.org/provider/oai?verb=ListIdentifiers&metadataPrefix=oai_dc [Try it!]
http://harvest.issuelab.org/provider/oai?verb=ListRecords&metadataPrefix=oai_dc [Try it!]
http://harvest.issuelab.org/provider/oai?verb=GetRecord&identifier=record identifier&metadataPrefix=oai_dc [Seeded - try it!]
You can actually do quite a bit with these verbs when mixed with some other tricks. For example, you can ask for all records available with a publication date between 2007-01-01 and 2008-01-01 within our "Health and Medicine" issue area like this:
http://harvest.issuelab.org/provider/oai?verb=ListRecords&metadataPrefix=oai_dc&from=2007-01-01&until=2008-01-01&set=health_and_medicine
Want records in a particular issue area from a particular date until the present? Just leave off the "until" parameter. Example:
http://harvest.issuelab.org/provider/oai?verb=ListRecords&metadataPrefix=oai_dc&from=2007-01-01&set=health_and_medicine
The "from and "until" date parameters available to you pertain to the date a metadata record was harvested and not the date that a research listing was originally published or archived/modified within the IssueLab archive. To search on records by date of publication (eg., all research listings that were originally released on a certain date), use our website's search tool.
Here are the arguments that you can use in your HTTP request (don't forget -- nothing happens unless you issue one of the six verbs!):
Interested in retrieving data about a particular IssueLab issue area? We provide the following "sets" -- OAI-PMH's way of letting you hone in on just the data you want. To use the set parameter, you would use a URL like this one (we'll request all research listed under the "animals" issue area):
http://harvest.issuelab.org/provider/oai?verb=ListRecords&metadataPrefix=oai_dc&set=animals
Here's the complete list of issue areas with corresponding set name. You will use the set name in your HTTP request:
We are using data provider software created and shared by the University of Michigan. Enter verb=Identify and it's UMich's umich_oai_toolkit that returns the results. :) Complete information is here.
We are using Public Knowledge Project's OAI Harvester software to enable searching, browsing and display of our metadata repository. Visit http://pkp.sfu.ca/harvester_download for complete information.
Both of these groups make these software packages available at no cost. We are happy to do our part in spreading the word about these terrific open source tools. We wrote our own PHP script to transfer our XML files into umich_oai_toolkit's required MySQL blob format. We're happy to share our open source record loader software.
Don't hesitate to get in touch with us should you require assistance. We're happy to help! Send a note to oai at issuelab.org.
Page last updated: 2008-04-08
http://www.issuelab.org
IssueLab: bringing nonprofit research into focus! Locate, access, engage.
http://www.issuelab.org/enews
We've got issues. Read all about it in IssueLab eNews!
http://www.issuelab.org/create_an_account
Create an account and add your organizations research or join the Issuelab RatPack!
http://harvest.issuelab.org
Come 'n git it! IssueLab's research data is ready for harvesting!