Overview Our Data Harvesting Our Data Issue Areas and "Sets" Technology and Thanks Questions? Comments?
We are excited to share the nonprofit-produced research data we archive.
Everything you need to know to get started with our data shows below. However, we hope you will take a little time to understand the ideas behind a couple of protocols and standards that we adhere to in order to provide our data to the widest audience.
We use the Open Archive Initiative Protocol for Metadata Harvesting (OAI-PMH) to provide our data. Never heard of OAI-PMH? All you ever wanted to know is available on the OAI website. If you don't have a lot of time (and who does?) scan this informative tutorial to get the lay of the land. In a nutshell,
"The essence of the open archives approach is to enable access to Web-accessible material through interoperable repositories for metadata sharing, publishing and archiving. It arose out of the e-print community, where a growing need for a low-barrier interoperability solution to access across fairly heterogeneous repositories lead to the establishment of the Open Archives Initiative (OAI). The OAI develops and promotes a low-barrier interoperability framework and associated standards, originally to enhance access to e-print archives, but now taking into account access to other digital materials. As it says in the OAI mission statement "The Open Archives Initiative develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content."
From "OAI for Beginners: Basic OAI concepts and features"
In keeping with OAI-PMH, our data is expressed in XML and uses unqualified Dublin Core. There is a lot more to learn about Dublin Core on the DC website. Here's a sentence to give you the gist of what's going on:
The Dublin Core Metadata Initiative (DCMI) is an organization dedicated to promoting the widespread adoption of interoperable metadata standards and developing specialized metadata vocabularies for describing resources that enable more intelligent information discovery systems.
From "Dublin Core Metadata Initiative: About the Initiative"
Our data is delivered in XML -- one XML file per research listing. This chart provides info on all the data points we collect. Note that not every research listing includes all of these data points. At a minimum, the data points that appear in bold type will be included in every XML file.
| Data points | Description | |
| Coverage | <dc:coverage> | The geographic areas the research considers (uncontrolled list). |
| Creator | <dc:creator> | The author(s) of the research. |
| Date | <dc:date> | The date the research was published. Format YYYY-MM-DD. |
| Description | <dc:description> | The summary provided for the research work. |
| Format | <dc:format> | When available, the file format of a saved resource. Formats include: pdf; doc; xls; rtf; txt; ppt, etc. Note: If a record includes "<dc:format>scribd;xxxxx;xxxxx</dc:format>", a Scribd.com document is available. More info. |
| Identifier | <dc:identifier> | The listings unique. |
| Language | <dc:language> | The language in which the research was authored/published. |
| Publisher | <dc:publisher> | The nonprofit organization(s) that made the research in question available to/through the IssueLab archive. |
| Rights | <dc:rights> | The copyright and usage instructions for the research work. |
| Subject | <dc:subject> | The issue areas that the research work falls under (see "Issue Areas and Sets" below). Research works can fall into up to three issue areas. |
| Title | <dc:title> | Title of the research work. |
| Type | <dc:type> | The research can be categorized as these types. Controlled list: CaseStudy; Dataset; Ethnography; Evaluation; FactSheet; InteractiveResource; Literature/Research Review; MovingImage; Policy/Issue Brief; Presentation/Slideshow; Report/Whitepaper; StillImage; Survey; Testimony; Toolkit/Guide. |
Here is an example of the contents of one research metadata record:
<?xml version="1.0" encoding="UTF-8" ?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2008-04-09T18:53:51Z</responseDate>
<request identifier="2003_2004_statewide_survey_of_immigrants_and_refugees" verb="GetRecord" metadataPrefix="oai_dc">http://harvest.issuelab.org/provider/oai</request>
<GetRecord>
<record>
<header>
<identifier>oai:harvest.issuelab.org:2003_2004_statewide_survey_of_immigrants_and_refugees</identifier>
<datestamp>2008-04-09T17:26:47Z</datestamp>
<setSpec>human_rights_and_civil_liberties</setSpec>
<setSpec>education_and_literacy</setSpec>
<setSpec>immigration</setSpec>
</header>
<metadata>
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>2003-2004 Statewide Survey of Immigrants and Refugees</dc:title>
<dc:subject>education_and_literacy</dc:subject>
<dc:subject>human_rights_and_civil_liberties</dc:subject>
<dc:subject>immigration</dc:subject>
<dc:creator>Illinois Coalition for Immigrant and Refugee Rights</dc:creator>
<dc:publisher>Illinois Coalition for Immigrant and Refugee Rights</dc:publisher>
<dc:date>2007-08-01</dc:date>
<dc:description>This report weaves together demographic and field research conducted by ICIRR in 2003 to assess the needs of immigrants and refugees throughout Illinois and to recommend the formation of an Illinois immigrant integration policy.</dc:description>
<dc:identifier>http://www.issuelab.org/research/2003_2004_statewide_survey_of_immigrants_and_refugees</dc:identifier>
<dc:type>Text</dc:type>
<dc:language>eng</dc:language>
<dc:format>pdf</dc:format>
<dc:format>scribd;37067679;key-174p87aga4148zpw329a</dc:format>
</oai_dc:dc>
</metadata>
</record>
</GetRecord>
</OAI-PMH>
As we mentioned above, we use the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) to let others access our data. There are six case-sensitive commands, or "verbs", provided by the OAI protocol for querying purposes:
Here are the most basic HTTP requests you can issue. Note: a few verbs require that you include the "metadataPrefix" parameter as well as the "verb"; the "metadataPrefix" shows in green below:
http://harvest.issuelab.org/provider/oai?verb=Identify [Try it!]
http://harvest.issuelab.org/provider/oai?verb=ListSets [Try it!]
http://harvest.issuelab.org/provider/oai?verb=ListMetadataFormats [Try it!]
http://harvest.issuelab.org/provider/oai?verb=ListIdentifiers&metadataPrefix=oai_dc [Try it!]
http://harvest.issuelab.org/provider/oai?verb=ListRecords&metadataPrefix=oai_dc [Try it!]
http://harvest.issuelab.org/provider/oai?verb=GetRecord&identifier=record identifier&metadataPrefix=oai_dc [Seeded - try it!]
You can actually do quite a bit with these verbs when mixed with some other tricks. For example, you can ask for all records saved to the IssueLab repository between 2010-01-06 and 2011-01-01 within our "Health and Medicine" issue area like this (you must use date format: YYYY-MM-DDTHH:MM:SSZ):
http://harvest.issuelab.org/provider/oai?verb=ListRecords&metadataPrefix=oai_dc&from=2010-01-06T16:25:14Z&until=2011-01-01T23:59:59Z&set=health_and_medicine
Want records in a particular issue area stored from a particular date until the present? Just leave off the "until" parameter. Example:
http://harvest.issuelab.org/provider/oai?verb=ListRecords&metadataPrefix=oai_dc&from=2010-01-06T16:25:14Z&set=health_and_medicine
The "from and "until" date parameters available to you pertain to the date a record became available in our data provider service and not the date that a research listing was originally published. To search on records by date of publication (eg., all research listings that were originally released on a certain date), use our website's search tool.
Here are the arguments that you can use in your HTTP request (don't forget -- nothing happens unless you issue one of the six verbs!):
Interested in retrieving data about a particular IssueLab issue area? We provide the following "sets" -- OAI-PMH's way of letting you hone in on just the data you want. To use the set parameter, you would use a URL like this one (we'll request all research listed under the "animals" issue area):
http://harvest.issuelab.org/provider/oai?verb=ListRecords&metadataPrefix=oai_dc&set=animals
Here's the complete list of issue areas with corresponding set name. You will use the set name in your HTTP request:
If a record includes "<dc:format>scribd;....</dc:format>", a Scribd.com document is available. Everything you need to embed a thumbnail graphic or entire dynamic document, generated at Scribd.com, appears in this line as follows.
Example: <dc:format>scribd;37067679;key-174p87aga4148zpw329a</dc:format>
The data that appears after the first semi-colon (;) below -- underlined and in bold -- is Scribd's document ID number.
<dc:format>scribd;37067679;key-174p87aga4148zpw329a</dc:format>
The data that appears after the second semi-colon (;) below -- underlined and in bold -- is Scribd's access key.
<dc:format>scribd;37067679;key-174p87aga4148zpw329a</dc:format>
You will need both the document ID and the access key to generate/embed resources from Scribd. Complete information about using Scribd's API are available on the Scribd.com website.
We are using data provider software created and shared by the University of Michigan. Enter verb=Identify and it's UMich's umich_oai_toolkit that returns the results. :) Complete information is here.
This software package is available at no cost. We are happy to do our part in spreading the word about these terrific open source tools. We wrote our own PHP script to transfer our XML files into umich_oai_toolkit's required MySQL blob format. We're happy to share our open source record loader software.
Don't hesitate to get in touch with us should you require assistance. We're happy to help! Send a note to oai at issuelab.org.
Page last updated: 2011-02-08
http://www.issuelab.org
IssueLab: bringing nonprofit research into focus! Locate, access, engage.
http://www.issuelab.org/enews
We've got issues. Read all about it in IssueLab eNews!
http://www.issuelab.org/create_an_account
Create an account and add your organizations research or join the Issuelab RatPack!
http://harvest.issuelab.org
Come 'n git it! IssueLab's research data is ready for harvesting!