These examples use the search API to search the PLOS corpus of scientific articles. These examples are not intended to be a full explanation on the use of Solr. A full Solr query language explanation can be found here and a tutorial here. The construction of PLOS search queries deviates from the standard Solr query URL by using ‘search‘ instead of ‘select‘ when making request to the end point.
Parameters passed to the endpoint will be identical to those described in the Solr documentation. All Solr searches will need to have the parameter ‘q’ ( for query ) specified. The ‘q’ parameter defines the field(s) that will be searched and by what criteria. For example if we wanted information regarding all articles with ‘DNA’ in the title of the article we would construct the following URL.
With knowledge of available fields complex queries can be constructed. Suppose we want all the article DOIs ( id ) and abstracts with ‘Drosophila’ in the title and ‘RNA’ in the body of the article.
The example above introduces several new concepts. Let’s analyze each in more detail. In the above example ‘q=title:”Drosophila” AND body:”RNA”‘ specifies a search for documents with ‘Drosophila’ in the title AND ‘RNA’ in the body of the article. Multiple fields can be searched using AND, OR, NOT and Wild Card Characters so queries can be quite complex. If you click on the above link you will notice that the browser will encode special characters in the URL to the following:
where ‘%22’ is double quote and ‘%20’ a space. If you are attempting to construct these URLs in your favorite language most programming libraries provide methods to properly encode arbitrary strings for URLs.
Lastly we added the ‘fl’ parameter and a comma separated list of stored fields. The ‘fl’ parameter specifies which stired fields to return in the query response. Clicking on the above example returns an XML response that has the ‘id’ (DOI) and ‘abstract’ of numerous articles. Stored fields are have an attribute of ‘stored=“true“‘ in the Solr schema.xml.
Solr responses are XML by default. This can be changed to JSON by using the ‘wt=json’ parameter in the URL.
Setting Limits and PAGING Search RESULTS
By default the PLOS Search API will return no more than 10 document matches. Most queries will likely have far more than 10 document matches. You will want to control the number of results returned so that you can process all the matches that result from your query or process the results in batches so that your script is not overwhelmed with too much data. This can be accomplished using the ‘start’ and ‘rows’ parameters where ‘start’ specifies the starting row and ‘rows’ specifies the maximum number of rows to return in the result. Getting the first 100 matches of with Drosophila in the title and RNA in the body, returning the DOI only:
To get the next 100 matches:
If you have any questions, please join the PLOS API developers group and post your questions to the group.