Wikipedia & Wikidata API & Data: A Cheat Sheet
Did you know there are officially four APIs for fetching Wikipedia content?
Terms
- Wiki is a type of website/service that allows users to add and edit content collaboratively. They often offer features like collaboration, internal links, and history versions control.
- MediaWiki is a software to run wiki sites. It’s open-source and self-hostable, so anyone can run their own instances (think Wordpress).
- English Wikipedia is one very large and popular MediaWiki instance.
- Wikipedia is a collection of wikis in many languages (and variants), each a MediaWiki instance.
- Wikimedia is a movement; it’s also the foundation entity that runs Wikipedia and other projects.
- Wikibase is a MediaWiki extension that enables it to hold structured, linked open data.
- Wikidata is a knowledge database. Wikidata is an instance of MediaWiki with the Wikibase extension; it’s also one of the Wikimedia projects. (This sentence is living proof of the confusion between these terms).
There Are 4 APIs for Fetching Wikipedia Content
MediaWiki Action API | MediaWiki REST API | Wikimedia REST API | Wikimedia Enterprise | |
---|---|---|---|---|
available on | all MediaWiki instances | most active MediaWiki instances (>= v1.35, which was released in July 2020) |
only Wikimedia projects | major Wikimedia projects |
available for | public usage (page content, searching, etc.) authenticated usage (accounts, email sending, content editing, etc.) |
public usage (page, media files, history, transformation, etc.) authenticated usage (content editing) |
public usage (page content, search, transformation) special offer: en.wiktionary.org offers API for structured definition data |
snapshot (dumps), on-demand (structured documents), realtime changes |
style | action-based | RESTful | RESTful | RESTful |
request format | form data-based | json | json | json |
response format | json, xml (deprecated), php (deprecated) | json, html | json, html, pdf | json |
documentation | API:Main_page | API:REST_API | Wikimedia_REST_API | API Documentation |
references | - | API:REST_API/Reference |
||
spec | - | - | OpenAPI 3 spec | - |
API Explorer | API Sandbox | - | API Explorer (Swagger) | - |
endpoint* | [project url]/[script path]/api.php sample: https://en.wikipedia.org/w/api.php |
[project url]/[script path]/rest.php/v[version number] sample: https://en.wikipedia.org/w/rest.php/v1 |
[project url]/api/rest_v1 sample: https://en.wikipedia.org/api/rest_v1/ |
https://api.enterprise.wikimedia.com/ |
auth | login token-based | OAuth token-based | token-based | |
clients/SDKs | no official clients. There’s one official list of available clients | no official clients | no official clients | official SDKs for go and python |
[*] The ScriptPath
is a config value of MediaWiki. It’s probably /w
or blank for most instances. See Manual:$wgScriptPath.
APIs to Enhance Wikipedia Experience
These APIs are not for fetching data from Wikipedia, but for developing MediaWiki itself and extensions to enhance the experience of reading and editing wikipedia. (think wordpress plugins).
Wikidata is a treasure
SPARQL Query Services
Wikidata is quite different from Wikipedia as it is a structured knowledge base. The data (statements) is organized into items (nodes), properties (edges) and values. For example:
item | property | values |
---|---|---|
Albert Einstein (Q937) | educated at (P69) | St John’s College (Q206702) |
Bulbasaur (Q847571) | instance of (P31) | grass-type Pokémon (Q25930653) |
In other words, it’s a giant graph database. With Wikidata Query Service, you can use SPARQL to run complex searches. For example, this query asks for all grass-type pokémons. It’s also quite powerful because you can query data from other wikis and using external info like IMDB IDs.
Useful links:
- Portal
- SPARQL query examples
- Human query interface
- API endpoint: https://query.wikidata.org/sparql
- RDF dump format
- Docs for Blazegraph (the query engine behind the Wikidata SPARQL query service)
- A list of tools to work with Wikidata (applies to both SPARQL and RESTful API)
Wikibase RESTful API
Wikibase (and by extension, Wikidata) also provides a RESTful API. Not as powerful as SPARQL, but it’s simpler and more familiar to most developers.
Useful links for the Wikibase/Wikidatab RESTful API:
- Portal
- API explorer
- endpoint: https://www.wikidata.org/w/rest.php/wikibase/v0
Page View Stastics Data
Wikimedia offers a Pageviews Analysis tool for examining traffic data on Wikimedia wikis. This tool allows user to query and compare traffic statistics by page, site, language, with data aggregated by day, month, or year.
Useful links:
- Wikipedia:Statistics
- Wikipedia:Pageview_statistics#Pageviews_analysis
- Docs for Pageviews Analysis
- Pageviews Analysis
- Source code for pageviews
Exporting, Dumps, and Datasets
MediaWiki natively supports importing and exporting data. There are also plenty of tools and projects for that purpose.
Useful links:
- Special:Export is available on all MediaWiki instances for exporting pages
- Wikipedia:Database_download is the portal page for related info.
- dumps.wikimedia.org is the index page for official dumps.
There’re also unofficial datasets available. DBpedia is possibly the most popular one. Besides data dumps, DBpedia also provides many resources including its own SPARQL query service.
Comments