Did you know there are officially four APIs for fetching Wikipedia content?

Terms

  • Wiki is a type of website/service that allows users to add and edit content collaboratively. They often offer features like collaboration, internal links, and history versions control.
  • MediaWiki is a software to run wiki sites. It’s open-source and self-hostable, so anyone can run their own instances (think Wordpress).
  • English Wikipedia is one very large and popular MediaWiki instance.
  • Wikipedia is a collection of wikis in many languages (and variants), each a MediaWiki instance.
  • Wikimedia is a movement; it’s also the foundation entity that runs Wikipedia and other projects.
  • Wikibase is a MediaWiki extension that enables it to hold structured, linked open data.
  • Wikidata is a knowledge database. Wikidata is an instance of MediaWiki with the Wikibase extension; it’s also one of the Wikimedia projects. (This sentence is living proof of the confusion between these terms).

There Are 4 APIs for Fetching Wikipedia Content

  MediaWiki Action API MediaWiki REST API Wikimedia REST API Wikimedia Enterprise
available on all MediaWiki instances most active MediaWiki instances
(>= v1.35, which was released in July 2020)
only Wikimedia projects major Wikimedia projects
available for public usage (page content, searching, etc.)
authenticated usage (accounts, email sending, content editing, etc.)
public usage (page, media files, history, transformation, etc.)
authenticated usage (content editing)
public usage (page content, search, transformation)

special offer: en.wiktionary.org offers API for structured definition data
snapshot (dumps), on-demand (structured documents), realtime changes
style action-based RESTful RESTful RESTful
request format form data-based json json json
response format json, xml (deprecated), php (deprecated) json, html json, html, pdf json
documentation API:Main_page API:REST_API Wikimedia_REST_API API Documentation
references - API:REST_API/Reference

   
spec - - OpenAPI 3 spec -
API Explorer API Sandbox - API Explorer (Swagger) -
endpoint* [project url]/[script path]/api.php

sample: https://en.wikipedia.org/w/api.php
[project url]/[script path]/rest.php/v[version number]

sample: https://en.wikipedia.org/w/rest.php/v1
[project url]/api/rest_v1

sample: https://en.wikipedia.org/api/rest_v1/
https://api.enterprise.wikimedia.com/
auth login token-based OAuth token-based   token-based
clients/SDKs no official clients. There’s one official list of available clients no official clients no official clients official SDKs for go and python

[*] The ScriptPath is a config value of MediaWiki. It’s probably /w or blank for most instances. See Manual:$wgScriptPath.

APIs to Enhance Wikipedia Experience

These APIs are not for fetching data from Wikipedia, but for developing MediaWiki itself and extensions to enhance the experience of reading and editing wikipedia. (think wordpress plugins).

Wikidata is a treasure

SPARQL Query Services

Wikidata is quite different from Wikipedia as it is a structured knowledge base. The data (statements) is organized into items (nodes), properties (edges) and values. For example:

item property values
Albert Einstein (Q937) educated at (P69) St John’s College (Q206702)
Bulbasaur (Q847571) instance of (P31) grass-type Pokémon (Q25930653)

In other words, it’s a giant graph database. With Wikidata Query Service, you can use SPARQL to run complex searches. For example, this query asks for all grass-type pokémons. It’s also quite powerful because you can query data from other wikis and using external info like IMDB IDs.

Useful links:

Wikibase RESTful API

Wikibase (and by extension, Wikidata) also provides a RESTful API. Not as powerful as SPARQL, but it’s simpler and more familiar to most developers.

Useful links for the Wikibase/Wikidatab RESTful API:

Page View Stastics Data

Wikimedia offers a Pageviews Analysis tool for examining traffic data on Wikimedia wikis. This tool allows user to query and compare traffic statistics by page, site, language, with data aggregated by day, month, or year.

Useful links:

Exporting, Dumps, and Datasets

MediaWiki natively supports importing and exporting data. There are also plenty of tools and projects for that purpose.

Useful links:

There’re also unofficial datasets available. DBpedia is possibly the most popular one. Besides data dumps, DBpedia also provides many resources including its own SPARQL query service.

Updated:

Comments