Skip to content

Latest commit

 

History

History
75 lines (54 loc) · 2.1 KB

README.md

File metadata and controls

75 lines (54 loc) · 2.1 KB

MediaWiki Tools

Coverage Status

A high level library containing a set of tools for filtering pages using the rich data available in MediaWikis such as categories and info boxes. Uses both web-scraping and API methods (where available and feasible) to gather information.

Goals

  • Generate useful data (and datasets) from a wiki.
  • To work on any MediaWiki (including fandom.com) with or without api.
  • Get arbitrary subsets of pages based on categories and template parameters (todo).
  • Be very robust to variations and inconsistencies in user input.
  • Be efficient.

Installation

Install it using pip.

pip install mediawiki-tools

Requires python >3.8 because I like the walrus operator.

Usage

Check out the basic usage guide and detailed API documentation.

Example

Question: Which countries in Asia use english as spoken Language?

Answer:

from mwtools import MediaWikiTools

wiki = MediaWikiTools('en.wikipedia.org')

wiki.get_set(['Countries in Asia', 
              'English-speaking countries and territories'], 
             'and')
# ['Philippines', 'Pakistan', 'Bahrain', 'Singapore', 'Brunei', 'India']

Question: Which countries in Asia or Europe use english as spoken Language?

Answer:

wiki.get_set(['Countries in Asia', 'Countries in Europe',
              'English-speaking countries and territories'], 
             ['or','and'])
# ['Philippines',
#  'United Kingdom',
#  'Brunei',
#  'Malta',
#  'India',
#  'Pakistan',
#  'Scotland',
#  'Republic of Ireland',
#  'Singapore',
#  'Bahrain']

Question: Which of these countries are not island nations?

Answer:

wiki.get_set(['Countries in Asia', 'Countries in Europe',
              'English-speaking countries and territories',
              'Island countries'], 
             ['or', 'and', 'not'])
# ['Pakistan', 'India']