Setup Wikidata

Entity Linking

Wikidata entities must first be identified in the text to retrieve related subgraphs. You can use any available entity linking algorithms for Wikidata. Just to name a few:

Among them, REL (Radboud Entity Linker) can be easily set up locally for offline inference while achieves near-SOTA performance. We show how to use REL for entity linking on Wikidata.

Setup Entity Linker REL

  1. Install REL using pip

pip install radboud-el
  1. Download necessary files (2019 dump)

    # Place them under resources/rel
    mkdir -p resources/rel && cd resources/rel
    # Download generic files
    wget http://gem.cs.ru.nl/generic.tar.gz
    # Download Wikipedia corpus (2019)
    wget http://gem.cs.ru.nl/wiki_2019.tar.gz
    # Download entity disambiguation model (2019)
    wget http://gem.cs.ru.nl/ed-wiki-2019.tar.gz
    
    # Unzip files
    tar -zxvf generic.tar.gz && rm generic.tar.gz
    tar -zxvf wiki_2019.tar.gz && rm iki_2019.tar.gz
    tar -zxvf ed-wiki-2019.tar.gz && rm ed-wiki-2019.tar.gz
    

    The unzipped folder structure should look like this. If not, please adjust accordingly.

    resources/rel
    ├── generic
    └─── wiki_2019
    |   ├── basic_data
    |      └── anchor_files
    |   └── generated
    

    Please refer to REL’s documentation for further details.

Setup Wikimapper

REL links text spans to Wikipedia article titles. We then need Wikimapper to further map them to Wikidata IDs.

  1. Install Wikimapper using pip

    pip install wikimapper
    
  2. Prepare Wikimapper database

    • You can create your own database index. Please check create your own index.

    • You can download the precomputed indices from Wikimapper’s author (2019’s dump)

      mkdir resources/wikimapper && cd resources/wikimapper
      wget https://public.ukp.informatik.tu-darmstadt.de/wikimapper/index_enwiki-20190420.db
      
    • Alternatively, you can download the index computed by ourselves. They are newer (2023 Feb), and come with cased and uncased variant.

      mkdir resources/wikimapper && cd resources/wikimapper
      # They are hosted on google drive. gdown is a convenient gdrive download helper
      pip install gdown
      # index_enwiki-latest-cased.db
      gdown 1yMdzP4inW9CW5YbRZYVvsZYANHAERipL
      # index_enwiki-latest-uncased.db
      gdown 1hbfaaotNrWP3ecqk8B1Wnhf1ARZRakb9
      

SPARQL Endpoint

See also

If you have no root access, you can also setup the qEenpoint rootlessly.

We use qEndpoint to spin up a Wikidata endpoint that contains a Wikidata Truthy dump. If you have not installed docker yet, please check Get Docker.

  1. Download

    sudo docker run -p 1234:1234 --name qendpoint-wikidata qacompany/qendpoint-wikidata
    
  2. Run

    sudo docker start  qendpoint-wikidata
    
  3. Add Wikidata prefixes support. With this, you can leave out Wikidata prefixes every time you send queries to the endpoint.

    wget https://raw.githubusercontent.com/the-qa-company/qEndpoint/master/wikibase/prefixes.sparql
    sudo docker cp prefixes.sparql qendpoint-wikidata:/app/qendpoint && rm prefixes.sparql