Welcome to PES MATCH’s documentation!¶
PES MATCH¶
Python package and example pipeline for matching census and post-enumeration survey (PES) data. For simplicity, only deterministic and clerical matching methods are provided.
For additional information consider reading:
Setup instructions¶
Once Python is installed, download this repository and execute the command pip install -e .
or make install
to install/update the package and its dependencies.
Using this code¶
The aim of this repository is to provide a set of functions for cleaning and then matching census and PES data. The package gives matchers the tools to automatically match records at the chosen level of geography and then resolve conflicts/difficult cases using the Clerical Resolution Online Widget (CROW). Associative matching can also be implemented to assist in finding all possible matches between census and PES households. Once all methods have been applied at a single level of geography, matches can be combined and residuals collected for the next stage of matching.
An example of a typical matching pipeline is provided in the pipeline/
repository, however users may wish to apply matching methods in their own chosen order.
Files can be executed in the command line using python file_name.py
or within an IDE using the inbuilt run function.
Pipeline running order¶
Step 1: Run scripts in
pipeline/processing/
to clean raw filesStep 2: Update matchkeys in
pipeline/X_Stage_X/
before running the scripts in order. 2 stages are included but more can be created if neededStep 3 (Optional): Run clerical search (
pipeline/Clerical_Search/
) to find the remaining matches within chosen level of geography e.g., enumeration area (EA)
When is CROW required?¶
CROW is required for clerical matching (to manually resolve non-unique matches) after running the following scripts:
pipeline/1_Stage_1/01_stage_1_person.py
pipeline/2_Stage_2/01_stage_2_person.py
Directories and files of note¶
Descriptions of project directories and other significant files:
src/pes_match/
- the core package in the project.src/pes_match/cleaning.py
- functions to clean and prepare census / PES data for matchingsrc/pes_match/cluster.py
- function to cluster together non-unique matches between census / PES datasrc/pes_match/crow.py
- functions to prepare datasets for CROW and read in outputs from CROWsrc/pes_match/matching.py
- functions to match census and PES data togethersrc/pes_match/parameters.py
- parameters that can be updated before or during matching
pipeline/
- example pipeline that can be used as a starting point for matching census and PES data. Matchkeys can be updated within the scriptsCROW/
- contains the code and config files for the Clerical Resolution Online WidgetData/
- contains mock data and stores cleaned data, clerical inputs/outputs, checkpoint files and final outputstests/
- tests for the code undersrc/pes_match/
docs/
- package documentation that is built and hosted as HTML by GitHub Actions based on themain
branch code. See documentation for sphinx for updating or building docs manuallydocs/CONTRIBUTING.md
- general guidance for contributing to the projectdocs/CODE_OF_CONDUCT.md
- code of conduct
.github/workflows
- configs for GitHub Actions