ukpopulation
ukpopulation: unified national and subnational population estimates and projections, including variants - Published in JOSS (2018)
Science Score: 95.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org, zenodo.org -
✓Committers with academic emails
3 of 6 committers (50.0%) from academic institutions -
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Keywords from Contributors
Repository
Population and demographics projection module, developed for ITRC/MISTRAL
Basic Info
Statistics
- Stars: 13
- Watchers: 9
- Forks: 9
- Open Issues: 3
- Releases: 9
Topics
Metadata Files
README.md
ukpopulation: UK Demographic Projections
Latest Release: 1.2.2
- update to 2018 mid-year population estimates
- fixes for changed data formats for some of the subnational population projections
- credit and thanks to @BenjaminIsaac0111 and @ld_archer for the work
1.2 Release
Adds support for custom subnational population projections.
Custom SNPP Data
An externally generated SNPP dataset (from e.g. simim) can be registered with the ukpopulation package and used as if it was the standard ONS/StatsWales/NRScotland/NISRA projection:
```python3
import ukpopulation.customsnppdata as CustomSNPPData customdata = pd.readcsv("customsnpp.csv") customdata.head() GEOGRAPHYCODE GENDER CAGE OBSVALUE PROJECTEDYEARNAME 0 E06000005 1 0 603.0 2018 1 E06000005 1 1 600.0 2018 2 E06000005 1 2 624.0 2018 3 E06000005 1 3 636.0 2018 4 E06000005 1 4 661.0 2018 CustomSNPPData.registercustomprojection("customsnpp", customdata, "cachedirectory") Writing custom SNPP customsnpp to cache/ukpopulationcustomsnppcustomsnpp.csv CustomSNPPData.listcustomprojections("cachedirectory") ['customsnpp']
```
The external dataset must follow the format/column name conventions as above, but can also contain extra data if required for other use. The GENDER column should only take the values 1 (male) or 2 (female); the C_AGE column should contain the range 0-90 inclusive (90 meaning 90 or over).
1.1 release
- adds UK household projections
- initial support for custom SNPP variants
- better consistency across the MYE/NPP/SNPP APIs (breaks backwards compatibility)
- fixes some bugs/issues
Household Projections
Version 1.1 adds functionality is aggregates household projection data for the UK at LAD (or equivalent level). Each country's statistical agency provides a disaggregation by household type, but as there is little or no consistency between them, no attempt has (yet) been made to provide a UK-wide unified disaggregation. The year ranges mirror the year ranges for the SNPP for each country (see below). Extrapolation, or application of a national projection variant to the data is not provided at this stage.
Custom SNPP Variants
Given externally-generated data describing variations to an official projection variant, by geography (LAD) and year, this new functionality generates a full SNPP dataset, disaggregated proportionately by age and gender. The custom variant can optionally be forced to nearest-integer values, preserving the original (rounded) total.
API consistency
The public methods of the MYEData, SNPPData, NPPData and SNHPData classes now consistently order arguments: firstly categories (where appropriate) then geogs, then years.
The statistical agencies of the United Kingdom, that is: ONS, StatsWales, NR Scotland, and NISRA, all produce annual population estimates and projection data. Although the data are essentially the same, the quantity, format, and availability varies between agencies and datasets. All of the population projection data is available by (single year of) age and gender. Household projection data is more varied, with each country/agency producing inconsistent disaggregations of households in terms of HRP age and/or household type. For now, this package does not disaggregate by age and leaves the household type disaggregations as-is, which are:
Lowest common denominator | England (ONS) | Wales (StatsWales) | Scotland (NRScotland) | N.Ireland (NISRA) |
--------------------------|-----------------------------------------------------|--------------------|-----------------------|--------------------|
Single person | 'One person households: Female'
'One person households: Male' |'1 person'|'1 adult: female'
'1 adult: male'|'One adult households'|
Adults and children | 'Households with one dependent child'
'Households with three or more dependent children'
'Households with two dependent children' | '2 person (1 adult, 1 child)'
'3 person (2 adults, 1 child)'
'3 person (1 adult, 2 children)'
'4 person (2+ adults, 1+ children)'
'4 person (1 adult, 3 children)'
'5+ person (2+ adults, 1+ children)'
'5+ person (1 adult, 4+ children)'| '1 adult 1+ children'
'2+ adults 1+ children'| 'One adult households with children'
'Other households with children'|
Adults only | 'Other households with two or more adults'| '2 person (No children)'
'3 person (No children)'
'4 person (No children)'
'5+ person (No children)'| '2 adults'
'3+ adults'| 'Two adults without children'
'Other households without children'
National population projections (NPP) are the responsibility of ONS who provide the data for each country within the UK, including 15 variants covering a number of possible future scenarios. The current data is based on 2018 population estimates and project a century to 2118.
Subnational population projections (SNPP) are the responsibility of each country's agencies (ONS for England), and project 25 years from a base year that depends on the country in question:
Country | Latest SNPP year range (as of May 2020) | -----------------|------------------------------------------| England | 2018-2043 | Wales | 2018-2043 | Scotland | 2018-2043 | Northern Ireland | 2016-2041 |
Mid-year population estimates (MYE) are available for the entire UK by local authority, single year of age and gender, from 1991 to 2016 inclusive.
Projection Coverage
The countries within the UK produce their own SNPP data, and also produce some (patchy) variant projections. The ONS currently regard these (the England ones at least) as "experimental".
Scenario/Variant | Code | E | S | W | N |NPP| ---------------------------------|------|---|---|---|---|---| Principal | ppp | x | x | x | x | x | High fertility | hpp | x | x | | | x | Low fertility | lpp | | x | | | x | High life expectancy | php | | x | | | x | Low life expectancy | plp | | x | | | x | Moderately high life expectancy | pjp | | | | | x | Moderately low life expectancy | plp | | | | | x | High migration | pph | | x | | | x | Low migration | ppl | | x | | | x | High population | hhh | | | x | | x | Low population | lll | | | x | | x | 0% future EU migration | ppq | | | | | x | 50% future EU migration | ppr | | | | | x | 150% future EU migration | pps | | | | | x | Zero net migration | ppz | x | x | x | | x | Young age structure | hlh | | | | | | Old age structure | lhl | | | | | | Replacement fertility | rpp | | | | | | Constant fertility | cpp | | | | | | No mortality improvement | pnp | | | | | | No change | cnp | | | | | | Long term balanced net migration | ppb | | | x | | |
Accessibility
Nomisweb provides an API which allows relatively easy programmatic access the to data, and by far the preferred source of data. Currently not all the data is available from this source but this may change.
Nomisweb currently hosts the ONS principal NPP data for the UK, the SNPP data for England, and all of the MYE data.
All other data: ONS NPP variants, SNPP data for Wales, Scotland and Northern Ireland are available in different formats from the appropriate agency's website.
Rationale
The purpose of this package is to provide a unified interface to both SNPP and NPP data, including variants:
- encapsulating the downloading, processing and caching of the NPP and SNPP data from the various sources.
- consistently differentiating by age (single year, up to 90) and gender over the various datasets.
- providing a unified format for all the data.
- providing a method of synthesising SNPP variant projections using SNPP principal and NPP principal/variant projections
- providing a method of extrapolating SNPP data using NPP data
- enabling easy filtering and aggregating of of the data, e.g. extracting projections of the working-age population.
Methodology and Detail
Data Sources
- Nomisweb: UK NPP by country/age/gender, England SNPP by LAD/age/gender, UK MYE by LAD/age/gender.
- ONS: UK NPP variants by country/age/gender.
- Stats Wales: Wales SNPP by LAD/age/gender.
- National Records of Scotland: Scotland SNPP by LAD equivalent/age/gender.
- Northern Ireland Statistics and Research Agency: Northern Ireland SNPP by LAD equivalent/age/gender.
Data Processing
- Note that the SNPP for each country, and the NPP data may not have the same reference year. (See table above).
- NPP data is broken down by country (England/Wales/Scotland/Northern Ireland), for all the variant projections indicated in the second table above.
- Column headings and category values follow the nomisweb/census conventions:
GEOGRAPHY_CODE: ONS country, LAD, or LAD-equivalent codeGENDER: 1=Male, 2=FemaleC_AGE: 0-90, where 90 represents 90 or over. To avoid ambiguity, this is an exception - nomisweb census values are typically age+1)PROJECTED_YEAR_NAME: 2014-2116OBS_VALUE: count of persons
- All data are cached for swift retrieval.
Extrapolation
The SNPP data can be extrapolated using the longer-term NPP data. This is done independently for each age and gender in order to try to capture the age-gender structure of the original population. Aggregation only takes place on the extrapolated age-gender specific values. This means that the trends shown by SNPP geographies with different age-gender structures will differ.
If, conversely, the extrapolation was done on the aggregated populations for each SNPP geography, then each SNPP geography would have an identical trend, which would be identical to that of the NPP data.
This methodology can be more formally explained by the following equation for the aggregate SNPP S(g,y) for a given geography and year:

where N is the NPP, a is age, s is gender, y bar is a reference year (typically the final year in the SNPP data), and c(g) represents a mapping from a SNPP geography (LAD) to a NPP one (country).
Projection of Variants
The extrapolation methodology above can equally be applied to synthesising SNPP variants from SNPP principal and NPP variant data. The equivalent expression to the above is:

where the subscripts V and 0 refer to the variant and the principal projections respectively.
Installation
Requirements
API Key
This package uses the UKCensusAPI package to obtain some of the projection data. The package requires an API key to function correctly, see here for details.
Package
Requires Python 3.5 or higher. Dependencies should resolve automatically, but if not see troubleshooting
PyPI
bash
python3 -m pip install ukpopulation
Conda
bash
conda install ukpopulation
This assumes you have added the conda-forge channel, which can be done with
bash
conda config --add channels conda-forge
Some of the examples (see below) plot graphs and have a dependency on matplotlib, which can be installed with either pip or conda as appropriate.
Development version
To clone the repo and install locally:
bash
git clone https://github.com/nismod/ukpopulation
./setup.py install
(substituting the URL if a fork has been taken). The test data cache directory contains a file NOMIS_API_KEY which defines a dummy key for testing purposes only. The test suite can be run from the (project root dir) using:
bash
./setup.py test
Troubleshooting
Ensure you are using the correct version (>=3) of pip:
bash
pip --version
pip 9.0.1 from /usr/lib/python3/dist-packages (python 3.6)
If not replace pip with pip3 or python3 -m pip
If the installation has missing dependencies, try:
bash
pip install -r requirements.txt
./setup.py install
If (with python 3.5?) you encounter
bash
AttributeError: module 'html5lib.treebuilders' has no attribute '_base'
then
bash
pip install html5lib=0.9999999
should fix it. But better solution is to upgrade to python3.6
If matplotlib fails to install due to a missing dependency (tkinter), this can be fixed on Debian variants by
bash
sudo apt install python3-tk
If your problem isn't addressed above, please post an issue including as much supporting information as possible.
Usage Examples
Retrieve SNPP for specific LADs
Detailed data
This example fetches the 2018 projection for Newcastle by gender and age.
```python
import ukpopulation.snppdata as SNPPData snpp = SNPPData.SNPPData() ```
text
Cache directory: ./raw_data/
using cached LAD codes: ./raw_data/lad_codes.json
Collating SNPP data for England...
./raw_data/NM_2006_1_metadata.json found, using cached metadata...
Using cached data: ./raw_data/NM_2006_1_56aba41fc0fab32f58ead6ae91a867b4.tsv
./raw_data/NM_2006_1_metadata.json found, using cached metadata...
Using cached data: ./raw_data/NM_2006_1_dbe6c087fb46306789f7d54b125482e4.tsv
Collating SNPP data for Wales...
Collating SNPP data for Scotland...
Collating SNPP data for Northern Ireland...
```python
newcastle=snpp.filter("E08000021", 2018) newcastle.head() ```
text
C_AGE GENDER GEOGRAPHY_CODE OBS_VALUE PROJECTED_YEAR_NAME
0 0 1 E08000021 1814.0 2018
1 1 1 E08000021 1780.0 2018
2 2 1 E08000021 1770.0 2018
3 3 1 E08000021 1757.0 2018
4 4 1 E08000021 1747.0 2018
Aggregated data
This example fetches the total population projections for Newcastle from 2018 to 2039.
```python
import ukpopulation.snppdata as SNPPData snpp = SNPPData.SNPPData() ```
Cache directory: ./raw_data/
using cached LAD codes: ./raw_data/lad_codes.json
Collating SNPP data for England...
./raw_data/NM_2006_1_metadata.json found, using cached metadata...
Using cached data: ./raw_data/NM_2006_1_56aba41fc0fab32f58ead6ae91a867b4.tsv
./raw_data/NM_2006_1_metadata.json found, using cached metadata...
Using cached data: ./raw_data/NM_2006_1_dbe6c087fb46306789f7d54b125482e4.tsv
Collating SNPP data for Wales...
Collating SNPP data for Scotland...
Collating SNPP data for Northern Ireland...
python
newcastle=snpp.aggregate(["GENDER", "C_AGE"], "E08000021", range(2018,2039)) newcastle.head() ```
GEOGRAPHY_CODE PROJECTED_YEAR_NAME OBS_VALUE
0 E08000021 2018 299132.0
1 E08000021 2019 300530.0
2 E08000021 2020 301699.0
3 E08000021 2021 302729.0
4 E08000021 2022 303896.0
Retrieve NPP data filtered by age
Here's how to get the total working-age population by country from 2016 to 2050:
```python
import ukpopulation.nppdata as NPPData npp = NPPData.NPPData() ```
Cache directory: ./raw_data/
using cached LAD codes: ./raw_data/lad_codes.json
Loading NPP principal (ppp) data for England, Wales, Scotland & Northern Ireland
./raw_data/NM_2009_1_metadata.json found, using cached metadata...
Using cached data: ./raw_data/NM_2009_1_444caf1f672f0646722e389963289973.tsv
```python
ukworkingage=npp.aggregate(["GENDER", "CAGE"], "ppp", NPPData.NPPData.UK, range(2016,2051), ages=range(16,75)) ukworkingage.head() GEOGRAPHYCODE PROJECTEDYEARNAME OBS_VALUE 0 E92000001 2016 40269470 1 E92000001 2017 40460118 2 E92000001 2018 40591965 3 E92000001 2019 40704521 4 E92000001 2020 40834471 ```
And this aggregates the figures for Great Britain:
```python
gbworkingage=npp.aggregate(["GEOGRAPHYCODE", "GENDER", "CAGE"], "ppp", NPPData.NPPData.GB, range(2016,2051), ages=range(16,75)) gbworkingage.head() PROJECTEDYEARNAME OBS_VALUE 0 2016 46590014 1 2017 46801693 2 2018 46944219 3 2019 47063069 4 2020 47201882 ```
NB SNPP data can also be filtered by age and/or gender and/or geography in the same way.
Retrieve NPP variants for England & Wales
First detailed data (by age, gender and country), then aggregated by age and gender.
```python
import ukpopulation.nppdata as NPPData npp=NPPData.NPPData() Cache directory: ./rawdata/ using cached LAD codes: ./rawdata/ladcodes.json Loading NPP principal (ppp) data for England, Wales, Scotland & Northern Ireland ./rawdata/NM20091metadata.json found, using cached metadata... Using cached data: ./rawdata/NM20091444caf1f672f0646722e389963289973.tsv highgrowth = npp.detail("hhh", NPPData.NPPData.EW) highgrowth.head() CAGE GENDER OBSVALUE PROJECTEDYEARNAME GEOGRAPHYCODE 0 0 1 343198 2016 E92000001 1 0 1 334025 2017 E92000001 2 0 1 345332 2018 E92000001 3 0 1 349796 2019 E92000001 4 0 1 354274 2020 E92000001 highgrowthagg = npp.aggregate(["GENDER", "CAGE"], "hhh", NPPData.NPPData.EW) highgrowthagg.head() GEOGRAPHYCODE PROJECTEDYEARNAME OBS_VALUE 0 E92000001 2016 55268067 1 E92000001 2017 55660155 2 E92000001 2018 56115027 3 E92000001 2019 56568795 4 E92000001 2020 57019007
```
Extrapolate MYE using SNPP and NPP data
Single Area
Construct aggregate data for Exeter from 2011-2065:
- use MYE data up to 2016, aggregated by age and gender.
- then use SNPP data up to 2041, aggregated by age and gender.
- extrapolate using NPP data and Exeter's (2041) age-gender structure.
- aggregrate the extrapolated data by age and gender
- plot the data.

Bulk Calculation
In this example we extrapolate and aggregrate the SNPP for every LAD in Wales:
- for each area,
- extrapolate from 2039 to 2050 using the 2039 age-gender structure.
- aggregate the extrapolated datma by age and gender.
- append to full dataset.
- save Wales dataset as csv:
| GEOGRAPHYCODE | PROJECTEDYEARNAME | OBSVALUE | | -------------- | ------------------- | --------- | | W06000011 | 2040 | 262903.24103359133 | | W06000011 | 2041 | 262933.2340468692 | | W06000011 | 2042 | 263162.3661643687 | | W06000011 | 2043 | 263332.96819104964 | | W06000011 | 2044 | 263593.29826455784 | | W06000011 | 2045 | 263923.03553008236 | | W06000011 | 2046 | 264243.6253810904 | | W06000011 | 2047 | 264168.2113917932 | | W06000011 | 2048 | 264211.4576059673 | | ... | ... | ... |
Construct an SNPP variant by applying NPP variant to a specific LAD
Here we apply the "hhh" (high growth) and "lll" (low growth) NPP variants to the SNPP data for Newcastle:
- calculate the principal ("ppp") projection by simply aggregrating the SNPP data for Newcastle, 2018-2039, by age and gender.
- calculate the variants by weighting the unaggregated data (i.e. by age and gender) by the ratio of the NPP variant/principal.
- aggregrate the variant data by age and gender.
- plot the results.

Extrapolating an SNPP variant
Here we build on the examples above by not only applying the NPP variant, but extrapolating too. The process first involves extrapolating the SNPP by the NPP principal variant. The extrapolated data then has the variant adjustments applied to it.

Comparing household and population projections for a single LAD
In this example we simply plot the aggregate household projections for Newcastle against the (principal) population projection. You can see that population growth starts to tail off more than the household growth. This suggests a decrease in household size. Further inspection of the data should confirm this.

Code Documentation
Package documentation can be viewed like so:
python
import ukpopulation.myedata as MYEData
help(MYEData)
import ukpopulation.nppdata as NPPData
help(NPPData)
import ukpopulation.snppdata as SNPPData
help(SNPPData)
import ukpopulation.snhpdata as SNHPData
help(SNHPData)
Contributions
Contributions to this package are welcomed via the usual pull request mechanism.
Support
If you encounter a bug, feel the documentation is incorrect or incomplete, or want to suggest new features, please post an issue in the issues tab.
Citing
Please acknowledge this software if you use it in your work:
bibtex
@software{neworder,
doi = { 10.5281/zenodo.4244147 },
author = { Andrew P Smith, Tom Russell },
year = { 2020 },
version = { 1.2.2 },
url = { https://github.com/nismod/ukpopulation },
title = { ukpopulation: UK Demographic Projections }
}
Acknowledgements
This package was developed as a component of the EPSRC-funded MISTRAL programme, part of the Infrastructure Transitions Research Consortium.
Owner
- Name: National Infrastructure Systems Model
- Login: nismod
- Kind: organization
- Location: United Kingdom
- Website: www.itrc.org.uk
- Repositories: 30
- Profile: https://github.com/nismod
JOSS Publication
ukpopulation: unified national and subnational population estimates and projections, including variants
Authors
Tags
python data science population projectionGitHub Events
Total
- Fork event: 2
Last Year
- Fork event: 2
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| virgesmith | a****h@l****k | 132 |
| Andrew Smith | v****h | 38 |
| Benjamin Wilson | m****a@l****k | 29 |
| Tom Russell | t****l@g****m | 6 |
| dependabot[bot] | 4****] | 5 |
| ld-archer | l****r@l****k | 4 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 46
- Total pull requests: 26
- Average time to close issues: 5 months
- Average time to close pull requests: about 4 hours
- Total issue authors: 7
- Total pull request authors: 5
- Average comments per issue: 0.54
- Average comments per pull request: 0.12
- Merged pull requests: 22
- Bot issues: 0
- Bot pull requests: 8
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- virgesmith (28)
- BenjaminIsaac0111 (7)
- soodoku (5)
- tomalrussell (3)
- sveneggimann (1)
- bmcskelly (1)
- ld-archer (1)
Pull Request Authors
- BenjaminIsaac0111 (13)
- dependabot[bot] (8)
- virgesmith (2)
- ld-archer (2)
- tomalrussell (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 2
-
Total downloads:
- pypi 26 last-month
-
Total dependent packages: 0
(may contain duplicates) -
Total dependent repositories: 1
(may contain duplicates) - Total versions: 15
- Total maintainers: 1
pypi.org: ukpopulation
Download, cache, collate, filter, manipulate and extrapolate UK population and household estimates/projections
- Homepage: https://github.com/nismod/ukpopulation
- Documentation: https://ukpopulation.readthedocs.io/
- License: MIT License
-
Latest release: 1.2.2
published over 5 years ago
Rankings
Maintainers (1)
conda-forge.org: ukpopulation
- Homepage: https://github.com/nismod/ukpopulation
- License: MIT
-
Latest release: 1.2.2
published over 5 years ago
Rankings
Dependencies
- CacheControl ==0.12.6
- appdirs ==1.4.3
- beautifulsoup4 ==4.9.3
- certifi ==2020.6.20
- chardet ==3.0.4
- colorama ==0.4.3
- contextlib2 ==0.6.0
- distlib ==0.3.0
- distro ==1.4.0
- et-xmlfile ==1.0.1
- html5lib ==1.1
- idna ==2.8
- ipaddr ==2.2.0
- jdcal ==1.4.1
- lml ==0.1.0
- lockfile ==0.12.2
- lxml ==4.6.5
- msgpack ==0.6.2
- numpy ==1.21.0
- openpyxl ==3.0.7
- packaging ==20.3
- pandas ==1.2.4
- pep517 ==0.8.2
- progress ==1.5
- pyexcel ==0.6.6
- pyexcel-io ==0.6.4
- pyexcel-xls ==0.6.2
- pyexcel-xlsx ==0.6.0
- pyparsing ==2.4.6
- python-dateutil ==2.8.1
- pytoml ==0.1.21
- pytz ==2020.4
- requests ==2.25.1
- retrying ==1.3.3
- six ==1.15.0
- soupsieve ==2.0.1
- texttable ==1.6.3
- ukcensusapi ==1.1.6
- urllib3 ==1.26.5
- webencodings ==0.5.1
- xlrd ==1.2.0
- xlwt ==1.3.0

