cambridge-estates-building-energy-archive

Data archive for Cambridge University Estates building energy usage dataset (for use with CityLearn)

https://github.com/eeci/cambridge-estates-building-energy-archive

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
    Organization eeci has institutional domain (www.eeci.cam.ac.uk)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.0%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Data archive for Cambridge University Estates building energy usage dataset (for use with CityLearn)

Basic Info
  • Host: GitHub
  • Owner: EECi
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 166 MB
Statistics
  • Stars: 1
  • Watchers: 3
  • Forks: 0
  • Open Issues: 1
  • Releases: 3
Created almost 3 years ago · Last pushed 12 months ago
Metadata Files
Readme License Citation

README.md

Cambridge University Estates building energy usage archive (2000-2023)

DOI

This repository hosts a dataset of historic building energy usage (electricity and gas) from buildings across the Cambridge University Estates covering the period 2000 to 2023. The electricity usage data includes lighting, plug loads, and plant equipment electricity consumption. It is assumed that for the period covered, none of the buildings have heat pumps installed, and so the gas usage data corresponds to the total heating energy usage for the buildings.

Interactive plots visualising the available data can be found at EECi.github.io/Cambridge-Estates-Building-Energy-Archive.

Tools are provided for identifying and constructing building energy datasets that are in a format compatible with the CityLearn environment for building energy control simulation. Detail on this formatting can be found in the CityLearn documentation. All predicted variables are perfect predictions copied from the true data measurements.

DataSources.md provides details of the source of the data variables within the datasets, and any pre-processing performed.

Updates for Version 2

Version 2 of this dataset provides two major updates: 1. Gas usage data is provided for all buildings where it is available. 2. The dataset is expanded to include more buildings and more years of data. Some buildings from Version 1 are removed.

NOTE: the annonymised building IDs in Version 2 do not correspond to the building IDs in Version 1.

Version 2.1

Solar panel model parameters for Renewables.Ninja API call adjusted to make solar generation data more realistic (defaults from web portal used). Previously solar generation data was overly optimistic with excessively high capacity factors due to use of optimal tracking & tilt option.

Note on Data Processing

Very lightweight pre-processing is performed on the energy usage data obtained from the Cambridge University Estates building monitoring systems.

There are two key steps: 1. The data is screened for years with sufficient data availability and visually inspected data quality. 2. Missing data is replaced with zeros (for compatibility with the CityLearn environment).

Further detail on pre-processing is available in DataSources.md.

As a result, the provided data contains substantial real-world 'messiness'. These data quality and availability issues are common in practical building monitoring systems. Hence, this dataset provides opportunity for the study of data quality issues in building energy management. The following studies are suggested: - Data missingness/validity detection and data imputation - Change-point detection for building/occupant behaviour changes - Building control scheme robustness (i.e. stability under unreliable input data)

Directory Structure

  • building_data
    • processed_data; pre-processed electricity and gas data for each building (csv files for each available year of data)
    • find_cont_datasets.ipynb; script of identifying sets of buildings with data available over continuous time periods
    • summarise_building_data.ipynb; script for reporting and visualising summary data on building data availability
    • view_building_data.py; script for visualising building data time series
    • viz_data_availability.py; script for visualising data availability across buildings
  • aux_data; pre-processed data for weather, solar, electricity pricing, and grid carbon intensity (csv files for each available year of data) + scripts for gathering and processing data
  • DataSources.md; documentation on the source of the data variables within the datasets, and any pre-processing performed
  • prep_dataset.ipynb; script for preparing datasets for CityLearn environment
  • resources; resources for generating datasets
  • datasets; output directory for generated datasets

Citation

If you use any data provided in this repository please cite it using the following,

@misc{langtry2024CambridgeUniversityEstates, author = {Langtry, Max and Choudhary, Ruchi}, month = may, title = {Cambridge University Estates building energy usage archive}, version = {2.1}, year = 2024, doi = {10.5281/zenodo.10955332}, url = {https://github.com/EECi/Cambridge-Estates-Building-Energy-Archive}, }

Acknowledgements

We would like to thank the Cambridge University Estates division for their help making this data publicly accessible.

Owner

  • Name: Energy Efficient Cities Initiative, University of Cambridge
  • Login: EECi
  • Kind: organization
  • Location: Cambridge, UK

GitHub Events

Total
  • Issues event: 1
  • Issue comment event: 1
  • Push event: 6
Last Year
  • Issues event: 1
  • Issue comment event: 1
  • Push event: 6

Dependencies

requirements.txt pypi
  • CityLearn ==1.7.0
  • ipykernel ==6.29.3
  • ipython ==8.18.1
  • jupyter ==1.0.0
  • matplotlib ==3.8.3
  • numpy ==1.21.6
  • openpyxl ==3.1.2
  • pandas ==1.3.5
  • plotly ==5.20.0
  • scipy ==1.11.4
  • seaborn ==0.13.2
  • tqdm ==4.66.2