covid19-jhudashboard-livedata-extraction
Covid19 numbers extraction from John Hopkins University Dashboard
https://github.com/rvg296/covid19-jhudashboard-livedata-extraction
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.9%) to scientific vocabulary
Repository
Covid19 numbers extraction from John Hopkins University Dashboard
Basic Info
- Host: GitHub
- Owner: rvg296
- Language: Jupyter Notebook
- Default Branch: master
- Size: 1.6 MB
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
readme.md
Automatic data extraction & export of categorized data from JHU COVID-19 Dashboard
Introduction
Johns-Hopkins University has put in a tremendous effort in gathering all the COVID-19 data from reliable sources and bringing to us in the form of an ArcGIS Dashboard. They have also open-sourced all the feature layers to public, for using in their own dashboards or analysis.
Challenge
Even though most of the data is open sourced, for people who want get all numbers, it would be tedious to look for the service layers, add/connect them in their ArcGIS Online Organization, download the data set and aggregate them county-wise.Hence, this is an attempt to automatically extract the numbers (Active, Confirmed, Deaths) both Countrywise,US-Countywise and finally categorize,aggregate,style and export them to a neatly formatted spreadsheet from their dashboard. This is performed by leveraging Pandas and ArcGIS API for Python.
Data Source
Here is the first hosted feature service that was open-sourced by John Hopkins
This feature service has 3 feature layers in it namely
Upon a huge demand, JHU also came up with US-Countywise feature service on March 23rd, 2020.
Extraction Workflow
- Get the feature services published in ArcGIS Online using their Item IDs for both worldwide & US.
- Get the Cases_Country feature layer in the World Wide dataset and Cases feature layer in the US Countywise dataset.
- Convert both the feature layers into spatial data frames for data extraction.
Countrywide cases
- We extract only the required fields from the dataset namely
- Country_Region
- Deaths
- Confirmed
- Recovered
- Active
- Now apply the styling using Pandas Styling API.
- Here Bar Style and Gradient Styles were used for visualizing the data.
- After visualization, export the styled dataframe into a neatly formatted spread sheet.
US-CountyWise cases
Since JHU stated that they havent found a reliable source for Active and Recovered cases at county level, this was not included. But this may change over time once reliable data is found.
In addition, as mentioned by JHU, all the cases for which exact county location is not known but a state is known are assigned a Lat Long of 0,0 and named as Unassigned in the county column.
From the USCountywise spatial dataframe we extract only State,County,Confirmed,Deaths.
Even, through the extraction is straight-forward, we wanted to have the data grouped by state and county for which pandas pivotting is performed.
Once that is completed, styling is applied on the dataframe to quickly identify counties where deaths occurred, State-wide death totals and finally US-wide totals.
After we visualize the dataframe, we can export it to a neatly formatted spreadsheet like above.
Discrepancies:
You might observe a small difference in Confirmed and Active numbers for the US when we compare the US-Countywise cases dataset and the Country wide dataset. This is due to the time of update of the feature layers. The country-wide feature layer update is ahead of the US-Countywise update.
Notebook Viewer:
Due to limitation of rendering interactive Javascript plots,stylings in your notebook inside the github repository, I have included the notebook along with dataframe stylings in the nbviewer. Click to access the notebook viewer here
Sample Screenshots:
The numbers quoted below may vary depending on the time you run the script
In Jupyter Notebook
In Excel
References:
Owner
- Name: Rohit Mendadhala
- Login: rvg296
- Kind: user
- Location: Texas
- Website: rvg296.github.io
- Repositories: 1
- Profile: https://github.com/rvg296
Passionate about geospatial problem solving, geospatial data science, geospatial development
Citation (citation.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Mendadhala" given-names: "Rohit Venkat Gandhi" orcid: "https://orcid.org/0000-0003-4847-0880" title: "Covid19 John Hopkins University Dashboard LiveData Extraction" version: 1.0.0 doi: 10.5072/zenodo.1171530 date-released: 03/2020 url: "https://github.com/rvg296/Covid19-JHUDashboard-LiveData-Extraction"