icd

Tools for working with icd codes and comorbidities

https://github.com/mark-hoffmann/icd

Keywords

data-analysis icd python

Last synced: 7 months ago · JSON representation

Repository

Tools for working with icd codes and comorbidities

Basic Info

Host: GitHub
Owner: mark-hoffmann
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 19.5 KB

Statistics

Stars: 27
Watchers: 1
Forks: 8
Open Issues: 2
Releases: 0

Topics

data-analysis icd python

Created over 8 years ago · Last pushed almost 5 years ago

Metadata Files

Readme License

README.rst

icd
===

.. image:: https://img.shields.io/pypi/v/icd.svg
    :target: https://pypi.python.org/pypi/icd
    :alt: Latest PyPI version

.. image:: https://travis-ci.org/mark-hoffmann/icd.png
   :target: https://travis-ci.org/mark-hoffmann/icd
   :alt: Latest Travis CI build status

.. image:: https://codecov.io/gh/mark-hoffmann/icd/branch/master/graph/badge.svg
   :target: https://codecov.io/gh/mark-hoffmann/icd
   :alt: Coverage

Tools for working with icd codes and comorbidities. This was inspired by the R package, `icd `_, as a simple python implementation for some of the base functionality. This has been benchmarked to be able to hand large datasets (tens of millions of rows) for various icd code manipulation tasks.

If you would be interested in helping contribute to this repository, feel free to `send me an email `_.

Usage
-----
Basic usage includes two very common tasks while dealing with icd code data. 

- Transforming datasets from a long to wide format
- Processing icd codes for known comorbidity mappings

|
|

**Transforming from long to wide**


Data is commonly in a long format that may have a key for an individual such as *person_id* with many claims *claim_id* belonging to it. 

For example:

+------------+------------+-----------+------------+------------+
| claim_id   | person_id  | icd_cd_1  |  icd_cd_2  |  icd_cd_3  |
+============+============+===========+============+============+
|    001     |    A       | code_6    |  code_2    |            |
+------------+------------+-----------+------------+------------+
|    002     |    A       | code_8    |            |            |
+------------+------------+-----------+------------+------------+
|    003     |    A       | code_3    |  code_2    |  code_6    |
+------------+------------+-----------+------------+------------+
|    004     |    B       | code_1    |            |            |
+------------+------------+-----------+------------+------------+
|    005     |    B       | code_2    |  code_3    |            |
+------------+------------+-----------+------------+------------+
|    006     |    C       | code_4    |  code_2    |  code_5    |
+------------+------------+-----------+------------+------------+

For easier processing, we must transform the table into a more collapsed version. The number of *icd* columns then becomes the maximum unique codes for any given *person_id*.

Such as:

+------------+-----------+------------+------------+------------+
| person_id  | icd_cd_1  |  icd_cd_2  |  icd_cd_3  |   icd_cd_4 |
+============+===========+============+============+============+
|    A       |  code_6   | code_2     |  code_8    |    code_3  |
+------------+-----------+------------+------------+------------+
|    B       |  code_1   | code_2     |  code_3    |            |
+------------+-----------+------------+------------+------------+
|    C       |  code_4   | code_2     |  code_5    |            |
+------------+-----------+------------+------------+------------+

To accomplish this task, simply use the function *long_to_short_transformation* as such:

.. code-block:: python
  
  import pandas as pd 
  import icd

  data = {"person_id":[1,1,1,2,2,3],
           "dx_1":["F11","E40","","F32","C77","G10"],
           "dx_2":["F1P","E400","","F322","C737",""]}
  df = pd.DataFrame.from_dict(data)
  icd.long_to_short_transformation(df,"person_id",["dx_1","dx_2"])

Where *df* is your pandas dataframe, *"person_id"* is the column you want to roll up on, and *["dx_1","dx_2"]* is the array of columns that contain icd codes.

It is important to note that even if you only have one icd column, it **must still be an array**. Also, you must **impute NaN values** to be an **empty string** such as "".

The function will return a new dataframe with index of *person_id*, a column of *person_id*, as well as as many unique columns as needed in the following form *icd_0*, *icd_1*, ... , *icd_n*.

|
|

**Processing icd codes to known comorbidities**

The second task has to do with actually mapping comorbidities to these icd codes. For this, you can use the function *icd_to_comorbidities*. This can be seen from going from a table of the format:

+------------+-----------+------------+------------+------------+
| person_id  | icd_cd_1  |  icd_cd_2  |  icd_cd_3  |   icd_cd_4 |
+============+===========+============+============+============+
|    A       |  code_6   | code_2     |  code_8    |    code_3  |
+------------+-----------+------------+------------+------------+
|    B       |  code_1   | code_2     |  code_3    |            |
+------------+-----------+------------+------------+------------+
|    C       |  code_4   | code_2     |  code_5    |            |
+------------+-----------+------------+------------+------------+

To the format:

+------------+-----------+------------+------------+------------+
| person_id  | comorb_1  |  comorb_2  |  comorb_3  |   comorb_4 |
+============+===========+============+============+============+
|    A       |  True     | False      |  True      |    True    |
+------------+-----------+------------+------------+------------+
|    B       |  False    | True       |  False     |     False  |
+------------+-----------+------------+------------+------------+
|    C       |  False    | False      |  False     |   False    |
+------------+-----------+------------+------------+------------+

This comorbidity mapping is pending on the mapping used.

|

An example of doing is is carried out as such:

.. code-block:: python

  import pandas as pd
  import icd

  df = pd.DataFrame.from_dict({'icd_0': {1: 'F1P', 2: 'F322', 3: ''},
		               'icd_1': {1: 'F11', 2: 'C77', 3: 'G10'},
			       'icd_2': {1: '', 2: 'C737', 3: ''},
			       'icd_3': {1: 'E400', 2: 'F32', 3: ''},
		               'icd_4': {1: 'E40', 2: '', 3: ''},
			       'person_id': {1: 1, 2: 2, 3: 3}})
  icd.icd_to_comorbidities(df, "person_id", ["icd_0","icd_1","icd_2","icd_3","icd_4"])

|

The default default mapping is the *quan_elixhauser10*, which is a transcription by Quan of the original Elixhauser icd 9 comorbidities in the `following paper `_.

Optionally, you can provide a *mapping* keyword argument as such:

.. code-block:: python

  icd.icd_to_comorbidities(df, "person_id", ["icd_0","icd_1","icd_2","icd_3","icd_4"], mapping="quan_elixhauser10")

The currently supported mappings are the default *"quan_elixhauser10"* as well as the *"charlson10"* mapping as referenced from the same paper above. Additionally, you can find them laid out in SAS code `here `_.


If you want to to create a custom comborbidity mapping, simply pass in a dict for the mapping argument instead of a supported keyword string. The dict must follow the following format as such:

.. code-block:: python

  custom_mapping = {"paraplegia_and_hemiplegia":['G81','G82','G041','G114','G801','G802','G830','G831','G832','G833','G834','G839'],
				    "renal_disease":['N18','N19','N052','N053','N054','N055','N056','N057','N250','I120','I131','N032','N033','N034','N035','N036','N037','Z490','Z491','Z492','Z940','Z992'],
				    "cancer":['C00','C01','C02','C03','C04','C05','C06','C07','C08','C09','C10','C11','C12','C13','C14','C15','C16','C17','C18','C19','C20','C21','C22','C23','C24','C25','C26','C30','C31','C32','C33','C34','C37','C38','C39','C40','C41','C43','C45','C46','C47','C48','C49','C50','C51','C52','C53','C54','C55','C56','C57','C58','C60','C61','C62','C63','C64','C65','C66','C67','C68','C69','C70','C71','C72','C73','C74','C75','C76','C81','C82','C83','C84','C85','C88','C90','C91','C92','C93','C94','C95','C96','C97'],
				    "moderate_or_sever_liver_disease":['K704','K711','K721','K729','K765','K766','K767','I850','I859','I864','I982'],
				    "metastitic_carcinoma":['C77','C78','C79','C80'],
				    "aids_hiv":['B20','B21','B22','B24']
				  }
  icd.icd_to_comorbidities(df, "person_id", ["icd_0","icd_1","icd_2","icd_3","icd_4"], mapping=custom_mapping)

The above function returns a new DataFrame with the *person_id* values as the index, a column of whatever "person_id" string is passed in, along with a column for every comorbidity populated with either **True** or **False**.

Installation
------------

icd can easily be downloaded from Pypi package index via the following:

.. code-block:: python

  pip install icd



Requirements
^^^^^^^^^^^^
- `pandas `_

Compatibility
-------------

icd currently supports Python 3.4, 3.5, and 3.6

Licence
-------

`MIT `_

Authors
-------

`icd` was written by `Mark Hoffmann `_.

Owner

Name: Mark
Login: mark-hoffmann
Kind: user
Location: Chicago, IL
Company: Meta

Website: https://markkhoffmann.com
Twitter: markkhoffmann
Repositories: 6
Profile: https://github.com/mark-hoffmann

AI Engineer Meta | Previously --- Chief Architect - Ubiety, ML/AI - NASA Jet Propulsion Laboratory / DARPA

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

Committers

Last synced: over 2 years ago

All Time

Total Commits: 20
Total Committers: 4
Avg Commits per committer: 5.0
Development Distribution Score (DDS): 0.3

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
mark-hoffmann	m**n@3**m	14
Mark Hoffmann	m**n@h**m	3
Mark Hoffmann	m**n@g**m	2
Mark	m****n	1

Committer Domains (Top 20 + Academic)

homeaware.com: 1 38thstreetstudios.com: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 3
Total pull requests: 1
Average time to close issues: over 2 years
Average time to close pull requests: less than a minute
Total issue authors: 3
Total pull request authors: 1
Average comments per issue: 1.67
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

skaragou (1)
supatuffpinkpuff (1)
fonnesbeck (1)

Pull Request Authors

mark-hoffmann (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 153 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 4
Total maintainers: 1

pypi.org: icd

Tools for working with icd codes and comorbidities

Homepage: https://github.com/mark-hoffmann/icd
Documentation: https://icd.readthedocs.io/
License: mit
Latest release: 0.1.3
published almost 5 years ago

Versions: 4
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 153 Last month

Rankings

Dependent packages count: 10.0%

Forks count: 11.9%

Stargazers count: 12.0%

Average: 16.1%

Dependent repos count: 21.7%

Downloads: 25.0%

Maintainers (1)

mark-hoffmann

Last synced: 7 months ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

icd

Science Score: 10.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.rst

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: icd

Rankings

Maintainers (1)

Dependencies