Science Score: 39.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 6 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: litalbarkai
- License: apache-2.0
- Language: C++
- Default Branch: main
- Size: 2.59 MB
Statistics
- Stars: 6
- Watchers: 1
- Forks: 4
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
Open Redatam 
About
Open Redatam is an open source software for extracting raw information from REDATAM databases. It was created to recover information of REDATAM databases for statistical analysis using standard tools such as SPSS, STATA, R, etc.
Please read our article for the full context of this project (Open Access):
Vargas Sepúlveda, Mauricio and Barkai, Lital. 2025. "The REDATAM format and its challenges for data access and information creation in public policy." Data & Policy 7 (January): e18. https://dx.doi.org/10.1017/dap.2025.4.
This software is a full C++ ground-up rewrite of the original Redatam Converter created by Pablo de Grande and written in C#. Rewriting the original C# code in C++ allows for better portability and the ability to use the program within R, Python, and other languages.
For R and Python users (otherwise skip this section)
If you use R: We have an R package 📦 that allows to directly read REDATAM databases in R.
If you use Python: We have a Python package 📦 that allows to directly read REDATAM databases in Python.
If you only need the processed data: We provide tidy microdata 📊 in R and CSV format.
Usage
For a given census, such as the Chilean Census 2017, the following options are equivalent.
Desktop app

Command line
bash
redatam input-dir/dictionary.dicx output-dir
The REDATAM database will be exported to CSV files and an XML summary of the tables and variables.
Installation
From binaries
Ubuntu
Download the DEB file and install it. This will install redatam and redatamgui in /usr/local/bin/ with the necessary dependencies and a desktop entry. The installer creates an entry in the Applications folder for Open Redatam GUI and you can also use the command line tool from the Terminal by calling redatam.
Mac
Download the DMG file. The image contains redatam and redatamgui, which you can copy to "Applications". The app is not verified because it does not make sense for us to pay 200 USD/year just to sign one image. You can install it anyways this by going to the System Settings in the Apple menu and then:
- Select Privacy & Security.
- Scroll down to the Security section.
- Click "Open Anyway" beneath the message "RedatamGUI was blocked for use because it is not from an identified developer."
Windows
Download the EXE file and install it. The installer creates an entry in the Start Menu for Open Redatam GUI and you can also use the command line tool from Power Shell by calling C:\Program Files (x86)\Open Redatam\redatam.exe.
From source
The software requires C++11 or higher to compile.
On Linux, run the following commands:
bash
git clone https://github.com/pachadotdev/open-redatam.git
sudo apt-get update
sudo apt-get install -y qtbase5-dev qtbase5-dev-tools qt5-qmake
make
Then run ./redatam or ./redatamgui.
On Mac, run the following commands:
bash
git clone https://github.com/pachadotdev/open-redatam.git
brew install qt@5
export PATH="/opt/homebrew/opt/qt@5/bin:$PATH"
export LDFLAGS="-L/opt/homebrew/opt/qt@5/lib"
export CPPFLAGS="-I/opt/homebrew/opt/qt@5/include"
export PKG_CONFIG_PATH="/opt/homebrew/opt/qt@5/lib/pkgconfig"
make
Then run ./redatam or ./redatamgui.
On Windows, you need Visual Studio Code 2019 with C++ development tools and Qt 5 for MSVC 2019 64-bit.
Then run the following commands:
```bash git clone https://github.com/pachadotdev/open-redatam.git
cd redatamwindows cmake -G "Visual Studio 16 2019" . cmake --build . --config Release cmake --install . --config Release
cd redatamguiwindows cmake . -G "Visual Studio 16 2019" cmake --build . --config Release "C:\Qt\5.15.2\msvc2019_64\bin\windeployqt.exe" --release .\Release\redatamgui.exe cd .. ```
Validation against IPUMS data
A simple validation exercise is to compare counts and percentages by sex and age groups against the IPUMS data to determine if the data was converted correctly.
Some census files feature a sample. For this exercise, we used the dictionaries for the countries containing the universe in DIC and DICX formats.
Bolivia 2012
DIC and DICX parsing leads to the same results.
r
sex n_ipums pct_ipums n_rdtm pct_rdtm n_diff pct_diff
<int+lbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 1 [Male] 5020766. 49.9 5019447 49.9 1319. 0.00839
2 2 [Female] 5040041. 50.1 5040409 50.1 -368. -0.00839
r
age2 n_ipums pct_ipums n_rdtm pct_rdtm
<int+lbl> <dbl> <dbl> <int> <dbl>
1 1 [0 to 4] 1091540. 10.8 1089948 10.8
2 2 [5 to 9] 988984. 9.83 992654 9.87
3 3 [10 to 14] 1076853. 10.7 1078164 10.7
4 4 [15 to 19] 1107708. 11.0 1106284 11.0
5 12 [20 to 24] 981961. 9.76 978606 9.73
6 13 [25 to 29] 812924. 8.08 817395 8.13
7 14 [30 to 34] 754287. 7.50 753831 7.49
8 15 [35 to 39] 631461. 6.28 631032 6.27
9 16 [40 to 44] 543112. 5.40 544701 5.41
10 17 [45 to 49] 461436. 4.59 461984 4.59
11 18 [50 to 54] 404489. 4.02 403220 4.01
12 19 [55 to 59] 322182. 3.20 324025 3.22
13 20 [60 to 64] 280089. 2.78 279867 2.78
14 21 [65 to 69] 207331. 2.06 204529 2.03
15 22 [70 to 74] 152851. 1.52 152423 1.52
16 23 [75 to 79] 99799. 0.992 99276 0.987
17 24 [80 to 84] 81862. 0.814 81095 0.806
18 25 [85+] 61937. 0.616 60822 0.605
Chile 2017
DIC and DICX parsing leads to the same results.
r
sex n_ipums pct_ipums n_rdtm pct_rdtm n_diff pct_diff
<int+lbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 1 [Male] 8607570 49.0 8601989 48.9 5581 0.0457
2 2 [Female] 8961420 51.0 8972014 51.1 -10594 -0.0457
r
age2 n_ipums pct_ipums n_rdtm pct_rdtm n_diff pct_diff
<int+lbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 1 [0 to 4] 1157530 6.59 1166146 6.64 -8616 -0.0471
2 2 [5 to 9] 1215960 6.92 1210189 6.89 5771 0.0348
3 3 [10 to 14] 1147610 6.53 1147415 6.53 195 0.00297
4 4 [15 to 19] 1236800 7.04 1244697 7.08 -7897 -0.0429
5 12 [20 to 24] 1384320 7.88 1387822 7.90 -3502 -0.0177
6 13 [25 to 29] 1475500 8.40 1474150 8.39 1350 0.0101
7 14 [30 to 34] 1294480 7.37 1293637 7.36 843 0.00690
8 15 [35 to 39] 1202670 6.85 1207777 6.87 -5107 -0.0271
9 16 [40 to 44] 1195460 6.80 1198503 6.82 -3043 -0.0154
10 17 [45 to 49] 1162380 6.62 1160763 6.61 1617 0.0111
11 18 [50 to 54] 1186210 6.75 1184954 6.74 1256 0.00907
12 19 [55 to 59] 1047910 5.96 1047779 5.96 131 0.00245
13 20 [60 to 64] 849470 4.84 846915 4.82 2555 0.0159
14 21 [65 to 69] 656700 3.74 653002 3.72 3698 0.0221
15 22 [70 to 74] 518120 2.95 515909 2.94 2211 0.0134
16 23 [75 to 79] 365350 2.08 363589 2.07 1761 0.0106
17 24 [80 to 84] 240640 1.37 239446 1.36 1194 0.00718
18 25 [85+] 231880 1.32 231310 1.32 570 0.00362
Dominican Republic 2002
DIC and DICX parsing leads to the same results.
r
sex n_ipums pct_ipums n_rdtm pct_rdtm n_diff pct_diff
<int+lbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 1 [Male] 4270670 49.8 4265215 49.8 5455 -0.0149
2 2 [Female] 4305390 50.2 4297326 50.2 8064 0.0149
r
age2 n_ipums pct_ipums n_rdtm pct_rdtm n_diff pct_diff
<dbl+lbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 1 [0 to 4] 977620 11.4 973644 11.4 3976 0.0284
2 2 [5 to 9] 974680 11.4 971881 11.4 2799 0.0147
3 3 [10 to 14] 960410 11.2 959338 11.2 1072 -0.00516
4 4 [15 to 19] 839980 9.79 838239 9.79 1741 0.00487
5 12 [20 to 24] 784940 9.15 785802 9.18 -862 -0.0245
6 13 [25 to 29] 687440 8.02 687785 8.03 -345 -0.0167
7 14 [30 to 34] 649820 7.58 646112 7.55 3708 0.0313
8 15 [35 to 39] 591470 6.90 590750 6.90 720 -0.00248
9 16 [40 to 44] 475180 5.54 476647 5.57 -1467 -0.0259
10 17 [45 to 49] 379460 4.42 380028 4.44 -568 -0.0136
11 18 [50 to 54] 331210 3.86 330713 3.86 497 -0.000293
12 19 [55 to 59] 233910 2.73 233976 2.73 -66 -0.00508
13 20 [60 to 64] 209800 2.45 207933 2.43 1867 0.0179
14 21 [65 to 69] 157940 1.84 158365 1.85 -425 -0.00787
15 22 [70 to 74] 135330 1.58 136068 1.59 -738 -0.0111
16 23 [75 to 79] 77990 0.909 77871 0.909 119 -0.0000460
17 24 [80 to 84] 54960 0.641 54402 0.635 558 0.00550
18 25 [85+] 53660 0.626 52987 0.619 673 0.00687
19 98 [Unknown] 260 0.00303 NA NA NA NA
Ecuador 2010
DIC and DICX parsing leads to the same results.
r
sex n_ipums pct_ipums n_rdtm pct_rdtm n_diff pct_diff
<int+lbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 1 [Male] 7178200 49.6 7177683 49.6 517 0.00757
2 2 [Female] 7304130 50.4 7305816 50.4 -1686 -0.00757
r
age2 n_ipums pct_ipums n_rdtm pct_rdtm
<int+lbl> <dbl> <dbl> <int> <dbl>
1 1 [0 to 4] 1459980 10.1 1462277 10.1
2 2 [5 to 9] 1530380 10.6 1526806 10.5
3 3 [10 to 14] 1542170 10.6 1539342 10.6
4 4 [15 to 19] 1421490 9.82 1419537 9.80
5 12 [20 to 24] 1288560 8.90 1292126 8.92
6 13 [25 to 29] 1203030 8.31 1200564 8.29
7 14 [30 to 34] 1066870 7.37 1067289 7.37
8 15 [35 to 39] 941870 6.50 938726 6.48
9 16 [40 to 44] 819470 5.66 819002 5.65
10 17 [45 to 49] 753060 5.20 750141 5.18
11 18 [50 to 54] 608370 4.20 610132 4.21
12 19 [55 to 59] 513910 3.55 515893 3.56
13 20 [60 to 64] 397050 2.74 400759 2.77
14 21 [65 to 69] 322970 2.23 323817 2.24
15 22 [70 to 74] 238280 1.65 240091 1.66
16 23 [75 to 79] 164550 1.14 165218 1.14
17 24 [80 to 84] 115750 0.799 115552 0.798
18 25 [85+] 94570 0.653 96227 0.664
El Salvador 2007
DIC and DICX parsing leads to the same results
r
sex n_ipums pct_ipums n_rdtm pct_rdtm n_diff pct_diff
<int+lbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 1 [Male] 2718590 47.3 2719371 47.3 -781 -0.00970
2 2 [Female] 3025050 52.7 3024742 52.7 308 0.00970
r
age2 n_ipums pct_ipums n_rdtm pct_rdtm n_diff pct_diff
<dbl+lbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 1 [0 to 4] 556940 9.70 555893 9.68 1047 0.0190
2 2 [5 to 9] 683410 11.9 684727 11.9 -1317 -0.0219
3 3 [10 to 14] 706600 12.3 706347 12.3 253 0.00542
4 4 [15 to 19] 599880 10.4 600565 10.5 -685 -0.0111
5 12 [20 to 24] 485100 8.45 486542 8.47 -1442 -0.0244
6 13 [25 to 29] 459660 8.00 457890 7.97 1770 0.0315
7 14 [30 to 34] 400100 6.97 402249 7.00 -2149 -0.0368
8 15 [35 to 39] 355850 6.20 353147 6.15 2703 0.0476
9 16 [40 to 44] 303920 5.29 303631 5.29 289 0.00547
10 17 [45 to 49] 251620 4.38 252122 4.39 -502 -0.00838
11 18 [50 to 54] 217660 3.79 215734 3.76 1926 0.0338
12 19 [55 to 59] 182920 3.18 183075 3.19 -155 -0.00244
13 20 [60 to 64] 150930 2.63 151864 2.64 -934 -0.0160
14 21 [65 to 69] 125590 2.19 125157 2.18 433 0.00772
15 22 [70 to 74] 97110 1.69 97457 1.70 -347 -0.00590
16 23 [75 to 79] 75400 1.31 75984 1.32 -584 -0.0101
17 24 [80 to 84] 46770 0.814 46870 0.816 -100 -0.00167
18 25 [85+] 44180 0.769 44859 0.781 -679 -0.0118
Peru 2017
DIC and DICX parsing leads to the same results.
r
sex n_ipums pct_ipums n_rdtm pct_rdtm n_diff pct_diff
<int+lbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 1 [Male] 13659450 49.7 13622640 49.7 36810 0.0494
2 2 [Female] 13799500 50.3 13789517 50.3 9983 -0.0494
r
age2 n_ipums pct_ipums n_rdtm pct_rdtm n_diff pct_diff
<dbl+lbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 1 [0 to 4] 2730740 9.94 2724620 9.94 6120 0.00535
2 2 [5 to 9] 2689060 9.79 2683928 9.79 5132 0.00200
3 3 [10 to 14] 2947320 10.7 2948985 10.8 -1665 -0.0244
4 4 [15 to 19] 2732690 9.95 2730785 9.96 1905 -0.0100
5 12 [20 to 24] 2542550 9.26 2531554 9.24 10996 0.0243
6 13 [25 to 29] 2304810 8.39 2291865 8.36 12945 0.0329
7 14 [30 to 34] 2073730 7.55 2074691 7.57 -961 -0.0164
8 15 [35 to 39] 1880710 6.85 1871852 6.83 8858 0.0206
9 16 [40 to 44] 1653130 6.02 1642059 5.99 11071 0.0301
10 17 [45 to 49] 1374440 5.01 1371385 5.00 3055 0.00260
11 18 [50 to 54] 1152430 4.20 1152647 4.20 -217 -0.00796
12 19 [55 to 59] 885160 3.22 892143 3.25 -6983 -0.0310
13 20 [60 to 64] 731310 2.66 730956 2.67 354 -0.00325
14 21 [65 to 69] 579310 2.11 579302 2.11 8 -0.00357
15 22 [70 to 74] 452130 1.65 452998 1.65 -868 -0.00598
16 23 [75 to 79] 342750 1.25 343999 1.25 -1249 -0.00669
17 24 [80 to 84] 200710 0.731 203636 0.743 -2926 -0.0119
18 25 [85+] 185970 0.677 184752 0.674 1218 0.00329
Uruguay 2011
DIC and DICX parsing leads to the same results.
r
sex n_ipums pct_ipums n_rdtm pct_rdtm n_diff pct_diff
<int+lbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 1 [Male] 1577700 48.0 1577416 48.0 284 0.0324
2 2 [Female] 1706550 52.0 1708461 52.0 -1911 -0.0324
r
age2 n_ipums pct_ipums n_rdtm pct_rdtm n_diff pct_diff
<int+lbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 1 [0 to 4] 219860 6.69 220345 6.71 -485 -0.0114
2 2 [5 to 9] 236080 7.19 238068 7.25 -1988 -0.0569
3 3 [10 to 14] 255570 7.78 256552 7.81 -982 -0.0260
4 4 [15 to 19] 263090 8.01 261691 7.96 1399 0.0465
5 12 [20 to 24] 240700 7.33 241006 7.33 -306 -0.00568
6 13 [25 to 29] 229820 7.00 228385 6.95 1435 0.0471
7 14 [30 to 34] 232930 7.09 233365 7.10 -435 -0.00973
8 15 [35 to 39] 221690 6.75 222521 6.77 -831 -0.0219
9 16 [40 to 44] 203740 6.20 203098 6.18 642 0.0226
10 17 [45 to 49] 198160 6.03 198773 6.05 -613 -0.0157
11 18 [50 to 54] 195040 5.94 194565 5.92 475 0.0174
12 19 [55 to 59] 172560 5.25 173007 5.27 -447 -0.0110
13 20 [60 to 64] 151050 4.60 150775 4.59 275 0.0106
14 21 [65 to 69] 132090 4.02 131563 4.00 527 0.0180
15 22 [70 to 74] 111500 3.39 112395 3.42 -895 -0.0256
16 23 [75 to 79] 94810 2.89 93659 2.85 1151 0.0365
17 24 [80 to 84] 69880 2.13 70505 2.15 -625 -0.0180
18 25 [85+] 55680 1.70 55604 1.69 76 0.00315
Code structure
Readers
ByteArrayReader:
- Provides basic file reading and byte manipulation.
- Used by: BitArrayReader, FuzzyEntityParser, Variable.
BitArrayReader:
- Splits binary data into variable-sized chunks.
- Used by: Variable.
FuzzyEntityParser:
- Reads and parses .dic files.
- Depends on: ByteArrayReader.
- Used by: RedatamDatabase.
XMLParser:
- Reads and parses.dicxfiles.
- Depends on: pugixml (src/vendor).
- Used by:RedatamDatabase`.
Entities
Variable:
- Represents a data structure within the system.
- Depends on: ByteArrayReader, BitArrayReader.
- Used by: Entity.
Entity:
- A higher-level representation grouping Variable and other metadata.
- Depends on: Variable.
- Used by: RedatamDatabase.
Exporters
CSVExporter:
- Converts entities to CSV-compatible structures.
- Depends on: Entity, ParentIDCalculator and utils.
- Used by: RedatamDatabase.
XMLExporter:
- Converts entities and their variables into an XML-compatible structure.
- Depends on: Entity, utils, and pugixml (src/vendor).
- Used by: RedatamDatabase.
ParentIDCalculator:
- Calculates parent IDs for a child Entity based on row data.
- Depends on: Entity.
- Used by: CSVExporter.
RListExporter (R package):
- Converts entities to R-compatible structures.
- Depends on: Entity.
- Used by: RedatamDatabase.
PyDictExporter (Python package):
- Converts entities to Python-compatible structures.
- Depends on: Entity.
- Used by: RedatamDatabase.
Database
RedatamDatabase:
- The central orchestrator that manages entities and interacts with parsers and exporters.
- Depends on: Entity, FuzzyEntityParser, XMLParser, RListExporter, PyDictExporter.
Donating
If you find this software useful, please consider donating. You can donate via BuyMeACoffee.
References
De Grande, Pablo. 2016. “El formato Redatam.” Estudios demográficos y urbanos 31 (3): 811–32. Ruggles, Steven, Lara Cleveland, Rodrigo Lovaton, Sula Sarkar, Matthew Sobek, Derek Burk, Dan Ehrlich, Quinn Heimann, and Jane Lee. 2024. “Integrated Public Use Microdata Series (IPUMS).” 2024. https://international.ipums.org/international/.
Credits
Open Redatam was created and is supported by Lital Barkai (barkailital@gmail.com).
The tests, installation instructions and R and Python package were created by Mauricio "Pacha" Vargas Sepulveda (m.sepulveda@mail.utoronto.ca)
The original converter was created by Pablo De Grande. See here for more information.
This project uses pugixml created by Arseny Kapoulkine to structure a part of the output data.
The author wishes to acknowledge the statistical offices that provided the underlying data used for the validation: National Institute of Statistics, Bolivia; National Institute of Statistics, Chile; National Institute of Statistics and Censuses, Ecuador; National Institute of Statistics, Uruguay.
Owner
- Login: litalbarkai
- Kind: user
- Repositories: 1
- Profile: https://github.com/litalbarkai
GitHub Events
Total
- Issues event: 10
- Watch event: 5
- Member event: 1
- Issue comment event: 13
- Push event: 9
- Pull request review event: 1
- Pull request event: 14
- Fork event: 3
- Create event: 2
Last Year
- Issues event: 10
- Watch event: 5
- Member event: 1
- Issue comment event: 13
- Push event: 9
- Pull request review event: 1
- Pull request event: 14
- Fork event: 3
- Create event: 2
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 8
- Total pull requests: 19
- Average time to close issues: 22 days
- Average time to close pull requests: 8 days
- Total issue authors: 3
- Total pull request authors: 3
- Average comments per issue: 1.75
- Average comments per pull request: 0.63
- Merged pull requests: 13
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 8
- Pull requests: 19
- Average time to close issues: 22 days
- Average time to close pull requests: 8 days
- Issue authors: 3
- Pull request authors: 3
- Average comments per issue: 1.75
- Average comments per pull request: 0.63
- Merged pull requests: 13
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- pachadotdev (5)
- nlebovits (2)
- Aloniss (1)
Pull Request Authors
- pachadotdev (10)
- litalbarkai (7)
- nlebovits (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 603 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 5
- Total maintainers: 1
cran.r-project.org: redatam
Import 'REDATAM' Files
- Homepage: https://github.com/litalbarkai/open-redatam
- Documentation: http://cran.r-project.org/web/packages/redatam/redatam.pdf
- License: Apache License (≥ 2)
-
Latest release: 2.1.2
published about 1 year ago