Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.0%) to scientific vocabulary
Keywords
Repository
A Stata package for Sankey diagrams
Basic Info
Statistics
- Stars: 28
- Watchers: 3
- Forks: 6
- Open Issues: 12
- Releases: 16
Topics
Metadata Files
README.md
Installation | Syntax | Citation guidelines | Examples | Feedback | Change log
sankey v1.9
(24 Jun 2025)
This package allows users to draw Sankey plots in Stata. It is based on the Sankey Guide published on the Stata Guide on Medium on October 2021.
Installation
The package can be installed via SSC or GitHub. The GitHub version, might be more recent due to bug fixes, feature updates etc, and may contain syntax improvements and changes in default values. See version numbers below. Eventually the GitHub version is published on SSC.
SSC (v1.81):
ssc install sankey, replace
GitHub (v1.9):
net install sankey, from("https://raw.githubusercontent.com/asjadnaqvi/stata-sankey/main/installation/") replace
The palettes package is required to run this command:
ssc install palettes, replace
ssc install colrspace, replace
ssc install graphfunctions, replace
Even if you have these packages installed, please check for updates: ado update, update.
If you want to make a clean figure, then it is advisable to load a clean scheme. These are several available and I personally use the following:
ssc install schemepack, replace
set scheme white_tableau
You can also push the scheme directly into the graph using the scheme(schemename) option. See the help file for details or the example below.
I also prefer narrow fonts in figures with long labels. You can change this as follows:
graph set window fontface "Arial Narrow"
Syntax
The syntax for the latest version is as follows:
stata
sankey value [if] [in] [weight], from(var) to(var)
[ by(var) palette(str) colorby(layer|level) colorvar(var) stock stock2 colorvarmiss(str) colorboxmiss(str)
smooth(1-8) gap(num) recenter(mid|bot|top) ctitles(list) ctgap(num) ctsize(num) ctposition(bot|top)
ctcolor(str) ctwrap(num) labangle(str) labsize(str) labposition(str) labgap(str) showtotal labprop labscale(num)
valsize(str) valcondition(num) format(str) valgap(str) novalues valprop valscale(num)
novalright novalleft nolabels sort1(value|name[, reverse]) sort2(value|order[, reverse]) align fill
lwidth(str) lcolor(str) alpha(num) offset(num) boxwidth(str) percent wrap(num) * ]
See the help file help sankey for details.
The most basic use is as follows:
sankey value, from(var1) to(var2) [by(level)]
where var1 and var2 are source and destination variables respectively against which the value variable is plotted. The by() variable defines the levels and is optional since v1.72.
Citation guidelines
Software packages take countless hours of programming, testing, and bug fixing. If you use this package, then a citation would be highly appreciated.
The SSC citation is recommended. Please note that the GitHub version might be newer than the SSC version.
Examples
Get the example data from GitHub:
stata
import excel using "https://github.com/asjadnaqvi/stata-sankey/blob/main/data/sankey_example2.xlsx?raw=true", clear first
Let's test the sankey command:
stata
sankey value, from(source) to(destination) by(layer)

Smooth
sankey value, from(source) to(destination) by(layer) smooth(2)

sankey value, from(source) to(destination) by(layer) smooth(8)

Re-center
sankey value, from(source) to(destination) by(layer) recenter(bot)

sankey value, from(source) to(destination) by(layer) recenter(top)

Gaps
sankey value, from(source) to(destination) by(layer) gap(0)

sankey value, from(source) to(destination) by(layer) gap(20)

Values
sankey value, from(source) to(destination) by(layer) noval showtot

Sort (v1.6)
sankey value, from(source) to(destination) by(layer) sort1(name)

sankey value, from(source) to(destination) by(layer) sort1(value)

sankey value, from(source) to(destination) by(layer) sort1(value) sort2(value)

sankey value, from(source) to(destination) by(layer) sort1(name, reverse) sort2(value)

sankey value, from(source) to(destination) by(layer) sort1(name, reverse) sort2(value, reverse)

sankey value, from(source) to(destination) by(layer) sort1(name, reverse) sort2(order)

sankey value, from(source) to(destination) by(layer) sort1(name, reverse) sort2(order, reverse)

Custom sorting on a value:
```stata gen source2 = . gen destination2 = .
foreach x in source destination {
replace x'2 = 1 ifx'=="Blog"
replace x'2 = 2 ifx'=="LinkedIn"
replace x'2 = 3 ifx'=="Twitter"
replace x'2 = 4 ifx'=="Direct"
replace x'2 = 5 ifx'=="App"
replace x'2 = 6 ifx'=="Medium"
replace x'2 = 7 ifx'=="Website"
replace x'2 = 8 ifx'=="Homepage"
replace x'2 = 9 ifx'=="Total"
replace x'2 = 10 ifx'=="Google"
replace x'2 = 11 ifx'=="Facebook"
}
lab de labels 1 "Blog" 2 "LinkedIn" 3 "Twitter" 4 "Direct" 5 "App" 6 "Medium" 7 "Website" 8 "Homepage" 9 "Total" 10 "Google" 11 "Facebook", replace
lab val source2 labels lab val destination2 labels
sankey value, from(source2) to(destination2) by(layer) ```

boxwidth
sankey value, from(source) to(destination) by(layer) boxwid(5)

valcond
sankey value, from(source) to(destination) by(layer) valcond(200)

sankey value, from(source) to(destination) by(layer) valcond(300)

Palettes
sankey value, from(source) to(destination) by(layer) palette(CET C6)

sankey value, from(source) to(destination) by(layer) colorby(level)

color by variable (v1.4)
``` gen trace1 = 1 if source=="App"
sankey value, from(source) to(destination) by(layer) colorvar(trace1) ```

``` cap drop trace2 gen trace2 = . replace trace2 = 1 if source=="App" & destination=="App" & layer==0 replace trace2 = 2 if source=="App" & destination=="App" & layer==1 replace trace2 = 3 if source=="App" & destination=="App" & layer==2 replace trace2 = 4 if source=="App" & destination=="Total" & layer==3
sankey value, from(source) to(destination) by(layer) colorvar(trace2) ```

sankey value, from(source) to(destination) by(layer) colorvar(trace2) palette(Oranges)

sankey value, from(source) to(destination) by(layer) colorvar(trace2) palette(Blues) ///
colorvarmiss(gs13) colorboxmiss(gs13)

sankey value, from(source) to(destination) by(layer) colorvar(trace2) ///
palette(blue*0.1 blue*0.3 blue*0.5 blue*0.7) colorvarmiss(gs13) colorboxmiss(gs13)

column titles (v1.4)
sankey value, from(source) to(destination) by(layer) ctitles(Cat1 Cat2 Cat3 Cat4 Cat5)

sankey value, from(source) to(destination) by(layer) ctitles(Cat1 Cat2 Cat3 Cat4 Cat5) ctg(-100)

sankey value, from(source) to(destination) by(layer) ctitles("Cat 1" "Cat 2" "Cat 3" "Cat 4" "Cat 5") ctg(-100)

sankey value, from(source) to(destination) by(layer) ctitles("Cat 1" "Cat 2" "Cat 3" "Cat 4" "Cat 5") ctpos(top) ctg(100) recenter(top)

label rotation and offset
sankey value, from(source) to(destination) by(layer) noval showtot palette(CET C6) ///
laba(0) labpos(3) labg(-1) offset(10)

hide values and labels (v1.5)
sankey value, from(source) to(destination) by(layer) novalleft

sankey value, from(source) to(destination) by(layer) novalright

sankey value, from(source) to(destination) by(layer) noval

sankey value, from(source) to(destination) by(layer) nolabels

proportional values and labels (v1.5)
sankey value, from(source) to(destination) by(layer) valprop vals(2)

sankey value, from(source) to(destination) by(layer) labprop labs(2)


All together
sankey value, from(source) to(destination) by(layer) palette(CET C6) alpha(60) ///
labs(2.5) laba(0) labpos(3) labg(-1) offset(5) noval showtot ///
ctitles("Cat 1" "Cat 2" "Cat 3" "Cat 4" "Cat 5") ctg(-100) cts(3) ///
title("My sankey plot", size(6)) note("Made with the #sankey package.", size(2.2)) ///
xsize(2) ysize(1)

stocks (v1.6+)
stata
import excel using "https://github.com/asjadnaqvi/stata-sankey/blob/main/data/sankey_stocks.xlsx?raw=true", clear first
sankey value, from(source) to(destination) by(layer) xsize(2) ysize(1) showtotal
sankey value, from(source) to(destination) by(layer) xsize(2) ysize(1) showtotal stock
sankey value, from(source) to(destination) by(layer) xsize(2) ysize(1) showtotal stock2

v1.9
Load trade data by regions:
stata
use "https://github.com/asjadnaqvi/stata-sankey/blob/main/data/trade_sankey_example.dta?raw=true", clear
Generate the default Sankey:
stata
sankey value, from(ex_region) to(im_region)

Add better styling using the new options in v1.9:
stata
sankey value, from(ex_region) to(im_region) ///
format(%15.1fc) labprop smooth(8) palette(HCL intense) sort1(value) sort2(value) ///
labs(2.4) laba(0) labpos(9 3) labg(2) gap(5) noval showtot lw(none) ///
title("{fontface Merriweather Bold:Global trade in 2022 (USD millions)}", size(4)) ///
note("Source: COMTRADE BACI HS07 2022.", size(2)) ///
plotregion(margin(l+16 r+16 b+5)) ///
ctitle("{bf:Exporting region}" "{bf:Importing region}") ctwrap(8) ctgap(5) ///
xsize(2) ysize(1)

Feedback
Please open an issue to report errors, feature enhancements, and/or other requests.
Change log
v1.9 (24 Jun 2025)
- Option ctwrap() added to wrap title labels.
- Option ctgap() now takes on values based on percentage of total height. This makes it easier to relatively displace the title labels.
- Option labpos() now accepts lists of positions for each layer.
- X-axis was sometimes adding additional space due to some internal tolerance limit. This has been fixed.
- Minor bug fixes.
v1.81 (16 Oct 2024)
- Weights are now allowed. It is still advisable to prepare the data beforehand.
- wrap() now requires graphfunctions for label wrapping the respects word boundaries.
- Option stock2 added that collapses stocks on the right (incoming) and removes own flows. In contrast, stock collapses stocks on the left (out-going).
- Various code fixes should remove additional small bugs.
v1.8 (22 Sep 2024)
- Added option align to align flows. Works only if there is just one parent (still beta).
- Added option fill to extrapolate missing flows. Works only if there is just one parent (still beta).
- Added option n() to allow users to increase the number of points for generating the arcs. Default is 30.
- Quite a large code clean up so the command should run a bit faster.
v1.74 (11 Jun 2024)
- Added wrap() option for wrapping labels.
- Minor code cleanups.
v1.73 (16 Mar 2024)
- If the from() and to() variables have value labels, then the order of the value labels is respected. This allows the users to have full control of the order of the drawing of the layers through value labels (requested by Katie Naylor + others).
- The command now throws an error if from() and to() have different format types. Both have to be either string or numeric variables. This was necessary to implement in order to implement the above change.
- Minor code cleanups.
v1.72 (12 Feb 2024)
- Fixed labprop from wrong calculation the label sizes.
- valcond() now passes on to box labels. Was removed but has been put back in.
- by() changed to optional. Assumes one layer if not specified. This is mostly a quality of life improvement. A warning message is displayed to ensure that by() is not left out by mistake.
- ctsize() converted to string allow size names.
- ctcolor() added.
- Help file improved.
- Minor code cleanups
v1.71 (15 Jan 2024)
- Fixed a bug where numerical from() and to() variables with value labels were messing up the labels in the final figure (reported by Ian White).
v1.7 (06 Nov 2023)
- Fixed valcond() dropping bar values.
- Fixed ctitles() getting random colors. It now defaults to black.
- Added ctpos() option to change column title position.
- Added percent option which is still beta. Convert flows to percent values.
v1.61 (22 Jul 2023)
- saving() option added (requested by Anirban Basu).
- Minor fixes.
v1.6 (11 Jun 2023)
- Complete rewrite of the base routines. The code is 30% smaller but several times faster.
- The option sortby() split into sort1() and sort2() for clarity.
- Added support for numerical variables with value labels.
- Option stock added to collapse own flows (source = destination) to box heights (requested by Oras Alabas).
- Several code optimizations and minor bug fixes.
v1.51 (25 May 2023)
- Added background checks for from() and to() variable. This ensures that the code runs regardless of the variable types. Ideally both should be strings.
v1.5 (30 Apr 2023)
- Added laprop, titleprop, and labscale() for scaling values and labels.
- Added novalright, novalleft, nolabels options.
- Added sortby(., reverse) option.
- Help file improved in its layout.
v1.4 (23 Apr 2023) - Fixed major bugs with unbalanced panels. - Added column title options. - Added option to draw colors by variables. - Several bug fixes and improvements to the code.
v1.31 (04 Apr 2023) - Fixed the color of categories. Previous version was resulting in wrong color assignments.
v1.3 (26 Feb 2023)
- Node bundling added which align nodes in front of each other. This looks better especially if flows are passing through certain nodes.
- Option sortby() added that allows alphabetical sorting (sortby(name)) or numerical sorting sortby(value) (Thanks to Fabian Unterlass for detailed feedback).
- Option boxwdith() added to allow adjusting the width of node boxes.
v1.21 (15 Feb 2023)
- valcond() fixed.
- Error in gaps fixed.
v1.2 (02 Feb 2023) - Unbalanced Sankey's are now allowed. This means that incoming and outgoing layers do not necessarily have to be equal. Outgoing can be larger than incoming. - A category can now also start in the middle. - Various bug fixes.
v1.1 (13 Dec 2022)
- Option valformat() renamed to just format. This aligns it with standard Stata usages.
- A new option offset() added to displace x-axis on the right-hand side. Offset is given in percentage share of x-axis range. This allows rotated labels to be displaced properly.
- Checks for missing bilateral flow combinations. Hitting a non-flow combo was causing the code to crash.
v1.0 (08 Dec 2022) - Public release.
Owner
- Name: Asjad Naqvi
- Login: asjadnaqvi
- Kind: user
- Location: Vienna
- Company: WIFO
- Website: https://asjadnaqvi.github.io/
- Twitter: AsjadNaqvi
- Repositories: 52
- Profile: https://github.com/asjadnaqvi
Vienna, Austria
GitHub Events
Total
- Issues event: 10
- Watch event: 5
- Issue comment event: 12
- Push event: 6
- Create event: 1
Last Year
- Issues event: 10
- Watch event: 5
- Issue comment event: 12
- Push event: 6
- Create event: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 40
- Total pull requests: 0
- Average time to close issues: about 1 month
- Average time to close pull requests: N/A
- Total issue authors: 25
- Total pull request authors: 0
- Average comments per issue: 1.4
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 9
- Pull requests: 0
- Average time to close issues: about 2 months
- Average time to close pull requests: N/A
- Issue authors: 7
- Pull request authors: 0
- Average comments per issue: 1.33
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- asjadnaqvi (12)
- fahad-mirza (2)
- Hveemos (2)
- ralfminor (1)
- rolandhosner (1)
- johannafg (1)
- ngams (1)
- s-garcia-torres (1)
- zebart00 (1)
- MHMPires (1)
- moritzpoll (1)
- elghafiky (1)
- hsayles (1)
- friccakeystone (1)
- joannapiechucka (1)