Recent Releases of naniar
naniar - naniar 1.1.0
New
- Implement
impute_fixed,impute_zero, andimpute_factor. notably these do not implement "scoped variants" which were previously implemented - for example,impute_fixed_ifetc. This is in favour of using the newacrossworkflow withindplyr, and it is easier to maintain. #261 - Add
digitargument tomiss_var_summaryto help display %missing data correctly when there is a very small fraction of missingness. #284 - Implemented
impute_mode- resolves #213. geom_miss_point()works withshapeargument #290- Fix bug with
all_complete, which was implemented as!anyNA(x)but should beall(complete.cases(x)). - Correctly implement
any_na()(andany_miss()) andany_complete(). Rework examples to demonstrate workflow for finding complete variables.
Bug fixes
- Fix bug with
shadow_longnot working when gathering variables of mixed type. Fix involves specifying a value transform, which defaults to character. #314 - Implement
Date,POSIXctandPOSIXltmethods forimpute_below()- #158 - Provide replacenawith, a complement to replacewithna - #129
- Fix bug with
gg_miss_fctwhere it used a deprecated function from forcats - #342
Misc
- Use
cli::cli_abortandcli::cli_warninstead ofstopandwarn(#326) - Use
expect_snapshotinstead ofexpect_error(#326)
Changes
- Soft deprecated
shadow_shift- #193 - Soft deprecate
miss_case_cumsum()andmiss_var_cumsum()- #257
- R
Published by njtierney almost 2 years ago
naniar - naniar 1.0.0
Version 1.0.0 of naniar is to signify that this release is associated with the publication of the associated JSS paper, doi:10.18637/jss.v105.i07. There are also a few small changes that have been implemented in this release, which are described below.
There is still a lot to do in naniar, and this release does not signify that there are no changes upcoming, more so to establish that this is a stable release, and that any changes upcoming will go through a more formal deprecation process and so on.
New
- The DOI in the CITATION is for a new JSS publication that will be registered after publication on CRAN.
- Replaced
tidyr::gatherwithtidyr::pivot_longer- resolves #301 - added
set_n_missandset_prop_missfunctions - resolved #298
Bug Fixes
- Fix bug in
gg_miss_var()where a warning appears to due change in how to remove legend #288.
Misc
- Removed gdtools from naniar as no longer needed 302.
- added imports,
vctrsandcli- which are both free dependencies as they are used within the already used tidyverse already.
- R
Published by njtierney about 3 years ago
naniar - "Spur of the lamp post"
naniar 0.6.0 (2020/08/17) "Spur of the lamp post"
- Provide warning for
replace_with_nawhen columns provided that don't exist (see #160). Thank you to michael-dewar for their help with this.
Breaking Changes
- Drop the "nabular" and "shadow" classes (#268) used in
nabular()andbind_shadow(). In doing so removes the functions,as_shadow(),is_shadow(),is_nabular(),new_nabular(),new_shadow(). These were mostly used internally and it is not expected that users would have used this functions. If these were used, please file an issue and I can implement them again.
- R
Published by njtierney over 5 years ago
naniar - naniar 0.5.2 (2020/06/28) "Silver Apple"
naniar 0.5.2 (2020/06/28) "Silver Apple"
Minor Changes
- Improvements to code in
miss_var_summary(),miss_var_table(), andprop_miss_var(), resulting in a 3-10x speedup.
- R
Published by njtierney over 5 years ago
naniar - "Uncle Andrew's Applewood Wardrobe"
naniar 0.5.1 (2020/04/10) "Uncle Andrew's Applewood Wardrobe"
Minor Changes
- Fixes warnings and errors from
tibbleand subsequent downstream impacts onsimputation.
- R
Published by njtierney almost 6 years ago
naniar - The End of this Story and the Beginning of all of the Others
naniar 0.5.0 (2020/02/20) "The End of this Story and the Beginning of all of the Others"
Breaking Changes
- The following functions related to calculating the proportion/percentage of missingness were made Defunct and will no longer work:
miss_var_prop()complete_var_prop()miss_var_pct()complete_var_pct()miss_case_prop()complete_case_prop()miss_case_pct()complete_case_pct()
Instead use: prop_miss_var(), prop_complete_var(), pct_miss_var(), pct_complete_var(), prop_miss_case(), prop_complete_case(), pct_miss_case(), pct_complete_case(). (see 242)
replace_to_na()was made defunct, please usereplace_with_na()instead. (see 242)
Minor changes
miss_var_cumsumandmiss_case_cumsumare now exported- use
map_dfcinstead ofmap_df - Fix various extra warnings and improve test coverage
Bug Fixes
- Address bug where the number of missings in a row is not calculated properly - see 238 and 232. The solution involved using
rowSums(is.na(x)), which was 3 times faster. - Resolve bug in
gg_miss_fct()where warning is given for non explicit NA values - see 241. - skip vdiffr tests on github actions
- use
tibble()notdata_frame()
- R
Published by njtierney almost 6 years ago
naniar - The Planting of The Tree
Improvements
- The
geom_miss_point()ggplot2 layer can now be converted into an interactive web-based version by theggplotly()function in the plotly package. In order for this to work, naniar now exports thegeom2trace.GeomMissPoint()function (users should never need to callgeom2trace.GeomMissPoint()directly --ggplotly()calls it for you). - adds WORDLIST for spelling thanks to
usethis::use_spell_check() - fix documentation
@seealsobug (#228) (@sfirke)
Dependency fixes
Thanks to a PR (#223) from @romainfrancois:
- This fixes two problems that were identified as part of reverse dependency checks of dplyr 0.8.0 release candidate. https://github.com/tidyverse/dplyr/blob/revdepdplyr080_RC/revdep/problems.md#naniar
- n() must be imported or prefixed like any other function. In the PR, I've changed 1:n() to dplyr::row_number() as naniar seems to prefix all dplyr functions.
- updateshadow was only restoring the class attributes, changed so that it restores all attributes, this was causing problems when data was a groupeddf. This likely was a problem before too, but dplyr 0.8.0 is stricter about what is a grouped data frame.
- R
Published by njtierney about 7 years ago
naniar - # naniar 0.4.1 (2018/11/20) "Aslan's Song"
Minor Change
- Fixes to
new_tibble#220 - Thanks to Kirill Müller. - Refactoring the capture of arguments from
rlang#218 - thanks for Lionel Henry.
- R
Published by njtierney about 7 years ago
naniar - An Unexpected Meeting
New Feature
- Add custom label support for missings and not missings with functions
add_label_missingsandadd_label_shadow()andadd_any_miss(). So you can now do `addlabelmissings(data, missing = "custommissinglabel", complete = "customcompletelabel") impute_median()and scoped variantsany_shade()returns a logical TRUE or FALSE depending on if there are anyshadevaluesnabular()an alias forbind_shadow()to tie thenabularterm into the work.is_nabular()checks if input is nabular.geom_miss_point()now gains the arguments fromshadow_shift()/impute_below()for altering the amount ofjitterand proportion below (prop_below).Added two new vignettes, "Exploring Imputed Values", and "Special Missing Values"
miss_var_summaryandmiss_case_summarynow no longer provide the cumulative sum of missingness in the summaries - this summary can be added back to the data with the optionadd_cumsum = TRUE. #186Added
gg_miss_upsetto replace workflow of:data %>% as_shadow_upset() %>% UpSetR::upset()
Major Change
recode_shadownow works! This function allows you to recode your missing values into special missing values. These special missing values are stored in the shadow part of the dataframe, which ends in_NA.- implemented
shadewhere appropriate throughout naniar, and also added verifiers,is_shade,are_shade,which_are_shade, and removedwhich_are_shadow. as_shadowandbind_shadownow return data of classshadow. This will feed intorecode_shadowmethods for flexibly adding new types of missing data.- Note that in the future
shadowmight be changed tonabbleor something similar.
Minor feature
- Functions
add_label_shadow()andadd_label_missings()gain arguments so you can only label according to the missingness / shadowy-ness of given variables. - new function
which_are_shadow(), to tell you which values are shadows. - new function
long_shadow(), which converts data in shadow/nabular form into a long format suitable for plotting. Related to #165 - Added tests for
miss_scan_count
Minor Changes
gg_miss_upsetgets a better default presentation by ordering by the largest intersections, and also an improved error message when data with only 1 or no variables have missing values.shadow_shiftgains a more informative error message when it doesn't know the class.- Changed
common_na_stringto include escape characters for "?", "", "." so that if they are used in replacement or searching functions they don't return the wildcard results from the characters "?", "", and ".". miss_case_tableandmiss_var_tablenow has final column namespct_vars, andpct_casesinstead ofpct_miss- fixes #178.
Breaking Changes
- Deprecated old names of the scalar missingness summaries, in favour of a more consistent syntax #171. The old the and new are:
|oldnames |newnames |
|:--------------------|:--------------------|
|miss_case_pct |pct_miss_case |
|miss_case_prop |prop_miss_case |
|miss_var_pct |pct_miss_var |
|miss_var_prop |prop_miss_var |
|complete_case_pct |pct_complete_case |
|complete_case_prop |prop_complete_case |
|complete_var_pct |pct_complete_var |
|complete_var_prop |prop_complete_var |
These old names will be made defunct in 0.5.0, and removed completely in 0.6.0.
impute_belowhas changed to be an alias ofshadow_shift- that is it operates on a single vector.impute_below_alloperates on all columns in a dataframe (as specified in #159)
Bug fix
- Ensured that
miss_scan_countactuallyreturn'd something. gg_miss_var(airquality)now prints the ggplot - a typo meant that this did not print the plot
- R
Published by njtierney over 7 years ago
naniar - Strawberry's Adventure
This release is a patch to remove a package imported but not used.
Minor Change
This is a patch release that removes tidyselect from the package Imports, as
it is unnecessary. Fixes #174
naniar_0.3.1.tar.gz
- R
Published by njtierney over 7 years ago
naniar - Digory and his Uncle Are Both in Trouble
New Features
- Added
all_miss()/all_na()equivalent toall(is.na(x)) - Added
any_complete()equivalent toall(complete.cases(x)) - Added
any_miss()equivalent toanyNA(x) - Added
common_na_numbersand finalisedcommon_na_strings- to provide a list of commonly used NA values #168 - Added
miss_var_which, to lists the variable names with missings - Added
as_shadow_upsetwhich gets the data into a format suitable for plotting as anUpSetRplot:
r
airquality %>%
as_shadow_upset() %>%
UpSetR::upset()
Added some imputation functions to assist with exploring missingness structure and visualisation:
impute_belowPerfoms as forshadow_shift, but performs on all columns. This means that it imputes missing values 10% below the range of the data (powered byshadow_shift), to facilitate graphical exloration of the data. Closes #145 There are also scoped variants that work for specific named columns:impute_below_at, and for columns that satisfy some predicate function:impute_below_if.impute_mean, imputes the mean value, and scoped variantsimpute_mean_at, andimpute_mean_if.
impute_belowandshadow_shiftgain argumentsprop_belowandjitterto control the degree of shift, and also the extent of jitter.Added
complete_{case/var}_{pct/prop}, which complementmiss_{var/case}_{pct/prop}#150Added
unbind_shadowandunbind_dataas helpers to remove shadow columns from data, and data from shadows, respectively.Added
is_shadowandare_shadowto determine if something contains a shadow column. simimlar torlang::is_naandrland::are_na,is_shadowthis returns a logical vector of length 1, andare_shadowreturns a logical vector of length of the number of names of a data.frame. This might be revisited at a later point (seeany_shadeinadd_label_shadow).Aesthetics now map as expected in geommisspoint(). This means you can write things like
geom_miss_point(aes(colour = Month))and it works appropriately. Fixed by Luke Smith in Pull request #144, fixing #137.
Minor Changes
miss_var_summaryandmiss_case_summarynow return useorder = TRUEby default, so cases and variables with the most missings are presented in descending order. Fixes #163Changes for Visualisation:
- Changed the default colours used in
gg_miss_caseandgg_miss_varto lorikeet purple (from ochRe package: https://github.com/ropenscilabs/ochRe) gg_miss_case- The y axis label is now ...
- Default presentation is with
order_cases = TRUE. - Gains a
show_pctoption to be consistent withgg_miss_var#153 gg_miss_whichis rotated 90 degrees so it is easier to read variable namesgg_miss_fctuses a minimal theme and tilts the axis labels #118.
- Changed the default colours used in
imported
is_naandare_nafromrlang.Added
common_na_strings, a list of commonNAvalues #168.Added some detail on alternative methods for replacing with NA in the vignette "replacing values with NA".
- R
Published by njtierney over 7 years ago
naniar - CRAN 0.1.0 Release "The Founding of naniar"
"The Founding of naniar the first version on CRAN! The name is taken from Chapter 9 of The Magician's Nephew. Below is the updated NEWS file
naniar 0.1.0 (2017/08/09) "The Founding of naniar"
=========================
- This is the first release of
naniaronto CRAN, updates tonaniarwill happen reasonably regularly after this approximately every 1-2 months
naniar 0.0.9.9995 (2017/08/07)
=========================
Name change
- After careful consideration, I have changed back to
naniar
Major Change
- three new functions :
miss_case_cumsum/miss_var_cumsum/replace_to_na - two new visualisations :
gg_var_cumsum&gg_case_cumsum
New Feature
group_byis now respected by the following functions:miss_case_cumsum()miss_case_summary()miss_case_table()miss_prop_summary()miss_var_cumsum()miss_var_run()miss_var_span()miss_var_summary()miss_var_table()
Minor changes
- Reviewed documentation for all functions and improved wording, grammar, and style.
- Converted roxygen to roxygen markdown
- updated vignettes and readme
- added a new vignette "naniar-visualisation", to give a quick overview of the visualisations provided with naniar.
- changed
label_missing*tolabel_missto be more consistent with the rest of naniar - Add
pctandprophelpers (#78) - removed
miss_df_pct- this was literally the same aspct_missorprop_miss. - break larger files into smaller, more manageable files (#83)
gg_miss_vargets ashow_pctargument to show the percentage of missing values (Thanks Jennifer for the helpful feedback! :))
Minor changes
miss_var_summary&miss_case_summarynow have consistent output (one was ordered by n_missing, not the other).- prevent error in
miss_case_pct enquo_xis nowx(as adviced by Hadley)- Now has ByteCompile to TRUE
- add Colin to auth
narnia 0.0.9.9400 (2017/07/24)
=========================
new features
replace_to_nais a complement totidyr::replace_naand replaces a specified value from a variable to NA.gg_miss_fctreturns a heatmap of the number of missings per variable for each level of a factor. This feature was very kindly contributed by Colin Fay.gg_miss_functions now return a ggplot object, which behave as such.gg_miss_basic themes can be overriden with ggplot functions. This fix was very kindly contributed by Colin Fay.- removed defunct functions as per #63
- made
add_*functions handle bare unqouted names where appropriate as per #61 - added tests for the
add_*family - got the svgs generated from vdiffr, thanks @karawoo!
breaking changes
- changed
geom_missing_point()togeom_miss_point(), to keep consistent with the rest of the functions innaniar.
narnia 0.0.8.9100 (2017/06/23)
=========================
new features
- updated datasets
brfssandtaoas per #59
narnia 0.0.7.9992 (2017/06/22)
=========================
new features
add_label_missings()add_label_shadow()cast_shadow()cast_shadow_shift()cast_shadow_shift_label()added github issue / contribution / pull request guides
tsgeneric functions are nowmiss_var_spanandmiss_var_run, andgg_miss_spanand work ondata.frame's, as opposed to justtsobjects.add_shadow_shift()adds a column of shadowshifted values to the current dataframe, adding "shift" as a suffixcast_shadow()- acts likebind_shadow()but allows for specifying which columns to addshadow_shiftnow has a method for factors - powered byforcats::fct_explicit_na()#3
bug fixes
- shadow_shift.numeric works when there is no variance (#37)
name changes
- changed
is_nafunction tolabel_na - renamed most files to have
tidy-miss-[topic] gg_missing_*is changed togg_miss_*to fit with other syntax
Removed functions
- Removed old functions
miss_cat,shadow_dfandshadow_cat, as they are no longer needed, and have been superceded bylabel_missing_2d,as_shadow, andis_na.
minor changes
- drastically reduced the size of the pedestrian dataset, consider 4 sensor locations, just for 2016.
New features
- New dataset,
pedestrian- contains hourly counts of pedestrians - First pass at time series missing data summaries and plots:
miss_ts_run(): return the number of missings / complete in a single runmiss_ts_summary(): return the number of missings in a given time periodgg_miss_ts(): plot the number of missings in a given time period
Name changes
- renamed package from
naniartonarnia- I had to explain the spelling a few times when I was introducing the package and I realised that I should change the name. Fortunately it isn't on CRAN yet.
naniar 0.0.6.9100 (2017/03/21)
=========================
- Added
prop_missand the complementprop_complete. Wheren_missreturns the number of missing values,prop_missreturns the proportion of missing values. Likewise,prop_completereturns the proportion of complete values.
Defunct functions
- As stated in 0.0.5.9000, to address Issue #38, I am moving towards the format misstypevalue/fun, because it makes more sense to me when tabbing through functions.
The left hand side functions have been made defunct in favour of the right hand side.
- percent_missing_case() --> miss_case_pct()
- percent_missing_var() --> miss_var_pct()
- percent_missing_df() --> miss_df_pct()
- summary_missing_case() --> miss_case_summary()
- summary_missing_var() --> miss_var_summary()
- table_missing_case() --> miss_case_table()
- table_missing_var() --> miss_var_table()
naniar 0.0.5.9000 (2016/01/08)
=========================
Deprecated functions
- To address Issue #38, I am moving towards the format misstypevalue/fun, because it makes more sense to me when tabbing through functions.
miss_*= I want to explore missing valuesmiss_case_*= I want to explore missing casesmiss_case_pct= I want to find the percentage of cases containing a missing valuemiss_case_summary= I want to find the number / percentage of missings in each casemiss_case_table= I want a tabulation of the number / percentage of cases missing
This is more consistent and easier to reason with.
Thus, I have renamed the following functions:
- percent_missing_case() --> miss_case_pct()
- percent_missing_var() --> miss_var_pct()
- percent_missing_df() --> miss_df_pct()
- summary_missing_case() --> miss_case_summary()
- summary_missing_var() --> miss_var_summary()
- table_missing_case() --> miss_case_table()
- table_missing_var() --> miss_var_table()
These will be made defunct in the next release, 0.0.6.9000 ("The Wood Between Worlds").
naniar 0.0.4.9000 (2016/12/31)
=========================
New features
n_completeis a complement ton_miss, and counts the number of complete values in a vector, matrix, or dataframe.
Bug fixes
shadow_shiftnow handles cases where there is only 1 complete value in a vector.
Other changes
- added much more comprehensive testing with
testthat.
naniar 0.0.3.9901 (2016/12/18)
=========================
After a burst of effort on this package I have done some refactoring and thought hard about where this package is going to go. This meant that I had to make the decision to rename the package from ggmissing to naniar. The name may strike you as strange but it reflects the fact that there are many changes happening, and that we will be working on creating a nice utopia (like Narnia by CS Lewis) that helps us make it easier to work with missing data
New Features (under development)
add_n_missandadd_prop_missare helpers that add columns to a dataframe containing the number and proportion of missing values. An example has been provided to use decision trees to explore missing data structure as in Tierney et algeom_miss_point()now supports transparency, thanks to @seasmith (Luke Smith)more shadows. These are mainly around
bind_shadowandgather_shadow, which are helper functions to assist with creating
Bug fixes
geom_missing_point()broke after the new release of ggplot2 2.2.0, but this is now fixed by ensuring that it inherits from GeomPoint, rather than just a new Geom. Thanks to Mitchell O'hara-Wild for his help with this.missing data summaries
table_missing_varandtable_missing_casealso now return more sensible numbers and variable names. It is possible these function names will change in the future, as these are kind of verbose.semantic versioning was incorrectly entered in the DESCRIPTION file as 0.2.9000, so I changed it to 0.0.2.9000, and then to 0.0.3.9000 now to indicate the new changes, hopefully this won't come back to bite me later. I think I accidentally did this with visdat at some point as well. Live and learn.
Other changes
gathered related functions into single R files rather than leaving them in their own.
correctly imported the
%>%operator from magrittr, and removed a lot of chaff around@importFrom- really don't need to use@importFromthat often.
ggmissing 0.0.2.9000 (2016/07/29)
=========================
New Feature (under development)
geom_missing_point()now works in a way that we expect! Thanks to Miles McBain for working out how to get this to work.
ggmissing 0.0.1.9000 (2016/07/29)
=========================
New Feature (under development)
- tidy summaries for missing data:
percent_missing_dfreturns the percentage of missing data for a data.framepercent_missing_varthe percentage of variables that contain missing valuespercent_missing_casethe percentage of cases that contain missing values.table_missing_vartable of missing information for variablestable_missing_casetable of missing information for casessummary_missing_varsummary of missing information for variables (counts, percentages)summary_missing_casesummary of missing information for variables (counts, percentages)
- ggmissingcol: plot the missingness in each variable
- ggmissingrow: plot the missingness in each case
- ggmissingwhich: plot which columns contain missing data.
- R
Published by njtierney over 8 years ago
naniar - The Wrong Door
naniar 0.0.4.9000 (2016/12/31)
New features
n_completeis a complement ton_miss, and counts the number of complete values in a vector, matrix, or dataframe.
Bug fixes
shadow_shiftnow handles cases where there is only 1 complete value in a vector.
Other changes
- added much more comprehensive testing with
testthat.
naniar 0.0.3.9901 (2016/12/18)
New features
add_n_missandadd_prop_missare helpers that add columns to a dataframe containing the number and proportion of missing values. An example has been provided to use decision trees to explore missing data structure as in Tierney et algeom_miss_point()now supports transparency, thanks to @seasmith (Luke Smith)
naniar 0.0.3.9000 (2016/12/18)
After a burst of effort on this package I have done some refactoring and thought hard about where this package is going to go. This meant that I had to make the decision to rename the package from ggmissing to naniar. The name may strike you as strange but it reflects the fact that there are many changes happening, and that we will be working on creating a nice utopia (like Narnia by CS Lewis) that helps us make it easier to work with missing data
New Features (under development)
- more shadows. These are mainly around
bind_shadowandgather_shadow, which are helper functions to assist with creating
Bug fixes
geom_missing_point()broke after the new release of ggplot2 2.2.0, but this is now fixed by ensuring that it inherits from GeomPoint, rather than just a new Geom. Thanks to Mitchell O'hara-Wild for his help with this.- missing data summaries
table_missing_varandtable_missing_casealso now return more sensible numbers and variable names. It is possible these function names will change in the future, as these are kind of verbose. - semantic versioning was incorrectly entered in the DESCRIPTION file as 0.2.9000, so I changed it to 0.0.2.9000, and then to 0.0.3.9000 now to indicate the new changes, hopefully this won't come back to bite me later. I think I accidentally did this with visdat at some point as well. Live and learn.
Other changes
- gathered related functions into single R files rather than leaving them in their own.
- correctly imported the
%>%operator from magrittr, and removed a lot of chaff around@importFrom- really don't need to use@importFromthat often.
ggmissing 0.0.2.9000 (2016/07/29)
New Feature (under development)
geom_missing_point()now works in a way that we expect! Thanks to Miles McBain for working out how to get this to work.
ggmissing 0.0.1.9000 (2016/07/29)
New Feature (under development)
- tidy summaries for missing data:
percent_missing_dfreturns the percentage of missing data for a data.framepercent_missing_varthe percentage of variables that contain missing valuespercent_missing_casethe percentage of cases that contain missing values.table_missing_vartable of missing information for variablestable_missing_casetable of missing information for casessummary_missing_varsummary of missing information for variables (counts, percentages)summary_missing_casesummary of missing information for variables (counts, percentages)
- ggmissingcol: plot the missingness in each variable
- ggmissingrow: plot the missingness in each case
- ggmissingwhich: plot which columns contain missing data.
- R
Published by njtierney about 9 years ago