Recent Releases of RSMTool
RSMTool - RSMTool 12.0.0
What's Changed
- Python 3.8 and 3.9 are no longer supported since the SKLL dependency was updated to v5.0.1.
- Remove rsmextra and special sections by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/676
- Add type hints (part 1) by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/677
- Add type hints (Part 2) by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/679
- Add type hints (Part 3) by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/681
- Add Type Hints (Part 4) by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/682
- Add Type Hints (Final Part) by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/683
- Fix pandas-related warnings by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/684
- Separate runtime and dev dependencies by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/685
Full Changelog: https://github.com/EducationalTestingService/rsmtool/compare/v11.3.0...v12.0.0
Scientific Software - Peer-reviewed
- Python
Published by desilinguist about 2 years ago
RSMTool - v11.3.0
π‘ New features π‘
- Add section ordering for
rsmexplainby @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/667 - Update intermediate files notebook section for readability by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/670
π οΈ Bugfixes & Improvements π οΈ
- Update SHAP to the latest version 0.44.0 by @damien2012eng in https://github.com/EducationalTestingService/rsmtool/pull/664
- Unpin dependencies and fix minor issues by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/666
- Fix new warnings & remove manual suppressions by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/668
- Refactor CLI tests to modernize codecov by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/669
- Pin numpy to < 2 by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/673
ππ½ Contributions & Code Reviews ππ½
@damien2012eng @desilinguist @Frost45 @mulhod @tamarl08
Full Changelog: https://github.com/EducationalTestingService/rsmtool/compare/v11.2.0...v11.3.0
Scientific Software - Peer-reviewed
- Python
Published by desilinguist over 2 years ago
RSMTool - RSMTool 11.2.0
π‘ New features π‘
- Add sections for W&B logging to make information easier to find by @tamarl08 in https://github.com/EducationalTestingService/rsmtool/pull/659
- Add configuration option to disable truncation of outliers by @damien2012eng in https://github.com/EducationalTestingService/rsmtool/pull/661
π οΈ Bugfixes & Improvements π οΈ
- Fix bugs when processing cross-validation folds in
rsmxvalby @tamarl08 in https://github.com/EducationalTestingService/rsmtool/pull/662
ππ½ Contributions & Code Reviews ππ½
@damien2012eng @desilinguist @mulhod @tamarl08
Full Changelog: https://github.com/EducationalTestingService/rsmtool/compare/v11.1.1...v11.2.0
Scientific Software - Peer-reviewed
- Python
Published by tamarl08 over 2 years ago
RSMTool - RSMTool 11.1.1
:bulb: New features :bulb:
- Add a new human-human confusion matrix for double-scored data by @mulhod in https://github.com/EducationalTestingService/rsmtool/pull/649
- Allow prallelization of grid search when using SKLL models in
rsmtoolby @tamarl08 in https://github.com/EducationalTestingService/rsmtool/pull/650 ## :hammerandwrench: Bugfixes & Improvements :hammerandwrench: - Update pre-commit checks by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/647
- Enhance wandb logging of evaluation metrics by @tamarl08 in https://github.com/EducationalTestingService/rsmtool/pull/651
- Fix warnings in reports by @tamarl08 in https://github.com/EducationalTestingService/rsmtool/pull/654
ππ½ Contributions & Code Reviews ππ½
@damien2012eng @desilinguist @mulhod @tamarl08 @tazin-afrin
Full Changelog: v11.0.1...v11.1.1
Scientific Software - Peer-reviewed
- Python
Published by tamarl08 over 2 years ago
RSMTool - RSMTool 11.0.1
:bulb: New features :bulb:
- New rsmexplain plots by @damien2012eng in https://github.com/EducationalTestingService/rsmtool/pull/603
- Full W&B integration to allow logging of output artifacts and report by @tamarl08 in https://github.com/EducationalTestingService/rsmtool/pull/617, https://github.com/EducationalTestingService/rsmtool/pull/620, https://github.com/EducationalTestingService/rsmtool/pull/621, https://github.com/EducationalTestingService/rsmtool/pull/623, https://github.com/EducationalTestingService/rsmtool/pull/627
- Add FAQ page to documentation by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/622
- Add support for Python 3.11 by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/628
- Add support for output files when auto-generating configurations by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/640
- Enhancements to fast_predict by @mulhod in https://github.com/EducationalTestingService/rsmtool/pull/632
- NOTE: The
.modelfiles produced byrsmtoolare no longer SKLL model files. They are serializedrsmtool.Modelerobjects. This change should be transparent to the users if the only places they use the.modelfiles are withrsmpredictandrsmexplain. However, if those files are used outside of RSMTool and expected to contain SKLL learners, then the following change is needed: users would now need to use theModeler.load_from_file()method to load the.modelfile produced byrsmtooland then access the SKLL learner via the.learnerattribute.
:hammerandwrench: Bugfixes & Improvements :hammerandwrench:
- Migrate nose to nose2 by @damien2012eng in https://github.com/EducationalTestingService/rsmtool/pull/610
- Upgrade
shapby @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/612 - Use example IDs when specifying
sample_idsby @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/613 - Expect
scale_withvalue of 'raw' in rsmeval by @tamarl08 in https://github.com/EducationalTestingService/rsmtool/pull/614 - Fix
update_filesfornose2. by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/616 - Fix bug in wording of what will be highlighted for disattenuated correlation by @mulhod in https://github.com/EducationalTestingService/rsmtool/pull/594
- Pin skll version in doc requirements by @tamarl08 in https://github.com/EducationalTestingService/rsmtool/pull/619
- Remove unnecessary warnings in HTML reports. by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/624
- Include system information in RSMExplain reports by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/633
- Suppress alt text warnings when generating reports by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/634
- Fix W&B tests and add to CI builds. by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/637
- Switch to
rufffor pre-commit checks by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/639 - Fix test dir usage in test_wandb by @tamarl08 in https://github.com/EducationalTestingService/rsmtool/pull/642
ππ½ Contributions & Code Reviews ππ½
- @tamarl08
- @mulhod
- @damien2012eng
- @dblandan
- @Frost45
- @blongwill
Full Changelog: https://github.com/EducationalTestingService/rsmtool/compare/v10.0.0...v11.0.1
Scientific Software - Peer-reviewed
- Python
Published by tamarl08 almost 3 years ago
RSMTool - RSMTool 10.0.0
This is a major new release! It includes new functionality as well as updated dependencies.οΈ
:bulb: New features :bulb:
Dependencies
- Shap is now a required dependency. It is currently pinned to 0.41.0 but we plan to keep RSMTool updated with the latest SHAP versions as they are released.
- Numpy has been pinned to <= 1.23.5 since SHAP 0.41.0 does not work with numpy 1.24.x.
RSMExplain
- Added new command-line utility
rsmexplainto generate an explanation report for an existing rsmtool experiment. Underlyingly,rsmexplainleverages SHapley Additive exPlanations produced byshap. - Added comprehensive documentation on how to run
rsmexplain. - Added support for automated and interactive configuration generation for
rsmexplain. - Add comprehensive functional tests for
rsmexplain.
More reliable notebook merging
- Updated
rsmtool.reporter.merge_notebooks()to usenbconvertandnbformatAPIs instead of the JSON-based hack that was being used before.
:hammerandwrench: Bugfixes & Improvements :hammerandwrench:
- Use
legend_handlesinstead of the deprecatedlegendHandlesattribute for matplotlib to avoid deprecation warnings in notebooks. - Minor documentations fixes in various places.
Contributions from: @damien2012eng, @desilinguist, @tamarl08, @dblandan, and @mulhod!
Scientific Software - Peer-reviewed
- Python
Published by damien2012eng almost 3 years ago
RSMTool - RSMTool 9.1.1
What's Changed
- Remove
np.warningsfromfairness_utils.py. by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/580 - Add new
fast_predict()API method for prediction by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/581 - Convert all formatted strings to f-strings and add pre-commit with flynt by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/584
- Integrate black and run on all files by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/586
- Add isort, pydocstyle, flake8 as pre-commit checks by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/587
- Restore and increase test coverage by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/589
- Update contributing docs & remove extraneous whitespace by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/590
- Update SKLL to v3.2.0 by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/591
- Release v9.1.1 by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/592
Full Changelog: https://github.com/EducationalTestingService/rsmtool/compare/v9.0.1...v9.1.1
Scientific Software - Peer-reviewed
- Python
Published by desilinguist over 3 years ago
RSMTool - v9.0.1
What's Changed
This is a minor bugfix release.
- Delete the
stablebranch by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/573 - Disallow negative confidence intervals in fairness plots since they cause new versions of
pandasto break by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/574 - Add workaround for broken SVGs in
nbconvertby overridingclean_htmlby @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/575 - Fix bug for integer IDs when using
rsmxvalby @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/577 - Update SKLL dependency to v3.1.0 by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/578
Full Changelog: https://github.com/EducationalTestingService/rsmtool/compare/v9.0.0...v9.0.1
Scientific Software - Peer-reviewed
- Python
Published by desilinguist over 3 years ago
RSMTool - RSMTool 9.0
This is a major new release. It includes new functionality and breaking changes to the API as well as to dependencies.
β‘οΈ RSMTool 9.0 is incompatible with previous versions β‘οΈ
π‘ New features π‘
Dependencies
RSMTool is now compatible with SKLL v3.0 and, therefore, scikit-learn v1.0.2.
RSMTool now supports Python 3.10, in addition to 3.8 and 3.9. Python 3.7 is no longer supported.
tqdm is now a required dependency.
Native cross-validation support
Add native support for cross-validation experiments to RSMTool. Using a single train-test split may lead to biased estimates of performance since those estimates will depend on the specific characteristics of that split. However, using cross-validation instead can provide more accurate estimates of scoring model performance since those estimates are averaged over multiple train-test splits that are randomly selected based on the data.
Add new command-line utility
rsmxvalto run cross-validation experiments. Underlyingly, it leverages the RSMTool API functionsrun_experiment(),run_evaluation(), andrun_summary()to generate multiple useful reports for the users.Add support for automated configuration generation to
rsmxvalin both batch and interactive mode.Add comprehensive documentation on how to run cross-validation experiments.
Add comprehensive functional tests for cross-validation.
API Changes
Add two new logging functions in
rsmtool.utils.logging. These are only meant to be used by RSMTool developers, not users.Factor out the code that was used to write a dataframe to disk into a separate utility method
DataWriter.write_frame_to_disk()so that it an also be used byrsmxval. This can prove useful to advanced RSMTool users as well.Add new cross-validation specific utility functions to
rsmtool.utils.cross_validation.Convert several class or static methods in various classes to instance methods in order to allow for passing and using an optional logger instance.
Tweak the
check_scaled_coefficients()test utility function to take the output directory as an argument instead of taking an experiment name to allow its usage forrsmxvalfunctional tests.
π Bugfixes & Improvements π
Fix the behavior of the
use_thumbnailsoption in RSMTool configuration files. It was generating both the thumbnail as well as the full-sized figure due to the behavior of Matplotlibβssavefig(). The solution was to turn off interactive plotting in all header notebooks.Replace deprecated methods and keywords in RSMTool code as recommended by the latest versions of pandas, numpy, and scikit-learn.
Fix several duplicate target warnings when compiling the documentation. Make sure included RST files have an extension of
.rst.incso that they are not compiled twice. Turn all web links into anonymous references so that there are no conflicts with the same target names.Make feature boxplots for subgroups in reports more flexible in terms of the number of features. Specifically, if the experiment has more than 150 features, no boxplots are shown. Previously this limit was 30. In addition, the message that the boxplots have been omitted is displayed more prominently when it happens. Finally, if the number of features is > 30 but <=150, a new message asking the user to enable thumbnails is shown.
Update Gitlab CI plan to use Python 3.8 and Azure Pipelines to use Python 3.10. Add new cross-validation tests to both CI plans.
Scientific Software - Peer-reviewed
- Python
Published by desilinguist about 4 years ago
RSMTool - RSMTool 8.1.2
This is a bugfix release.
- Update the code for compatibility with
pandas 1.3.0andscikit-learn 0.24.2.
Scientific Software - Peer-reviewed
- Python
Published by aloukina almost 5 years ago
RSMTool - RSMTool 8.1.1
This is a bugfix release with some minor improvements.
Continuous integration build for RSMTool migrated from Travis CI to Gitlab CI.
Minor bug fixed in
parse_json_with_commentsto handle URLs correctly.Minor updates to warnings and documentation.
Scientific Software - Peer-reviewed
- Python
Published by Frost45 almost 5 years ago
RSMTool - RSMTool 8.1.0
This is a minor but backwards-incompatible release which includes changes necessary to make RSMTool compatible with SKLL v2.5.
What's new
- RSMTool is now compatible with SKLL 2.5!
π₯ Breaking Changes π₯
Python 3.6 is no longer officially supported since the latest versions of
pandasandnumpyhave dropped support for it. RSMTool officially supports Python 3.7, 3.8, and 3.9.RSMTool no longer supports
.xlsfiles. For users who use Excel to prepare their data, we continue supportingxlsxfiles.Models trained with earlier versions of RSMTool can no longer be used to generate predictions. If you use
rsmpredictorcompute_and_save_predictionsto generate predictions based on existing models, you will need to re-train the models.
Scientific Software - Peer-reviewed
- Python
Published by aloukina about 5 years ago
RSMTool - RSMTool 8.0.2
This is a bugfix release with some minor improvements.
The version of
nbconvertused by RSMTool is now pinned to<6.0due to a change in v6.0 and above that broke RSMTool report generation. We will remove the pin in a future release when the upstream issue is fixed.RSMTool reports no longer displays a pie chart for the model coefficients if any of the coefficients are negative.
Minor updates for compatibility with external packages.
Minor updates to warnings and documentation.
Scientific Software - Peer-reviewed
- Python
Published by aloukina over 5 years ago
RSMTool - RSMTool 8.0.1
This is a bugfix release with some minor improvements.
Update the code for compatibility with
pandas1.1.0.prmse_trueno longer raises an error if there are no double-scored responses. Instead the function displays a warning and returns None.Command line tools
rsmtool,rsmeval,rsmpredict,rsmcompareandrsmsummarizeno longer raise an error if a user does not provide any command line arguments. Instead the tools display the help message.Minor updates to documentation.
Improvements to the testing and coverage measurement process.
Scientific Software - Peer-reviewed
- Python
Published by aloukina almost 6 years ago
RSMTool - RSMTool 8.0
This is a major new release. It includes a lot of new functionality and multiple changes to the API.
β‘οΈ RSMTool 8.0 is backwards incompatible with previous versions β‘οΈ
π‘ New features π‘
Dependencies
RSMTool is now compatible with SKLL v2.1
All dependencies other than
skllare now unpinned.RSMTool now supports Python versions 3.6, 3.7 and 3.8.
Interactive generation of configuration files
- Configuration files for
rsmtool,rsmeval,rsmpredict,rsmcompareandrsmsummarizecan now be generated automatically, either interactively or non-interactively. This exciting new functionality makes it easier to keep track of the many configuration options available in RSMTool and greatly simplifies the process of setting up the experiment. Watch the video demonstrating the new interactive generation or read the documentation.
Passing hyperparameters to SKLL models
- It is now possible to pass custom hyperparameter values to
sklllearners used through RSMTool. This is done using a new configuration fieldskll_fixed_parameters. The parameters are also displayed in the report.
Generalized version of PRMSE
The formula for PRMSE has been updated to a more general version derived by Matthew S. Johnson that allows computation of PRMSE for any number of raters. For two raters, the formula returns the same result as the formula used in previous versions of the tool.
The API now provides a new function
prmse_true()which accepts scikit-learn style parameters and returns the PRMSE value.It is now possible to supply error variance of human raters necessary to compute PRMSE. This can be useful when the experiments require computing this parameter on data other than the evaluation set. This can be done via the
rater_error_variancefield in the configuration file or by passing the variance as a parameter toprmse_true().
Changes to RSMTool reports
- The report now always displays the headers for the "Consistency" and "True score evaluations" sections. If no second score is available, the report will indicate this. If you do not want these section headers to appear in your report, use the
general_sectionfield to exclude these sections. TIP: If you use automatic configuration generation, you configuration file will contain the full list of available sections that you can edit to exclude unnecessary sections.
π₯ Incompatible Changes π₯
File formats
rsmcompareandrsmsummarizeno longer support experiments that were generated with earlier versions of RSMTool. You will need to re-run the experiments that you want to compare or summarize.rsmtoolno longer supports old-style configuration files (not used since v5.5 or earlier).rsmtoolno longer supports feature files in.jsonformat (not used since v5.5 or earlier).The Intermediate file containing true score evaluations
true_score_evalno longer contains variance of human scores. This information can still be obtained fromconsistencyfiles.
API Changes
The
ConfigurationandConfigurationParserobjects in theconfiguration_parsermodule have been fully refactored. A newConfigurationobject can now be instantiated using a dictionary with keys using the same name as the fields in the configuration file . Validation and normalization is now done as part of initialization. See this PR for more detail.Configurationobjects no longer have afilepathattribute. Use theconfigdirattribute to indicate what any relative paths in the dictionary are relative to.Functions in the erstwhile
rsmtool.utilsmodule have been moved to new locations. This includes several functions for computing evaluation metrics (agreement,difference_of_standardized_means,partial_correlations,quadratic_weighted_kappa, andstandardized_mean_difference). See the API documentation for the new location of these functions.The API for computing PRMSE has changed. See the API documentation for new functions.
π Bugfixes & Improvements π
v7.1.0 did not allow
run_*functions to acceptpathlib.Pathobjects for paths to configuration files. This is now allowed.Error messages and warnings produced by RSMTool are now more meaningful and consistent.
Multiple changes to improve code readability and consistency.
Scientific Software - Peer-reviewed
- Python
Published by aloukina about 6 years ago
RSMTool - RSMTool 7.1
This is a minor release which includes changes necessary to make RSMTool compatible with SKLL 2.0.
What's new
RSMTool is now compatible with SKLL 2.0.
The implementation of
scipy.stats.pearsonrused in RSMTool to compute Pearson's correlation coefficient has changed. The new implementation is equivalent to the old one in the majority of cases but tends to produce slightly different values for very smallN. See https://github.com/EducationalTestingService/rsmtool/issues/343 for further detail.If you use the Dash app on macOS, you can now download the complete RSMTool documentation for offline use. Go to Dash preferences, click on "Downloads", then "User Contributed", and search for "RSMTool".
The conda package for RSMTool is now available from the official ETS conda channel.
API changes
The
run_experiment,run_evaluation,run_comparison,run_summary, andcompute_and_save_predictionsfunctions now accept Python dictionaries as input.The
.filepathattribute ofConfigurationobject will be deprecated in a future version and replaced with two new atttributes:configdirandfilename. Usejoin(configdir, filename)if you need the full path to the configuration file.
Other
- Minor changes to the documentation.
- Many functions used for tests have been refactored for efficiency.
Scientific Software - Peer-reviewed
- Python
Published by aloukina over 6 years ago
RSMTool - RSMTool 7.0
This is a major release which includes changes to several key evaluation metrics computed by RSMTool.
What's new
Changes to evaluation metrics
The exact definitions of all evaluation metrics and their method of computation are now available in * RSMTool documentation under evaluation metrics.
Changes to evaluation metrics
Quadratic weighted kappa (QWK) for
raw,raw_trim,scaleandscale_trimscores is now computed on continuous score values using formula suggested by Haberman (2019). In previous versions of RSMTool such continuous score values were rounded to compute QWK.Subgroup differences are now evaluated using a new metrics "Difference in standardized means". This metrics was designed to be more robust to differences in scale between human and machine scores.
SMD for human-human agreement is now computed using pooled standard deviation of H1 and H2 for the double-scored sample in the denominator.
The default
tolerancefor score postprocessing is now set to 0.4998 (instead of 0.49998). This may result in small changes to the values of all evaluation metrics forraw_trimandscale_trimscores. See below for new configuration files if you need to define custom tolerance.
New evaluation metrics
Test-theory based evaluations: RSMTool and RSMEval now compute proportional reduction in mean squared error when using system scores to predict true scores.
RSMTool and RSMEval now compute various additional metrics of model fairness suggested in Loukina et al. 2019.
New configuration settings
A new configuration setting
experiment_namesfor RSMSummarize allows specifying custom names for each experiment. These will be used to refer to the experiments in intermediate output files and in the report.A new configuration setting
trim_toleranceallows specifying custom tolerance when trimming scores to ceiling and floor values in RSMTool and RSMEval.A new configuration setting
min_n_per_groupallows defining a threshold so that only groups with more than a certain number of members are included into the report. All groups are still included into the intermediate output files.
Other new functionality
.jsonlinesformat is now one of the supported input file formats.
API changes
Several additional methods for computing standardized mean difference (SMD) are now available via
rsmtool.utils.standardized_mean_differenceThe new routine for computing QWK is available via
rsmtool.utils.quadratic_weighted_kappaThe new metrics differences in standardized means (DSM) is available via
rsmtool.utils.difference_of_standardized_meansFunctions for computing fairness analyses are now available via
rsmtool.fairness_utils.get_fairness_analyses.
Bugfixes
partial_correlations()function has been updated to return a correctly formatted matrix in a situation where the covariance matrix is very close to zero.The reports have been updated to correctly display plots for features with very long names.
Scientific Software - Peer-reviewed
- Python
Published by aloukina over 6 years ago
RSMTool -
This is a major release which includes a number of improvements primarily aimed to increase the flexibility of RSMTool API.
What's New
New functionality
RSMTool now supports input files in SAS
SAS7BDATformat.New learner
NNLRIterative. This is a new built-in linear regression model that learns empirical OLS regression weights with feature selection using an iterative implementation of non-negative least squares regression.Custom truncation thresholds. The user can now remove outliers using pre-existing truncation thresholds specified in the
featuresfile by using the field usetruncationthresholdsUsers can now run the
.ipynbnotebook generated from the experiment interactively, without having to set any environment variables. Each experiment now generates a (hidden) environment JSON file, which the notebook will automatically read.
API changes
There is now a separate function
utils.standardized_mean_difference()that can be used to compute SMD.A new function
reader.try_to_load_file()allows API user to specify what they want to happen if a file cannot be loaded. The functions can be set to returnNone, to raise warning, or to raise error.DataContainerclass now includes additional helper methods. These methods allow users todrop()andrename()data frames in the DataContainer, and to select data frames using a specified prefix or suffix with theget_frames()method.Configurationclass now includes several additional helper methodspop()andcopy().utils.get_thumbnail_as_html()now accepts an optional argumentpath_to_thumbnailwhich allows using two different paths for thumbnails and full-size images.
Other
Support for
seaborn 0.9.0andstatsmodels 0.9.0.Support for
numpy 1.14.0,scipy 1.1.0, andpandas 0.23.0+.Support for
ipython 6.5.0andnotebook 5.7.2.The documentation incorrectly stated the order of operations in the processing pipeline: the change of feature sign (if applicable) happens after standardization.
If the user specifies a list of features and one of such features has zero variance, the tool now displays the correct error message.
The logging messages displayed by
check_flag_columnnow indicate the partition if different flag columns were used for training and evaluating the model.Miscellaneous minor bug fixes in the notebooks.
Scientific Software - Peer-reviewed
- Python
Published by aloukina over 7 years ago
RSMTool - Version 6.0.1
This is a bugfix release.
- The "System Information" section of the reports now uses
pkg_resourcesinstead ofpipto get the list of installed packages sincepipdisallows the use of its internal API starting with v10. - Fix incorrect formatting in the documentation.
- Update
ipythonandnotebookpackage versions in order to address an incompatibility issue with the latest version of thetornadoweb server that affects interactive use ofipython notebookbut not the report generation itself. - Updated the description of the marginal/partial correlation plot in the report.
Scientific Software - Peer-reviewed
- Python
Published by desilinguist about 8 years ago
RSMTool - Version 6.0
What's new?
This is a major release. The entire code base has been fully refactored to use a much more object-oriented design. This should make it much easier to make improvements and to add extensions. As result, there have been significant changes to the RSMTool API (see link in documentation below for more details).
New features
New learners
- New regressors from the latest SKLL release (v1.5.1) have been added to
rsmtool. rsmtoolcan now be used with both regressors and classifiers from SKLL, including classifiers that produce probabilistic output which can be used to produce expected values as predictions.See the SKLL documentation for the full list of learners.
Enhanced outputs
- Users can now specify the
file_formatconfiguration option to save intermediate files in eithertsv,csv, orxlsxformat. - Users can specify a
use_thumbnailsconfiguration option that will embed clickable thumbnails in the HTML report, rather than full-sized images. Upon clicking the thumbnails, full-sized images will be displayed in a new window. This is particularly useful for larger reports with many images, improving both the readability and the loading speed of such reports. - Reports for
rsmtool,rsmeval, andrsmsummarizenow contain a new section containing links to intermediate files (intermediate_file_paths.ipynb) so that users can now easily inspect these files from the report itself.
New configuration options
- Users can now specify
featuresin the configuration file as alist. When providing a list of features, signs or transformations cannot be specified. This makes creating configuration files for simple experiments much easier and faster. - Users can now specify a
skll_objectivefor tuning the SKLL learners used in their experiments. - Users can now specify a
flag_column_testconfiguration option to use different flags for the test file and the training file. - Users can now specify a
standardize_featuresboolean option if they do not want the feature values standardized, which is the default.
New evaluations
rsmtoolandrsmevalnow compute disattenuated correlations if the data includes two human scores.
Code changes
- New helper classes have been added to
rsmtool, which allow easy reading, writing, and manipulation of multiplepandasdata frames.container.DataContainer(): A class to encapsulate multiple data frames.reader.DataReader(): A class to read multiple tabular files into aDataContainer()object.writer.DataWriter(): A class to write all data frames contained in aDataContainer()object to separate files, with a specified file extension.- The
rsmtoolmodule is now installable viapip, in addition to being installable withconda. preprocessor.trim()can now take both numpy arrays and lists as inputs.
Bugfixes
- Fixed warning in
rsmcomparewhen computing summary evaluations. - Previously confusion matrices forced human scores to integers, while score distributions used the value "as is". Now both analyses use rounded human scores.
- Length columns are now forced to numeric, if they are non-numeric.
Documentation
- Added documentation for refactored API.
- Added detailed documentation about how to write RSMTool tests.
Scientific Software - Peer-reviewed
- Python
Published by desilinguist over 8 years ago
RSMTool - Version 5.7
What's new?
- Update Python to v3.6, pandas to v0.22.0 and SKLL to v1.5. This required minor changes to the code and updates to some of the test files.
- The conda installation command has changed. See the new command here.
Improvements
- The
evaluation_by_groupnotebook in addition to bar plots now includes a table showing the main metrics for each subgroup. - When using the RSMTool API, it is now possible to specify a
tolerancekeyword argument fortrimmethod. Read more here.
Bugfixes
- The differential feature functioning (DFF) plots are now correctly generated using preprocessed feature values. In the previous version, they incorrectly used raw feature values.
- In v0.19.0 of scikit-learn, the implementation of
explained_variance_in their PCA implementation underwent some bugfixes. Due to this, the results of PCA analyses no longer match those produced by the previous versions of RSMTool and had to be changed.
Other minor changes
- Updated the utility script
update_skll_model.pyto make it compatible with SKLL v.1.5. - Minor updates for the documentation.
Scientific Software - Peer-reviewed
- Python
Published by desilinguist over 8 years ago
RSMTool - Version 5.6
This is an important release that has a critical bugfix as well as useful improvements.
Bugfixes
- Fixed critical bug in computation of standardized mean differences. The denominator for SMDs should be using population standard deviations, not the ones computed over the subgroups themselves.
- Added converters to the notebook header to allow correct treatment of candidate IDs with leading zeros.
- Modified the test utility functions to catch discrepancies caused by missing leading zero.
Improvements
- The tables generated by
rsmsummarizeare now saved in the same way as for other tools. rsmsummarizenow shows a table with standardized coefficients for all models.- The predictions for the post-processed training set are now also saved.
- Added a new notebook that shows differential feature functioning (DFF) plots by subgroup. To use it, add
dff_by_groupto thegeneral_sectionconfiguration option. Read more here. - The features that have not been used in the model are now excluded from the datasets before they are sent to SKLL for prediction. This makes the prediction step much faster for large datasets.
- When testing whether the feature std. dev. in the training set is zero, we currently set tolerance to 1e-06. This is not sufficient with features with very low values (these can result from an inverse transform of acoustic likelihoods which are logs of very small values). This tolerance is now increased to 1e-07.
Other Minor Changes
- Update the utility script
update_skll_model.pyto allow it to be used with other tools. - Update tests and documentation.
Scientific Software - Peer-reviewed
- Python
Published by desilinguist almost 9 years ago
RSMTool - Version 5.5.2
This is primarily a bug fix release but it also has some improvements.
Bugfixes
- The notebooks are fixed so that any plots are now shown in their assigned places (this was broken in v5.5.1 due to the underlying
matplotlibdependency being upgraded to v2.0).
Improvements
- The widths of the subgroup plots is now more intelligently determined. No more plots with really wide bars when there are only a few groups.
- Many of the unnecessary warnings that popped up in the reports and on the terminal are now suppressed and handledΒ in code where appropriate.
Scientific Software - Peer-reviewed
- Python
Published by desilinguist over 9 years ago
RSMTool - Version 5.5.1
This is a minor bugfix release.
What's new?
- Update SKLL requirement to v1.3. This allows us to streamline the RSMTool conda recipe into a single recipe (using the MKL backend instead of OpenBLAS on macOS/Linux)
- Update all other conda packages to their latest versions.
- Minor fixes and updates to tests.
Scientific Software - Peer-reviewed
- Python
Published by desilinguist over 9 years ago
RSMTool - Version 5.5.0
This is a major release.
What's new?
- New tool:
rsmsummarizewhich can summarize any number ofrsmtoolexperiments and produce a summary report. - All input files can now be in any tabular format (CSV/TSV/XLS/XLSX). This is an improvement over previous releases where input files were required to be CSV files. For more details, see the documentation. This includes the feature description file although the old JSON format is still supported for backwards compatibility (you will get a
DeprecationWarningwhen using that format). rsmtoolnow includes a new modelScoreWeightedLRwhich estimates feature coefficients using weighted least squares regression. The weights are computed as an inverse proportion of total number of responses with a given score level.rsmtoolnow produces thefeaturesub-directory as part of its output for all experiments. Previously, this sub-directory was only produced for experiments with some form of feature selection.rsmcomparenow requires the user to specify a "comparison ID" instead of generating one automatically from the experiment IDs of the two experiments being compared.
Improvements
- Improved CSS for HTML report printing.
- Several updates and fixes to documentation.
- Fix errors in PCA computation when the number of components was smaller than the total number of features.
- Use
skllAPI to convert featureset to data frame instead of writing our own function. - Separate the file reading and processing functions in
rsmpredictfor more modularity. - Wrap longer labels on box plots automatically.
- Update package dependencies to latest releases.
- Increase report generation timeout to be 60 minutes instead of 10 minutes. This is useful for experiments with very large data files.
- Fix bug that had system and human scores reversed in the confusion matrix.
- Limit the length of experiment IDs where appropriate such that we don't encounter "filename is too long" OS errors.
Scientific Software - Peer-reviewed
- Python
Published by desilinguist over 9 years ago
RSMTool - Version 5.2.1
This is a minor release that fixes a bug in how some javascript was loaded in the Jupyter notebooks.
Scientific Software - Peer-reviewed
- Python
Published by desilinguist over 9 years ago
RSMTool - Version 5.2.0
This release has minor features and bug fixes.
1. rsmcompare now includes extra checks to make sure the experiment paths and ids specified by the user actually exist.
2. Factored out rsmcompare code from the header notebook and moved to comparison.py.
3. Factored out the float formatting functions from the rsmtool/rsmcompare header notebooks and moved them to utils.py.
4. Added new tests for comparison.py and the float formatting and highlighting functions in utils.py.
5. Fixed the bug in rsmcompare which seemed to ignore zero scores in confusion matrices.
6. Fixed a bug in rsmcompare that prevented the score distribution table from being displayed correctly if the score levels differed between the two models.
Scientific Software - Peer-reviewed
- Python
Published by desilinguist almost 10 years ago
RSMTool - Version 5.1.1
This is a minor bugfix release.
1. Previously, if rsmpredict was given a model requiring a transformation that could yield Inf/NaN values for new data (e.g. sqrt(-1)), it would raise an error and terminate. Now, it simply excludes such responses and displays a warning.
2. Updated various conda files to use newer versions of the ipython and notebook packages since there seem to have been some updates that broke older recipes and requirements files.
Scientific Software - Peer-reviewed
- Python
Published by desilinguist almost 10 years ago
RSMTool - Version 5.1.0
This is a major release.
1. Completely overhauled the documentation. Instead of relying on a collection of loosely organized markdown files, the documentation is much more cohesive and hosted on readthedocs. It now includes a clear introduction to what RSMTool is as well as tutorials.
2. The RSMTool API is now richer and explicitly documented.
3. rsmcompare can now compare two rsmeval experiments as well as an rsmtool experiment to an rsmeval experiment.
4. Code coverage is now automatically computed as part of CI testing.
5. Expected warnings are now suppressed when running the tests.
6. Fixed several stylistic issues in the codebase raised by pep8 and pyflakes.
Scientific Software - Peer-reviewed
- Python
Published by desilinguist almost 10 years ago
RSMTool - Version 5.0.2
- Added files necessary for submission to the Journal of Open Source Software.
Scientific Software - Peer-reviewed
- Python
Published by desilinguist almost 10 years ago
RSMTool - Version 5.0.1
This is a hotfix release that fixes the following regression:
- rsmcompare now does not accidentally swap the old and the new experiments.
Scientific Software - Peer-reviewed
- Python
Published by desilinguist almost 10 years ago
RSMTool - Version 5.0.0
New features
- Evaluations on the test set now include R2 and RMSE.
- The
rsmtoolreports now include model fit parameters (R2 and adjusted R2) for the training set. - It is now possible to exclude candidates with less than X responses from model training/evaluation.
rsmcomparecan now handle experiments which used SKLL models.rsmcomparenow includes a notebook for consistency between human raters (thanks @bndgyawali!)
Bug fixes
- Correct handling of repeated feature names in the feature .json file.
- Correct printing of feature coefficients for SKLL models.
- Correct handling of quoted boolean values in config .json file.
- Fixed rounding and highlighting in feature correlation table.
- And several dozen more.
Scientific Software - Peer-reviewed
- Python
Published by desilinguist almost 10 years ago
RSMTool - Version 4.6.0
This is the first GitHub release for RSMTool. Before being open-sourced, RSMTool was an internal research project at the Educational Testing Service.
Scientific Software - Peer-reviewed
- Python
Published by desilinguist about 10 years ago