Recent Releases of RSMTool

RSMTool - RSMTool 12.0.0

What's Changed

  • Python 3.8 and 3.9 are no longer supported since the SKLL dependency was updated to v5.0.1.
  • Remove rsmextra and special sections by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/676
  • Add type hints (part 1) by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/677
  • Add type hints (Part 2) by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/679
  • Add type hints (Part 3) by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/681
  • Add Type Hints (Part 4) by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/682
  • Add Type Hints (Final Part) by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/683
  • Fix pandas-related warnings by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/684
  • Separate runtime and dev dependencies by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/685

Full Changelog: https://github.com/EducationalTestingService/rsmtool/compare/v11.3.0...v12.0.0

Scientific Software - Peer-reviewed - Python
Published by desilinguist about 2 years ago

RSMTool - v11.3.0

πŸ’‘ New features πŸ’‘

  • Add section ordering for rsmexplain by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/667
  • Update intermediate files notebook section for readability by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/670

πŸ› οΈ Bugfixes & Improvements πŸ› οΈ

  • Update SHAP to the latest version 0.44.0 by @damien2012eng in https://github.com/EducationalTestingService/rsmtool/pull/664
  • Unpin dependencies and fix minor issues by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/666
  • Fix new warnings & remove manual suppressions by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/668
  • Refactor CLI tests to modernize codecov by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/669
  • Pin numpy to < 2 by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/673

πŸ™πŸ½ Contributions & Code Reviews πŸ™πŸ½

@damien2012eng @desilinguist @Frost45 @mulhod @tamarl08

Full Changelog: https://github.com/EducationalTestingService/rsmtool/compare/v11.2.0...v11.3.0

Scientific Software - Peer-reviewed - Python
Published by desilinguist over 2 years ago

RSMTool - RSMTool 11.2.0

πŸ’‘ New features πŸ’‘

  • Add sections for W&B logging to make information easier to find by @tamarl08 in https://github.com/EducationalTestingService/rsmtool/pull/659
  • Add configuration option to disable truncation of outliers by @damien2012eng in https://github.com/EducationalTestingService/rsmtool/pull/661

πŸ› οΈ Bugfixes & Improvements πŸ› οΈ

  • Fix bugs when processing cross-validation folds in rsmxval by @tamarl08 in https://github.com/EducationalTestingService/rsmtool/pull/662

πŸ™πŸ½ Contributions & Code Reviews πŸ™πŸ½

@damien2012eng @desilinguist @mulhod @tamarl08

Full Changelog: https://github.com/EducationalTestingService/rsmtool/compare/v11.1.1...v11.2.0

Scientific Software - Peer-reviewed - Python
Published by tamarl08 over 2 years ago

RSMTool - RSMTool 11.1.1

:bulb: New features :bulb:

  • Add a new human-human confusion matrix for double-scored data by @mulhod in https://github.com/EducationalTestingService/rsmtool/pull/649
  • Allow prallelization of grid search when using SKLL models in rsmtool by @tamarl08 in https://github.com/EducationalTestingService/rsmtool/pull/650 ## :hammerandwrench: Bugfixes & Improvements :hammerandwrench:
  • Update pre-commit checks by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/647
  • Enhance wandb logging of evaluation metrics by @tamarl08 in https://github.com/EducationalTestingService/rsmtool/pull/651
  • Fix warnings in reports by @tamarl08 in https://github.com/EducationalTestingService/rsmtool/pull/654

πŸ™πŸ½ Contributions & Code Reviews πŸ™πŸ½

@damien2012eng @desilinguist @mulhod @tamarl08 @tazin-afrin

Full Changelog: v11.0.1...v11.1.1

Scientific Software - Peer-reviewed - Python
Published by tamarl08 over 2 years ago

RSMTool - RSMTool 11.0.1

:bulb: New features :bulb:

  • New rsmexplain plots by @damien2012eng in https://github.com/EducationalTestingService/rsmtool/pull/603
  • Full W&B integration to allow logging of output artifacts and report by @tamarl08 in https://github.com/EducationalTestingService/rsmtool/pull/617, https://github.com/EducationalTestingService/rsmtool/pull/620, https://github.com/EducationalTestingService/rsmtool/pull/621, https://github.com/EducationalTestingService/rsmtool/pull/623, https://github.com/EducationalTestingService/rsmtool/pull/627
  • Add FAQ page to documentation by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/622
  • Add support for Python 3.11 by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/628
  • Add support for output files when auto-generating configurations by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/640
  • Enhancements to fast_predict by @mulhod in https://github.com/EducationalTestingService/rsmtool/pull/632
  • NOTE: The .model files produced by rsmtool are no longer SKLL model files. They are serialized rsmtool.Modeler objects. This change should be transparent to the users if the only places they use the .model files are with rsmpredict and rsmexplain. However, if those files are used outside of RSMTool and expected to contain SKLL learners, then the following change is needed: users would now need to use the Modeler.load_from_file() method to load the .model file produced by rsmtool and then access the SKLL learner via the .learner attribute.

:hammerandwrench: Bugfixes & Improvements :hammerandwrench:

  • Migrate nose to nose2 by @damien2012eng in https://github.com/EducationalTestingService/rsmtool/pull/610
  • Upgrade shap by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/612
  • Use example IDs when specifying sample_ids by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/613
  • Expect scale_with value of 'raw' in rsmeval by @tamarl08 in https://github.com/EducationalTestingService/rsmtool/pull/614
  • Fix update_files for nose2. by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/616
  • Fix bug in wording of what will be highlighted for disattenuated correlation by @mulhod in https://github.com/EducationalTestingService/rsmtool/pull/594
  • Pin skll version in doc requirements by @tamarl08 in https://github.com/EducationalTestingService/rsmtool/pull/619
  • Remove unnecessary warnings in HTML reports. by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/624
  • Include system information in RSMExplain reports by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/633
  • Suppress alt text warnings when generating reports by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/634
  • Fix W&B tests and add to CI builds. by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/637
  • Switch to ruff for pre-commit checks by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/639
  • Fix test dir usage in test_wandb by @tamarl08 in https://github.com/EducationalTestingService/rsmtool/pull/642

πŸ™πŸ½ Contributions & Code Reviews πŸ™πŸ½

  • @tamarl08
  • @mulhod
  • @damien2012eng
  • @dblandan
  • @Frost45
  • @blongwill

Full Changelog: https://github.com/EducationalTestingService/rsmtool/compare/v10.0.0...v11.0.1

Scientific Software - Peer-reviewed - Python
Published by tamarl08 almost 3 years ago

RSMTool - RSMTool 10.0.0

This is a major new release! It includes new functionality as well as updated dependencies.️

:bulb: New features :bulb:

Dependencies

  • Shap is now a required dependency. It is currently pinned to 0.41.0 but we plan to keep RSMTool updated with the latest SHAP versions as they are released.
  • Numpy has been pinned to <= 1.23.5 since SHAP 0.41.0 does not work with numpy 1.24.x.

RSMExplain

  • Added new command-line utility rsmexplain to generate an explanation report for an existing rsmtool experiment. Underlyingly, rsmexplain leverages SHapley Additive exPlanations produced by shap.
  • Added comprehensive documentation on how to run rsmexplain.
  • Added support for automated and interactive configuration generation for rsmexplain.
  • Add comprehensive functional tests for rsmexplain.

More reliable notebook merging

  • Updated rsmtool.reporter.merge_notebooks() to use nbconvert and nbformat APIs instead of the JSON-based hack that was being used before.

:hammerandwrench: Bugfixes & Improvements :hammerandwrench:

  • Use legend_handles instead of the deprecated legendHandles attribute for matplotlib to avoid deprecation warnings in notebooks.
  • Minor documentations fixes in various places.

Contributions from: @damien2012eng, @desilinguist, @tamarl08, @dblandan, and @mulhod!

Scientific Software - Peer-reviewed - Python
Published by damien2012eng almost 3 years ago

RSMTool - RSMTool 9.1.1

What's Changed

  • Remove np.warnings from fairness_utils.py. by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/580
  • Add new fast_predict() API method for prediction by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/581
  • Convert all formatted strings to f-strings and add pre-commit with flynt by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/584
  • Integrate black and run on all files by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/586
  • Add isort, pydocstyle, flake8 as pre-commit checks by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/587
  • Restore and increase test coverage by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/589
  • Update contributing docs & remove extraneous whitespace by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/590
  • Update SKLL to v3.2.0 by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/591
  • Release v9.1.1 by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/592

Full Changelog: https://github.com/EducationalTestingService/rsmtool/compare/v9.0.1...v9.1.1

Scientific Software - Peer-reviewed - Python
Published by desilinguist over 3 years ago

RSMTool - v9.0.1

What's Changed

This is a minor bugfix release.

  • Delete the stable branch by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/573
  • Disallow negative confidence intervals in fairness plots since they cause new versions of pandas to break by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/574
  • Add workaround for broken SVGs in nbconvert by overriding clean_html by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/575
  • Fix bug for integer IDs when using rsmxval by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/577
  • Update SKLL dependency to v3.1.0 by @desilinguist in https://github.com/EducationalTestingService/rsmtool/pull/578

Full Changelog: https://github.com/EducationalTestingService/rsmtool/compare/v9.0.0...v9.0.1

Scientific Software - Peer-reviewed - Python
Published by desilinguist over 3 years ago

RSMTool - RSMTool 9.0

This is a major new release. It includes new functionality and breaking changes to the API as well as to dependencies.

⚑️ RSMTool 9.0 is incompatible with previous versions ⚑️

πŸ’‘ New features πŸ’‘

Dependencies

  • RSMTool is now compatible with SKLL v3.0 and, therefore, scikit-learn v1.0.2.

  • RSMTool now supports Python 3.10, in addition to 3.8 and 3.9. Python 3.7 is no longer supported.

  • tqdm is now a required dependency.

Native cross-validation support

  • Add native support for cross-validation experiments to RSMTool. Using a single train-test split may lead to biased estimates of performance since those estimates will depend on the specific characteristics of that split. However, using cross-validation instead can provide more accurate estimates of scoring model performance since those estimates are averaged over multiple train-test splits that are randomly selected based on the data.

  • Add new command-line utility rsmxval to run cross-validation experiments. Underlyingly, it leverages the RSMTool API functions run_experiment(), run_evaluation(), and run_summary() to generate multiple useful reports for the users.

  • Add support for automated configuration generation to rsmxval in both batch and interactive mode.

  • Add comprehensive documentation on how to run cross-validation experiments.

  • Add comprehensive functional tests for cross-validation.

API Changes

  • Add two new logging functions in rsmtool.utils.logging. These are only meant to be used by RSMTool developers, not users.

  • Factor out the code that was used to write a dataframe to disk into a separate utility method DataWriter.write_frame_to_disk() so that it an also be used by rsmxval. This can prove useful to advanced RSMTool users as well.

  • Add new cross-validation specific utility functions to rsmtool.utils.cross_validation.

  • Convert several class or static methods in various classes to instance methods in order to allow for passing and using an optional logger instance.

  • Tweak the check_scaled_coefficients() test utility function to take the output directory as an argument instead of taking an experiment name to allow its usage for rsmxval functional tests.

πŸ›  Bugfixes & Improvements πŸ› 

  • Fix the behavior of the use_thumbnails option in RSMTool configuration files. It was generating both the thumbnail as well as the full-sized figure due to the behavior of Matplotlib’s savefig(). The solution was to turn off interactive plotting in all header notebooks.

  • Replace deprecated methods and keywords in RSMTool code as recommended by the latest versions of pandas, numpy, and scikit-learn.

  • Fix several duplicate target warnings when compiling the documentation. Make sure included RST files have an extension of .rst.inc so that they are not compiled twice. Turn all web links into anonymous references so that there are no conflicts with the same target names.

  • Make feature boxplots for subgroups in reports more flexible in terms of the number of features. Specifically, if the experiment has more than 150 features, no boxplots are shown. Previously this limit was 30. In addition, the message that the boxplots have been omitted is displayed more prominently when it happens. Finally, if the number of features is > 30 but <=150, a new message asking the user to enable thumbnails is shown.

  • Update Gitlab CI plan to use Python 3.8 and Azure Pipelines to use Python 3.10. Add new cross-validation tests to both CI plans.

Scientific Software - Peer-reviewed - Python
Published by desilinguist about 4 years ago

RSMTool - RSMTool 8.1.2

This is a bugfix release.

  • Update the code for compatibility with pandas 1.3.0 and scikit-learn 0.24.2.

Scientific Software - Peer-reviewed - Python
Published by aloukina almost 5 years ago

RSMTool - RSMTool 8.1.1

This is a bugfix release with some minor improvements.

  • Continuous integration build for RSMTool migrated from Travis CI to Gitlab CI.

  • Minor bug fixed in parse_json_with_comments to handle URLs correctly.

  • Minor updates to warnings and documentation.

Scientific Software - Peer-reviewed - Python
Published by Frost45 almost 5 years ago

RSMTool - RSMTool 8.1.0

This is a minor but backwards-incompatible release which includes changes necessary to make RSMTool compatible with SKLL v2.5.

What's new

  • RSMTool is now compatible with SKLL 2.5!

πŸ’₯ Breaking Changes πŸ’₯

  • Python 3.6 is no longer officially supported since the latest versions of pandas and numpy have dropped support for it. RSMTool officially supports Python 3.7, 3.8, and 3.9.

  • RSMTool no longer supports .xls files. For users who use Excel to prepare their data, we continue supporting xlsx files.

  • Models trained with earlier versions of RSMTool can no longer be used to generate predictions. If you use rsmpredict or compute_and_save_predictions to generate predictions based on existing models, you will need to re-train the models.

Scientific Software - Peer-reviewed - Python
Published by aloukina about 5 years ago

RSMTool - RSMTool 8.0.2

This is a bugfix release with some minor improvements.

  • The version of nbconvert used by RSMTool is now pinned to <6.0 due to a change in v6.0 and above that broke RSMTool report generation. We will remove the pin in a future release when the upstream issue is fixed.

  • RSMTool reports no longer displays a pie chart for the model coefficients if any of the coefficients are negative.

  • Minor updates for compatibility with external packages.

  • Minor updates to warnings and documentation.

Scientific Software - Peer-reviewed - Python
Published by aloukina over 5 years ago

RSMTool - RSMTool 8.0.1

This is a bugfix release with some minor improvements.

  • Update the code for compatibility with pandas 1.1.0.

  • prmse_true no longer raises an error if there are no double-scored responses. Instead the function displays a warning and returns None.

  • Command line tools rsmtool, rsmeval, rsmpredict, rsmcompare and rsmsummarize no longer raise an error if a user does not provide any command line arguments. Instead the tools display the help message.

  • Minor updates to documentation.

  • Improvements to the testing and coverage measurement process.

Scientific Software - Peer-reviewed - Python
Published by aloukina almost 6 years ago

RSMTool - RSMTool 8.0

This is a major new release. It includes a lot of new functionality and multiple changes to the API.

⚑️ RSMTool 8.0 is backwards incompatible with previous versions ⚑️

πŸ’‘ New features πŸ’‘

Dependencies

  • RSMTool is now compatible with SKLL v2.1

  • All dependencies other than skll are now unpinned.

  • RSMTool now supports Python versions 3.6, 3.7 and 3.8.

Interactive generation of configuration files

  • Configuration files for rsmtool, rsmeval, rsmpredict, rsmcompare and rsmsummarize can now be generated automatically, either interactively or non-interactively. This exciting new functionality makes it easier to keep track of the many configuration options available in RSMTool and greatly simplifies the process of setting up the experiment. Watch the video demonstrating the new interactive generation or read the documentation.

Passing hyperparameters to SKLL models

  • It is now possible to pass custom hyperparameter values to skll learners used through RSMTool. This is done using a new configuration field skll_fixed_parameters. The parameters are also displayed in the report.

Generalized version of PRMSE

  • The formula for PRMSE has been updated to a more general version derived by Matthew S. Johnson that allows computation of PRMSE for any number of raters. For two raters, the formula returns the same result as the formula used in previous versions of the tool.

  • The API now provides a new function prmse_true() which accepts scikit-learn style parameters and returns the PRMSE value.

  • It is now possible to supply error variance of human raters necessary to compute PRMSE. This can be useful when the experiments require computing this parameter on data other than the evaluation set. This can be done via the rater_error_variance field in the configuration file or by passing the variance as a parameter to prmse_true().

Changes to RSMTool reports

  • The report now always displays the headers for the "Consistency" and "True score evaluations" sections. If no second score is available, the report will indicate this. If you do not want these section headers to appear in your report, use the general_section field to exclude these sections. TIP: If you use automatic configuration generation, you configuration file will contain the full list of available sections that you can edit to exclude unnecessary sections.

πŸ’₯ Incompatible Changes πŸ’₯

File formats

  • rsmcompare and rsmsummarize no longer support experiments that were generated with earlier versions of RSMTool. You will need to re-run the experiments that you want to compare or summarize.

  • rsmtool no longer supports old-style configuration files (not used since v5.5 or earlier).

  • rsmtool no longer supports feature files in .json format (not used since v5.5 or earlier).

  • The Intermediate file containing true score evaluations true_score_eval no longer contains variance of human scores. This information can still be obtained from consistency files.

API Changes

  • The Configuration and ConfigurationParser objects in the configuration_parser module have been fully refactored. A new Configuration object can now be instantiated using a dictionary with keys using the same name as the fields in the configuration file . Validation and normalization is now done as part of initialization. See this PR for more detail.

  • Configuration objects no longer have a filepath attribute. Use the configdir attribute to indicate what any relative paths in the dictionary are relative to.

  • Functions in the erstwhile rsmtool.utils module have been moved to new locations. This includes several functions for computing evaluation metrics (agreement, difference_of_standardized_means, partial_correlations, quadratic_weighted_kappa, and standardized_mean_difference). See the API documentation for the new location of these functions.

  • The API for computing PRMSE has changed. See the API documentation for new functions.

πŸ›  Bugfixes & Improvements πŸ› 

  • v7.1.0 did not allow run_* functions to accept pathlib.Path objects for paths to configuration files. This is now allowed.

  • Error messages and warnings produced by RSMTool are now more meaningful and consistent.

  • Multiple changes to improve code readability and consistency.

Scientific Software - Peer-reviewed - Python
Published by aloukina about 6 years ago

RSMTool - RSMTool 7.1

This is a minor release which includes changes necessary to make RSMTool compatible with SKLL 2.0.

What's new

  • RSMTool is now compatible with SKLL 2.0.

  • The implementation of scipy.stats.pearsonr used in RSMTool to compute Pearson's correlation coefficient has changed. The new implementation is equivalent to the old one in the majority of cases but tends to produce slightly different values for very small N. See https://github.com/EducationalTestingService/rsmtool/issues/343 for further detail.

  • If you use the Dash app on macOS, you can now download the complete RSMTool documentation for offline use. Go to Dash preferences, click on "Downloads", then "User Contributed", and search for "RSMTool".

  • The conda package for RSMTool is now available from the official ETS conda channel.

API changes

  • The run_experiment, run_evaluation, run_comparison, run_summary, and compute_and_save_predictions functions now accept Python dictionaries as input.

  • The .filepath attribute of Configuration object will be deprecated in a future version and replaced with two new atttributes: configdir and filename. Use join(configdir, filename) if you need the full path to the configuration file.

Other

  • Minor changes to the documentation.
  • Many functions used for tests have been refactored for efficiency.

Scientific Software - Peer-reviewed - Python
Published by aloukina over 6 years ago

RSMTool - RSMTool 7.0

This is a major release which includes changes to several key evaluation metrics computed by RSMTool.

What's new

Changes to evaluation metrics

The exact definitions of all evaluation metrics and their method of computation are now available in * RSMTool documentation under evaluation metrics.

Changes to evaluation metrics

  • Quadratic weighted kappa (QWK) for raw, raw_trim, scale and scale_trim scores is now computed on continuous score values using formula suggested by Haberman (2019). In previous versions of RSMTool such continuous score values were rounded to compute QWK.

  • Subgroup differences are now evaluated using a new metrics "Difference in standardized means". This metrics was designed to be more robust to differences in scale between human and machine scores.

  • SMD for human-human agreement is now computed using pooled standard deviation of H1 and H2 for the double-scored sample in the denominator.

  • The default tolerance for score postprocessing is now set to 0.4998 (instead of 0.49998). This may result in small changes to the values of all evaluation metrics for raw_trim and scale_trim scores. See below for new configuration files if you need to define custom tolerance.

New evaluation metrics

New configuration settings

  • A new configuration setting experiment_names for RSMSummarize allows specifying custom names for each experiment. These will be used to refer to the experiments in intermediate output files and in the report.

  • A new configuration setting trim_tolerance allows specifying custom tolerance when trimming scores to ceiling and floor values in RSMTool and RSMEval.

  • A new configuration setting min_n_per_group allows defining a threshold so that only groups with more than a certain number of members are included into the report. All groups are still included into the intermediate output files.

Other new functionality

API changes

Bugfixes

  • partial_correlations() function has been updated to return a correctly formatted matrix in a situation where the covariance matrix is very close to zero.

  • The reports have been updated to correctly display plots for features with very long names.

Scientific Software - Peer-reviewed - Python
Published by aloukina over 6 years ago

RSMTool -

This is a major release which includes a number of improvements primarily aimed to increase the flexibility of RSMTool API.

What's New

New functionality

  • RSMTool now supports input files in SAS SAS7BDAT format.

  • New learner NNLRIterative. This is a new built-in linear regression model that learns empirical OLS regression weights with feature selection using an iterative implementation of non-negative least squares regression.

  • Custom truncation thresholds. The user can now remove outliers using pre-existing truncation thresholds specified in the features file by using the field usetruncationthresholds

  • Users can now run the .ipynb notebook generated from the experiment interactively, without having to set any environment variables. Each experiment now generates a (hidden) environment JSON file, which the notebook will automatically read.

API changes

  • There is now a separate function utils.standardized_mean_difference() that can be used to compute SMD.

  • A new function reader.try_to_load_file() allows API user to specify what they want to happen if a file cannot be loaded. The functions can be set to return None, to raise warning, or to raise error.

  • DataContainer class now includes additional helper methods. These methods allow users to drop() and rename() data frames in the DataContainer, and to select data frames using a specified prefix or suffix with the get_frames() method.

  • Configuration class now includes several additional helper methods pop() and copy().

  • utils.get_thumbnail_as_html() now accepts an optional argument path_to_thumbnail which allows using two different paths for thumbnails and full-size images.

Other

  • Support for seaborn 0.9.0 and statsmodels 0.9.0.

  • Support for numpy 1.14.0, scipy 1.1.0, and pandas 0.23.0+.

  • Support for ipython 6.5.0 and notebook 5.7.2.

  • The documentation incorrectly stated the order of operations in the processing pipeline: the change of feature sign (if applicable) happens after standardization.

  • If the user specifies a list of features and one of such features has zero variance, the tool now displays the correct error message.

  • The logging messages displayed by check_flag_column now indicate the partition if different flag columns were used for training and evaluating the model.

  • Miscellaneous minor bug fixes in the notebooks.

Scientific Software - Peer-reviewed - Python
Published by aloukina over 7 years ago

RSMTool - Version 6.0.1

This is a bugfix release.

  • The "System Information" section of the reports now uses pkg_resources instead of pip to get the list of installed packages since pip disallows the use of its internal API starting with v10.
  • Fix incorrect formatting in the documentation.
  • Update ipython and notebook package versions in order to address an incompatibility issue with the latest version of the tornado web server that affects interactive use of ipython notebook but not the report generation itself.
  • Updated the description of the marginal/partial correlation plot in the report.

Scientific Software - Peer-reviewed - Python
Published by desilinguist about 8 years ago

RSMTool - Version 6.0

What's new?

This is a major release. The entire code base has been fully refactored to use a much more object-oriented design. This should make it much easier to make improvements and to add extensions. As result, there have been significant changes to the RSMTool API (see link in documentation below for more details).

New features

New learners

  • New regressors from the latest SKLL release (v1.5.1) have been added to rsmtool.
  • rsmtool can now be used with both regressors and classifiers from SKLL, including classifiers that produce probabilistic output which can be used to produce expected values as predictions.

    See the SKLL documentation for the full list of learners.

Enhanced outputs

  • Users can now specify the file_format configuration option to save intermediate files in either tsv, csv, or xlsx format.
  • Users can specify a use_thumbnails configuration option that will embed clickable thumbnails in the HTML report, rather than full-sized images. Upon clicking the thumbnails, full-sized images will be displayed in a new window. This is particularly useful for larger reports with many images, improving both the readability and the loading speed of such reports.
  • Reports for rsmtool, rsmeval, and rsmsummarize now contain a new section containing links to intermediate files (intermediate_file_paths.ipynb) so that users can now easily inspect these files from the report itself.

New configuration options

  • Users can now specify features in the configuration file as a list. When providing a list of features, signs or transformations cannot be specified. This makes creating configuration files for simple experiments much easier and faster.
  • Users can now specify a skll_objective for tuning the SKLL learners used in their experiments.
  • Users can now specify a flag_column_test configuration option to use different flags for the test file and the training file.
  • Users can now specify a standardize_features boolean option if they do not want the feature values standardized, which is the default.

New evaluations

  • rsmtool and rsmeval now compute disattenuated correlations if the data includes two human scores.

Code changes

  • New helper classes have been added to rsmtool, which allow easy reading, writing, and manipulation of multiple pandas data frames.
    • container.DataContainer(): A class to encapsulate multiple data frames.
    • reader.DataReader(): A class to read multiple tabular files into a DataContainer() object.
    • writer.DataWriter(): A class to write all data frames contained in aDataContainer() object to separate files, with a specified file extension.
    • The rsmtool module is now installable via pip, in addition to being installable with conda.
    • preprocessor.trim() can now take both numpy arrays and lists as inputs.

Bugfixes

  • Fixed warning in rsmcompare when computing summary evaluations.
  • Previously confusion matrices forced human scores to integers, while score distributions used the value "as is". Now both analyses use rounded human scores.
  • Length columns are now forced to numeric, if they are non-numeric.

Documentation

Scientific Software - Peer-reviewed - Python
Published by desilinguist over 8 years ago

RSMTool - Version 5.7

What's new?

  • Update Python to v3.6, pandas to v0.22.0 and SKLL to v1.5. This required minor changes to the code and updates to some of the test files.
  • The conda installation command has changed. See the new command here.

Improvements

  • The evaluation_by_group notebook in addition to bar plots now includes a table showing the main metrics for each subgroup.
  • When using the RSMTool API, it is now possible to specify a tolerance keyword argument for trim method. Read more here.

Bugfixes

  • The differential feature functioning (DFF) plots are now correctly generated using preprocessed feature values. In the previous version, they incorrectly used raw feature values.
  • In v0.19.0 of scikit-learn, the implementation of explained_variance_ in their PCA implementation underwent some bugfixes. Due to this, the results of PCA analyses no longer match those produced by the previous versions of RSMTool and had to be changed.

Other minor changes

  • Updated the utility script update_skll_model.py to make it compatible with SKLL v.1.5.
  • Minor updates for the documentation.

Scientific Software - Peer-reviewed - Python
Published by desilinguist over 8 years ago

RSMTool - Version 5.6

This is an important release that has a critical bugfix as well as useful improvements.

Bugfixes

  • Fixed critical bug in computation of standardized mean differences. The denominator for SMDs should be using population standard deviations, not the ones computed over the subgroups themselves.
  • Added converters to the notebook header to allow correct treatment of candidate IDs with leading zeros.
  • Modified the test utility functions to catch discrepancies caused by missing leading zero.

Improvements

  • The tables generated by rsmsummarize are now saved in the same way as for other tools.
  • rsmsummarize now shows a table with standardized coefficients for all models.
  • The predictions for the post-processed training set are now also saved.
  • Added a new notebook that shows differential feature functioning (DFF) plots by subgroup. To use it, add dff_by_group to the general_sectionconfiguration option. Read more here.
  • The features that have not been used in the model are now excluded from the datasets before they are sent to SKLL for prediction. This makes the prediction step much faster for large datasets.
  • When testing whether the feature std. dev. in the training set is zero, we currently set tolerance to 1e-06. This is not sufficient with features with very low values (these can result from an inverse transform of acoustic likelihoods which are logs of very small values). This tolerance is now increased to 1e-07.

Other Minor Changes

  • Update the utility script update_skll_model.py to allow it to be used with other tools.
  • Update tests and documentation.

Scientific Software - Peer-reviewed - Python
Published by desilinguist almost 9 years ago

RSMTool - Version 5.5.2

This is primarily a bug fix release but it also has some improvements.

Bugfixes

  • The notebooks are fixed so that any plots are now shown in their assigned places (this was broken in v5.5.1 due to the underlying matplotlib dependency being upgraded to v2.0).

Improvements

  • The widths of the subgroup plots is now more intelligently determined. No more plots with really wide bars when there are only a few groups.
  • Many of the unnecessary warnings that popped up in the reports and on the terminal are now suppressed and handledΒ in code where appropriate.

Scientific Software - Peer-reviewed - Python
Published by desilinguist over 9 years ago

RSMTool - Version 5.5.1

This is a minor bugfix release.

What's new?

  • Update SKLL requirement to v1.3. This allows us to streamline the RSMTool conda recipe into a single recipe (using the MKL backend instead of OpenBLAS on macOS/Linux)
  • Update all other conda packages to their latest versions.
  • Minor fixes and updates to tests.

Scientific Software - Peer-reviewed - Python
Published by desilinguist over 9 years ago

RSMTool - Version 5.5.0

This is a major release.

What's new?

  • New tool: rsmsummarize which can summarize any number of rsmtool experiments and produce a summary report.
  • All input files can now be in any tabular format (CSV/TSV/XLS/XLSX). This is an improvement over previous releases where input files were required to be CSV files. For more details, see the documentation. This includes the feature description file although the old JSON format is still supported for backwards compatibility (you will get a DeprecationWarning when using that format).
  • rsmtool now includes a new model ScoreWeightedLR which estimates feature coefficients using weighted least squares regression. The weights are computed as an inverse proportion of total number of responses with a given score level.
  • rsmtool now produces the feature sub-directory as part of its output for all experiments. Previously, this sub-directory was only produced for experiments with some form of feature selection.
  • rsmcompare now requires the user to specify a "comparison ID" instead of generating one automatically from the experiment IDs of the two experiments being compared.

Improvements

  • Improved CSS for HTML report printing.
  • Several updates and fixes to documentation.
  • Fix errors in PCA computation when the number of components was smaller than the total number of features.
  • Use skll API to convert featureset to data frame instead of writing our own function.
  • Separate the file reading and processing functions in rsmpredict for more modularity.
  • Wrap longer labels on box plots automatically.
  • Update package dependencies to latest releases.
  • Increase report generation timeout to be 60 minutes instead of 10 minutes. This is useful for experiments with very large data files.
  • Fix bug that had system and human scores reversed in the confusion matrix.
  • Limit the length of experiment IDs where appropriate such that we don't encounter "filename is too long" OS errors.

Scientific Software - Peer-reviewed - Python
Published by desilinguist over 9 years ago

RSMTool - Version 5.2.1

This is a minor release that fixes a bug in how some javascript was loaded in the Jupyter notebooks.

Scientific Software - Peer-reviewed - Python
Published by desilinguist over 9 years ago

RSMTool - Version 5.2.0

This release has minor features and bug fixes. 1. rsmcompare now includes extra checks to make sure the experiment paths and ids specified by the user actually exist. 2. Factored out rsmcompare code from the header notebook and moved to comparison.py. 3. Factored out the float formatting functions from the rsmtool/rsmcompare header notebooks and moved them to utils.py. 4. Added new tests for comparison.py and the float formatting and highlighting functions in utils.py. 5. Fixed the bug in rsmcompare which seemed to ignore zero scores in confusion matrices. 6. Fixed a bug in rsmcompare that prevented the score distribution table from being displayed correctly if the score levels differed between the two models.

Scientific Software - Peer-reviewed - Python
Published by desilinguist almost 10 years ago

RSMTool - Version 5.1.1

This is a minor bugfix release. 1. Previously, if rsmpredict was given a model requiring a transformation that could yield Inf/NaN values for new data (e.g. sqrt(-1)), it would raise an error and terminate. Now, it simply excludes such responses and displays a warning. 2. Updated various conda files to use newer versions of the ipython and notebook packages since there seem to have been some updates that broke older recipes and requirements files.

Scientific Software - Peer-reviewed - Python
Published by desilinguist almost 10 years ago

RSMTool - Version 5.1.0

This is a major release. 1. Completely overhauled the documentation. Instead of relying on a collection of loosely organized markdown files, the documentation is much more cohesive and hosted on readthedocs. It now includes a clear introduction to what RSMTool is as well as tutorials. 2. The RSMTool API is now richer and explicitly documented. 3. rsmcompare can now compare two rsmeval experiments as well as an rsmtool experiment to an rsmeval experiment. 4. Code coverage is now automatically computed as part of CI testing. 5. Expected warnings are now suppressed when running the tests. 6. Fixed several stylistic issues in the codebase raised by pep8 and pyflakes.

Scientific Software - Peer-reviewed - Python
Published by desilinguist almost 10 years ago

RSMTool - Version 5.0.2

Scientific Software - Peer-reviewed - Python
Published by desilinguist almost 10 years ago

RSMTool - Version 5.0.1

This is a hotfix release that fixes the following regression: - rsmcompare now does not accidentally swap the old and the new experiments.

Scientific Software - Peer-reviewed - Python
Published by desilinguist almost 10 years ago

RSMTool - Version 5.0.0

New features

  • Evaluations on the test set now include R2 and RMSE.
  • The rsmtool reports now include model fit parameters (R2 and adjusted R2) for the training set.
  • It is now possible to exclude candidates with less than X responses from model training/evaluation.
  • rsmcompare can now handle experiments which used SKLL models.
  • rsmcompare now includes a notebook for consistency between human raters (thanks @bndgyawali!)

Bug fixes

  • Correct handling of repeated feature names in the feature .json file.
  • Correct printing of feature coefficients for SKLL models.
  • Correct handling of quoted boolean values in config .json file.
  • Fixed rounding and highlighting in feature correlation table.
  • And several dozen more.

Scientific Software - Peer-reviewed - Python
Published by desilinguist almost 10 years ago

RSMTool - Version 4.6.0

This is the first GitHub release for RSMTool. Before being open-sourced, RSMTool was an internal research project at the Educational Testing Service.

Scientific Software - Peer-reviewed - Python
Published by desilinguist about 10 years ago