aises-virtue-ethics

Teach virtue ethics to AI, an alternative approach to ConstitutionalAI. Final project for the AISES course, fall 2025

https://github.com/carlomartinucci/aises-virtue-ethics

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.6%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

Teach virtue ethics to AI, an alternative approach to ConstitutionalAI. Final project for the AISES course, fall 2025

Basic Info

Host: GitHub
Owner: carlomartinucci
Language: Python
Default Branch: main
Size: 2.98 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 1

Created about 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme Citation

aises-virtue-ethics

Teach virtue ethics to LLMs (and compare their moral decisions).

Final project for AISES course, spring 2025. Read document

TL;DR:

This repo is a training and evaluation framework to further alignment in a different way from the traditional RLHF or Constitutional AI approach. The idea is that LLMs already have a notion of morality, so we can try to improve a model's behaviour eliciting its ethical vision and teaching it to adhere to it, in a “virtue ethics” fashion.

The training part consists in fine-tuning a model with conversations generated using the model itself like this: - SYSTEM: you are a helpful assistant - USER: given the following scenario, what would you do? [SCENARIO] - ASSISTANT: [response] - USER: evaluate the response. Is it ethical? - ASSISTANT: [response]

The scenarios are 70 case studies taken from https://ethicsunwrapped.utexas.edu/case-studies.

The evaluation is done with two methods:

Against the https://huggingface.co/datasets/hendrycks/ethics benchmark, for commonsense, deontology, justice and utilitarianism subsets.
Asking a more powerful model to rank from 1 to 5 the “what would you do” responses to the scenarios.

First results are inconclusive on both evaluations: the differences between the two models are not statistically significant. This leads me to think that the fine-tuning didn’t update the model enough.

The immediate next steps are aimed at trying to produce a statistically different model, by employing different fine-tuning methods and adding more ethical scenarios.

Why

Traditional Reinforcement Learning with Human Feedback relies on labelling from humans to steer the model's behaviour, which means that alongside "correct" human preferences, the model will also learn to mimic the evaluator's biases. Constitutional AI uses a set of predetermined rules, and first asks the model to revise its own responses according to the rules, then it still uses Reinforcement Learning, but instead of using human feedback, it uses AI-generated feedback based on the said rules. In both cases the approach is top-down: we try to steer them towards the behaviours that we prefer, either explicitly or with rules.

The idea of a virtue ethics approach is that sufficiently large models already have a pretty comprehensive understanding of morality as an abstract topic, but they're not actually applying it to themselves, so we could try a bottom-up approach where we allow them to reflect on their behaviour and adjust it according to their own understanding of morality.

This is in some sense similar to how we raise children: we tell them what to do and what to avoid (RLHF), and we give them rules to follow (Constitutional AI), but we also teach them to think and reflect and act in a virtuous way according to their own well formed conscience. Now, LLMs are not children and fully trusting an LLM to develop its own ethical behaviour would be ill-advised, to say the least. Still, it seems a promising direction to explore.

How to use this repo

See the document for a walkthrough of the training and test processes.

Browsing the repository, each script has its documentation and instructions to run it. They are:

what_would_you_do.py asks a model its answer to the scenarios
is_the_answer_ethical.py asks a model an evaluation of the answers
create_sft_jsonl.py uses the scenarios, the answers and the evaluations to create a supervised fine-tuning jsonl file, to be used to fine-tune the model
eval_ethics_openai.py runs the ETHICS benchmark against a model
rate_answers.py asks a model to give a 1-5 rating to the answers of a scenario

Inside the rate_answers and eval-ethics folders there are two scripts that generates some graph bars based on the results produced by the other scripts.

References

See CITATIONS.md for dataset, paper and case study references.

Owner

Name: Carlo Martinucci
Login: carlomartinucci
Kind: user
Location: Padova
Company: @BendingSpoons

Repositories: 41
Profile: https://github.com/carlomartinucci

Think more.

Citation (CITATIONS.md)

# Citations

## ETHICS Dataset

This project uses the [ETHICS dataset](https://huggingface.co/datasets/hendrycks/ethics), created by Dan Hendrycks and collaborators:

```bibtex
@article{hendrycks2021ethics,
  title={Aligning AI With Shared Human Values},
  author={Dan Hendrycks and Collin Burns and Steven Basart and Andrew Critch and Jerry Li and Dawn Song and Jacob Steinhardt},
  journal={Proceedings of the International Conference on Learning Representations (ICLR)},
  year={2021}
}
```

## Ethics Unwrapped

The `scenario/ethicsunwrapped` folder includes case studies from [Ethics Unwrapped](https://ethicsunwrapped.utexas.edu/case-study).

## Murdough Center for Engineering Professionalism

The `scenario/murdoughcenter` folder includes case studies from the [Murdough Center for Engineering Professionalism](https://www.depts.ttu.edu/murdoughcenter/products/cases.php)

## Markkula Center for Applied Ethics

The `scenario/markkula` folder includes case studies from the [Markkula Center for Applied Ethics](https://www.scu.edu/ethics/ethics-resources/ethics-cases/)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

aises-virtue-ethics

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

aises-virtue-ethics

TL;DR:

Why

How to use this repo

References

Owner

Citation (CITATIONS.md)

GitHub Events

Total

Last Year