https://github.com/camfort/camfort-ai

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
1 of 1 committers (100.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.1%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: camfort
Language: Python
Default Branch: master
Size: 26.4 KB

Statistics

Stars: 0
Watchers: 3
Forks: 0
Open Issues: 0
Releases: 0

Created over 3 years ago · Last pushed over 3 years ago

Metadata Files

Readme

Experiments with OpenAI, CamFort and Fortran source code

Installation requirements

pip install -r requirements.txt

Docker alternative

./build.sh generates image openai:python3.10 locally from Dockerfile.openai-python
./run.sh [...] runs command [...] within the Docker container.

Getting an OpenAI API key

Sign up at https://openai.com/api/. You should automatically receive US$18 in credit valid for 3 months (as of this writing), which should be sufficient for testing. You can find your API key here: https://beta.openai.com/account/api-keys.

The key is a long string beginning with sk-. Put the OpenAI API key into the env var OPENAPI_API_KEY or provide it using the --api-key option to scripts.

Function / subroutine search

Compiling a database of vectors

Script get_vectors.py can be used to form a (sqlite3) database keeping information about functions and subroutines in Fortran files for later look-up.

It expects JSON describing function/subroutine locations in a certain format either on stdin or given in a file via option -f. Each line in the file looks like this:

{"name": "function_name", "path": "path/to/fortran/file.f90", "firstLine": 10, "lastLine": 25}

Example runs

fortran-src --dump-funs-and-subs src/ | ./run.sh python get_vectors.py --api-key "sk-..."
fortran-src --dump-funs-and-subs file.f90 | ./run.sh python get_vectors.py --database mydatabase.db --api-key "sk-..."

Searching the database

Script search_vectors.py can be used to query the (sqlite3) database using a general free-text search query to find the functions or subroutines that most closely match the description.

Example runs

./run.sh python search_vectors.py --api-key "sk-..." --database mydatabase.db calculate wet-bulb temperatures
./run.sh python search_vectors.py -n 10 precipitation in clouds

OpenAI transcripts and finetuning

This section discusses the code and examples found in the openai-transcripts subdirectory.

Interactive OpenAI

OpenAI can be used interactively https://beta.openai.com/playground. Typical settings include:

model: text-davinci-003 (most powerful) or code-davinci-002 (programming oriented).
temperature: 0, for the most deterministic output
maximum length: adjust as needed for longer output sequences (256 is not a bad start)
stop sequence: I often create one to stop the AI from going on and on. I use either !=end or """, lately.

File units-transcript1.txt contains a sample transcript from a session of teaching OpenAI about units. In between text explanations are series of examples. Usually, the final example in each series is a result of prompting the AI to fill in the 'output' automatically. Then after having built up some knowledge, I move on to the next concept.

It is possible to provide a series of prompt/completion data entries to 'finetune' the OpenAI model. Such finetuned models will appear in the list of models in the playground, or can be invoked from the programmatic API.

Converting to JSONL

The input to finetuning requires the JSONL format, which is just a series of lines in a text file, each line containing an independent JSON object. A sample script has been provided to convert from a simple text format to JSONL.

python3 txt-to-jsonl.py < finetune-examples.txt > finetune.jsonl

The expected text input is a bit verbose and is adapted from the units transcript, it looks like this:

Input: """
<partially completed code>
"""
Output: """
<fully completed code>
"""
###
[...]

The output:

{"prompt": "<partially completed code">, "completion": "<fully completed code>"}
[...]

Running the finetune process

Install the openai pip package and ensure your OPENAI_API_KEY env var is set with your key, then you can invoke the functionality for finetuning like so:

openai api fine_tunes.create -t openai-transcripts/finetune.jsonl -m davinci

This will put you on the queue (and eventually charge your account based on the number of tokens; it will tell you first before charging so you can cancel if you like).

Running the model

The generated finetune model will then be available in the playground or on the command line (with OPENAI_API_KEY env var set):

openai api completions.create -m davinci:ft-personal-xx-yy-zz -p '<partially completed code for the query>'

where davinci:ft-personal-xx-yy-zz is the name of the created model.

Owner

Name: camfort
Login: camfort
Kind: organization

Website: http://camfort.github.io
Repositories: 18
Profile: https://github.com/camfort

Tooling for the static analysis and verification of Fortran code (joint project between the University of Kent, University of Cambridge, and Bloomberg LP)

GitHub Events

Total

Last Year

Committers

Last synced: 12 months ago

All Time

Total Commits: 17
Total Committers: 1
Avg Commits per committer: 17.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Matthew Danish	m**5@c**k	17

Committer Domains (Top 20 + Academic)

cam.ac.uk: 1

Issues and Pull Requests

Last synced: 12 months ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

https://github.com/camfort/camfort-ai

Science Score: 10.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Experiments with OpenAI, CamFort and Fortran source code

Installation requirements

Docker alternative

Getting an OpenAI API key

Function / subroutine search

Compiling a database of vectors

Example runs

Searching the database

Example runs

OpenAI transcripts and finetuning

Interactive OpenAI

Converting to JSONL

Running the finetune process

Running the model

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels