https://github.com/lzw108/fmd

This is a continuous project on Financial Misinformation Detection (FMD).

https://github.com/lzw108/fmd

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.4%) to scientific vocabulary
Last synced: 8 months ago · JSON representation

Repository

This is a continuous project on Financial Misinformation Detection (FMD).

Basic Info
  • Host: GitHub
  • Owner: lzw108
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 1.89 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

Financial Misinformation Detection

This work also supported Financial Misinformation Detection (FMD) challenge at COLING 2025

Paper arXiv

News

📢 Jan. 20, 2025 Our FMDLlama paper has been accepted by WWW 2025 as a short paper.

📢 Jan. 20, 2025 The Financial Misinformation Detection Challenge has successfully wrapped up at COLING 2025. Learn more about the challenge.

📢 Sep. 26, 2024 New preprint paper related to this work: "FMDLlama: Financial Misinformation Detection based on Large Language Models" at arXiv.

Datasets

  • Practice data: Link
  • Complete train data: Link
  • Test data: TBD

Usage

Data preprocess

You can follow the practicedatapreprocess.ipynb file to get instruction train/val/test data in ./data/practicedata/instructdata/ path. The default is an instruction example, change accordingly as need.

Convert data format

```python

train

python src/converttoconvdata.py --origdata ./data/practicedata/instructdata/FMDtrain.json --writedata ./data/practicedata/instructdata/train.json --dataset_name fmd

val

python src/converttoconvdata.py --origdata ./data/practicedata/instructdata/FMDval.json --writedata ./data/practicedata/instructdata/val.json --dataset_name fmd ```

The commands above are to convert the data into dialogue data format for LLMs training. The current format is used for the LLaMA2 series (i.e. "Human": "sentence", "Assistant": "sentence" ). If you need to switch to other LLMs, please make the corresponding modifications.

Fine-tune

python bash ./src/run_sft.sh

Inference

python bash src/run_inference.sh

Evaluation

Follow the evaluation.ipynb file to get F1, rouge, bertscore, and final score.

License

This project is licensed under [MIT]. Please find more details in the MIT file.

Citation

@article{liu2024fmdllama, title={FMDLlama: Financial Misinformation Detection based on Large Language Models}, author={Liu, Zhiwei and Zhang, Xin and Yang, Kailai and Xie, Qianqian and Huang, Jimin and Ananiadou, Sophia}, journal={arXiv preprint arXiv:2409.16452}, year={2024} }

GitHub Events

Total
  • Push event: 4
Last Year
  • Push event: 4

Dependencies

requirements.txt pypi
  • bert-score *
  • datasets *
  • deepspeed *
  • flash-attn *
  • gradio_client *
  • peft *
  • rouge_score *
  • sentencepiece *
  • textblob *
  • torch *
  • transformers *
  • wandb *