https://github.com/anand-kamble/data-filtering

https://github.com/anand-kamble/data-filtering

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.9%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: anand-kamble
  • Language: Python
  • Default Branch: main
  • Size: 346 KB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 2
  • Releases: 0
Created about 2 years ago · Last pushed about 2 years ago
Metadata Files
Readme Changelog

Readme.md

Use of LLM to Improve Plane Maintenance

Data Processor

This project leverages a Language Model (LLM) to enhance the efficiency and accuracy of plane maintenance by processing and analyzing relevant data.

Parameters

  • config (data_config or None): The configuration object containing parameters for data processing. If None, a CopaError will be raised.
  • base_path (str, optional): The base path where the data is located relative to the run.sh script. Defaults to an empty string, which means the data is located in the current directory.
  • test_mode (bool, optional): If True, the DataProcessor will run in test mode. Defaults to False.
  • test_rows (int, optional): The number of rows to process in test mode. Defaults to 100.
  • drop_duplicates (bool, optional): If True, duplicate rows will be dropped during data processing. Defaults to False.
  • no_cache (bool, optional): If True, caching will be disabled during data processing. Defaults to False.

Installation

To set up the Python virtual environment for this project, you'll need Poetry. Poetry is a tool for dependency management and packaging in Python. Follow the steps below to install Poetry and run the project:

Step 1: Install Poetry

You can find instructions for installing Poetry here.

Step 2: Set Up the Environment

Once Poetry is installed, follow these steps to generate the filtered data:

  1. Install Dependencies: sh poetry install

  2. Activate the Virtual Environment: sh poetry shell

  3. Install Additional Tools (First-Time Setup Only): Required only for development
    You can skip this step if you do not plan to contribute to repository. sh pip install -U commitizen

  4. Run the Data Processor: sh ./run.sh Alternatively, if you have exited the Poetry shell, you can run: sh poetry run ./run.sh

Default Execution

If no arguments are provided, the script will use the default parameters specified in the script.

Custom Execution

You can provide custom parameters when running the script. For example: sh ./run.sh --config "custom_config.json" --test_mode --test_rows 500

This flexibility allows you to tailor the data processing to your specific needs.

Owner

  • Name: Anand Kamble
  • Login: anand-kamble
  • Kind: user
  • Location: Tallahassee, FL
  • Company: Florida State University

Graduate student in FSU Scientific Computing

GitHub Events

Total
  • Public event: 1
Last Year
  • Public event: 1

Dependencies

pyproject.toml pypi
  • jupyter ^1.0.0 develop
  • dfply ^0.3.3
  • fletcher ^0.7.2
  • halo ^0.0.31
  • modin ^0.29.0
  • pandas ^2.2.2
  • python >=3.11, < 3.13
  • static-frame ^2.6.0
  • tidypandas ^0.3.0
  • ydata-profiling ^4.7.0