synthpred

A Julia package for synthetic data analysis, advanced imputation (ARIMA, RNN), AutoML, and ensemble modeling.

https://github.com/tymill/synthpred

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.2%) to scientific vocabulary

Keywords

arima automl ensemble flux imputation julia machine-learning synthetic-data time-series
Last synced: 6 months ago · JSON representation ·

Repository

A Julia package for synthetic data analysis, advanced imputation (ARIMA, RNN), AutoML, and ensemble modeling.

Basic Info
  • Host: GitHub
  • Owner: TyMill
  • License: mit
  • Language: Julia
  • Default Branch: main
  • Homepage:
  • Size: 308 KB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 2
  • Open Issues: 0
  • Releases: 2
Topics
arima automl ensemble flux imputation julia machine-learning synthetic-data time-series
Created 11 months ago · Last pushed 10 months ago
Metadata Files
Readme License Citation

README.md

SynthPred.jl

Docs

DOI

GitHub all releases

SynthPred.jl is a Julia package for synthetic data analysis, advanced imputation (ARIMA, RNN), AutoML, and ensemble modeling.


🚀 Features

  • 🔍 Descriptive statistics and missing data reporting
  • 🧼 Simple and advanced imputation:
    • Mean, median, mode
    • Forward/backward fill
    • Gaussian distribution sampling
    • Time series-based: ARIMA
    • Sequence learning-based: RNN (Flux.jl)
  • 🤖 AutoML for classification (MLJ.jl-based)
  • ⚖️ Blending top-performing models via ensembling
  • 📊 Predictions on new data
  • 📑 JSON/CSV imputation reports

📦 Installation

julia using Pkg Pkg.add(url="https://github.com/TyMill/SynthPred.jl")


🧪 Quick Example

```julia using SynthPred using CSV, DataFrames

Load training data

df = CSV.read("data/example.csv", DataFrame)

Explore data

SynthPred.Exploration.describe_data(df)

Impute missing values (e.g. RNN strategy)

dfclean, report = SynthPred.Imputer.imputeadvanced(df, "rnn", threshold=0.1) SynthPred.Imputer.saveimputationreport(report, "reports/imputation_report.json")

Run AutoML pipeline

topmodels, scores = SynthPred.AutoML.runautoml(dfclean, :target) X = select(dfclean, Not(:target)) y = dfclean[:, :target] ensemble = SynthPred.AutoML.blendtopmodels(topmodels, X, y)

Predict on new data

Xnew = CSV.read("data/newdata.csv", DataFrame) preds = SynthPred.AutoML.predictensemble(ensemble, Xnew) println(preds) ```


📚 Documentation

Full documentation is available at: https://your-username.github.io/SynthPred.jl


🧪 Project Structure

SynthPred/ ├── Project.toml ├── src/ │ ├── SynthPred.jl │ ├── Exploration.jl │ ├── Imputer.jl │ └── AutoML.jl ├── data/ │ ├── example.csv │ └── new_data.csv ├── reports/ │ └── imputation_report.json ├── docs/ │ └── src/index.md ├── test/ │ └── runtests.jl └── main.jl


📌 Roadmap

  • [x] Core modules: Exploration, Imputer, AutoML
  • [x] ARIMA and RNN-based imputations
  • [x] AutoML + model blending with MLJ.jl
  • [x] Imputation reports (CSV/JSON)
  • [x] Documentation (Documenter.jl + GitHub Pages)
  • [ ] Exporting trained models (JLD2, BSON)
  • [ ] Web GUI with Pluto.jl or Dash.jl
  • [ ] Integration with JuliaHub and Zenodo DOI

🤝 Contributing

Pull requests are welcome! For major changes, please open an issue first to discuss your proposal.


📜 License

MIT License © 2025 Tymoteusz Miller


📬 Contact

📧 me@tymoteuszmiller.dev


Built with ❤️ in Julia for real-world ML and scientific discovery.

Owner

  • Login: TyMill
  • Kind: user

Citation (CITATION.bib)

@software{miller2025synthpred,
  author       = {Tymoteusz Miller},
  title        = {SynthPred.jl: A Julia Library for Synthetic Data Analysis, Advanced Imputation, and Ensemble AutoML},
  year         = {2025},
  publisher    = {GitHub},
  journal      = {SoftwareX (planned)},
  url          = {https://github.com/TyMill/SynthPred.jl},
  version      = {v0.1.0},
  doi          = {10.5281/zenodo.15090893}
}

GitHub Events

Total
  • Release event: 2
  • Watch event: 2
  • Push event: 42
  • Pull request event: 2
  • Fork event: 2
  • Create event: 3
Last Year
  • Release event: 2
  • Watch event: 2
  • Push event: 42
  • Pull request event: 2
  • Fork event: 2
  • Create event: 3

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 46
  • Total Committers: 1
  • Avg Commits per committer: 46.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 46
  • Committers: 1
  • Avg Commits per committer: 46.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
TyMill 8****l 46

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • TyMill (2)
Top Labels
Issue Labels
Pull Request Labels