https://github.com/eusdancerdev/statflow
statflow – A versatile statistical toolkit for Python, featuring core statistical methods, time series analysis, signal processing, and climatology tools.
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.9%) to scientific vocabulary
Repository
statflow – A versatile statistical toolkit for Python, featuring core statistical methods, time series analysis, signal processing, and climatology tools.
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
statflow
statflow is a comprehensive Python toolkit for statistical analysis, time series processing, and climatological data analysis. Built with modern scientific computing standards, it provides robust tools for statistical operations, signal processing, and specialised climatology workflows. The package emphasises professional-grade statistical computing with comprehensive type annotations, efficient algorithms, and extensive climatological indicators.
Features
Core Statistical Analysis:
- Advanced time series analysis with periodic statistics and trend detection
- Statistical hypothesis testing (Z-tests, Chi-squared tests)
- Moving operations (moving averages, window sums) for multi-dimensional data
- Comprehensive interpolation methods (polynomial, spline, linear) for NumPy, pandas, and xarray
- Signal processing with filtering (low-pass, high-pass, band-pass) and whitening techniques
- Regression analysis tools and approximation techniques
Climatological Analysis:
- Climate indicator calculations (WSDI, SU, CSU, FD, TN, RR, CWD, HWD)
- Periodic climatological statistics with multi-frequency support (hourly, daily, monthly, seasonal, yearly)
- Representative series generation including Hourly Design Year (HDY) following ISO 15927-4:2005
- Simple bias correction techniques with absolute and relative delta methods
- Comprehensive meteorological variable calculations (heat index, wind chill, dew point, specific humidity)
- Bioclimatic variable computation (19 standard bioclimatic indicators)
Advanced Data Processing:
- Multi-format data support (pandas DataFrames, xarray Datasets/DataArrays, NumPy arrays)
- Cumulative data decomposition and time series transformation
- Consecutive occurrence analysis for extreme event detection
- Autocorrelation analysis with optimised algorithms for large datasets
- Professional error handling with comprehensive input validation
Signal Processing & Filtering:
- Signal whitening techniques (classic, sklearn PCA, ZCA whitening)
- Multiple filtering approaches with frequency domain processing
- Fourier transform-based band-pass filtering methods
- Noise handling and signal enhancement tools
Installation
Prerequisites
- Python 3.10+: Required for modern type annotations and features
- Core Dependencies: NumPy, pandas, scipy, xarray for scientific computing
- Additional Dependencies: filewise, pygenutils (project packages)
For Regular Users
For regular users who want to use the package in their projects:
bash
pip install statflow
This automatically installs statflow and all its dependencies from PyPI and GitHub repositories.
Package Updates
To stay up-to-date with the latest version of this package, simply run:
bash
pip install --upgrade statflow
Development Setup
For Contributors and Developers
If you're planning to contribute to the project or work with the source code, follow these setup instructions:
Quick Setup (Recommended)
```bash
Clone the repository
git clone https://github.com/EusDancerDev/statflow.git cd statflow
Install in editable mode with all dependencies
pip install -e . ```
Note: The -e flag installs the package in "editable" mode, meaning changes to the source code are immediately reflected without reinstalling.
This will automatically install all dependencies with version constraints.
Alternative Setup (Explicit Git Dependencies)
If you prefer to use the explicit development requirements file:
```bash
Clone the repository
git clone https://github.com/EusDancerDev/statflow.git cd statflow
Install development dependencies from requirements-dev.txt
pip install -r requirements-dev.txt
Install in editable mode
pip install -e . ```
This approach gives you the latest development versions of all interdependent packages for testing and development.
If you encounter import errors after cloning:
- For regular users: Run
pip install statflow(all dependencies included) - For developers: Run
pip install -e .[dev]to include development dependencies - Verify Python environment: Make sure you're using a compatible Python version (3.10+)
- Check scientific computing libraries: Ensure scipy, xarray, and other scientific packages are available
Verify Installation
To verify that your installation is working correctly, you can run this quick test:
```python
Test script to verify installation
try: import statflow from filewise.general.introspectionutils import gettypestr from pygenutils.arraysandlists.datamanipulation import flattenlist from statflow.core.timeseries import periodic_statistics
print("✅ All imports successful!")
print(f"✅ statflow version: {statflow.__version__}")
print("✅ Installation is working correctly.")
except ImportError as e: print(f"❌ Import error: {e}") print("💡 For regular users: pip install statflow") print("💡 For developers: pip install -e .[dev]") ```
Implementation Notes
This project implements a dual-approach dependency management system:
- Production Dependencies: Version-constrained dependencies for PyPI compatibility
- Development Dependencies: Git-based dependencies for latest development versions
- Installation Methods:
- Regular users: Simple
pip install statflowwith all dependencies included - Developers:
pip install -e .[dev]for latest Git versions and development tools
- Regular users: Simple
- PyPI Compatibility: All packages can be published without Git dependency issues
- Development Flexibility: Contributors get access to latest versions for testing and development
Usage
Core Statistical Analysis
```python from statflow.core.timeseries import periodicstatistics, autocorrelate from statflow.core.statisticaltests import ztesttwomeans, chisquaredtest import pandas as pd import numpy as np
Load your time series data
df = pd.readcsv("yourdata.csv", parse_dates=['date'])
Calculate periodic statistics
monthlymeans = periodicstatistics( df, statistic="mean", freq="M", # Monthly frequency dropdateidx_col=False )
Perform hypothesis testing
sample1 = np.random.normal(10, 2, 100) sample2 = np.random.normal(12, 2, 100) zstat, pvalue, result = ztesttwo_means(sample1, sample2) print(f"Z-test result: {result}")
Autocorrelation analysis
autocorr = autocorrelate(df['temperature'].values, twosided=False) ```
Signal Processing
```python from statflow.core.signalprocessing import lowpassfilter, bandpass1, signalwhitening from statflow.core.movingoperations import movingaverage, windowsum
Apply signal filtering
filteredsignal = lowpassfilter(noisydata, window_size=5)
Band-pass filtering in frequency domain
bandfiltered = bandpass1( originalsignal, timestep=0.1, lowfreq=0.1, high_freq=2.0 )
Signal whitening for decorrelation
whiteneddata = signalwhitening(signal_data, method="classic")
Moving operations for time series
movingavg = movingaverage(timeseries, N=7) # 7-day moving average cumulativesum = windowsum(dataarray, N=30) # 30-point window sum ```
Interpolation Methods
```python from statflow.core.interpolationmethods import interpnp, interppd, interpxr, polynomial_fitting
NumPy array interpolation
interpolatednp = interpnp( datawithgaps, method='spline', order=3 )
Pandas DataFrame interpolation
interpolatedpd = interppd( dfwithmissing, method='polynomial', order=2 )
Polynomial fitting with edge preservation
fitteddata = polynomialfitting( yvalues, polyord=3, fix_edges=True ) ```
Climatological Analysis
```python from statflow.fields.climatology.indicators import calculateWSDI, calculateSU, calculatehwd from statflow.fields.climatology.periodicclimatstats import climatperiodicstatistics from statflow.fields.climatology.variables import calculateheat_index, biovars
Climate indicators
Warm Spell Duration Index
wsdi = calculateWSDI( dailytmaxdata, tmaxthreshold=30.0, minconsecdays=6 )
Summer Days count
summerdays = calculateSU(dailytmaxdata, tmax_threshold=25.0)
Heat wave analysis
hwdevents, totalhwd = calculatehwd( tmaxdata, tmindata, maxthresh=35.0, minthresh=20.0, dates=dateindex, min_days=3 )
Climatological statistics
monthlyclimat = climatperiodicstatistics( climatedata, statistic="mean", timefreq="monthly", keepstd_dates=True )
Meteorological calculations
heatidx = calculateheatindex(temperature, humidity, unit="celsius") dewpoint = calculatedewpoint(temperature, humidity)
Bioclimatic variables (19 standard indicators)
bioclimvars = biovars( tmaxmonthlyclimat, tminmonthlyclimat, precipmonthly_climat ) ```
Bias Correction
```python from statflow.fields.climatology.simplebiascorrection import calculateandapply_deltas
Simple bias correction between observed and reanalysis data
correcteddata = calculateandapplydeltas( observedseries=obsdata, reanalysisseries=reanalysisdata, timefreq="monthly", deltatype="absolute", # or "relative" statistic="mean", preference="observed", # treat observations as truth season_months=[12, 1, 2] # for seasonal analysis ) ```
Representative Series (HDY)
```python from statflow.fields.climatology.representativeseries import calculateHDY, hdy_interpolation
Calculate Hourly Design Year following ISO 15927-4:2005
hdydataframe, selectedyears = calculateHDY( hourlyclimatedf, varlist=['date', 'temperature', 'humidity', 'windspeed'], varlistprimary=['date', 'temperature', 'humidity'], dropnewidxcol=True )
Interpolate between months to smooth transitions
hdysmooth, winddirsmooth = hdyinterpolation( hdydataframe, selectedyears, previousmonthlasttimerange="20:23", nextmonthfirsttimerange="0:3", varlisttointerpolate=['temperature', 'humidity'], polynomial_order=3 ) ```
Project Structure
The package is organised as a comprehensive statistical analysis toolkit:
text
statflow/
├── core/ # Core statistical functionality
│ ├── approximation_techniques.py # Curve fitting and approximation methods
│ ├── interpolation_methods.py # Multi-format interpolation tools
│ ├── moving_operations.py # Moving averages and window operations
│ ├── regressions.py # Regression analysis tools
│ ├── signal_processing.py # Signal filtering and processing
│ ├── statistical_tests.py # Hypothesis testing functions
│ └── time_series.py # Time series analysis and statistics
├── fields/ # Domain-specific analysis modules
│ └── climatology/ # Climate data analysis tools
│ ├── indicators.py # Climate indicators (WSDI, SU, etc.)
│ ├── periodic_climat_stats.py # Climatological statistics
│ ├── representative_series.py # HDY and representative data
│ ├── simple_bias_correction.py # Bias correction methods
│ └── variables.py # Meteorological calculations
├── distributions/ # Statistical distributions (future expansion)
├── utils/ # Utility functions and helpers
│ └── helpers.py # Support functions for analysis
├── CHANGELOG.md # Detailed version history
├── VERSIONING.md # Version management documentation
└── README.md # Package documentation
Key Capabilities
1. Time Series Analysis
- Periodic Statistics: Calculate statistics across multiple time frequencies with robust datetime handling
- Cumulative Data Processing: Decompose cumulative time series into individual values
- Consecutive Analysis: Detect and count consecutive occurrences of extreme events
- Autocorrelation: Optimised autocorrelation analysis for pattern detection
2. Statistical Testing
- Hypothesis Tests: Z-tests for mean comparison, Chi-squared tests for independence
- Robust Validation: Comprehensive input validation and error handling
- Multiple Data Types: Support for NumPy arrays, pandas Series, and more
3. Signal Processing
- Filtering Suite: Low-pass, high-pass, and band-pass filters with multiple implementation methods
- Signal Enhancement: Whitening techniques for decorrelation and noise reduction
- Frequency Domain: Fourier transform-based processing for advanced filtering
4. Climatological Indicators
- Standard Indices: WSDI, SU, CSU, FD, TN, RR, CWD following international standards
- Heat Wave Analysis: Comprehensive heat wave detection with intensity metrics
- Bioclimatic Variables: Complete set of 19 bioclimatic indicators for ecological studies
5. Meteorological Calculations
- Atmospheric Variables: Heat index, wind chill, dew point, specific humidity
- Magnus Formula: Accurate saturation vapor pressure calculations
- Multi-Unit Support: Celsius/Fahrenheit and metric/imperial unit systems
6. Data Processing Excellence
- Multi-Format Support: Seamless handling of pandas, xarray, and NumPy data structures
- Type Safety: Modern PEP-604 type annotations throughout the codebase
- Error Handling: Comprehensive validation with descriptive error messages
Advanced Features
Professional Climatology Workflows
```python
Complete climatological analysis workflow
from statflow.fields.climatology import *
1. Calculate basic climate indicators
indicators = { 'summerdays': calculateSU(dailytmax, 25.0), 'frostdays': calculateFD(dailytmin, 0.0), 'tropicalnights': calculateTN(dailytmin, 20.0), 'wetdays': calculateRR(dailyprecip, 1.0) }
2. Generate climatological statistics
climatstats = climatperiodicstatistics( climatedataframe, statistic="mean", timefreq="seasonal", seasonmonths=[6, 7, 8] # Summer season )
3. Apply bias correction
correctedprojections = calculateandapplydeltas( observeddata, modeldata, timefreq="monthly", deltatype="relative", preference="observed" )
4. Calculate meteorological variables
heatstress = calculateheatindex(temperature, humidity) comfortmetrics = calculatewindchill(temperature, wind_speed) ```
High-Performance Time Series Processing
```python
Optimised for large datasets
from statflow.core.timeseries import periodicstatistics, consecoccurrencesmaxdata
Process multi-dimensional climate data
largedataset = xr.opendataset("largeclimatefile.nc")
Efficient periodic statistics with proper memory management
monthlystats = periodicstatistics( largedataset, statistic="mean", freq="M", groupbydates=True )
Vectorised extreme event analysis
extremeevents = consecoccurrencesmaxdata( temperaturearray, maxthreshold=35.0, minconsec=3, calcmaxconsec=True ) ```
Dependencies
Core Dependencies
- numpy: Numerical computing and array operations
- pandas: Data manipulation and time series handling
- scipy: Statistical functions and signal processing
- xarray: Multi-dimensional data handling for climate data
Project Dependencies
- filewise: File operations and introspection utilities
- pygenutils: General-purpose utilities for arrays, strings, and time handling
- paramlib: Parameter management and global constants
Optional Dependencies
- scikit-learn: For advanced whitening techniques in signal processing
- matplotlib: For plotting and visualisation (user's choice)
Integration Examples
Climate Data Analysis Pipeline
```python import statflow as sf import xarray as xr import pandas as pd
Load climate model data
climatedata = xr.opendataset("climatemodeloutput.nc")
1. Time series analysis
trendanalysis = sf.core.timeseries.periodicstatistics( climatedata.temperature, statistic="mean", freq="Y" # Annual trends )
2. Calculate climate indicators
heatwaves = sf.fields.climatology.indicators.calculatehwd( climatedata.tasmax.values, climatedata.tasmin.values, maxthresh=35.0, minthresh=20.0, dates=climatedata.time, mindays=3 )
3. Signal processing for trend detection
filteredtemp = sf.core.signalprocessing.lowpassfilter( climatedata.temperature.values, windowsize=10 )
4. Statistical validation
tempstats = sf.core.statisticaltests.ztesttwomeans( historicalperiod, future_period ) ```
Multi-Scale Statistical Analysis
```python
Analyse data across multiple temporal scales
scales = ['hourly', 'daily', 'monthly', 'seasonal'] results = {}
for scale in scales: results[scale] = sf.fields.climatology.climatperiodicstatistics( meteorologicaldata, statistic="mean", timefreq=scale, keepstddates=True )
Cross-scale correlation analysis
correlations = {} for i, scale1 in enumerate(scales): for scale2 in scales[i+1:]: corrdata = sf.core.timeseries.autocorrelate( results[scale1].values.flatten() ) correlations[f"{scale1}{scale2}"] = corrdata ```
Best Practices
Data Preparation
- Ensure consistent datetime indexing for time series analysis
- Validate data quality and handle missing values appropriately
- Use appropriate data structures (pandas for tabular, xarray for multi-dimensional)
- Consider memory usage for large climate datasets
Statistical Analysis
- Choose appropriate statistical tests based on data distribution and assumptions
- Use robust error handling and validate input parameters
- Consider multiple time scales for comprehensive climate analysis
- Apply proper bias correction techniques for model-observation comparisons
Performance Optimisation
- Leverage vectorised operations for large datasets
- Use appropriate interpolation methods based on data characteristics
- Consider parallel processing for independent calculations
- Monitor memory usage with large climate model outputs
Climatological Standards
- Follow international standards for climate indicator calculations
- Use appropriate thresholds for regional climate conditions
- Document methodology and parameter choices
- Validate results against established climatological references
Contributing
Contributions are welcome! Please feel free to submit a Pull Request for:
- New statistical methods or climate indicators
- Performance improvements and optimisations
- Enhanced documentation and examples
- Bug fixes and error handling improvements
Development Guidelines
- Follow Type Annotations: Use modern PEP-604 syntax for type hints
- Maintain Documentation: Comprehensive docstrings with examples
- Add Tests: Unit tests for new functionality
- Performance Considerations: Optimise for large scientific datasets
- Compatibility: Ensure compatibility with multiple data formats
bash
git clone https://github.com/EusDancerDev/statflow.git
cd statflow
pip install -e ".[dev]"
pytest # Run test suite
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Scientific Python Community for foundational libraries (NumPy, pandas, scipy, xarray)
- Climate Research Community for standard definitions of climate indicators
- International Standards (ISO 15927-4:2005) for representative weather data methodologies
- Open Source Contributors for continuous improvement and feedback
Citation
If you use statflow in your research, please cite:
bibtex
@software{statflow2024,
title={statflow: Statistical Analysis and Climatology Toolkit},
author={Your Name},
year={2024},
url={https://github.com/yourusername/statflow},
version={3.5.0}
}
Contact
For questions, suggestions, or collaboration opportunities:
- Issues: Open an issue on GitHub for bug reports or feature requests
- Discussions: Use GitHub Discussions for general questions and ideas
- Email: Contact the maintainers for collaboration inquiries
Related Projects
- climalab: Climate data analysis and processing tools
- filewise: File operations and data manipulation utilities
- pygenutils: General-purpose Python utilities
- paramlib: Parameter management and configuration constants
Troubleshooting
Common Issues
- Memory Errors with Large Datasets:
python
# Use chunking for large xarray datasets
large_data = xr.open_dataset("huge_file.nc", chunks={'time': 1000})
- Type Compatibility:
python
# Ensure consistent data types
data = data.astype(np.float64) # Convert to consistent numeric type
- Missing Dependencies:
bash
pip install scipy xarray # Install missing scientific computing libraries
- Performance Issues:
python
# Use appropriate methods for data size
if len(data) > 50000:
autocorr = sf.core.time_series.autocorrelate(data, twosided=False)
Getting Help
- Check the CHANGELOG.md for recent updates and breaking changes
- Review function docstrings for parameter details and examples
- Consult the VERSIONING.md for version compatibility information
- Open an issue on GitHub with a minimal reproducible example
statflow - Professional statistical analysis and climatology toolkit for Python 🌡️📊
Owner
- Login: EusDancerDev
- Kind: user
- Repositories: 2
- Profile: https://github.com/EusDancerDev
GitHub Events
Total
- Push event: 34
- Create event: 1
Last Year
- Push event: 34
- Create event: 1