Recent Releases of TECA

TECA - TECA 6.0.0

TECA 6.0.0 Release Highlights

This is a major release that contains numerous improvements and fixes. TECA BARD is fully GPUized. Temporal reductions have been ported to C++ and optimized. The data and execution models have been extended for batching (processing multiple steps per request). New spatial parallel and space time parallel execution patterns allow the full space time extent of high resolution data to processed in memory. The new spatial parallelism is used in a low, high, and band pass filters as well as temporal percentile calculation. Numerous I/O optimizations have been introduced including the use of MPI collective buffering for spatial parallel execution.

Execution Model Improvements

e134264b add spatial executive c6f9bc62 cfwriter add partitioning contraints 4d53de68 add spacetimeexecutive 97efc350 add cfspacetimetimestepmappper 98bbb97b adds cfspatialtimestepmapper 3d915ee9 cfspacetimetimestepmapper add partitioning contraints a8fa4d8d cfspatialtimestepmapper add partitioning contraints 19f5e229 coordinateutil partition add contraints 4d2a8f1c indexreduce execution controls c765bd51 cfwriter command line parsing of spatial parallel properties 7c0c8a32 spatialexecutive constrain partitioning 792b7f94 spacetimeexecutive constrain partitioning f572c81e metadataprobe report number of intervals b295896e mesh wrap temporal bounds and extent daa684d7 indexrequestkey update 25bd3b44 indexexecutive clean up verbose report a61ec638 test cfreader temporal extent handling 6e3323dc datasetdiff handle temporal extents d5dad5eb test temporal reduction spatial parallelism 019dc836 cfwriter spatial parallelism f3c14a09 cflayoutmanager spatial parallelism ba50dd85 cftimestepmapper layout manager API 9aa17f18 interval time step mapper refactor e3a25a8c block time step mapper refactor 1cfcc08f coordinate util spatial partitioning e341f638 cfreader reads temporal extents 423a8da9 data model updates for multiple time steps per mesh

Data Model Improvements

03939e15 add and apply simplified dispatch macros 69e88df9 hamr update to latest 422f3835 hamr fully asynchronous by default 2cd9c8e7 hamr enforce const for read only data access 95de5935 hamr update to latest master 2927a95e HAMR update to latest master adf56038 variantarrayutil add host synchronization helper 69760845 variantarray add synchronization method c7b1b2d9 add tecavariantarrayutil 29897a4c variantarray better dispatch 1ce73a70 variantarray better dispatch a70cdfe8 variantarray make test for accessibility virtual 1422ea30 variantarray provide direct access to internal memory 59111349 variantarray python construct from numpy scalar c3562a76 cartesianmesh fall back to mesh extents 03143cb2 cartesianmeshsource spatial parallelism b8615ed7 cartesianmeshregrid per array dimensions ca4fcbb3 cartesianmesh per array extent and shape const 42446f27 cartesianmeshsource generate data on the assigned GPU d3082de4 cartesianmeshsource include bounds metadata in output mesh acf3fe2e cartesianmesh overload array shape to return a tuple 6a9f3ac0 cartesianmeshregrid pass array attributes from the source e5e8e4a7 cartesianmesh array extent time dim and add shape 73b58ebb cartesianmesh fix Python bindings for array shape/extent 86ef5616 cartesianmeshsource fix calendaring metadata in output

New Algorithms

f730aa81 add tecasurfaceintegral alg f79c2c8d add tecaregionalmoistureflux dc66e328 add tecatablejoin f2af4c41 add spectral filter e439275e add tecavtkutil::partitionwriter to help debug space-time paritioning 0fe459e0 add temporalpercentile temporal reduction 140008c5 wrote temporalindex_select and tests

New Applications

acfcaffe add regionalmoistureflux app cfd6ce85 Add the spectral filter app

GPUization

a64839b6 bayesianardetect add CUDA implementation cf74102e 2dcomponentarea thrust use stream per thread stream 42d16f76 2dcomponentarea set cuda device before doing any work e54e33b4 componentareafilter set cuda device before doing any work c3efa90d connectedcomponents set cuda device before doing any work 45a87f1d bayeseianardetect set cuda device before doing any work 3791b67d latitudedamper set cuda device before doing any work 8993ed66 unpackdata set cuda device before doing any work 640ee577 indexexecutive explicitly assign device ids 79445b3b binarysegmentation use streams for sorting and data movement 23347358 cudautil add a 1d domain decomposition 9644b346 latitudedamper add CUDA implementation a2432065 componentareafilter add CUDA implementation 5a2f6603 2dcomponentarea use restrict on kernels ad65931f 2dcomponentarea GPU-ize the area calculation 96c59666 cfreader don't use page locked memory for cuda 7549e888 cudautil simplify device assignment 1b14777c connectedcomponents use 8 connetivity 52be3623 ha4 test code use 8 connectivity 2f4047f9 indexexecutive environment variable override CUDA device assignment 0919c784 connectedcomponents inetgrate CUDA ha4 implementation 77884268 shapefilemask add CUDA implementation c44aded2 cudautil implement a container for cuda streams edf6c588 geometryutil GPUize point in poly 693a7b2c threadutil threads per device behavior ac2f59fe cuda warning cleanup 3f2ba7f7 spatialexecutive load balance across GPUs 5c082594 spacetimeexecutive load balance across GPUs

Threading Improvements

62410659 bayesianardetect fix thread safety issues fa1c2099 threadutil warn about too few threads wo MPI 1d5f4158 threadutil clamp the number of threads c9704448 threadutil report num threads when not binding af1592a4 threadedalgorithm propagatedeviceassignment 81d4e2d0 threadedalgorithm expose ranksper_device in API

Optimizations

60c9e718 cfrestripe app add collective buffer mode 3dbc0e22 Added C++ version of the temporal reduction algorithm and application 9735209c cfreader open file in collective mode 5558ff66 spectralfilter app command line options for collective buffering c0efea8f cf/multicfreader option to use collective buffering f304f275 cfwriter use collective buffering

Documentation

d5eb0fcc cfreader fix copy paste error in documentation e5306fac componentareafilter fix indent add comments 30adda58 algorithm fix a documentation typo bb730837 shapefilemask improve documentation d8fcade0 tablereduce improve documentation b166667f integratedwatervapor improve documentation ef2cd480 integratedvaportransport improve documentation f3623803 threadedalgorithm improve documentation e5a26ff2 doc doxygen style comments for programmablealgorithm dc367728 doc doxygen style comments for tecatable de5e8d68 doc data access developer tutorial 1d25525b intervaliterator subclasses fix units doxygen doc strings dd5f1fee doc update temporalreduction user guide c71e9057 cfwriter fix typo in docs 53effc02 doc update m1517 install locations for perlmutter 1b71d8eb coordinateutil improve documentation ff383a0f rtd add section explaining execution model ae237bd9 rtd docs fix doxygen install location c51132bf rtd pin sphinx version as latest is incompatible with rtddocs 5ea6e10c rtd doc array access tutorial spell check af9d2e6c doc rtd improve array access tutorial b528ec9d rtd fix a rst warning 9a6e888b rtd updates to the install for mac os 1a7dc382 doc rtd exclude variantarray_oeprator from doxygen

Testing

bf97e954 test disable periodic bc in bard app test 238db9f6 test bayesian ar detect sort by label area 49e83a90 deeplabardetect remove tests b7d14f17 testing update linux distributions c38337f9 testing cleanup use of %e% in tests d40d800a temporalreduction: added tests 80a01599 test add test for cpptemporalreduciton w. io 3b277b3a test temporal reduction stepsperrequest command line argument 9e614ea1 add testtemporalreduciton 3b338bf9 ha4 test code update ctests command 5dd84cb9 connectedcomponents test ignore component labels 6569a79f ha4 test code improvements a1012ed6 ha4 test code handles periodic BC in x-direction a380f62c ha4 test code works on images not divisible by 32 e6216c3b add ha4 connected component label test code a769ff73 teststreamingreducethreads: specifying netcdf file name to avoid conflict with temporal reduction all iterator test 6e02fa62 test temporalreduction app python and C++ d79206a4 testing temporalreduction tests specify number of threads 709f6853 temporalreduction C++ impl improvements and regression test 5120006d update the DOI badge to point to the latest release 18533f8c Changed teca data revision from 149 to 151

General Improvement

24142094 bayesianardetectparameters add properties to select specific start row be087dc5 bayesianardetect instrument the BARD app 37f4237e bayesianardetect app control writer thread pool size 176c1f6b connectedcomponents cleanup a warning 10eaf195 connectedcomponents minor improvements ee8cbf23 temporalreduction: set stepsperrequest in python app; included definition in cpp app 27f3ef3e temporalreduction: standardized nthreads command line b371bea9 temporalreduction construct output at end and others 494a3b42 temporalreduction: caching the intermediate result 07a119ae temporalreduction: any number of time steps per request is allowed bd321844 descriptivestatistics remove debuging code 18768fd8 indexexecutive fix a compiler warning ff551dce cpptemporalreduction algorithm errors are fatal 95bd6a88 temporalreduction: setthreadpoolsize [cfwriter] changed from -1 to 1 to fix intermittent bugs 7953cbbb temporalreduction: change the 1 time step per request to a run time specified number of steps 1bab4257 datasetdiff ignore specified arrays 03fc0bc7 tablesort sort either ascending or descending b29c4fd7 coordinateutil wrap bounds to extent overload d0ac7a98 integratedvaportransport handle ascending coordinates in the first order method b593e57d integratedvaportransport app enable automatic z-axis unit conversion 78675ec9 integratedvaportransport warn if vertical axis units are incorrect 63087d20 normalizecoordinates check z-axis units df9378e0 integratedvaportransport layer thickness eb4853a7 evaluateexpression netcdf attributes for the cfwriter 4cccc26f table include dataset property for array attributes 98e0a891 tablejoin pass array attributes for NetCDF I/O 3b815827 integratedwatervapor reformat units string f6eabe0f algorithm add a single value setter for vector properties 657ba214 indexreduce use std::vector instead of std::array abac3f23 indexeddatasetcache override request index eb86345c integratedvaportransport change format of units b0a4390c datasetsource report variables from tables and meshes 31b748a2 datasetsource move to alg to access typed datasets 290db6bf coordinateutil improvements 20c5ad67 tablereduce report and request use default implementations b878f89f programoptions support std::array in algorithm properties 06b52b2f shapefilemask improvements 9a506d7f dataset typed accessors 40ea89bd derived quantity improvements ebb12862 arrayattributes include meshdimactive 254f9e7f temporalreduction app/alg cpp/python catch user errors a1eb0f0e cfwriter improve error message 589f70c6 cfwriter improve collective buffering error message d01da07f cfrestripe app runs in CPU only mode by default 19334ed6 cfreader improve collective buffering error c4de2426 spectralfilter per-rank timing output in verbose mode 51a26703 spectralfilter add ideal butterworth frequency response dadb4911 spectralfilter fixes issue found when processing real data 2a8816ff spectralfilter refactor regression tests 976482d6 spectral filter fix high pass kernel generation 57e3f31d tecatemporalreduction: added all iterator average test 9924d676 tecatemporalreduction: added all iterator fbb866ea tecacalendarutil: added new class alliterator c6704cdc temporalreduction: added flag to spatial parallelism 4b0d251d tecacalendarutil: added the new class nstepsiterator ec98d675 added index selection to the temporal reduction 6f1ae9da metadata add support for std::array 6e99362f vorticity better identitiers in dispatch macro 4407b536 cudautil remove redundant error check d11f4803 validvaluemask export mask type 296ec4a0 temporalreduction app command line option controlling threadpool size 48e32130 temporalreduction: rename the C++ implementation bd6718ac temporalreduction: handle the case where the number of inputs < 2. 19ead29e temporalreduction: renamed the original python implementation fbd22354 temporalreduction: resolving a warning 9ba6d353 temporalreduction clean up warnings with nvcc 3bcfcf43 tenporalreduction app integrate multi threading ec71e980 Renamed python version of temporal reduction; python bindings cd28e3e8 tecathreadedprogrammablealgorithm: increased the size of the classname variable from 64 to 96 ccfdba31 potentialintensity user provided masking threshold e7c53c0c potentialintensity units checks and conversions c674b798 potentialintensity app reduce verbosity 8f6a1ed3 tecapotentialintensity clean up runtime warnings df50b49a python functions returing typed scalars 11814513 potentialintensity app use spatial partitioning e18f4d49 potentialintensity app land mask from mcf file 839ef1c3 apputil error out with positional options

Bug fixes

6cb6cccf componentareafilter fix indentation ee9a4d84 connectedcomponents fix 8-way connectivity accross periodic boundary e85aa72f systeminterface fix double free in stack trace generation d19e7270 testing fix the component are filter test 10e94597 temporalreduction: fix data access d6b22e63 tecaprofiler: fixed convertion of hexstring to int 220587ad cputhreadpool fix bind argument position bf99eff3 cpu/cudathreadpool fix streaming bug 3d5d4db2 cfwriter fix let threadedalgorithm process command line 80820ef1 threadedalgorithm fix set algorithm props from command line 59c42b53 threadedalgorithm fix threadsperdevice parameter name bada5a60 cpptemporalreduciton fix thread safety issues 7de9224c cpptemporalreduction fix a typo in documentation 59eb4c79 ha4 test code fix race condition 2eaa71b6 connectedcomponents fix race condition ddaf758f connectedcomponents fix compile w/o cuda 4c8032c1 connectedcomponents 8-way connectivity bug fixes 55b0908f ha4 test code 8-way connectivity bug fixes 18e0c92d renamevariables fix set variables in the output attributes e7396820 fixes for cuda 12 and warning cleanup for gcc 12 9d13cd42 temporalreduction fix missing virtual destructor in base class 76ab59d8 arraycollection fix double move f462ac83 normalizecoordinates fix a bug in the output extents e8dcfca3 tests fix regex that picks up new file e3dc08f3 cpptemporalreduction cleanup, fixes, and improvements 00ba2421 temporalreduction: included flag to choose python or c++ implementation; fixing the nsteps interval bc43a364 temporalreduction: rename the python implementation; fixing name of two python tests 244f58e5 temporalreduction: fixing the parameter order in a test 79b36732 temporalreduction: added a new finalize function to fix a bug 942aa111 temporalreduction fix a warning and set strream size 7120ecb8 cpu/cudathreadpool fix thread safety issues d2519402 threadedalgorithm fix indentation 26fba6d7 potentialintensity units checks and conversion fix 45dbd2b9 Fixed nstepsiterator class of python version of temporalreduction 86e6ea74 calendaring fix buffer overflow warnings 5ffc2d4e Fixing issue 98f04ee3 temporal_percentile fixes

Python

42ca1d80 python support wrapping API with fixed length C-arrays 61e9f34a remove numpy deprecated types

Build System

a76c7cf9 build cleanup cmake code 8f965035 added CMAKEINSTALLRPATH to CMakeLists.txt 3e43838f build define NDEBUG in CUDA release build 08b95f05 build always update the version descriptor 944a3f25 build system don't relink unless neccessary

- C++
Published by burlen over 2 years ago

TECA - TECA 5.0.0

Major features

The TECA data model now supports memory management on CPUs as well as CUDA, OpenMP device offload, HIP capable GPUs and accelerators. TECA's execution model was extended to support CUDA capable GPUs. This includes automated load balancing across multi GPU accelerated compute nodes on supercomputing systems as well as CUDA kernel launching and load balancing infrastructure Support for zero-copy interpoerability with Cupy and Numba on CUDA capable GPU's was added.

GPUized algorithms

tecabinarysegmentation tecal2norm tecavalidvaluemask tecaunpackdata tecaintegratedvaportransport tecatemporalreduction tecalapserate tecacfreader tecacfwriter

New algorithms and apps

tecalapserate tecatcpotentialintensity tecatimeaxisconvolution tecashapefilemask tecatempestremap tecacartesianmeshcoordinatetransform tecaarraycollectionreader tecaarraycollectionwriter

Improvements

Make the tecaarraycollection a data set Add user defined intervals and operators to the tecatemporalreduction tecatemporalreduction handle integer data in the avergaing reduction tecatemporalreduction use the valid value mask add a summation reduction to the tecatemporalreduction improved threading support on MacOS users can provide call backs at runtime for custom error handling

Documentation

Numerous improvements to the user guide and Doxygen documentation including documentation of new applications and install on GPU enabled systems Updated examples illustrating how to use Cupy in Python applications New Perlmutter specific examples were added to TECA_Examples

- C++
Published by burlen over 3 years ago

TECA - TECA 4.1.0

4.1.0 is a feature release with a number of new and exciting features and a number of critical bug fixes.

  • new mask below surface algorithm that creates point wise binary (0,1) mask identifying mesh points that are below land surface based on externally provided DEM.
  • integrated the mask below surface stage into the BARD, IWV, and IVT apps
  • new unpack NetCDF packed data stage
  • add coordinate normalization stage transform for longitude from -180 to 180 to 0 to 360
  • new IWV algorithm
  • new IWV command line application
  • new time based file layouts (daily, monthly, yearly, seasonal)
  • BARD app can now generate output fields weighted by AR probabilities
  • new rename variables stage
  • improvements to cartesianmeshsource for remeshing
  • cf_reader correctly detects centering and per field dimensionality
  • multicfreader MCF file format improvements. Add support for reader properties, globablly and per reader.
  • cf_reader option to produce 2D field when the 3'rd dimension is length 1
  • Cartesian meshes can now contain both 2D and 3D arrays, metadata annotations are used to differentiate at run time
  • metadata probe improvements to report per-field centering
  • new remeshing capability deployed in cf_restripe and apps that utilize elevation mask
  • improvements to the user guide
  • refactored source code documentation to be compatible with Doxygen,
  • published Doxygen on the rtd site : https://teca.readthedocs.io/en/integrating_breathe/doxygen/index.html
  • new capabilities in the cf_restripe command line application for remeshing
  • 25+ bug fixes

- C++
Published by burlen almost 5 years ago

TECA - TECA 4.0.0

Documentation

  1. A major overhaul of the command line application section of the user guide including the addition of examples.
  2. Publish batch scripts illustrating running TECA at scale in the new TECA_examples repo.
  3. Giving tutorials and publishing the materials in the new TECA_tutorials repo
  4. Updates to the installation section of the TECA User's Guide](https://teca.readthedocs.io/en/latest/installation.html)

Data Model Improvements

  1. Added support for Arakawa C Grids in teca_arakawa_c_grid
  2. Added support for logically Cartesian so called curvilinear grids in teca_curvilinear_mesh
  3. Refactored mesh related class hierarchy so that common codes such as array accessing and I/O live in teca_mesh
  4. Added support for face and edge centered mesh based data.

I/O Capabilities

  1. Added reader for WRF simulation teca_wrf_reader
  2. Add support for writing logically Cartesian curvilinear meshes in teca_cartesian_mesh_writer.
  3. Added a new NetCDF based output format for tabular data to the teca_table_writer.
  4. Added support for reading tabular CSV files to the teca_table_reader. This enables the tabular outputs such as TC tracks etc saved from TECA apps to be stored in a format ingestible by other tools such as Python and Excel without the need to convert from TECA's internal binary format.
  5. Added versioning and error checking to TECA's internal binary serialization format across all datasets. This enables us to catch version differences and handle bad or corrupted files gracefully.
  6. use of NetCDF parallel 4 (i.e. MPI collective I/O) for writing results. this enables the use of any number of files with any number of ranks.

Execution Patterns

  1. Implement a new streaming mode reduction where data is incrementally reduced as it becomes available. This parallelizes the reduction step and reduces the memory overhead.
  2. Introducing a new MPI parallel approach to scan the time axis. This has substantial benefit when there are a large number of files.
  3. expose MPI aware thread load balancing to Python. This was used in the teca_pytorch_algorithm to automatically load balance the OpenMP backend of PyTorch.
  4. implement GPU load balancing strategy in the teca_pytorch_algorithm.
  5. Enable process groups to be excluded from execution. This lets a pipeline run on a subset of MPICOMMWORLD.

Algorithmic Capabilities

  1. Added teca_pytorch_algorithm a base class that handle tasks common to interfacing to PyTroch when developing Machine Learning based detectors.
  2. Added teca_deeplab_ar_detect a new PyTorch based Machine Learning based AR detector.
  3. Added teca_valid_value_mask an algorithm that generates a mask identifying the presence of NetCDF _FillValue values in arrays. Down stream algorithms use the mask to handle _FillValue's in an algorithm appropriate manner.
  4. Added teca_temporal_reduction an algorithm that implements transformations from one time resolution to another. The implementation includes min, max, and average operators and supports daily, monthly, and seasonal intervals.
  5. Added teca_vertical_reduction an algorithm that converts 3D data to 2D by applying a reduction in the vertical spatial dimension. This is a base class that contains code common to vertical reductions.
  6. Added teca_integrated_vapor_transport a vertical reduction that computes IVT from horizontal wind vector and specific humidity.
  7. An improved floating point differencing algorithm was developed and a number of codes were updated to use it.

Command Line Applications

  1. Added teca_integrated_vapor_transport command line application for computing IVT.
  2. Added teca_restripe command line application for re-organizing NetCDF datasets.
  3. Added teca_deeplab_ar_detector command line application detecting AR's using machine learning.
  4. Integrated IVT calculations into the teca_nayesian_ar_detector.
  5. Normalized names and meaning of command line options across command line applications

Python Capabilities

  1. A polymorphic redesigned the teca_python_algorithm makes it easier to use.
  2. Handle numpy scalar types
  3. Expose more features such as MPI aware thread load balancing, calendaring, profiling, and file manipulation utilities.

Testing

  1. Added testing infrastructure and tests for command line applications
  2. Deployed testing on Ubuntu 18.04, Fedora 31, Fedora 32, and Mac OS with xcode 12.2.

Bug fixes

More than 50 bug fixes were reported.

- C++
Published by burlen about 5 years ago

TECA - TECA 3.0.0

This is a major release in support of:

T.A. O'Brien et al, "Detection of Atmospheric Rivers with Inline Uncertainty Quantification: TECA-BARD v1.0", Geoscientific Model Development, submitted winter 2020

The pipeline internals were refactored to be more general, the assumption that time was the dimension across which the reduction is applied was removed, as well as changes that enable nested map-reduce.

The TECA User Guide was ported to "Read the Docs". https://teca.readthedocs.io

Our Travis CI test infrastructure was updated to use Docker, and two new OS images Fedora 28, and Ubuntu 18.04 were deployed.

More than 40 bug fixes

New algorithms included in this release:

| Type | Name | Description | |---------------------|------------------------------------|-----------------------------------------------------------------------------------| | general puprose | teca2dcomponentarea | Computes the area's of regions identified by the connected components filter. | | general puprose | tecabayesianardetect | Detects atmospheric rivers using a Bayesian method. | | general puprose | tecabayesianardetectparameters | Parameters used by Bayesian AR detector. | | general puprose | tecacartesianmeshsource | Used to create Cratesian meshes in memory and inject them into a pipeline. | | general puprose | tecacomponentareafilter | Masks regions with area out side a user specified range | | general puprose | tecacomponentstatistics | Gathers information about connected component regions into a tabular format | | general puprose | tecalatitudedamper | Multiplies a field by an inverted Gaussian (user specified mean and HWHM) | | general puprose | tecanormalizecoordinates | Transforms Cartesian meshes such that coordinates are always in ascending order | | general puprose | tecapythonalgorithm | Base class for TECA algorithm's written in Python. Handles internal plumbing | | core infrastructure | tecamemoryprofiler | Supporting class that samples memory consumtion during application execution | | core infrastructure | tecaprofiler | Supporting class that logs start, stop, and duration of developer defined events | | I/O | tecacartesianmeshreader | Reads TECA Cartesian meshes in TECA's internal binary format | | I/O | tecacartesianmeshwriter | Writes TECA Cartesian meshes in TECA's internal binary format | | I/O | tecacf_writer | Writes TECA Cratesian meshes in NetCDF CF2 conventions |

New applications included in this release:

| Name | Description | |--------------------------|-------------------------------------------------------------------------| | tecabayesianardetect | Command line application that can be used to detect AR's on HPC systems | | tecaprofile_explorer | Interactive tool for exploring run time profiling data |

- C++
Published by burlen about 5 years ago