machexplorer

MACH data explorer application

https://github.com/k-sink/machexplorer

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.2%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

MACH data explorer application

Basic Info

Host: GitHub
Owner: k-sink
License: mit
Language: R
Default Branch: main
Homepage: https://github.com/k-sink/MACHexplorer
Size: 3.63 MB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 4

Created 11 months ago · Last pushed 10 months ago

Metadata Files

Readme License Citation

MACH Explorer

Overview

Developed by Katharine Sink, MACH Explorer leverages Shiny for an interactive interface to evaluate and manipulate the MACH dataset (Sink, 2025). The MACH dataset encompasses hydrometeorological and attribute data for 1,014 watersheds. MACH Explorer simplifies data access by offering a dynamic interface that enables users to filter and extract data, based on a suite of temporal and spatial criteria. The interface is enriched with multiple specialized tabs, each designed to enhance data interaction and retrieval. A standout feature is the ability to preview data subsets and export them as CSV files, significantly reducing the barrier to entry for users unfamiliar with programmatic data processing. At the core of the MACH Explorer lies a robust DuckDB database management system file, full_dataset.duckdb, which consolidates MACH time series data (1980-2023), MOPEX time series data (1948-1979), and catchment attributes for 1,014 basins. This database integrates daily hydrometeorological variables and attributes into a unified, query-efficient structure. The MACH Explorer incorporates a portable R environment, encapsulating all necessary R dependencies and packages within a self-contained executable, eliminating the need for users to install or configure R independently. Complementing this, a bundled portable Chrome instance serves as the rendering engine, delivering a seamless, browser-based interface that mirrors the functionality of a native application.

This repository contains the MACHExplorerInstaller.exe executable and the fulldataset.duckdb_ database file under Releases. The DuckDB file was created from MACH data to streamline app efficiency. The MACH dataset CSV files are separately available for download at zenodo.

If you use this software, please cite: Katharine Sink. (2025). k-sink/MACHexplorer: MACH Explorer version 1.0 (v1.0-machexplorer). Zenodo. https://doi.org/10.5281/zenodo.16585881

Installation

Download the MACH_Explorer_Installer.exe from the GitHub releases page.
Run the installer and follow the on-screen instructions.
The app will be installed to %LocalAppData%\MACH Explorer App.

Launch the installer to begin setup. Click "Next" to install in default location.

Click in the checkbox next to the "Create a desktop shortcut" if you want to create one, then click "Next".

The application is ready to install. Confirm the folder location and additional tasks, then click "Install".

The application includes a portable R and portable Chrome browser, creating a self-contained Shiny executable. Click "Finish" to exit the setup. MACH Explorer is now ready!

Requirements

Windows operating system.
No additional software required (includes portable R and Chrome).
Approximately 1.02 GB total.

Support

Report issues or suggest features at https://github.com/k-sink/MACHexplorer/.
Contact: katharine.sink@utdallas.edu

Please note that I am not a software developer ... just a doctoral student who created an app. 😺

License

This project is licensed under the MIT License - see the LICENSE file for details.

Usage

To start the MACH Explorer, locate the installation directory and double-click run.bat or double-click the desktop shortcut if you elected to create one during app installation. - What You’ll See: When you run run.bat, a black Command Prompt window will appear with a message like "Listening..." (Figure 1). This indicates that the app is starting and preparing to display its interface. You do not need to do anything in the window. The window must remain open for the app to work. - Browser Opening: After a moment, the app will open in a web browser window (using a built-in Chrome instance). You can begin using the app from there. - Keeping the Command Window Open: Do not close the Command Prompt window while using the app, as this will stop the app. To exit, simply close the browser window or click the "Stop" button if provided, then close the Command Prompt window. - Troubleshooting: If the browser doesn’t open or the "Listening..." message doesn’t appear, ensure all files are correctly installed and try running run.bat again. Check the Command Prompt for any error messages if issues persist. Figure 1: Command prompt window that connects to Chrome browser and launches the R script.

Getting Started

Users can browse and retrieve time series and/or attribute data for up to 1,014 catchments. 1. Connect the app to the database file on the Data Import tab. 2. Select sites on the Site Selection tab. All tabs in the app retrieve data for the sites selected here. 3. After selecting sites, any of the following tabs can be used, independently of the others. Retrieve time series data for selected sites using the Daily Data, Monthly Data, and/or Annual Data tabs. Retrieve attribute data for selected sites using the Attributes and/or Land Cover tabs. Retrieve original MOPEX data for selected sites using the MOPEX Data tab. Note that the Retrieve and View Data buttons essentially query the database and must be pressed any time changes are made to filtering criterion or to site selections.

Data Import

The app uses a database management system called DuckDB, which is portable and provides APIs for languages such as R. Prior to using the app, download the full_dataset.duckdb database file from the Github releases page. The database file contains the same information available in the zenodo release, consolidated for efficient querying. When the app is launched, it will open on the Data Import tab (Figure 2). Use the Browse button to locate the database file on your local machine.

Figure 2: Data Import landing page.

All tabs will be disabled until a connection is established with the database file. A purple progress bar will be displayed while the app is connecting to the database file. Once the app has connected successfully, a green "Connected to Database" status message will be displayed (Figure 3).

Figure 3: Data Import page after database connection is established.

Site Selection

After the database connection is established, all additional tabs will be available. The Site Selection dashboard contains a location map for all 1,014 watersheds from the MACH dataset along with filtering options including spatial criterion (state, latitude, longitude) and general catchment attributes (mean elevation, drainage area, slope). All subsequent tabs in the application are dependent on the sites selected on this page. Sites selections are retained throughout the application. The dashboard consists of "Filter Sites", "Edit Individual Sites", "USGS Station Locations", "Discharge Record", and "Selected Sites" (Figure 4). The tables and map update in real time as filters are adjusted and represent the sites that will be used for data retrieval on subsequent tabs.

Figure 4: Site Selection dashboard.

The "Filter Sites" box, Figure 5, contains spatial filters. To enable a selection, click in the checkbox. Clicking on the State box will display a drop-down menu. To select a state, click the name to add and to remove, click in the box and then use backspace. More than one state can be selected at a time. Slider bars can be used to determine a range of values for each numeric attribute, once it is selected. The ranges default to cover all possible values on record. Latitude (N) is decimal degrees north, Longitude (W) is decimal degrees west, Mean Elevation (m) is mean elevation above sea level in meters, Drainage Area (km2) is basin drainage area in square kilometers, and Mean Slope (percent) is overall mean basin slope in percent. The Reset Filters button will clear all selected filters and display all 1,014 sites. This button can be pressed at any time.

Figure 5: Spatial filtering options include state, latitude, longitude, mean elevation, drainage area, and slope.

Individual sites can be added or removed based on user preference using the "Edit Individual Sites" box. For example, if a watershed is missing streamflow data for a specific period, it can be deleted from the selections using the 8-digit site number, shown in Figure 6. To remove a site, manually type the site number into the bottom box and then click the Remove Site button. To add a site, manually type the site number into the top box and then click the Add Site button. Leading zeroes should be included, if applicable to the site number. The Remove All Sites button will clear all selections, resulting in a blank display and tables. Individual sites can then be manually added. The "Edit Individual Sites" box lets you customize exactly which basins you want to obtain data from.

Figure 6: Individual sites can be added or removed from the filtered selections.

The leaflet map "USGS Station Locations" displays the sites listed in the "Selected Sites" table and updates as filtering criterion are changed. The blue points represent the USGS stream gauging station locations. The default base map is OpenStreetMap, which includes geographical features such as roads, buildings, and trails. The basemap can be changed to EsriTopo, which provides labelled topographic contours and hillshade. Basin delineations can be toggled on and off by checking the Basin Delineations box (Figure 7). The polygons represent the basin drainage area.

Figure 7: USGS stream gauging site locations along with basemap options.

A point can be clicked on in the map to display a pop-up box containing the site number, site name, and coordinates (Figure 8).

Figure 8: Pop-up information box.

Corresponding information for the filtered watersheds are included in two separate tables. Since not all basins have a complete streamflow record, the "Discharge Record" table (Figure 9) displays selected sites and the number of daily streamflow observations on record. The columns include the site number (SITENO), total number of records (count_rec), the first available date (first_date), the last available date (last_date), and the number of streamflow records for each calendar year. Leap years will contain 366 days. All sites in the MACH dataset have a minimum of 10 years of streamflow data. A complete record will contain 16,071 days.

Figure 9: Streamflow records per site. The number of displayed records can be changed using the dropdown (5, 10, 20, 50). The search bar allows numbers and characters. Columns can be ordered using the diamond buttons near the header.

The "Selected Sites" table (Figure 10) displays sites based on "Filter Sites" and "Edit Individual Sites" information. The table includes the USGS site number (SITENO), site name (NAME), state (STATE), latitude (LAT), longitude (LONG), mean elevation (ELEV), basin drainage area (AREA), and overall mean basin slope (SLOPE).

Figure 10: Selected sites with SITENO, stream gauging site name (NAME), latitude (LAT), longitude (LONG), elevation (ELEV), drainage area (AREA), and mean overall slope (SLOPE).

Daily, Monthly, and Annual Data

Once the filtered basin selections are made, you have the option to evaluate time series at the daily, monthly, and/or annual scale.

DAILY DATA

The Daily Data dashboard is shown in Figure 11.

Figure 11: Daily Data dashboard.

For daily data, shown in Figure 12, each variable can be filtered by a range of values after the variable is selected. To select a variable, check the box under "Select Variable(s)". Once selected, minimum and maximum value boxes will be displayed. If you want all possible values, you do not need to make any adjustments. The default values cover the range among the 1,014 sites. Variable abbreviations and units are: precipitation (PRCP) in millimeters per day, mean air temperature (TAIR), minimum air temperature (TMIN), and maximum air temperature (TMAX) in degrees Celsius, potential evapotranspiration (PET) in millimeters per day, actual evapotranspiration (AET) in millimeters per day, stream discharge (OBSQ) in millimeters per day, snow water equivalent (SWE) in millimeters per day, shortwave radiation (SRAD) in watts per square meter, water vapor pressure (VP) Pascals, and day length (DAYL) in seconds per day. The data can also be filtered temporally under "Select Time Period(s)". The Date Range enables you to select a beginning and ending date. The Calendar Year and Month can also be selected using the dropdown menu to further refine the temporal range. Calendar Year is January 1 to December 31. Multiple Calendar Year and/or Month selections can be made and subsequently removed by using backspace. If Date Range and Calendar Year are both selected, the year must fall within the established range. An error message will be displayed if this occurs, which you can dismiss. No data will be displayed or queried from the database until the Retrieve and View Data button is pressed. The Reset Filters button will clear all selected variables and values.

Figure 12: Available climate variables in MACH. Daily data can be filtered by value range, date range, calendar year, and/or calendar month.

The "Filtered Daily Data" table header will show the selected number of sites (Figure 13). If this number is not consistent with the Site Selection tab, make sure you have pressed the Retrieve and View Data button again to update the query. The table will display the selected variables and specified filters for the first 1,000 records only. All data will be available upon download. If you want to view data for a specific site before downloading, select that single site on the Site Selection tab. Again, only the first 1,000 records will be available for preview. The data preview is limited for this tab due to the large volume of available data (+16 million rows). If any streamflow data are missing, the cells in the OBSQ column will be blank and availability will correspond to the "Discharge Data" results displayed on the Site Selection tab. Missing values in the downloaded csv files are indicated by NA. If ranges for selected variables are not applicable, the table will return No data available in table.

Figure 13: Data preview for selected sites and variable(s). The table will always include the SITENO and DATE columns.

Since the data preview is limited to 1,000 rows, the "Data Summary" table displays the selected sites and the number of rows (days) of data that will be returned upon download. In the example shown in Figure 14, precipitation was selected along with January for the entire period of record (1980 to 2023). Every January over a 44-year period is 1,364 days.

Figure 14: Data available for the selected sites.

Each time series data tab has two download options as shown in Figure 15. The exported daily data will match what is depicted in the filtered table (SITENO, DATE, and variable columns). If more than one watershed is selected from the Site Selection tab, the Export as csv button will download all data as one single csv file, with each basin appended at the end of a previous basin's record. The exported data file naming convention is MACHtimeYYYYMMDD with the four-digit year, month, and day of the current date. The time corresponds to the tab name, either daily, monthly, or annual. If you would like to download data for each watershed separately, the Export as separate csv files button will result in a zip file containing individual csv files. The zip file naming convention is MACHtimeYYYYMMDD.zip and once extracted, will contain individual csv files, MACHtime00000000.csv, where the zeroes represent the 8-digit site number.

Figure 15: Download options for time series data.

A progress message will be displayed while data is downloading (Figure 16). Note that larger amounts of data may take a few minutes. Please wait until the download is complete before moving to a new query or tab, otherwise, the app may freeze. All csv files will contain SITENO as the first column (character) and DATE (MM/DD/YYYY) as the second, followed by selected variables. All data tabs are independent of each other, meaning that the variables and filters selected on one tab, for example daily, will not carry over to monthly or annual.

Figure 16: Progress message during zip file creation.

MONTHLY DATA

The Monthly Data tab contains all climate variables and temporal filters. Rather than a range of values for each variable, you have a choice of Minimum, Maximum, Median, Mean, or Total, shown in Figure 17 in the "Select Statistic" box. Only one option can be selected at a time. The default is Mean. All statistics are based on the daily values over a month. For example, if Maximum is selected for precipitation, the maximum daily precipitation value for each individual month, January to December, will be returned for each calendar year. If Total is selected, all temperature variables will return the mean value.

Figure 17: Statistics for monthly data. Includes minimum, maximum, median, mean, or total of daily values.

Data can also be filtered by Calendar Year and/or Month under "Select Time Period(s)" as shown in Figure 18. The "Filtered Monthly Data" table will show all data returned and includes SITENO, YEAR, and MONTH columns along with any selected variables. These columns are what will be returned in downloaded monthly data csv files.

Figure 18: Monthly data table with statistic options, variables, temporal filters, and table display.

ANNUAL DATA

The Annual Data tab is similar to Monthly Data, except an "Annual Aggregation" option must be selected, either Water Year or Calendar Year, as shown in Figure 19. The default selection is Water Year. A water year begins on October 1 and ends on September 30 of the following year, for example, water year 1981 begins on October 1, 1980 and ends on September 30, 1981. The "Select Statistic" default is Mean.

Figure 19: Annual aggregation selection.

The Annual Data dashboard is shown in Figure 20.

Figure 20: Annual Data dashboard.

Annual data can be filtered by year under "Select Time Period(s)", which will reflect the aggregation option selected under "Annual Aggregation" (Figure 21). Water year options do not include 1980 since the data begins on January 1 1980. Water year 2023 ends on September 30, 2023.

Figure 21: Annual temporal filter.

MOPEX Data

MACH Explorer also enables retrieval of MOPEX data from the 395 basins within MACH also identified in the MOPEX dataset. If any filtered watersheds from the Site Selection tab are labelled as MOPEX in the site_info.csv (available on zenodo), they will appear on the MOPEX Data tab (Figure 22). In the example below, only 2 watersheds in Arizona (out of the 17 total sites) are also in MOPEX. The "Stream Discharge Record" is the same as the "Discharge Record" table, except it displays the total number of streamflow records per calendar year for the MOPEX sites (1948 to 1979). A complete record will have 11,688 days.

Figure 22: MOPEX Data dashboard.

There are two export options, shown in Figure 23. MOPEX only will download available stream discharge, precipitation, minimum temperature and maximum temperature for January 1, 1948 to December 31, 1979 for all identified MOPEX basins selected on the Site Selection tab. MOPEX & MACH will automatically append the MACH data beginning January 1, 1980 through to December 31, 2023 for stream discharge, precipitation, minimum temperature, and maximum temperature. Note that MOPEX data may have missing values for any variables.

Figure 23: MOPEX Data dashboard.

The table results will not change based on the export option. After the export option is chosen, make sure to press the Retrieve and Confirm Data button to query the database. A confirmation message will be displayed that reflects the export option. If you change the export option, make sure to press the Retrieve and Confirm Data button again.

For MOPEX only data, the message will indicate that MOPEX only data retrieved (Figure 24), while the MOPEX & MACH option will indicate that MACH data has been appended (Figure 25).

Figure 24: MOPEX only confirmation message.

Figure 25: MOPEX & MACH data confirmation message.

When the data is downloaded, the file naming convention for Export as csv is MOPEXYYYYMMDD.csv, and for **Export as separate csv files**. The zip file will be MOPEXYYYYMMDD.zip and the individual csv files will be MOPEX_00000000.csv where the zeroes represent the 8-digit site number. The first column of the csv file will be SITENO (character), followed by DATE (MM/DD/YYYY), OBSQ (streamflow), PRCP (precipitation), TMAX (maximum air temperature), and TMIN (minimum air temperature).

Attributes

MACH Explorer also includes catchment attributes and you can select according to the number of attributes (single, monthly, or annual) and type. The retrieved attributes pertain only to the filtered sites from the Site Selection tab. Only one attribute type can be selected at a time under "Select Attribute Type". Selected attributes will display in the "Selected Attributes" table with the SITENO as the first column followed by any selections. The table header will update based on the attribute type (single, monthly, annual). The Retrieve Attributes button must be pressed if any attribute selections are changed. A detailed README.csv file with attribute descriptions is located on zenodo. The options available in the drop-down menus correspond to the column names in the various attribute tables (i.e. climate, soil, geology). A portion of the README.csv file is shown in Figure 26. The file name correlates with the attribute type, for example, overall_climate is Climate under the overall (single attribute per site) option.

Additional attributes are available outside of the MACH Explorer application, on zenodo, including detailed dam information, SSURGO soil data, and precipitation indices (daminfo.csv, soilssurgo.csv, Rx_prcp.csv). The contents are too extensive for inclusion in the app.

Figure 26: Screenshot of a portion of the README file contents.

Single Value per Site display attributes that have one overall value (Figure 27). These categories under "Select Overall Site Attribute(s)" include Catchment, Climate, Hydrology, Soil, Geology, Regional, and Anthropogenic.

Figure 27: Single value per site for overall attributes.

The Monthly Value per Site option pertains to climate attributes calculated using daily MACH data. The table for monthly attributes includes SITENO and MONTH columns, followed by attribute selections (Figure 28).

Figure 28: Monthly value per site for monthly climate attributes.

The Annual Value per Site option pertains to climate attributes calculated using daily MACH data (Figure 29).

Figure 29: Annual value per site for annual climate attributes.

The "Download Attributes" option name will change based on the attribute type selection. All export buttons will download a single csv file that resembles the table output shown in the dashboard. The file naming convention is MACHattYYYYMMDD.csv for overall site attributes, MACHmonthlyattYYYYMMDD.csv for monthly attributes, and MACHannualattYYYYMMDD.csv for annual attributes.

Land Cover

The Land Cover tab retrieves land cover data for the selected sites (Figure 30). Available data begins in 1985 and ends in 2023. Values are percent coverage by basin area and all 16 classes will equal 100 percent. Specific years can be retrieved using "Select Calendar Year(s)" and 16 specific classes using "Select Land Cover Class(es)", which represent the modified Anderson Level II classification system (Anderson et al., 1976). The Retrieve Land Cover Data button must be pressed each time the filters are changed. The results are displayed in the "Land Cover Data" table and include SITENO, YEAR, and the selected land cover classes.

Figure 30: Land Cover Data dashboard.

Documentation

The MACH Explorer uses data and software downloaded from the following sources: - R: The R Project for Statistical Computing version 4.3.3. (Angel Food Cake). - Chrome: Google Chrome Portable. - Daymet: Daymet Version 4 - climate variables (precipitation, minimum air temperature, maximum air temperature, snow water equivalent, vapor pressure, solar radiation, day length) for January 1, 1980 to December 31, 2023. - GLEAM4: Global Land Evaporation Amsterdam Model - climate variables (actual evapotranspiration, potential evapotranspiration) for January 1, 1980 to December 31, 2023. - MOPEX: Model Parameter Estimation Experiment - climate variables (precipitation, minimum air temperature, maximum air temperature), streamflow for January 1, 1948 to December 31, 1979. - USGS NWIS: United States Geological Survey National Water Information System - streamflow data January 1, 1980 to December 31, 2023. - NHDPlus Version 2.1: National Hydrography Dataset - catchment attributes. - MRLC: Multi-Resolution Land Characteristics Consortium - land cover. - NID: US Army Corps of Engineers National Inventory of Dams - dam attributes. - NRCS: Web Soil Survey - soil attributes. - MACH: https://zenodo.org/records/16414465 (Katharine Sink, 2025).

Owner

Name: Katharine Sink
Login: k-sink
Kind: user
Location: Dallas, TX

Repositories: 1
Profile: https://github.com/k-sink

GitHub Events

Total

Release event: 3
Delete event: 1
Push event: 23
Create event: 5

Last Year

Release event: 3
Delete event: 1
Push event: 23
Create event: 5

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science