Recent Releases of 4cat
4cat - v1.50 We're all going on a summer holiday
This updates comprises various under-the-hood changes to make 4CAT faster and more robust. It also includes more features for annotating datasets with LLMs, processors, or manual input; easier extension management via the web interface; a number of new and updates processors; several new features for its web interface; and various bug fixes and updates to mapping code for datasets imported via Zeeschuimer.
⚠️ Docker users will need to rebuild their containers to benefit from some of the speed boosts implemented in this update. This may become required in a future release. For now rebuilding is optional and 4CAT will otherwise function normally (but sometimes slower than it could be).
⚠️ Please also follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:
New processors
- New processor (Video Wall) to make video collages of video datasets (5af02b3d35bcd5a86e584a62216e0457e9caf89f)
- New processor (LLM Prompting) that replaces the 'OpenAI prompting' processor and can interface with various LLM providers for LLM annotation of datasets, and allows for batched prompts and requesting structured output (#509, #515)
- New processor (Confusion matrix) to generate a confusion matrix of the values of two columns of a dataset (e7a9a8e080cb43ee78b488d6646d8e94b0f39c85)
- New processor (Replace text) to replace text within a dataset, resulting in a new dataset with the changed values (#512)
Updated processors
- Updated various processors to now have an option to save their results as annotations, meaning they will be available as an additional column on CSV exports of the parent dataset and can be edited manually in the Explorer (#507)
- Updated filter processors to copy the original dataset's annotations when making a new filtered dataset (#512)
- Updated the 'Download images' and 'Fetch URL metadata' processors to more intelligently utilise HTTP proxies. Requests are now limited per host name even if multiple processors run at the same time. Proxy settings can be configured in the 4CAT Control Panel (#487)
- Updated the 'Download images' processor to be available for more types of datasets (580e51206e65a5d959c85a43a8078e3a41583006)
- Updated the English (Infochimps) and Dutch (Onzetaal) word lists used by, among others, the 'Tokenisation' processor (8886507e7e9242270f0d35e2b14bbdcf836963dc, e44f676fe2b4b9c6e92cc27c215f1afd58f2b240)
- Updated the 'Image wall' processor to now also run directly on video datasets and offer more options for sorting and sizing (#508)
Web interface
- More control over extensions, including the ability to install and uninstall them via the web interface and enable and disable them while installed (#463)
- A new 'Jobs' page in the Control Panel shows an overview of workers running in the backend and allows admins to stop specific workers (#501)
- 4CAT can now use memcached to speed up the loading of pages in the web interface (#393, #492, 35a02424a81fe1a37157f2d4c840d449f0079923)
- Optimisations for dataset pages to make loading faster for datasets on which a lot of processors have been run (35a02424a81fe1a37157f2d4c840d449f0079923, c6caa0ce5776c55c0b722cd09128754e75c4377f, 05fe139ea9d543e07e3ec752c7c81482a9c7aa09)
- Dataset parameters on dataset pages can now be clicked to copy their value to the clipboard (cf2dd849fd9e6d40923cadff6718d33a6c621956)
- The Control Panel's 'Logs' page now also shows the web interface's own log, if it is in a separate file (c0d009196e3d0ebbca6b20611079063d9aeb64af)
- Added various new endpoints to 4CAT's API for automated use (ee3d2d005cf816254ca36a524812f5c2264c525a)
Bug fixes
Data sources and mapping
- Fixed an issue with the Telegram data source where spaces before or after the API key would mean the API key would not be recognised (c3c57a8dd9125801bbae24ac65cd6f73641a75d2)
- Fixed an issue with the Instagram data source where mapped items would not always have the same columns (da4e29da3ef6839bcbed941ff93b2738aec6e3d4)
- Fixed an issue with the X/Twitter data source where imported items with quoted posts that could not be loaded would not map properly (dfab94e5bcf53f38fa254a9d4861eec48724294f)
- Fixed an issue with the Telegram data source where it would not map an item if the Markdown version of its contents was not included (580f43ef9345362439cb92ac6501c5a078531a2f)
Processors
- Fixed an issue with the 'Download images' processor where a limit on the amount of images that could be downloaded was not parsed properly (860625c064cb0689b422bd2baa06a4c7dd9b0d73)
- Fixed an issue with network processors which would make it crash when trying to create a time-slice network from a dataset with invalid or empty timestamps (09aa847376741b54ffadf888ec15ee51731553dc)
- Fixed an issue with the 'Word Tree' processor where it would not properly include all selected columns in its output (24c6391c6a1c66ae1c762a0391308f56a6dd2c6a)
- Fixed an issue with the 'Datasource metrics' worker where it could crash if files it was scanning were deleted during the worker's operation (46442205aaaac134822f7558b3dca21ef8a127a6)
- Fixed an issue where 4CAT could crash when workers would automatically add jobs with bad code (#511)
Web interface
- Fixed an issue with the 'Logs' page in the Control Panel where logs would not properly load if they contained certain UTF-8 sequences (694c0dfb14eeea659f2b1b11fdc5c6b088dcddf1)
- Fixed an issue where the 'Preview' button would sometimes be visible for unfinished or empty datasets on dataset pages (4a888a0ce7786418a417ff5e708168b3f8a20974)
- Fixed an issue where dataset pages could take extremely long to load for very large datasets that could not be properly mapped with
map_item()(#393)
Operation
- Fixed an issue with the 'create_user.py' helper script that would make it not run properly (f671374526ba65a09bf0d34e8d99037c81f7b814)
- Fix an issue where the front-end would sometimes not properly load user-specific settings (#473, #455, #503)
- Fixed an issue where 4CAT would become unresponsive after a failed database query. Queries are now retried upon failure to allow for short database outages without crashing 4CAT (#466)
- Fixed an issue when running 4CAT outside of Docker on Windows where a POSIX-only dependency could not be loaded (1719d45f13f3fd1deb8ee0b5708db376c414519c, 9cec441c36b5bb1ffff32300087a3b468396b993)
Full Changelog: https://github.com/digitalmethodsinitiative/4cat/compare/v1.49...v1.50
- Python
Published by stijn-uva 7 months ago
4cat - v1.49 A smörgåsbord of fixes and updates
What's Changed
This updates comprises a revamp of the Explorer to make, in particular, the annotation of data via its interface easier and more efficient and facilitate the use of LLMs to create them.
There are also many fixes and updates to item mapping code for datasets imported via Zeeschuimer, as well as a number of other fixes.
⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️
We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up, or see error messages in the log file when upgrading via the web interface.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:
Explorer
- The explorer has been rebuilt to allow for more flexible annotation and in particular allow for annotation with the help of large language models, either via the DMI Service Manager, a locally running LM Studio, or the OpenAI API (#428)
- Sorting items in the Explorer is now more flexible for datasets that support it (i.e. aren't too large for sorting)
- More data sources now have Explorer interfaces reflecting what items would have looked like in their original context
- Annotations are now saved separately from the original data files, and added on-the-fly to files and interfaces where necessary, so that they can be manipulated more easily
Updates to interface, data sources, and processors
- Update Threads data source to use and refer to threads.com as the main domain instead of threads.net (5ae076d288c69b73bc02da6e687665b915868e99)
- Update Threads data source to properly extract image URLs again (0a8e2cf8ee0e308a59ebeb63badc4748d4ccb7c9)
- Update TikTok data source to put the 'stickers' column right after the 'body' column in CSV files (8347cacbd8319457b95e08ebe0b5f33421bad806)
- Update Bluesky data source to properly manage login sessions while collecting data (591b170f53c08b995fbccc487e4f3156ed12972e, ecb2147996e97673e209ddf0dbb63dbe1468742e)
- Update Douyin data source to properly handle a number of fields in the source data (dedac8becdaa5fc89792e40c074c1a7cecc17c0c)
- Update the Twitter data source to cope with changes to the Twitter object structure (e2a95111eb78a08b8328c9de70b399cc1fc552bc, da02d6003ba1c89d57842dce37c9ca6732c3a7da, ad69ffd6ce162c9a7fb6582a6b6d02dfa5b45c8a)
- Update 4chan scrapers to cope with Janitor-removed posts once more (a5eb1acf258b2f3196cbcbbf6750693483029fd5)
- Update the PixPlot interface to organise information more usefully when inspecting specific images (ec93e38a3447242f8ce80981c18cc8a4c73f4ca6)
- Update the OCR processor to more flexibly handle timeouts when doing image-to-text for large datasets (ecfa52271c8fd2e88ff1025eebc5c0aa33ac833e)
- Update the 'CSV import' data source to accept CSV files containing items with no timestamp or an empty timestamp (6320f30fcb07e7f90e84becdc2e9a4152be2731f, a60032c6b7b1faa3bd23a4e7ad6f9457d30dcf50)
- Update the 'about this service' field on the 4CAT home page to allow for Markdown formatting (2dc73ac74f492494e9388e64dcf404139283726b)
Bug fixes
- Fix an issue with the reading of the restart log file when running on Windows (4bdf59f7570a9e71be42bd9bc297e2c68d598e58)
- Fix an issue with Twitter data mapping when reading hashtags from TCAT-sourced datasets (19d24468dbdc4affbceae6f1df1c8a608a1579a2)
- Fix an issue with Twitter data mapping where it would crash on posts without an avatar URL (could occur for users with an NFT avatar) (f4995fd2c6da0c927f4882ede36f13b46ea9c32f)
- Fix an issue with Twitter data import where it would crash when encountering tweets with a missing timestamp (dcbcd84c9ef14ab2aaf5bac799f4dd9eaee82be1)
- Fix an issue where TCAT data imports could crash when encountering an erroneous item; these are now skipped and logged (ed6b5a8c70d906f8dc6e63e50891132787277976)
- Fix an issue with the TikTok (URLs) data source where it could run with an empty list of URLs (6948fe03ace62bc0324cde58fb5cc2f51c1c8913)
- Fix a crash in the
/api/get-standalone-processors/endpoint (2b0d7500ab30d739029ac470431bb172ab108a59) - Fix a front-end issues where the page could not be rendered when notifications had a specific format of expiration time (e225d554a522bd46aa7786ad22e2334695f670e7)
- Fix a front-end issue when creating a dataset imported from TCAT without selecting a query bin (b5c76588a9e926736557e2b1f7d44ddc80de46ca)
- Fix a front-end issue when trying to load a result page containing datasets with invalid references to parent datasets
Deployment
- Change Docker default configuration to bind front-end to localhost only rather than listening on the public interface (6a90ac0a458af9d36f24b94b3cc0ec47ac416a32)
Full Changelog: https://github.com/digitalmethodsinitiative/4cat/compare/v1.48...v1.49
- Python
Published by stijn-uva 9 months ago
4cat - v1.48 Spring cleaning 2025
What's Changed
This small update focuses on bug fixes that could not made it for the previous release, as well as some fixes for issues introduced in v1.47. There are also a few smaller feature updates.
⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️
We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up, or see error messages in the log file when upgrading via the web interface.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:
New data sources and processors
- Add a data source for RedNote comments, to be imported with Zeeschuimer (352668bc5c9f711a85f4548cb0d428c2d850897a)
- Add a processor to convert GEXF files to CSV files (c6a03264e35d10a6c95d95297101955f5b5881fd)
- Add an option to the control panel to configure a rate limit for account requests (15e27a609e55c8a33cb7f1ad9d229dbbea650c26)
Updates to interface, data sources, and processors
- Updated the Bluesky data source to hide passwords when entered in the 'Create dataset' form (05d13f436933572445f36149ba1a9fa944beb7e6)
- Updated the 'Histogram' processor to allow plotting over other intervals than months (c0ab7282e8e004e8a2d6b00d42c7868d430745e9)
- Updated the web interface's logging code to also log HTTP errors (638b552ca0e091872189af07d55ceeb3725b2671)
- Updated internal logic to speed up dataset page loading (#484)
- Update the co-tag network processor to recognise more columns as containing tags (1cf012022f6adbf22c3fabea3aca7f7c430e3ebd)
- Update the 'Top hashtags' processor to allow choosing which column to read tags from (7a8c446d518f7b4cfc40e013015be594ebfd3ac7)
- Updated the 'Download TikTok videos' processor to be compatible with user-uploaded datasets (db302b980e45647eb8f2e5e509705d11942531ae)
- Update the 'Video hasher' processor with an option to generate a CSV file with results (9bbf3045fbfcc53836242193197e775751c6d037)
- Updated the data source information page with an example of what a dataset for that data source might look like, if available (#485)
Bug fixes
- Fix an issue with processor presets where datasets would not always be assigned to the correct owner (47b8c67bbdad7a9fa1b0f631a2110276ad198e6f)
- Fix an issue with the RedNote, Threads, and Pinterest data sources that made datasets lack a 'thread_id' column (536a7e1846753f2dbe62dc8f03174e8092487a9a, 1af775d15b7289391564ab455bd52ff6a16656c7, 2eccc09e984c69bff542bc304694f91922f0d33f)
- Fix an issue with the Bluesky data source when using it to create pseudonymised datasets (7e1f7fb80824600ce119a36c8d67600f370bfbb6)
- Fix an issue with the RedNote data source when parsing items with no 'video media' (cb74df6543a15520d567a25474e3b572e850d05f)
- Fix an issue with the RedNote data source when parsing items with nested images (7997d3972e1367107beb9a7907e570b191913b4f)
- Fix an issue with the LinkedIn data source when parsing items with no author description (adb174aee14daf1f92acf0a870484133614e4e41)
- Fix an issue with the Threads data source where it would not always recognise embedded links (e4361cbb7d236dae9a114643254bf978cf4edf86)
- Fix an issue with the Truth.social data source where it would not map items correctly (723719dfd91082a270e09674aee877e632446ea3)
- Fix an issue with the PixPlot processor that would make it crash when items had no associated timestamp (207e3613070d713826d10d428dd0b00256cbb9d7)
- Fix an issue with the Image Wall with Labels processor that would make it crash when encountering invalid images (3282baef12404677a2c0c58845bc74715a68a8d0)
- Fix an issue with the 'Nuke dataset' button that made it not work (2510cbd224ccb0f53b31349aefb3cb24e2457b70)
- Fix an issue with the web interface where occasionally an 'undefined' error would show when trying to start a processor (5bab21d833fce8a6b0534bd20a587dd8d12a015c)
- Fix an issue where Zeeschuimer-based data sources would not show up in the filter on the datasets overview page (a727a19c636c3f9d6ec9e574cb63cbc440a22227)
- Fix an issue with the web interface where the page would not load correctly after the first run when trying to 'phone home' (dd5eb013d6f98b56c50d08f8242f3f43f21f68c0)
- Fix an issue with the restart log in the control panel where it would emit some log messages twice (8e02449dd76d7164e8e812aa26433557cbb5c667)
- Fix an issue with the control panel where it would in some situations not display the correct checked out git branch (30af6a902be4ef7b3d3ace7be3dd293c7023ce72)
Full Changelog: https://github.com/digitalmethodsinitiative/4cat/compare/v1.47...v1.48
- Python
Published by stijn-uva 10 months ago
4cat - v1.47 Spring is in the air, 2025 edition
This update introduces three new data sources; two for data imported via Zeeschuimer, from Pinterest and RedNote/Xiaohongshu. A third new data source allow for direct data capture from Bluesky, if provided with a Bluesky login.
There are also several new processors, focused on image analysis as well as using LLMs. In the latter category, a processor that allows one to prompt the OpenAI API for text generation based on a dataset's content allows for, for example, LLM-based coding or categorisation of a dataset collected with 4CAT.
Additionally, this release includes many bug fixes to processors, data sources and the 4CAT web interface.
⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️
We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up, or see error messages in the log file when upgrading via the web interface.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:
New data sources and processors
- Added data source: Bluesky, allowing for the capture of Bluesky posts for a given query - requires a Bluesky login (115a3c155f1c5edbce3d6c62c1fbbf66fa6a51ba)
- Added data source: Pinterest, for importing data collected from the Pinterest website with Zeeschuimer (#478)
- Added data source: Xiaohongshu/RedNote, for importing data collected from the RedNote website with Zeeschuimer (34b8409abbc4055416bf57a6a29b50d66d92fab4)
- Added processor: Deduplicate images, filtering an image dataset for duplicates using a range of comparison methods (aad7d57e1b44f7c59a3eb89ba6282d37f10339fe)
- Added processor: Bipartite image-item network, which can be used with e.g. Gephi's "Image Preview" plugin to create visual networks (f98addc7a3c5173b0f531d07a0379e73204c1c61)
- Added processor: Vectorise by category, allowing for vectors of tokens grouped by some column from the parent dataset (aeb01f769f5709e5283aedf55980fd7fe14b8e76)
- Added processor: OpenAI prompting, to interface with the OpenAI API and generate text based on the combination of the prompt and a value from the parent dataset for each item. Requires an OpenAI API key. (c4052137969fa7c507a0a1bbfaba18744244361e)
Updates to interface, data sources, and processors
- Updated the 'import CSV' data source to better handle files of which the CSV dialect cannot be detected automatically (46b2805f6ec136895606860e1c316af023fc91f1)
- Updated the 'Media upload' data source to warn a user when trying to upload SVG files (which most processors will not handle) (d1192257428c554fd3d715af2500bcc5925bcfd3, 2987fd4f53f1e6949cf2344213ac62bafbb689a1)
- Updated TikTok dataset import to include issensitive and isphotosensitive columns in the CSV mapping (2f4211354c1b15d41f850ca9bace3fb9a69070e2)
- Updated TikTok image downloader processor to allow downloading author avatars (6881cbadf36f1ff28c39543ce25a6e8b8796e31e)
- Updated co-tag network processor to allow ignoring certain tags (e53b73f75a5acfa0373072d5786d07fa5d44a9bc)
- Updated the video downloader processor to better handle download errors and rate limits (d1d9347b57dcef48287c79f06c19585dbe0dad69, d4c43a7b8ca2b6a219f18a0e191d2935baa1f3c4)
- Updated the 'Count values' and 'Thread metadata' processors to better report their progress while processing large datasets (59a15465a7d31a051a4918ac703674e14ada1b0c)
- Update the 4CAT back-end to log a message when a dataset cannot be deleted due to file permission issues (638413a02ae3406b89e6b45a0b3e8ff8fb14d6fd)
- Update the Instagram imported data source to no longer consider a lack of geo-tags 'missing data' (3c62f37f139df2e397de411421c39e6ac944c695)
- Update the Instagram imported data source with a new column 'likeshidden' that indicates if the amount of likes are hidden by the post author; the 'numlikes' column will be empty if this is the case (79cb297d53519106203a5583bdac37c14ec6114f)
- Update the 'Image wall' processor to use the 'fit height' sizing option by default, instead of 'square' (a43c9aa65808af3de62466440beeaa94d41a97cd)
- Update the dataset status message after importing data from Zeeschuimer to provide clearer information about data fields missing from the imported file (711c8b4c26fdd606e720a04664f24e8142d26b70)
- Update the front-end to hide some processors from the list for a dataset if they are technically compatible but do not make sense to run in the given context (#472)
- Update datasets to keep track of when they finish being created; existing datasets take their 'finished at' date from the dataset log's last update (#462)
- Update the default 4CAT configuration to enable new data sources (4376b33932f6d68e0e368f7b8f8b1aaef90676d2)
- Update Twitter-related processors and data sources to reflect the platform's name change to X (187101926bb7c650ced3d14a9cac17560ddb27b6)
- Update Bluesky widget on 'About' page to show smaller link previews (da8328edbbcf64395e69d06b5abf3708e5d60cc8)
- Update interface footer to only show the 4CAT version when logged in (0792ef4dae41ddf4f282a82801fced557738e807)
- Update the look of CSV preview of datasets to be more readable and indicate missing data (cb2ef691153ba1c3b1c78e567ae070480df14e72, 8da18b397c28888160ae5e434390cb2b1f59547b, dd2ab725901e4f967d512d896e7c4e18de6905e8)
- Update the dataset overview page to now show empty/unfinished datasets by default (8261b25d62718f981163d85238df4b83642e8f27)
- Update the list of available processors when creating a follow-up dataset to always show the processor description (68db3159e18670cb224cc3d0a349935161ef6516)
Docker-related changes
- Updated the Docker version of 4CAT to use Python 3.11 (2600e55f4bb3297814bfb842ddeb0f8b2366a437)
Removals and deprecations
- Removed the 'FAQ' page from the web interface (d5c873ae4841c15b35ed9354137a0237e0d60c6f)
- Removed the 'Convert to Excel-compatible CSV file' processor - use Excel's CSV import wizard instead (6367500a1c4633b0eb7968fe48b19e3333a0391d)
Bug fixes
- Fix a crash when importing NDJSON files with invalid entries; 4CAT will now skip the item and warn about it instead (6aa7177edba61357638d858ad171468ffd51c945)
- Fix an issue where data sources that could be imported via Zeeschuimer would show as available even when disabled (e09e87502e2fcee8f2ad79b26e40c27f54842acf)
- Fix an issue with the Telegram image downloader processor that would stall when hitting a rate limit (5d5a0e30bb111a4096c22dc7929e79ca1a9d1f9c)
- Fix an issue with the Telegram image downloader where it would crash on a 'bad request' response (3df74c9be01118c354eb1457895052cace37cb9a)
- Fix an issue with the Telegram image downloader where it could end up in an infinite loop when encountering a deleted image (ac543cc8bf73d4ecb61ca4d93f162fa198f645a5)
- Fix an issue with image downloader processors where they would attempt to download all available images if that was set to be allowed, even when asking for fewer (99e8fd086e6062851097366ad41ce1cd6677c5b0, 0638ec21f72f261d041ee3f31217acd6b3db6299)
- Fix an issue with the TikTok image downloader processor where it would crash when encountering unexpected errors (b60e8cf51f58f89b0f1ae9793dd2e6ea6a035e97)
- Fix an issue where 4CAT log messages would be logged twice in some cases (ded8d3df49e7b3cf1e142bfc80019edb43f1adfe)
- Fix an issue with the video scene detection processor where it would crash when a video in the parent dataset had not been downloaded (9453b76099ec03b3cea859f812890adb36e13df9)
- Fix an issue with the video frame extraction processor where it would crash when no frames could be extracted (176905a6307e9c1afeb7b9f36083a26893ce9ec0)
- Fix an issue in the TikTok comments data source where comments without information on whether they had been pinned would be skipped (bfe30751d50f8b0a7ffaa9817b11b0143e8e8290)
- Fix TikTok data import to properly map the post author thumbnail URL (8e660a4674b5e570a51730a342c3336437ab9817)
- Fix an issue with the word trees processor where it would crash when trying to make word trees of numeric data (5021e85302fe8cf16b783496052929cb30287820)
- Fix an issue with the 'group by sentence' setting of the tokeniser where it would crash when choosing certain languages (3f06845a0e2dc63e772a071e77748e116eef896d)
- Fix an issue in the video downloading processor where it would crash when the connection broke before the downloading finished (1765e8066e74624cd4b89cf96737f90901655336)
- Fix an issue when trying to export unfinished or incomplete datasets (a296ff03c983103b902a830c585efd426e349ece)
- Fix an issue when trying to work with tokenised datasets from older 4CAT versions or with duplicate source data (490688791ba84edc6a0f3b12793d7b0efabd76dd)
- Fix an issue where importing the same dataset into 4CAT twice would lead to strange side-effects (ffd5c46b5bc439562672fa3e2ac0d208ce5f727b)
- Fix an issue where the front-end interface would crash when trying to display datasets made with processors that were removed from 4CAT (817b4ee5810048e727c355e1a5a3ac35a78b355f)
- Fix the Generate images with BLIP2 processor to better handle images with no metadata (e.g. when uploaded via the 'Media upload' data source) (d69a0c3c58d7dba29e9ad477de804cdd09564b95)
- Fix an issue with the image classification processors to not crash when encountering SVG files but skip them instead (033b716e53eb63873c6acc1509411a23b931ecbc, 4912ef4eecd808f1aa5fa58e6533d3f492e04f6e)
- Fix an issue with the image categorisation processor where it would not properly skip empty categories (9465cc24f289b2f7a403e9903a82aaa5ecf72aa4)
- Fix an issue with the Classify using LLMs processor where it would not properly read an uploaded few-shot examples file (14c9fae409de993b6dd3a1b0de1d227cfe00cd2f)
- Fix an issue with the 'Media upload' data source where only the first uploaded file would be validated properly when uploading multiple (75ae4b2870d91b695b757621c9962efe1fd04e11)
- Fix an issue with the 'Count values' processor when trying to count numeric data (977d887ccf99d38c4971876755538d54af92e577)
- Fix an issue with the Telegram data source where the name of the source of a forwarded message was not mapped properly (66d60e918e050bb85086408ea3f03863cdc3a186)
- Fix an issue with the 'Open with Gephi Lite' link in the front-end network preview to conform to Gephi Lite's new URL scheme (855d34e9f69a802fe5833ef977ed8f9938ebddd0)
- Fix the 'Consolidate URLs' processors to properly skip data that isn't actually a URL (ccaf114346f7837f8acff6bf43a984f6b4b3da77)
- Fix an issue in the front-end where tooltips would sometimes be positioned (partially) outside the viewport (a3e4f77e0f420e4335def18482e3c82606cc1047)
- Fix an issue with the Gab data source where data would be imported incompletely when collected from a certain set of Gab pages (1716c4bc42836ca19fab8ba9dd1f45c8c1f4a6be)
- Fix an issue where the 'can manipulate dataset' privilege would not take effect when set on a per-user level (#481)
- Fix an issue where jobs could get stuck in the job queue even if the dataset they belong to had been deleted (#468)
- Python
Published by stijn-uva about 1 year ago
4cat - v1.46 Autumn Additions
⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️
We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up, or see error messages in the log file when upgrading via the web interface.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:
- Added support for extensions, modular additions to 4CAT that can be put in the
/extensions/folder in the 4CAT root (#451) - Added a processor to download 4CAT datasets as a Zip file, and updated the 'Import dataset' data source to allow loading these zip files as new datasets (#452)
- Added a data source for Threads, to allow importing Threads data via Zeeschuimer (a68f5d64de14c02368ba7b9b7dd0a1118cd15417)
- Added a processor for LLM-powered text coding via the DMI Service Manager (693960f41b73e39eda0c2f23eb361c18bde632cd)
- Added an option to the Telegram data source to crawl based on mentions and links in addition to forwarded messages (8f2193cdcf0179ba34947861be87ec587e22e638)
- Added razdel as a tokeniser to the Tokenise processor for tokenising Russian text (0b74569280f8f87376a964a6b160ea1993cb3354)
- Added an option to the 'Word trees' processor to allow selecting which column(s) to read text from (e4c0099d75cdc27f0e1f3f3609a8af93c52b425c)
- Added more stopwords corpora to the Tokeniser and allow using multiple at the same time - by default the one for the chosen text language is used (b9a327abe99f2d9ede4f2747f34f20d1dc6803cb)
- Added more 'auto-fill' options when importing CSV files (empty values, or the current date and time) (9bd9da568f593085a8d54744836e3290a75b51a7)
- Added a warning to the 'Media upload' data source when trying to upload too many files at once (ffcb6a4239075ba190fb534b25b89507e09e5f56, e4f982b4550b352a5d1a131abd78d52e6c196e48, e30464964262870c54c73f65a3bce630d6576f45)
- Added more indicative dataset status updates when running DMI Service Manager-powered processors (eb7693780cb191403f107817ca30d90373929bf0)
- Added support for previewing HTML datasets in the web interface (203314ec83fb631d985926a0b5c5c440cfaba9aa)
- Added configuration settings to toggle display of Anonymisation controls on the 'Create dataset' page (0945d8c0a11803a6bb411f15099d50fea25f10ab)
- Added configuration setting to toggle display of the 'you can install 4CAT yourself' message in the login form (cd356f7a69d15e8ecc8efffc6d63a16368e62962)
- Added a feed of the official 4CAT BlueSky account to the 4CAT 'Home' page (3d94b666cedd0de4e0bee953cbf1d787fdc38854)
- Added a delay to the worker that cleans up expired and orphaned datasets to wait 7 days before actually deleting an orphaned dataset (bfaf23b1065f068276e0c6c49d610a8c57083ae3)
- Fix a crash in the 'Image category wall' processor (ebf39d8262d199895aedc4f7fa275c5685e58563)
- Fix a crash in the 'Google Vision API' processor when running it on an empty dataset (fb09162db902fa22fdf2d7a3ed171ce1489bd92f)
- Fix a crash in the 'Video hashes' processor when running it on a dataset with no .metadata.json file (d41fa34514e8177efdac7e64a31f2ee75c7d1652)
- Fix a crash in the 'Download images' processor when trying to download images from a malformed URL (579ff64e18fbdcda39ef3c2457ab7a4f01ce3d9d)
- Fix a crash in the 'Download videos' processor when trying to extract video URLs from a non-text data field (e9b5232a963be02c2e86dabacb607b2315a4e0e6)
- Fix a crash in the 'Hatebase' processor (4ba872bef2968f7f8bf5831fd3a4f413420b36ed)
- Fix a rare race condition when running 4CAT via Docker (#396)
- Fix an issue in the front-end where an incomplete list of available processors was shown in some situations (43239467db046eea5eb5268f91d1b63a1042238d)
- Fix an issue in the Telegram data source where it would indicate that the 'app' needs updating to log in (d2a787e2c1559417bb5401f3208c82954052504f, 346150bd9cc96ac099abd4d15fa3de39bd65e9d1)
- Fix an issue in the Telegram data source where crawl depth parameters would not be interpreted correctly (1c0bf5e580eb16d8a6f9afa415f9febce449a537, #444)
- Fix an issue in the Telegram data source where some post attributes were not read correctly (2c8c860fc5378113d1352016ac26ca761adecb32, 959710ab613bd201c5cf56bb01b9e1e7d6ee84e5, c67a046137d916df3bb707f2243542d289045a06)
- Fixed an issue where the link to a newly created dataset on the 'Create dataset' page would not always work (b542ded6f976809ec88445e7b04f2c81b900188e)
- Fixed an issue where configuration tags with no associated users could get deleted (d6064beaf31a6a85b0e34ed4f8126eb4c4fc07e3)
- Fix an issue in the LinkedIn data source where image URLs would not always be parsed correctly (c27fbbe44175740bffa959fc21d3d98cb42758ce)
- Fix an issue in the Douyin data source where stream URLs would not always be parsed correctly (d769be44adb920503c33f88777d2879dcca4b98c)
- Remove Spacy-powered text analysis processors (48c20c2ebac382bd41b92da4481ff7d832dc1538)
- Remove the Parler data source (ee7f4345478f923541536c86a5b06246deae03f6)
- Update dependences (#450, a269f96ed0cf296400fc1d5b4252d0a6765dda52, d2a787e2c1559417bb5401f3208c82954052504f)
Full Changelog: https://github.com/digitalmethodsinitiative/4cat/compare/v1.45...v1.46
- Python
Published by stijn-uva over 1 year ago
4cat - v1.45 Summer 2024 Special Edition
⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️
We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up, or see error messages in the log file when upgrading via the web interface.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:
- Added a 'media upload' data source that allows uploading media for processing with various image/sound/video processor (#419)
- Added a 'Visualise images with text captions' processor that generates an image wall including captions for each image (e7e636b6b89b6163fa6976e67edba68e7d75b7ac)
- Updated dependencies for video hash processor (aad94f393de77cc9d4f578e1f5be66a3601a4c90)
- Updated 'Help' link in footer and the page it links to to give better information on how to get help with 4CAT (acf5de0ed02e144b920a80abfdfa35986dd0ed4c)
- Updated the in-page preview for datasets to more accurately make hyperlinks clickable (8d4f99b22e0308606c7f713ef704dfa939e85247)
- Updated the Telegram data source to optionally allow one to crawl channels (e8714b6fba72e00c690a8d643d8dc54d2250c94a)
- Updated the 'Count values' processor with an option to differentiate between missing and blank values (f2145bdeff1d68e46cdd3521ecbb61573f01a2f2)
- Updated the item mapping for X/Twitter data to include URLs for the author profile picture and banner in the CSV output (bcb914076760ea1fb0e277cdcd1782ffa101b535)
- Fixed a crash in the 'Download images' processor when setting the amount of images to download to 0 (e0c55a8ae132bedef5da27ecbbb9489a094d454c)
- Fixed an issue with upgrading a 4CAT running in a Docker container where pip could not properly run to update Python dependencies (2aaa972e6888743fc329d721c37fa626cf2eeae3)
- Fixed various bugs with the 'Visualise images by category' processor
- Fixed a bug in the 4chan data import helper script when processing posts from threads of which the OP had been deleted (d67cf440730ea1d4e124c76a4c21d65b56f39c68)
- Fixed a bug where the wrong worker would be used when converting Google Vision or Clarifai output to CSV (fd3ac238e60f052889d99c71588170570a384900)
- Fixed a bug in the tokeniser where it could crash when selecting 'other' as a language (f4f8e6621bd6f2504dc3afc2078280bf5edb6444)
- Fixed a bug where a job for the orphaned file cleanup worker would not always be properly added to the queue (1b9965d40aa33035a73f685c13a1ab50cc877f78)
- Fixed a bug in the 'visualise images by category' processor where setting the max images to 0 would not properly remove the image limit (3580fc9450501262badb8e61ef4b4df4b4c54322)
Full Changelog: https://github.com/digitalmethodsinitiative/4cat/compare/v1.44...v1.45
- Python
Published by stijn-uva over 1 year ago
4cat - v1.44 Dependency hotfix
⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️
While deploying the previous 4CAT release (1.43) an issue surfaced where the 'Count values' processor could not be loaded due to a dependency issue in a third-party library. We have updated 4CAT's dependency list to resolve this. Otherwise this release is identical to the previous one, save for this one additional feature:
- Added a progress bar to the list of active workers on the control panel's front page for workers where progress information is available.
Full Changelog: https://github.com/digitalmethodsinitiative/4cat/compare/v1.43...v1.44
- Python
Published by stijn-uva almost 2 years ago
4cat - v1.43 A small update with some fixes and new features
⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️
We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up.
When updating a Docker-based 4CAT, the front-end interface may fail to restart properly, marked by an error message like 'Error upgrading front-end container' in the restart log. In this case please run an upgrade via Docker Desktop or the command line as indicated on this page.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:
- Fix a crash in the 'Fetch URL Metadata' processor (90577982ac05019a7ac76818a62f91e84dd65902, 7eab746e944f1ababe3dcd6a5d25387a64c2237d)
- Fix an issue with uploading CSV files when Unix timestamps were formatted as floats rather than integers (27a568eca7f2f3742223fef6285eaf80583e0fc4)
- Fix a crash with metadata handling in the image-to-text processor (51e58dde6ca21278a80f252a8c22dc83d87ace1f, 51e58dde6ca21278a80f252a8c22dc83d87ace1f)
- Fix a potential crash in the LinkedIn data mapping code (e0e06686e78976f971aac620267d7e009eaaadff, ef9dd482b2258c428584997dc661156f63f68b91)
- Fix a potential crash in the Telegram image downloader that could be triggered if a download timed out (5727ff7230db42463a824f45d63f0b8343caac14)
- Fix a crash in video processors when processing Telegram data (060f2cd7f922e7fae337b0697f7c477442d21ef1, 661c42c2d083da7004335b0e14910935c3d392f6)
- Update Douyin data parser to handle new data format (289aa342c9912aceeca35887c079c72aa6ffbf52, 2d2bbb9fdb9b426b8f4a80782f04257721a97f2e)
- Update TikTok data parser to properly handle all imported data (d7561625b127573fbb0332fbb713be6a3cb3d953)
- Update Instagram data parser to properly handle all imported data (807ab77101d197ec897640480a2140439d570c05)
- Update video processors to be compatible with ffmpeg versions before 5.1 (1b51d224ca544d7e2913238adbff2049412bc41e)
- Update dependencies (5b9b23fb1696bc1b69e1d902c0a2ad4b7d168984)
- Update the video scene frame extractor to be much more efficient (572d03f1f368f0ad5f47e705a119b37646148d1d)
- Update order of shutting down workers when stopping the 4CAT backend to ensure the internal API remains available for as long as possible (4182c436e4fb5109c5e041dc729f77a58d877889
- Update error handling for processors interfacing with the DMI Service Manager (baacc86b269612b4b0956345f8b9fa902df1b61f)
- Update Twitter-related code and text to reflect its name change to 'X' (ab34c415c9ada23763b45676639ce3e80a34f594)
- Add support for automatic pseudonymisation when importing data from Zeeschuimer (8b66ae7e467913f8e7571cf4b45493f63804266f, c973750c8cabb8698704c5997903e92d1de866d2)
- Add Gab data source (#401, 9b662e9f9b4f4ce194608c8e20a8fc50bc6d9ae3)
- Python
Published by stijn-uva almost 2 years ago
4cat - v1.42 Fixing of bugs and imports
⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️
We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up.
When updating a Docker-based 4CAT, the front-end interface may fail to restart properly, marked by an error message like 'Error upgrading front-end container' in the restart log. In this case please run an upgrade via Docker Desktop or the command line as indicated on this page.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:
- Fix a bug in the restart procedure that could result in the front-end container failing to restart and upgrade when running 4CAT via Docker (765f29e9232afdf284ab1667b0f371951e0bf2f4)
- Fix a bug that could result in a processor crash when trying to filter datasets for a string on columns containing numeric values (537d76456e2826e8c4dd7026ec5b2d436370fad8)
- Fix a bug that could result in a worker crash when importing CrowdTangle-formatted CSV files (91c3da176fad90ba16871fa8892fac5a0df13785)
- Fix an issue with mapping Twitter data that could result in a crash (43c6ed646994111188bde66d5bcfe4ab602e8512)
- Added the possibility to create notifications for all users with a certain tag in the Control Panel (c43e76daae3c2e6ecdb218ee749315b985eccca4)
- Added a data source for importing TikTok comment data from Zeeschuimer (50a4434a37d71af6a9470c7fc4a236b043cbfb4d)
- Updated the default 4CAT configuration to enable the import of Gab and TikTok Comment data (342a4037411e7ccaa50b25a4686434bec39e2568)
- Updated Douyin item mapping to properly process items not assigned to a specific room (6918baeabc7a08b6a63495c5d38c86b2c88bca44, 1fd78b2362840299e80f5540c9fedc1be3b06da1)
- Python
Published by stijn-uva almost 2 years ago
4cat - v1.41 April does what it will
⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️
We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up, or if you encounter issues when upgrading via the web UI.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:
Processors
- Fixed an issue with the image downloading processor where it would not properly follow links to images for 4chan datasets (e5f1f703247a5763d3d0e03c44ee31ab60b8a8ed)
- Fixed an issue with the TF-IDF processor where results would be off if fewer results than the requested top n results were available (44848a8f4b9fea07e7f9ce03e4fe0d696d5f1d27)
- Fixed a rare crash that could occur when a processor would encounter a FileNotFound exception while a Slack webhook was configured for logging (131a0eca0ad514b1ee57803e5c560ab0e56de42d, #422)
- Updated dataset filters to give filtered datasets a more context-senstive name, based on the original's name as well as the filter type (3ef3e5ec9adbd8ddd128ce2b3f8fa3b1de1297e3)
- Updated the PixPlot processor to allow for a longer run time (2582538303e31470ed6bf8a01645f7b45af15e5d)
- Added a dedicated processor for downloading Telegram videos, replacing the generic one for datasets from that data source (94c814b9cab2ae2be10d5c5d3f6cfe20898e349c, 3f15410af3a278f5644f41f49e25498a1fac3c76)
- Added 'emoji' count option to 'Count values' processor, to count how often emoji occur in a dataset (bb50fc946fb6cdd8454969514bdc6d5ecf3f3530)
- Added 'Fetch URL metadata' processor, to fetch details about URLs mentioned in a dataset (a0baae17d8f11e4cae7cc261f8d406b1b1ce628a)
- Added options to the Telegram image downloader to fetch link preview thumbnails (8a7da5317defdafb5bdbf74dcbeb68e464fa21f4)
Data sources
- Fixed an issue with Telegram datasets that made items not have unique IDs in certain situations (a8b36dc5682df7c16e25474ea8fdbfc4f12f9d46)
- Fixed an issue when mapping Instagram datasets where a crash could occur if the 'full_name' of a user could not be determined (fa3be93bafef17e95881207604efa1212d562d9e)
- Fixed an issue with the Telegram data source where the 'max messages to fetch' setting would not be parsed correctly (d749237ec5c103b286ba8086904e405e232fc14c)
- Updated the warnings given by imported datasets to the user about items that could not properly be imported or for which some data was missing to be more accurate (db05ae5e565248e865e67b8ea60e6653357bb1f4, #418)
- Added columns with reactions, link details and number of forwards to Telegram dataset CSV exports (e653e3d8fb9c01697d96316df6f7634454671191, e4a93442efb84d73d6a4c9af9bc46a8f3e3fdda2)
- Added support for image galleries to the Douyin data source (876f4a4b6df51ec4b30a048c32191438b6778f90)
- Added a 4CAT setting to control the amount of entities that can be fetched at a time via the Telegram dataset (cd2e74d251491a93bc66dc7a64e8b2a60b0ed8ae)
Web interface
- Fixed an issue where the UI would not prompt for confirmation when deleting a configuration tag (39f2ec40faa3b8493bd5525279aeaeb2e4f586e0)
- Fixed an issue where deleting a
user:tag would delete alluser:tags (9b4981d8c7358f31ed65d9f161d556e578389801) - Fixed an issue where the colour of the favicon would revert to pink in certain situations (073587efc581adca0608988573ac83ea8b0c93d0)
- Fixed an issue where the 'Request access' link would be visible on the login page even if requesting access was disabled (28d733d56204231f4089660ff61282174aac7aed, 1f2cb77e3cb0fc9b5403da52aaa925b33089d18f)
- Fixed an issue where the control panel could be unresponsive when 4CAT's data folder was very large; disk usage is now calculated every few hours in the background (c8ad90b3436cff600320d3b2efdf6144240ea59d)
- Fixed an issue where the configuration tag priority order could be edited via the Settings page; use the Configuration Tags page instead (ae1c00fb3a521a2c3258b2597b04322d202c3ee7)
- Updated the user filter on the user list page of the control panel to be case-insensitive (940bac72c7e53bec9e136867c13e2a0a355961a4)
- Updated the layout of the control panel's Settings page to make it easier to navigate (d36254a188947fff507e8df59f793e98b3be1570)
- Updated the 'Share' dialog on dataset pages to allow comma-separated multiple item entry (6d8cb067bc12f8be68749f74a7291e0849494225)
- Updated some processors to hide/show certain options depending on the value of other options chosen (#397)
- Updated the CSV preview in the web UI to make hyperlinks clickable (daa7291e813e62fed4600a4acb8430004836cb86)
- Added links to a list of users with the tag to the 'Configuration tags' page in the control panel (9b4981d8c7358f31ed65d9f161d556e578389801)
Full Changelog: https://github.com/digitalmethodsinitiative/4cat/compare/v1.40...v1.41
- Python
Published by stijn-uva almost 2 years ago
4cat - v1.40 Long Dutch Winter release
⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️
We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up.
When updating a Docker-based 4CAT, upgrading to this version may fail or appear to not have made any changes the first time. This is due to a bug fixed in this version. If this happens to you, follow the 'Docker - how to upgrade with command' instructions via the link above.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:
- Optimised Docker build (93ddb4b6bd4b4d17b959919f6ab944f8878b1895)
- Fix an issue with migrating/upgrading to a new version inside a Docker-based 4CAT (aeb8090d64c250e3b7473ce21cfc9eaff088cb19)
- Fix an issue where the video downloader could fail when a link redirected too often or a video lacked a content type header (97209cbaeb46014a0de8b1e86e304e2725a5f12c)
- Fix an issue where Twitter datasets exported as CSV could have different columns depending on the date the dataset was imported into 4CAT (ce2b2d5674881d850470ff49bcad22f87e7a45a0)
- Fix an issue where the 4CAT front-end log would not start properly when running 4CAT via gunicorn but not inside a Docker container (84168e945e2ecf963cfdac3409d60544b521f694)
- Update LinkedIn item mapper to handle recently collected datasets (38a865ef6eb3f487a5525de47a44c9e7048b4073)
- Update Douban capture module to properly collect comment like counts (e1211c73735374086ff9098a161f088c730b4e1c)
- Add various explanatory tooltips to dataset result pages (c0aa4c75a40c0f1316d1440cf61039c7371803ec)
- Make 4CAT more robust in how it maps content imported with Zeeschuimer into CSV files (#409)
- Python
Published by stijn-uva almost 2 years ago
4cat - v1.39 Bugfixes and maintenance
⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️
We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:
- Fix several issues that could occur when trying to import CSV files (#404, 9963cd868a1eb8c7c4015152fe0ca5b51241d685, 02acd865415dd6a1d5b9e9675bcaab050fe2cf13, 0049de694f010e9d08f5c27658663c9799b93b94, 151d498568ce328848a85034a5eddce9d30ef5b8, cdfe75c7eca937bf5977cb26ee3d09ff25fce923)
- Fix an issue where the canonical host name for a Docker-based 4CAT front-end would not always be set correctly (#395, #403, 0d1dc05605d012ab2bb1d1ff25620ca6c82856b5)
- Fix an issue where tags would linger in the database after they were no longer associated with any user, or would be stored in the wrong order of preference (c38a2158524bfdb4469889bdbdc28b85535306af, 79f58bf703a1c7b1d2e068c5a8107c2d16329c13)
- Fix an issue where uploads from Zeeschuimer would fail for data sources whose ID starts with a number (e.g.
9gag) - Fix an issue where making a user would crash the front-end if a user with the same e-mail already exists (14c1f9ce12c8104418fa20318e109132e0a60d92)
- Fix a crash in the 'Post/topic matrix' processor (be6ea8c21a51334dc18b005dc942bca6482aa05c)
- Fix an issue where the user-specific setting for the max amount of downloaded TikTok videos would be ignored (df2462fc46f63118227c6f4c10522972495de6d2)
- Fix a crash when duplicating a dataset and copying the dataset's owners (a3fdbadfb64c32495f63535936475544002304b1)
- Fix an issue where the TikTok metadata fetcher would fail with a 'proxy unavailable' error when trying to run multiple TikTok processors at the same time (04faf2306b660f8a5b81f7a2fb869a0b50df152e)
- Fix an issue with the Telegram image downloader (23edf4473a4c7dcea21a17f9336182d75ded01b7)
- Fix an issue where network processors could crash with a divide by zero error if run on an empty dataset (e10cb2e4b8e56cb9d3791b1a51f3de8569cc910e)
- Add a clearer error message when trying to merge datasets that are not CSV or NDJSON (c5fbe02f59111050bc6c2be3d35cb6493e0eb93a)
- Add an explicit edge weight to generated networks that is properly recognised by Gephi (c41587079a44842a91e4c9f5539bbefb9a037765)
- Add the option to only capture the first frame of a video when extracting video frames (b3981c3ca4b5abbf672190ca0a6d46a5f3f9dc74)
- Add various image processors (e.g. image wall) as supported for video frame datasets (588290ad92856698b4fad883743b9840cfb2886f)
- Add improved compatibility between video hash processor and image classification processor so that video frames can be visually categorised (a4e6904fe21ed174ab6a6246a6008853d352309c)
- Add data source options to explicitly define the Tumblr API key to use if no 4CAT-wide keys have been configured (fdbdca94e3103b61f11d3ebab116a407dd960cad)
- Add a 4CAT setting that determines which proxy headers will be taken into account for URLs generated in the front-end (5e47dacbba62413ea566fe369a9b5f985824946b)
- Update the TikTok import processor to cope with the new data formats provided by Zeeschuimer (f24828b4ee7644cd73094a7652f5ebfd324ec9b6)
- Update code that determines place of a dataset in the queue to be more efficient (239726bde31b83e84ec53785d5cc4b5eab0b2ead)
- Update dataset importer to stream files, which should prevent issues with very large data files (7a1c4b92f6ff5f5546f3941faa36a6651981fb83)
- Update the type of the
jobidcolumn of the jobs table to BIGINT to avoid issues with long-running 4CAT servers (3414a964ebc19949897317c1af7600abdce26da6) - Update jobs table to no longer have a useless 'status' column (9f493b2787dbd885ebac2e92d7114566b23daf29)
- Update processor presets so they do not linger in an unfinished state if one of their components crashes, finish with an error instead (b815c5406b342e3dad980ab22be193850a0d1396, ddf9aabddd8ad718a83cb232d2ad7cb534c6e0a4)
- Update various image dataset processors to produce more compatible .metadata.json files (87ec4d0cb18efbe0bd112dfa40f0928064afabf5)
- Updated version for the videohash library dependency (5f5e10f90beb32a943c6f30347cdc95258897923)
- Python
Published by stijn-uva about 2 years ago
4cat - v1.38 CSV Upload hotfix
⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️
We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fix for a bug introduced in the previous release, v1.37:
- Fixed an issue that made the CSV upload data source never get past the 'please define your columns' stage when uploading CSVs with a custom format.
- Python
Published by stijn-uva over 2 years ago
4cat - v1.37 A release coincidental with AoIR 2023
:warning: Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. :warning: We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following features and fixes:
Processors
- Added a 'top hashtags' processor for datasets that contain hashtags (this is a preset for the 'count values' processor)
- Added more configuration options for the image wall processors to limit how large datasets can be
- Fixed a bug that made the co-link processor crash when used with particularly small datasets
- Fixed various issues with processing data via the DMI Service Manager
Data sources
- Data parsers for data imported from Zeeschuimer have been updated to allow data captured from the current version of the supported platforms.
- Added a data source that allows importing datasets from other 4CAT servers (#352, #375). This is not enabled by default but can be enabled in the Control Panel.
- Fixed an issue where CSV files would erroneously be detected as having no header rows upon importing them (#392)
- Fixed a number of issues with Telegram data parsing (#368, #371)
Deployment and configuration
- 4CAT will now update Python libraries to their latest compatible version when running
migrate.pyor upgrading via the control panel. - Docker images are now published for both ARM and x64 processor architectures (#392)
- Added a button to the 'Restart or upgrade' control panel page to restart only the front-end
- Added the option to migrate to a development branch of 4CAT via the control panel's "Upgrade" page. This requires enabling the 'Can upgrade to development branch' privilege in user settings before it is available.
- Fixed bugs with restarting the 4CAT front-end via the control panel when running via Apache, gunicorn or uwsgi
- Fixed a bug where generated URLs could have the wrong scheme when running 4CAT behind a reverse proxy
- Fixed a race condition that could cause the front-end container to crash on start-up when using 4CAT via Docker (#378)
- Fixed a potential issue when installing 4CAT via Docker with the latest version of the Postgres image (#382)
Interface
- Added a panel to the control panel which shows the active user tags for the currently logged in user
- Added a page to the control panel that allows creating many users at once by uploading a CSV file with user data
- Added a 'User Interface' category to the Settings panel to configure 4CAT's interface, for example to show in-line dataset previews and what to use as the 4CAT 'home page' (#380)
- Added the option for users to now receive an e-mail alert when their dataset is completed, which can be enabled via the control panel through the 'Show email when complete option' option in the 'User interface' settings (#329, #385)
- Added an indication of the precise place in the queue for queued datasets (#239)
- Added the option to force a particular configuration tag by passing a specific HTTP header. This can be used to serve a different configuration of 4CAT depending on e.g. the used domain name, or other factors as determined by the reverse proxy serving 4CAT (#380)
- Fixed an issue with the manipulation of user tags via the control panel (#383, #384)
- Fixed an issue with changing the ownership for many datasets at once via the 'Bulk dataset management' page
- Fixed an issue that allowed the 'About' page to appear twice in the site navigation
- Python
Published by stijn-uva over 2 years ago
4cat - v1.36 Summer School Actual Hotfix
⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️
We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes for bugs introduced in the previous release, v1.35:
- Fixed an issue where 4CAT would not be able to properly fetch the latest version from GitHub when upgrading
- Fixed an issue where admin users were not able to manage user notifications
- Fixed the date filter on the 'Dataset bulk management' page
- Fixed an issue with the expiration worker that would not actually delete expired datasets
Additionally, this release incorporates the following features and fixes: * Fixed an issue where imported Twitter data with withheld quote tweets could not be parsed * Fixed an issue where imported CrowdTangle datasets with Facebook data could not be parsed if they had an empty 'Post Created ' column * Fixed an issue where imported Douyin datasets with incomplete post data could not be parsed * Fixed an issue where the tokeniser would crash when running it on a dataset without a 'body' column if not specifying a column to extract tokens from * The upgrade dialog in the control panel now links to the release notes of the latest available upgrade
- Python
Published by stijn-uva over 2 years ago
4cat - v1.35 Summer School Lukewarm Fix
This release of 4CAT fixes the following bugs that were introduced in the previous release, v1.34:
- Fixed an issue when upgrading from an older version where the datasets table would not correctly be upgraded to the new scheme
- Fixed an issue where a dataset's parent key could be
NULLor an empty string; it should now always be an empty string - Fixed an issue where filtering datasets by user would have no effect
- Fixed an issue where datasets made with a filter processor would not have the same ownership as the original dataset
- Fixed an issue where deleting a child dataset would redirect to the dataset overview page instead of the parent dataset result page
- Fixed an issue where the CLIP image processor would not read the correct configuration values
- Fixed permalinks to processor results
- Fixed crash in the worker that deletes expired datasets and users
- Fixed the link to Gephi Lite in network previews
- Added POSTGRESTAG to .env allowing users to choose which Postgres database image they wish to use. The Postgres 15 Docker image is incompatible with Postgres 14 and users may wish to set POSTGRESTAG=14 to continue using version 14 or otherwise follow Postgres instructions on how to upgrade. This should not be an issue if you are upgrading 4CAT through the web interface.
This release changes the way dataset expiration works, to avoid situations where it is ambiguous whether a dataset should 'expire' and be deleted automatically. Expiration can now only be configured per data source, not globally. To make this easier controls have been added to the control panel to set expiration time for multiple data sources at once, and to 'keep' or 'expire' datasets in bulk.
⚠️ If you are upgrading 4CAT and had datasets or data sources set to expire, 4CAT will automatically disable all expiration ⚠️ to avoid datasets expiring inadvertently. Please inspect the relevant settings and adjust as needed after upgrading. You can find the controls at the 'Data sources' tab in the 4CAT settings and the 'Dataset bulk management' page, both in the control panel.
Additionally, this release of 4CAT incorporates the following new features and fixes:
* Added a page to the control panel that shows the latest 250 lines of the backend log files
* Added a page to the control panel where datasets can be managed in bulk
* Added a column link_url to the CSV export of LinkedIn datasets containing the link embedded in a post
* Added a column is_withheld to the CSV export of Twitter datasets imported with Zeeschuimer indicating if a tweet was withheld (withheld tweets could previously crash data exports)
* Added a user privilege which controls whether users can manipulate datasets they do not own (disabled by default, enabled for admins)
* Added Explorer styling for Douyin datasets
* Added a setting controlling how many images the PixPlot processor can process
* Fixed an issue with the TikTok URLs downloader which would erroneously try to capture posts behind a login wall
* Streamlined annotating data via the Explorer and subsequent usage of annotations
* The co-link network processor will now no longer add redundant loops
* The 'data sources' setting in the control panel is now easier to manipulate and has more explanatory information
- Python
Published by stijn-uva over 2 years ago
4cat - v1.34 Summer 2023 release
⚠️ Docker 4CAT releases v1.30 to v1.33 have a bug where upgrading 4CAT would never complete due to issues fetching the latest version. Please follow these instructions for upgrading if you are running one of these versions. You can find your 4CAT version by going to the Control Panel and clicking the 'Restart or Upgrade' button. ⚠️
We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up.
Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following new features and fixes:
Processors
- New processor to extract audio to files from downloaded videos
- New processors to work with video files (require ffmpeg)
- Processors to download videos generically (with yt-dlp) and specifically for TikTok (#343)
- Processors to detect scenes in videos and capture frames from videos to images (with ffmpeg)
- Processors to render captured frames to timelines
- Processor to merge multiple videos into one 'video stack'
- New Machine Learning-based processors via support for containerised processors run through the in-progress DMI Service Manager
- New processor to convert speech-to-text with Whisper AI
- New processor to categorise images with OpenAI's CLIP model and visualise results
- Fix an issue where the NDJSON to CSV processor would not include all NDJSON fields in the CSV output
- Fix an issue with the Word Embeddings processors that crashed when using certain prevalence thresholds (#353)
- The tokeniser and CSV converter can now run on NDJSON files
- The column filter processor can now ignore letter case
Data sources
- Add support for importing Douyin and Twitter data from Zeeschuimer
- Add a data source for VK/VKontakte, using the VK API (#358)
- Fix issue with the Tumblr data source which crashes it when failing to connect to the API
- Fix an issue where importing CSV files would crash if certain columns were not included or in the wrong format
- Fix an issue with the Word Trees processor which would make it crash for certain datasets
- Disabled Twitter and Reddit data sources by default as the relevant APIs are no longer functional
Deployment and configuration
- Add
--branchcommand line argument to migrate.py to allow migrating to a specific git repository branch - Add environment variable to allow configuring the database port used in Docker builds
- Fix issue with failing Docker builds due to improperly included dependencies
- Fix issue where data sources based on imports could not be disabled properly
- Fix an issue where it was not checked if the user was logged in when exporting datasets to 4CAT
- Fix an issue where upgrading 4CAT would never complete due to issues fetching the latest version from GitHub (#356)
Interface
- Add a configuration option to toggle the availability of the 'Request account' page on the login page
- Add 'Open with Gephi Lite' button to network previews
- Add a separate control for the secondary interface colour in the 4CAT settings
- Add an option to the 'Create dataset' page to allow the user to choose how to anonymise/pseudonymise a dataset
- Add an icon to the dataset overview page indicating if a dataset is scheduled for deletion (#330)
- Add controls to the dataset page that allow sharing datasets with other users (#311)
- Add some statistics to the control panel front page and move notifications and user management to their own pages
- Add the option to assign 'tags' to users, where each tag can override certain configuration settings, so you can configure some users to have different privileges than others (#331)
- Add current version number to interface footer
- Add initial support for imported Twitter and TikTok data to the Explorer.
- Merge data source settings with general settings page in control panel
- Overhaul of settings page in control panel
- Fix an issue where options for data sources and processors that were a checkbox were not parsed properly (#336, #337)
- The 4CAT favicon now automatically matches the interface colour (#364)
- Upgrade Font Awesome to 6.4.0
- The default name for datasets imported from Zeeschuimer is now more descriptive
- Python
Published by stijn-uva over 2 years ago
4cat - v1.33 Hotfix for 1.32
This hotfix release fixes the following bug in the previously released v1.32:
- Somehow we ended reintroducing the same bug 1.30 needed a hotfix for. Sorry for the inconvenience!
Instructions for upgrading can be found on the wiki.
- Python
Published by stijn-uva almost 3 years ago
4cat - v1.32 Spring is in the air release
This release of 4CAT incorporates the following new features and fixes:
Processors - Reworked TikTok video and image downloaders that can also download materials for posts of which the relevant link in the dataset has 'expired' - Support for tokenisation of Chinese text in the Tokeniser and Word Tree processors - Added a processor for extensive normalisation of URLs (e.g. youtu.be -> youtube.com) for easier link analysis - The column filter processor can now be used on any dataset with (mapped) columns - The user-tag and co-tag network processors can now be used on any dataset with hashtags
Data sources - Twitter data collected with Zeeschuimer can now be imported into 4CAT - CSV import can now parse Weibo data collected with Bazhuayu - CSV import has an option for automatically filling the ID columns so that datasets without per-item IDs can be imported - Fixed an issue where imported LinkedIn data could not be parsed properly - Fixed an issue where Reddit collection would crash for posts with malformed image metadata - Fixed an issue where datasets could be uploaded from Zeeschuimer even when the user was not logged in
Interface & management - The main accent colour of the 4CAT interface can now be configured in the 4CAT settings and is randomised upon first accessing a 4CAT instance - The name of the 4CAT instance (displayed on top of the interface) is randomised upon first accessing a 4CAT instance - Updated jQuery dependency - The log file link is now always displayed on dataset result pages, even if the dataset has not finished yet or is empty - Clarify that the Docker version of 4CAT can, in fact, use HTTPS
Other - Fixed an issue where 4CAT would mysteriously start crashing because of a lack of 'flavour' of Path objects - Bumped versions of various dependencies
- Python
Published by stijn-uva almost 3 years ago
4cat - v1.31 Hotfix for 1.30
This hotfix release fixes the following bugs in the previously released v1.30:
- An issue that made it impossible to get out of the 'congratulations, you have updated!' dialog after updating
- An issue with the configuration of the back-end Docker container that prevented restarts from the 4CAT web interface
- Some small fixes and tweaks that were just too late for the previous release.
Instructions for upgrading can be found on the wiki.
- Python
Published by stijn-uva about 3 years ago
4cat - v1.30 January 2023 release
Snapshot of 4CAT as of January 2023. Many changes and fixes since the last official release, including:
New and updated processors
- A processor for downloading videos and a number of video analysis processors to analyse the downloaded videos with (#303)
- A processor to merge multiple datasets into a new combined dataset (#301)
- The datasets created with Filter processors now have the same type as the dataset that was filtered (#291, #292, #312)
- An enhanced and more flexible processor to expand shortened URLs in a dataset with (#312)
- Processors to annotate downloaded images with Clarifai and visualise the results as a network.
- A processor to 'refresh' TikTok data, which attempts to update expired thumbnail and video links (among other things).
- The 'semantic frame extractor' processor has been removed.
- The 'pyLDAvis' processor has been removed (the package it relied on is unmaintained and intermittently broke builds).
New and updated data sources
- Four new data sources for which data can be imported from Zeeeschuimer: 9gag, Imgur, Parler and LinkedIn (the Parler data source was reworked from the existing data source for that platform)
- Uploading your own CSV data for 4CAT to analyse is much more flexible now and allows you to indicate the CSV column format yourself (#214, #297)
- 4chan datasets now index whether a post was deleted or not and when creating a dataset it is possible to exclude deleted posts (#306, #309)
- The Reddit data source has been updated to conform with changes to the Pushshift API (#327)
- The 'parliament speeches' and 'The Guardian Climate Change comments' data sources have been removed.
Interface updates, 4CAT control & management
- Switching between data sources on the 'Create dataset' page no longer shows erroneous "Invalid data source selected" popups (#314)
- The 'All datasets' page is no longer available to non-admin users.
- A separate control panel for toggling the availability of data sources and setting automatic expiration for datasets created with that data source (#310)
- When 4CAT is first installed, it now optionally tells us (the developers) that it has been installed (#284, #308)
- The port through which 4CAT should connect to a mail server can now be configured properly (#299, #302)
- The
4cat_backendcontainer now logs directly to the container entrypoint's output stream so it can be viewed withdocker logs.
And many smaller fixes & updates. If you are running at least 4CAT 1.29, you can update your 4CAT instance via the 'Restart & upgrade' button in the Control Panel.
- Python
Published by stijn-uva about 3 years ago
4cat - v1.29 Autumn release
Snapshot of 4CAT as of October 2022. Many changes and fixes since the last official release, including:
- Restart and upgrade 4CAT via the web interface (#181, #287, #288)
- Addition of several processors for Twitter datasets to increase inter-operability with DMI-TCAT
- DMI-TCAT data source, which can interface with a DMI-TCAT instance to create datasets from tweets stored therein (#226)
- LinkedIn data source, to be used together with Zeeschuimer
- Fixes & improvements to Docker container set-up and build process (#269, #270, #290)
- A number of processors have been updated to transparently filter NDJSON datasets instead of turning them into CSV datasets (#253, #282, #291, #292)
- And many smaller fixes & updates
From this release onwards, 4CAT can be upgraded to the latest release via the Control Panel in the web interface.
- Python
Published by stijn-uva over 3 years ago
4cat - v1.26 Spring release
Many updates:
- Configuration is now stored in the database and (mostly) editable via the web GUI
- The Telegram datasource now collects more data and stores the 'raw' message objects as NDJSON
- Dialogs in the web UI now use custom widgets instead of
alert() - Twitter datasets will retrieve the expected amount of tweets before capturing and ask for confirmation if it is a high number
- Various fixes and tweaks to the Dockerfiles
- New extended data source information pages with details about limitations, caveats, useful links, etc
- And much more
- Python
Published by stijn-uva almost 4 years ago
4cat - v1.25 February 2022 Snapshot
Snapshot of 4CAT as of 24 February 2022. Many changes and fixes since the last official release, including:
- Explore and annotate your datasets interactively with the new Explorer (beta)
- Datasets can be set to automatically get deleted after a set amount of time, and can be made private
- Incremental refinement of the web interface
- Twitter datasets can be exported to a DMI-TCAT instance
- User accounts can now be deactivated (banned)
- Many smaller fixes and new features
- Python
Published by stijn-uva almost 4 years ago
4cat - v1.21 September Snapshot
Snapshot of 4CAT as of 28 September 2021. Many changes and fixes since the last official release, including:
- User management via control panel
- Improved Docker support
- Improved 4chan data dump import helper scripts
- Improved country code filtering for 4chan/pol/ datasets
- More robust and versatile network analysis processors
- Various new filter processors
- Topic modeling processor
- Support for non-academic Twitter API queries
- Option to download NDJSON datasets as CSV
- Support for hosting 4CAT with a non-root URL
- And many more
- Python
Published by stijn-uva over 4 years ago
4cat - v1.18 for Zenodo
A release to trigger publication on Zenodo.
- Python
Published by stijn-uva almost 5 years ago
4cat - Public release
First public release, licensed under the MPL 2.0
- Python
Published by stijn-uva about 6 years ago
4cat - 1.0 (Beta)
4CAT is now ready for wider use! It offers...
- An API that can be used to queue and manipulate queries programmatically
- Diverse analytical post-processors that may be combined to further analyse data sets
- A flexible interface for adding various data sources
- A robust scraper
- A very retro interface
- Python
Published by stijn-uva almost 7 years ago