Releases | Open Source Science

thesis - Changes between 10 July - 19 November

What's Changed

Empirical Study ⚗️

Switch to poetry and use ruff and ruff formatting🎉 by @KarelZe in https://github.com/KarelZe/thesis/pull/444 ### Other Changes
Bump click from 8.1.5 to 8.1.6 by @dependabot in https://github.com/KarelZe/thesis/pull/431
Bump pyyaml from 6.0 to 6.0.1 by @dependabot in https://github.com/KarelZe/thesis/pull/430
Bump tqdm from 4.65.0 to 4.66.0 by @dependabot in https://github.com/KarelZe/thesis/pull/433
Bump tqdm from 4.66.0 to 4.66.1 by @dependabot in https://github.com/KarelZe/thesis/pull/434
Bump click from 8.1.6 to 8.1.7 by @dependabot in https://github.com/KarelZe/thesis/pull/435
Bump fastparquet from 2023.7.0 to 2023.8.0 by @dependabot in https://github.com/KarelZe/thesis/pull/436
Bump actions/checkout from 3 to 4 by @dependabot in https://github.com/KarelZe/thesis/pull/437
Bump gcsfs from 2023.6.0 to 2023.9.0 by @dependabot in https://github.com/KarelZe/thesis/pull/438
Bump google-auth from 2.22.0 to 2.23.0 by @dependabot in https://github.com/KarelZe/thesis/pull/439
Bump gcsfs from 2023.9.0 to 2023.9.1 by @dependabot in https://github.com/KarelZe/thesis/pull/440
Bump gcsfs from 2023.9.1 to 2023.9.2 by @dependabot in https://github.com/KarelZe/thesis/pull/441
Bump google-auth from 2.23.0 to 2.23.2 by @dependabot in https://github.com/KarelZe/thesis/pull/443
Bump psutil from 5.9.5 to 5.9.6 by @dependabot in https://github.com/KarelZe/thesis/pull/447
Bump google-auth from 2.23.2 to 2.23.3 by @dependabot in https://github.com/KarelZe/thesis/pull/446
Bump seaborn from 0.12.2 to 0.13.0 by @dependabot in https://github.com/KarelZe/thesis/pull/445
Bump schneegans/dynamic-badges-action from 1.6.0 to 1.7.0 by @dependabot in https://github.com/KarelZe/thesis/pull/448
Bump gcsfs from 2023.9.2 to 2023.10.0 by @dependabot in https://github.com/KarelZe/thesis/pull/449
Bump fastparquet from 2023.8.0 to 2023.10.1 by @dependabot in https://github.com/KarelZe/thesis/pull/451
Bump google-auth from 2.23.3 to 2.23.4 by @dependabot in https://github.com/KarelZe/thesis/pull/454
Bump schneegans/dynamic-badges-action from 1.6.0 to 1.7.0 by @dependabot in https://github.com/KarelZe/thesis/pull/453
Bump seaborn from 0.12.2 to 0.13.0 by @dependabot in https://github.com/KarelZe/thesis/pull/452
Bump ruff from 0.1.3 to 0.1.4 by @dependabot in https://github.com/KarelZe/thesis/pull/455
Bump ruff from 0.1.4 to 0.1.5 by @dependabot in https://github.com/KarelZe/thesis/pull/456
Bump mypy from 1.6.1 to 1.7.0 by @dependabot in https://github.com/KarelZe/thesis/pull/457
Bump pydantic-settings from 2.0.3 to 2.1.0 by @dependabot in https://github.com/KarelZe/thesis/pull/458

Full Changelog: https://github.com/KarelZe/thesis/compare/23-29...23-30

- TeX
Published by KarelZe over 2 years ago

thesis - Print-Version 🖨️

What's Changed

Empirical Study ⚗️

Final tweaks and appendix 🚩 by @KarelZe in https://github.com/KarelZe/thesis/pull/427 ### Other Changes
Bump google-auth from 2.21.0 to 2.22.0 by @dependabot in https://github.com/KarelZe/thesis/pull/428
Bump click from 8.1.4 to 8.1.5 by @dependabot in https://github.com/KarelZe/thesis/pull/429

Full Changelog: https://github.com/KarelZe/thesis/compare/23-28...23-29

- TeX
Published by KarelZe over 2 years ago

thesis - Changes between 3 July - 9 July

What's Changed

Writing 📖

Clean up thesis 🫁 by @KarelZe in https://github.com/KarelZe/thesis/pull/422 ### Other Changes
Bump fastparquet from 2023.4.0 to 2023.7.0 by @dependabot in https://github.com/KarelZe/thesis/pull/423
Bump click from 8.1.3 to 8.1.4 by @dependabot in https://github.com/KarelZe/thesis/pull/426

Full Changelog: https://github.com/KarelZe/thesis/compare/23-27...23-28

- TeX
Published by KarelZe over 2 years ago

thesis - Changes between 26 June - 2 July

What's Changed

Writing 📖

Cleanup🧹 by @KarelZe in https://github.com/KarelZe/thesis/pull/417
Rework / complete chapter on feature set definition🧙 by @KarelZe in https://github.com/KarelZe/thesis/pull/418
Add chapter on discussion + various fixes💬 by @KarelZe in https://github.com/KarelZe/thesis/pull/420
Fix 🇺🇸 dialect by @KarelZe in https://github.com/KarelZe/thesis/pull/421 ### Other Changes
Bump google-auth from 2.20.0 to 2.21.0 by @dependabot in https://github.com/KarelZe/thesis/pull/419 ### Outlook
Shorten, improve discussion, extend feature results, and proofread

Full Changelog: https://github.com/KarelZe/thesis/compare/23-26...23-27

- TeX
Published by KarelZe over 2 years ago

thesis - Changes between 19 June - 25 June

What's Changed

Writing 📖

Rework + complete introduction 👶 by @KarelZe in https://github.com/KarelZe/thesis/pull/413
Fix and extend chapter on feature importances 🧙 by @KarelZe in https://github.com/KarelZe/thesis/pull/411
Formulate paragraph on SAGE🥗 by @KarelZe in https://github.com/KarelZe/thesis/pull/414
Cleanup, simplify, extend chapters until [4.2 Selection of Approaches]⛩️ by @KarelZe in https://github.com/KarelZe/thesis/pull/415
Add chapter on conclusion 🔚 by @KarelZe in https://github.com/KarelZe/thesis/pull/416

Full Changelog: https://github.com/KarelZe/thesis/compare/23-25...23-26

- TeX
Published by KarelZe over 2 years ago

thesis - Changes between 12 June - 18 June

What's Changed

Empirical Study ⚗️

Transformer Pre-Trainining: Add early stopping + fix eval set👷‍♀️ by @KarelZe in https://github.com/KarelZe/thesis/pull/409 ### Writing 📖
Extend chapter on feature importance results / robustness checks / outlook👨‍💻 by @KarelZe in https://github.com/KarelZe/thesis/pull/405
Extend chapter on hyperparameter + results + robustness + feature sets + ...🧙 by @KarelZe in https://github.com/KarelZe/thesis/pull/410 ### Other Changes
Bump gcsfs from 2023.5.0 to 2023.6.0 by @dependabot in https://github.com/KarelZe/thesis/pull/406
Bump google-auth from 2.19.1 to 2.20.0 by @dependabot in https://github.com/KarelZe/thesis/pull/407

Full Changelog: https://github.com/KarelZe/thesis/compare/23-24...23-25

- TeX
Published by KarelZe over 2 years ago

thesis - Changes between 5 June - 11 June

What's Changed

Empirical Study ⚗️

Add chapter on rule-based results + fix bug in classical classifier 👨‍💻 by @KarelZe in https://github.com/KarelZe/thesis/pull/402 ### Writing 📖
[WIP] Add chapter on results / feature importances / application study 👨‍💻 by @KarelZe in https://github.com/KarelZe/thesis/pull/403 ### Other Changes
Fix Github Action 👷‍♀️ by @KarelZe in https://github.com/KarelZe/thesis/pull/404

Full Changelog: https://github.com/KarelZe/thesis/compare/23-23...23-24

- TeX
Published by KarelZe over 2 years ago

thesis - Changes between 29 May - 4 June

What's Changed

Empirical Study ⚗️

Fix attention maps / embedding visualizations🐞 by @KarelZe in https://github.com/KarelZe/thesis/pull/398 ### Writing 📖
Rework chapter on hyperparameter search🪙 by @KarelZe in https://github.com/KarelZe/thesis/pull/385
Summarize related works in an appendix by @KarelZe in https://github.com/KarelZe/thesis/pull/390
Rework chapter on research framework👷‍♀️ by @KarelZe in https://github.com/KarelZe/thesis/pull/391
Explain accuracy / loss discrepancy👶 by @KarelZe in https://github.com/KarelZe/thesis/pull/392
Streamline hyperparameter chapter🚂 by @KarelZe in https://github.com/KarelZe/thesis/pull/393
Add chapter on introduction👶 by @KarelZe in https://github.com/KarelZe/thesis/pull/395
Edit in Feedback Nikolas👨‍💻 by @KarelZe in https://github.com/KarelZe/thesis/pull/400
Start with chapter results 👨‍💻 by @KarelZe in https://github.com/KarelZe/thesis/pull/401 ### Other Changes
Bump google-auth from 2.19.0 to 2.19.1 by @dependabot in https://github.com/KarelZe/thesis/pull/396

Outlook 🔭

See https://github.com/users/KarelZe/projects/1/views/4.

Full Changelog: https://github.com/KarelZe/thesis/compare/23-22...23-23

- TeX
Published by KarelZe over 2 years ago

thesis - Changes between 22 May - 28 May

What's Changed

Empirical Study ⚗️

Remove breaking changes for GitHub action🐙 by @KarelZe in https://github.com/KarelZe/thesis/pull/380
Cleanup i. e., notebooks and code🚂 by @KarelZe in https://github.com/KarelZe/thesis/pull/383
Investigate sampling bias in unlabelled data🔎 by @KarelZe in https://github.com/KarelZe/thesis/pull/376
Implement Pre-Training🛝 by @KarelZe in https://github.com/KarelZe/thesis/pull/343
Fix pre-commit hooks🐙 by @KarelZe in https://github.com/KarelZe/thesis/pull/384 ### Writing 📖
Add in feedback from supervisor✒️ by @KarelZe in https://github.com/KarelZe/thesis/pull/379
Prepare chapter on feature importances i. e., SAGE / categorical embeddings / smaller fixes🥬 by @KarelZe in https://github.com/KarelZe/thesis/pull/375 ### Other Changes
Bump requests from 2.30.0 to 2.31.0 by @dependabot in https://github.com/KarelZe/thesis/pull/377
Bump google-auth from 2.18.1 to 2.19.0 by @dependabot in https://github.com/KarelZe/thesis/pull/382

Outlook 🔭

See https://github.com/users/KarelZe/projects/1/views/4.

Full Changelog: https://github.com/KarelZe/thesis/compare/23-21...23-22

- TeX
Published by KarelZe over 2 years ago

thesis - Changes between 15 May - 21 May

What's Changed

Empirical Study ⚗️

Fix typos in 6.0g-mb-viz-fttransformer notebook🐛 by @KarelZe in https://github.com/KarelZe/thesis/pull/369
Implement Attention Maps and fix SAGE values 👨‍💻 by @KarelZe in https://github.com/KarelZe/thesis/pull/358 ### Writing 📖
Implement and visualize embeddings🪄 by @KarelZe in https://github.com/KarelZe/thesis/pull/322
Chapter on selection of supervised methods👩‍🎓 by @KarelZe in https://github.com/KarelZe/thesis/pull/353, https://github.com/KarelZe/thesis/pull/371, and https://github.com/KarelZe/thesis/pull/372
Add visualizations of embedding✅ by @KarelZe in https://github.com/KarelZe/thesis/pull/360
Add chapter on Research Framework🪀 by @KarelZe in https://github.com/KarelZe/thesis/pull/365
Improve coverage of Feature Sets🧃 by @KarelZe in https://github.com/KarelZe/thesis/pull/368
Start Feature Set Definition + add viz Rearch Framework🪅 by @KarelZe in https://github.com/KarelZe/thesis/pull/361
Fix :US: spelling 🗽 by @KarelZe in https://github.com/KarelZe/thesis/pull/373
Shorten thesis ❌ by @KarelZe in https://github.com/KarelZe/thesis/pull/374 ### Other Changes
Bump google-auth from 2.18.0 to 2.18.1 by @dependabot in https://github.com/KarelZe/thesis/pull/367 ### Outlook 🔭 See https://github.com/users/KarelZe/projects/1/views/4.

Full Changelog: https://github.com/KarelZe/thesis/compare/23-20...23-21

- TeX
Published by KarelZe almost 3 years ago

thesis - Changes between 8 May - 14 May

Took Friday and Saturday off to get uni-related work done.

What's Changed

Empirical Study ⚗️

Clean up of outdated files ♻️ by @KarelZe in https://github.com/KarelZe/thesis/pull/355
Implemented correct feature importance measures (WIP) @KarelZe in https://github.com/KarelZe/thesis/pull/322
- includes SAGE values with zero-one loss and permutation in groups. Also, opened an issue https://github.com/iancovert/sage/issues/18 to discuss the idea and implementation with the authors. (WIP)
- includes visualizations of categorical embeddings with highly promising results.
- includes new approach to calculate attention maps (Cheffer et al) (WIP) ### Writing 📖
Add paragraphs on label smoothing, lr warmup, optimizer, and viz🤖 by @KarelZe in https://github.com/KarelZe/thesis/pull/350
Add results hyperparameter search gradient-boosting 😺 by @KarelZe in https://github.com/KarelZe/thesis/pull/352
Rewrite chapter Hyperparameter Search with updated results 🗺️ by @KarelZe in https://github.com/KarelZe/thesis/pull/354
Chapter on the selection of supervised methods👩‍🎓 (WIP) by @KarelZe in https://github.com/KarelZe/thesis/pull/353 ### Other Changes
Bump gcsfs from 2023.4.0 to 2023.5.0 by @dependabot in https://github.com/KarelZe/thesis/pull/351
Bump google-auth from 2.17.3 to 2.18.0 by @dependabot in https://github.com/KarelZe/thesis/pull/357 ### Outlook 🔭 See https://github.com/users/KarelZe/projects/1/views/4.

Full Changelog: https://github.com/KarelZe/thesis/compare/23-19...23-20

- TeX
Published by KarelZe almost 3 years ago

thesis - Changes between 1 May - 7 May

What's Changed

Empirical Study ⚗️

Implement Pre-Training🛝 (WIP) by @KarelZe in https://github.com/KarelZe/thesis/pull/343
Implement and Study Feature Importances🪄 (WIP) by @KarelZe in https://github.com/KarelZe/thesis/pull/322
- identified that random feature permutation won't work as expected

Writing 📖

Extend chapter on hyperparameter tuning, training of supervised / semi-supervised methods 📖 by @KarelZe in https://github.com/KarelZe/thesis/pull/342
- includes new insights in the training configuration of models
- includes new insights on the hyperparameters and their necessity
- identified smaller errors that led to largely fluctuating errors

Other Changes

Bump typer from 0.8.0 to 0.9.0 by @dependabot in https://github.com/KarelZe/thesis/pull/341
Bump requests from 2.29.0 to 2.30.0 by @dependabot in https://github.com/KarelZe/thesis/pull/344

Outlook 🔭

See https://github.com/users/KarelZe/projects/1/views/4.

Full Changelog: https://github.com/KarelZe/thesis/compare/23-18...23-19

- TeX
Published by KarelZe almost 3 years ago

thesis - Changes between 24 April - 30 April

What's Changed

Empirical Study ⚗️

Fix typo in train_model.py🐍 by @KarelZe in https://github.com/KarelZe/thesis/pull/323
Add label smoothing🍷 by @KarelZe in https://github.com/KarelZe/thesis/pull/328
Fix Transformer Implementation 🚑 by @KarelZe in https://github.com/KarelZe/thesis/pull/320
Add logistic regression🌉 by @KarelZe in https://github.com/KarelZe/thesis/pull/329 ### Writing 📖
Rewrite chapter on Pre-Training and Rewrite selection of semi-supervised methods🤖 by @KarelZe in https://github.com/KarelZe/thesis/pull/316
Add improved visualizations 🖼️ by @KarelZe in https://github.com/KarelZe/thesis/pull/318
Edit in review comments👩‍🎓 by @KarelZe in https://github.com/KarelZe/thesis/pull/319
Various writing improvements📖 by @KarelZe in https://github.com/KarelZe/thesis/pull/324
Discussion on computational demand + smaller fixes🏭 by @KarelZe in https://github.com/KarelZe/thesis/pull/338
Add in missing page numbers🔢 by @KarelZe in https://github.com/KarelZe/thesis/pull/340 ### Other Changes
Bump requests from 2.28.2 to 2.29.0 by @dependabot in https://github.com/KarelZe/thesis/pull/321
Bump fastparquet from 2023.2.0 to 2023.4.0 by @dependabot in https://github.com/KarelZe/thesis/pull/325
Bump typer from 0.7.0 to 0.8.0 by @dependabot in https://github.com/KarelZe/thesis/pull/339

Outlook 🔭 See https://github.com/users/KarelZe/projects/1/views/4.

Full Changelog: https://github.com/KarelZe/thesis/compare/23-17...23-18

- TeX
Published by KarelZe almost 3 years ago

thesis - Changes between 17 April - 23 April

What's Changed

Empirical Study ⚗️

Started with EDA on unlabelled data. Still have to make sense of the results.
Continued working on the invalid gradient problem. Haven't yet figured out, how to reproduce it reliably.

Writing 📖

Various theory chapters🤖 by @KarelZe in https://github.com/KarelZe/thesis/pull/312
Reworked chapter on token embeddings
Reworked chapter on FT-Transformer
Reworked chapter on decision trees
Shortened several chapters
Added chapter on Attention Mechanism
Added chapter on Gradient Boosting Procedure
Added discussion on Selection of Semi-Supervised Approaches
Added chapter on Pre-Training of Transformers
Various other improvements: notation, viz, typos, :us: / 🇬🇧 dialect, etc.
Edit in feedback🐥 by @KarelZe in https://github.com/KarelZe/thesis/pull/315 ### Other Changes
Bump psutil from 5.9.4 to 5.9.5 by @dependabot in https://github.com/KarelZe/thesis/pull/313

Outlook 🔭

See https://github.com/users/KarelZe/projects/1/views/4.

Full Changelog: https://github.com/KarelZe/thesis/compare/23-16...23-17

- TeX
Published by KarelZe almost 3 years ago

thesis - Changes between 10 April - 16 April

What's Changed

Empirical Study ⚗️

Implement proper training setup for transformers🤖 by @KarelZe in https://github.com/KarelZe/thesis/pull/292
Remove TabTransformer🤖 by @KarelZe in https://github.com/KarelZe/thesis/pull/305 ### Writing 📖
Fill in gap for trade initiator definition🧑‍🌾 by @KarelZe in https://github.com/KarelZe/thesis/pull/217
Chapter on Self-Training⭕ by @KarelZe in https://github.com/KarelZe/thesis/pull/296
Chapter on hyperparameter tuning🏎️ by @KarelZe in https://github.com/KarelZe/thesis/pull/300
Streamline thesis 🚈 by @KarelZe in https://github.com/KarelZe/thesis/pull/302
Add chapter on TokenEmbeddings💤 by @KarelZe in https://github.com/KarelZe/thesis/pull/307
Streamline writing of thesis🪜 by @KarelZe in https://github.com/KarelZe/thesis/pull/297
Paragraph on Random Feature Permutation / Partial Dependence Plots📑 by @KarelZe in https://github.com/KarelZe/thesis/pull/310
Edit in comments from Patrick🐥 by @KarelZe in https://github.com/KarelZe/thesis/pull/308 ### Other Changes
Bump gcsfs from 2023.3.0 to 2023.4.0 by @dependabot in https://github.com/KarelZe/thesis/pull/298
Add chapter on hyperparameter tuning (current state)🏎️ by @KarelZe in https://github.com/KarelZe/thesis/pull/295
Bump google-auth from 2.17.2 to 2.17.3 by @dependabot in https://github.com/KarelZe/thesis/pull/303

Outlook

see: https://github.com/users/KarelZe/projects/1/views/4

Full Changelog: https://github.com/KarelZe/thesis/compare/23-15...23-16

- TeX
Published by KarelZe almost 3 years ago

thesis - Changes between 3 April - 9 April

What's Changed

Empirical Study ⚗️

Fix totals in tables📊 by @KarelZe in https://github.com/KarelZe/thesis/pull/276
Add retraining / semi-supervised mode to gradient boosting😺 by @KarelZe in https://github.com/KarelZe/thesis/pull/278
Create summary statistics classical trade classification rules📊 by @KarelZe in https://github.com/KarelZe/thesis/pull/279
Code review of data preparation notebooks😈 by @KarelZe in https://github.com/KarelZe/thesis/pull/280
Run studies for SelfTrainingClassifier🅾️ by @KarelZe in https://github.com/KarelZe/thesis/pull/249
Fix statistical tests in effective spread calculation🌄 by @KarelZe in https://github.com/KarelZe/thesis/pull/281
Add transfer learning results🔄️ by @KarelZe in https://github.com/KarelZe/thesis/pull/285
Select benchmark on validation set🔧 by @KarelZe in https://github.com/KarelZe/thesis/pull/291
Delete references to Docker⚓ by @KarelZe in https://github.com/KarelZe/thesis/pull/294 ### Writing 📖
Chapter on evaluation metric🪙 by @KarelZe in https://github.com/KarelZe/thesis/pull/216
Delete outdated files and add questions for meeting❌ by @KarelZe in https://github.com/KarelZe/thesis/pull/283
Chapter on Semi-Supervised Learning🦯 by @KarelZe in https://github.com/KarelZe/thesis/pull/284
Various improvements: evaluation metric, hyperparameter tuning, and application study🎩 by @KarelZe in https://github.com/KarelZe/thesis/pull/286 ### Other Changes
Bump google-auth from 2.17.1 to 2.17.2 by @dependabot in https://github.com/KarelZe/thesis/pull/288

Full Changelog: https://github.com/KarelZe/thesis/compare/23-14...23-15

- TeX
Published by KarelZe almost 3 years ago

thesis - Changes between 27 March and 2 April

What's Changed

Empirical Study ⚗️

Allow unclassified in ClassicalClassifier🏦 by @KarelZe in https://github.com/KarelZe/thesis/pull/219
Implement Self-Training for CatBoost⭕ by @KarelZe in https://github.com/KarelZe/thesis/pull/215
Extend result generation🏁 by @KarelZe in https://github.com/KarelZe/thesis/pull/228
Improve Result Tables🖨️ by @KarelZe in https://github.com/KarelZe/thesis/pull/234
Fix midpoint/spread in ClassicalClassifier🐞 by @KarelZe in https://github.com/KarelZe/thesis/pull/235
Improve feature engineering notebook🤏 by @KarelZe in https://github.com/KarelZe/thesis/pull/236
Remove from feature set mode none the zero imputation🐞 by @KarelZe in https://github.com/KarelZe/thesis/pull/239
Generate ISE / CBOE supervised results of Gradient Boosting🐈 by @KarelZe in https://github.com/KarelZe/thesis/pull/243
Improvement of resumable studies and SelfTrainingClassifier🅾️ by @KarelZe in https://github.com/KarelZe/thesis/pull/246 and in https://github.com/KarelZe/thesis/pull/224
Run studies for SelfTrainingClassifier🅾️ (WIP) by @KarelZe in https://github.com/KarelZe/thesis/pull/249
Add visualizations of hyperparameter search space and fix minor typos🌔 by @KarelZe in https://github.com/KarelZe/thesis/pull/248 ### Writing 📖
Chapter on Feature Engineering🪄 by @KarelZe in https://github.com/KarelZe/thesis/pull/212
Update chapter on dataset/results 📑 by @KarelZe in https://github.com/KarelZe/thesis/pull/237
Run studies for SelfTrainingClassifier🅾️ (WIP) by @KarelZe in https://github.com/KarelZe/thesis/pull/249
Add chapter on random feature permutation🔀 (WIP) by @KarelZe in https://github.com/KarelZe/thesis/pull/217 ### Other Changes
Bump google-auth from 2.16.3 to 2.17.0 by @dependabot in https://github.com/KarelZe/thesis/pull/229
Bump google-auth from 2.17.0 to 2.17.1 by @dependabot in https://github.com/KarelZe/thesis/pull/242

Outlook 🔭

See https://github.com/users/KarelZe/projects/1/views/4.

Full Changelog: https://github.com/KarelZe/thesis/compare/23-13...23-14

- TeX
Published by KarelZe almost 3 years ago

thesis - Changes between 20 March and 26 March

Picked work again on Thursday.

What's Changed

Empirical Study ⚗️

Fix NaN gradients🐞 by @KarelZe in https://github.com/KarelZe/thesis/pull/137
Implement Self-Training Classifier⭕ (WIP) by @KarelZe in https://github.com/KarelZe/thesis/pull/209
Allow unclassified in ClassicalClassifier🏦 (WIP) by @KarelZe in https://github.com/KarelZe/thesis/pull/218 ### Writing 📖
Add questions for tomorrow❓ by @KarelZe in https://github.com/KarelZe/thesis/pull/205
Create chapter on Data-Preprocessing🌋 by @KarelZe in https://github.com/KarelZe/thesis/pull/214
Section on Feature Engineering🪄 (WIP) by @KarelZe in https://github.com/KarelZe/thesis/pull/208
Chapter on Self-Training⭕ (WIP) by @KarelZe in https://github.com/KarelZe/thesis/pull/209
Paragraph on Random Feature Permutation📑(WIP) by @KarelZe in https://github.com/KarelZe/thesis/pull/210

Other Changes

Bump google-auth from 2.16.2 to 2.16.3 by @dependabot in https://github.com/KarelZe/thesis/pull/213

Outlook 🛩️

https://github.com/users/KarelZe/projects/1/views/4?filterQuery=status%3A%22In+Progress%22%2C%22todo%22+

Full Changelog: https://github.com/KarelZe/thesis/compare/23-12...23-13

- TeX
Published by KarelZe almost 3 years ago

thesis - Changes between 13 March and 19 March

Didn't work 100 % on thesis. Spent most time on exam prep.

BwHPC Cluster is down until Friday. Thus, I will spend my time after the exam on writing ✏️ .

What's Changed

Empirical Study ⚗️

Generate results for classical classifier + effective spread👸 by @KarelZe in https://github.com/KarelZe/thesis/pull/200
Automatic generation of results tables🏇 by @KarelZe in https://github.com/KarelZe/thesis/pull/201
Automatic result / viz generation for gradient boosting🙀 by @KarelZe in https://github.com/KarelZe/thesis/pull/203
Add ROC / Recall curves to notebooks🦉 by @KarelZe in https://github.com/KarelZe/thesis/pull/204
Extended pipeline for result generation🛕 by @KarelZe in https://github.com/KarelZe/thesis/pull/202
Gathered some ideas on how to retrieve the feature importances / need to correct probabilities.

Outlook🎒 - exam prep (Mo - Wed) - write the chapter on data preprocessing incl. viz - shorten / rewrite the chapter on feature engineering - prewrite the sub-chapter on random feature permutation. Make sure it is the best possible choice. - create prototype for grouped random feature permutation - review and test #137

Full Changelog: https://github.com/KarelZe/thesis/compare/23-11...23-12

- TeX
Published by KarelZe almost 3 years ago

thesis - Changes between 6 March and 12 March

Didn't work 100 % on thesis. Spent some time on exam prep.

What's Changed

Writing 📖

Chapter on environment🪐 by @KarelZe in https://github.com/KarelZe/thesis/pull/196
Add chapter on train-test split🥮 by @KarelZe in https://github.com/KarelZe/thesis/pull/193
Streamline chapter on train-test split and improved visualizations🚂 by @KarelZe in https://github.com/KarelZe/thesis/pull/197 and https://github.com/KarelZe/thesis/pull/198
Add chapter on trade initiator🥯 by @KarelZe in https://github.com/KarelZe/thesis/pull/199 ### Other Changes
Bump tqdm from 4.64.1 to 4.65.0 by @dependabot in https://github.com/KarelZe/thesis/pull/195
Bump gcsfs from 2023.1.0 to 2023.3.0 by @dependabot in https://github.com/KarelZe/thesis/pull/194

Outlook🎒

finish remaining tasks from last week
exam prep

Full Changelog: https://github.com/KarelZe/thesis/compare/23-10...23-11

- TeX
Published by KarelZe almost 3 years ago

thesis - Changes between 27 February and 5 March

What's Changed

Didn't work 100 % on thesis. Spent some time on exam prep.

Writing 📖

Incorporate review comments from Christian📬 by @KarelZe in https://github.com/KarelZe/thesis/pull/185
Add chapter on problem framing and notation⛺ by @KarelZe in https://github.com/KarelZe/thesis/pull/188
Add feature definition to appendix🪙 by @KarelZe in https://github.com/KarelZe/thesis/pull/189
Add questions for meeting🙆‍♀️ by @KarelZe in https://github.com/KarelZe/thesis/pull/190
Notes on dataset and improved viz⛺ by @KarelZe in https://github.com/KarelZe/thesis/pull/187
Add chapter on dataset🌏 by @KarelZe in https://github.com/KarelZe/thesis/pull/192 ### Other Changes
Bump google-auth from 2.16.1 to 2.16.2 by @dependabot in https://github.com/KarelZe/thesis/pull/191

Outlook🎒

finish remaining tasks from last week
exam prep

Full Changelog: https://github.com/KarelZe/thesis/compare/23-09...23-10

- TeX
Published by KarelZe almost 3 years ago

thesis - Changes between February, 20th and February, 26th

What's Changed

Empirical Study ⚗️

Add notes, code, tests, and Chapter on effective spread🍕 by @KarelZe in https://github.com/KarelZe/thesis/pull/184 ### Writing 📖
Add chapter on Regression Trees🎄 by @KarelZe in https://github.com/KarelZe/thesis/pull/170
Add section on attention maps🧭 by @KarelZe in https://github.com/KarelZe/thesis/pull/172
Edit in review comments🎒 by @KarelZe in https://github.com/KarelZe/thesis/pull/174
Optimized citations/typesetting and extended check_formalia.py 🐍 by @KarelZe in https://github.com/KarelZe/thesis/pull/175
Edit in comments from second review👨‍🎓 by @KarelZe in https://github.com/KarelZe/thesis/pull/179
Add visualizations for layer norm🍇 by @KarelZe in https://github.com/KarelZe/thesis/pull/178
Add chapter on TabTransformer📑 by @KarelZe in https://github.com/KarelZe/thesis/pull/180
Add chapter on FT-Transformer🕹️ by @KarelZe in https://github.com/KarelZe/thesis/pull/181
Add notes and viz on train-test-split🍿 by @KarelZe in https://github.com/KarelZe/thesis/pull/182 ### Other Changes
Bump google-auth from 2.16.0 to 2.16.1 by @dependabot in https://github.com/KarelZe/thesis/pull/171
Bump actions/checkout from 1 to 3 by @dependabot in https://github.com/KarelZe/thesis/pull/183

Outlook🎒

Write the chapter on the gradient boosting procedure
Finish the attention and embeddings chapter. Add some nice visuals!
Integrate feedback
Resolve my small TODOs in LaTeX sources / go through warnings / fix overflows
Loosely research how pre-training on unlabelled data can be implemented in PyTorch
(merge and rework the Chapter on feature engineering)

Full Changelog: https://github.com/KarelZe/thesis/compare/23-08...23-09

- TeX
Published by KarelZe almost 3 years ago

thesis - Changes between February, 13th and February, 19th

What's Changed

Writing 📖

Refactor and enhance stacked hybrid rules to separate chapter 🔢 by @KarelZe in https://github.com/KarelZe/thesis/pull/155
Extend chapter on LR algorithm📖 by @KarelZe in https://github.com/KarelZe/thesis/pull/156
Research on trade initiator for CBOE / ISE 📑 by @KarelZe in https://github.com/KarelZe/thesis/pull/157
Improve readability of Overview over Transformers 🤖 by @KarelZe in https://github.com/KarelZe/thesis/pull/158
Rewrite chapter on positional encoding🧵 by @KarelZe in https://github.com/KarelZe/thesis/pull/159
Rewrite chapter position-wise FFN for clarity🎱 by @KarelZe in https://github.com/KarelZe/thesis/pull/160
Rewrite chapter on residual connections🔗 by @KarelZe in https://github.com/KarelZe/thesis/pull/161
Update citation style and table of symbols🎙️ by @KarelZe in https://github.com/KarelZe/thesis/pull/162
Add feature set definition to appendix🧃 by @KarelZe in https://github.com/KarelZe/thesis/pull/164
Add visualizations of Transformer for tabular data🖼️ by @KarelZe in https://github.com/KarelZe/thesis/pull/165
Improve captioning and transitions for Transformer chapters 🍞 by @KarelZe in https://github.com/KarelZe/thesis/pull/166
Fix and simplify formulas❤️‍🩹 by @KarelZe in https://github.com/KarelZe/thesis/pull/167
Streamline and extend the chapter on LR algorithm📑 by @KarelZe in https://github.com/KarelZe/thesis/pull/168
Rewrite layer norm chapter and fuse with residual connections 🍔 by @KarelZe in https://github.com/KarelZe/thesis/pull/169
Restructure chapter on trade initiator🪴 by @KarelZe in https://github.com/KarelZe/thesis/pull/163

Outlook 🏍️

Merge and rework chapters on FTTransformer, TabTransforrmer, token embeddings, feature engineering, and attention maps
Write a chapter on decision trees and gradient boosting as well as attention
Create nice visualizations for categorical embeddings and layer norm
Integrate feedback from @lxndrblz and @pheusel
Improve transformer implementation e.g., by choosing different search spaces, using numerical embeddings, fixing sample weighting, completing experiments with pytorch 2.0 etc.
Investigate results of current models e.g., robustness, effective spread, spread, partial dependence plots, etc. (see https://github.com/KarelZe/thesis/issues/8)

Full Changelog: https://github.com/KarelZe/thesis/compare/23-07...23-08

- TeX
Published by KarelZe about 3 years ago

thesis - Changes between February, 6th and February, 12th

Due to the slow progress last week, I decided to switch plans and progress with writing. I wrote all chapters on classical trade classification rules (9 pages). I also incorporated these chapters into thesis.pdf. Also, I gathered several ideas on how to improve the transformer chapters.

What's Changed

Writing 📖

Add chapter on quote rule🔢 by @KarelZe in https://github.com/KarelZe/thesis/pull/146
Add chapter on depth rule🔢 by @KarelZe in https://github.com/KarelZe/thesis/pull/147
Add chapter on the EMO rule🔢 by @KarelZe in https://github.com/KarelZe/thesis/pull/149
Add chapter on trade size rule 📑 by @KarelZe in https://github.com/KarelZe/thesis/pull/150
Add chapter on CLNV method 🔢 by @KarelZe in https://github.com/KarelZe/thesis/pull/151
Add chapter on tick rule 🔢 by @KarelZe in https://github.com/KarelZe/thesis/pull/152
Add chapter on Lee and Ready algorithm + proofreading 🔢 by @KarelZe in https://github.com/KarelZe/thesis/pull/154

Other Changes

Bump fastparquet from 2023.1.0 to 2023.2.0 by @dependabot in https://github.com/KarelZe/thesis/pull/153

Outlook 🐿️

(same as last week, as I worked on the classical trade classification rules) - Complete notes and write a draft on the selection of (semi-) supervised approaches - Rethink Transformer chapter. I'm still not happy with the overall quality. Will probably spend more time rewriting/rethinking. - Improve transformer implementation e.g., by choosing different search spaces, using numerical embeddings, fixing sample weighting, completing experiments with pytorch 2.0 etc. - Investigate results of current models e.g., robustness, effective spread, spread, partial dependence plots, etc. (see https://github.com/KarelZe/thesis/issues/8) Full Changelog: https://github.com/KarelZe/thesis/compare/23-06...23-07

- TeX
Published by KarelZe about 3 years ago

thesis - Changes between January, 30th and February, 5th

What's Changed

Writing 📖

Rewrite transformer chapters for clarity by @KarelZe in https://github.com/KarelZe/thesis/pull/139
Fix merge and build errors in reports 🐞 by @KarelZe in https://github.com/KarelZe/thesis/pull/140
Chapter on related works 👪 by @KarelZe in https://github.com/KarelZe/thesis/pull/141
Add notes on depth, trade size, and CLNV rule💸 by @KarelZe in https://github.com/KarelZe/thesis/pull/142
Improve notes on tick rule, quote rule, LR algorithm, and EMO rule💸 by @KarelZe in https://github.com/KarelZe/thesis/pull/144
Notes for meeting and misc pre-writing changes🐿️ by @KarelZe in https://github.com/KarelZe/thesis/pull/145 ### Other Changes
Bump docker/build-push-action from 3 to 4 by @dependabot in https://github.com/KarelZe/thesis/pull/138

Outlook 🧪

Complete notes and write a draft on the selection of (semi-) supervised approaches
Rethink Transformer chapter. I'm still not happy with the overall quality. Will probably spend more time rewriting/rethinking.
Improve transformer implementation e.g., by choosing different search spaces, using numerical embeddings, fixing sample weighting, completing experiments with pytorch 2.0 etc.
Investigate results of current models e.g., robustness, effective spread, spread, partial dependence plots, etc. (see #8)

Full Changelog: https://github.com/KarelZe/thesis/compare/23-05...23-06

- TeX
Published by KarelZe about 3 years ago

thesis - Changes between January, 23rd and January, 29th

What's Changed

Empirical Study ⚗️

Feature engineering for a very large dataset 🌌 by @KarelZe in https://github.com/KarelZe/thesis/pull/126
Add retraining for gradient boosting [+ 2 %] 🍾 by @KarelZe in https://github.com/KarelZe/thesis/pull/130
Improve accuracy of TabTransformer [+ 5 % from prev.]🪅 by @KarelZe in https://github.com/KarelZe/thesis/pull/129
Fix cardinalities of Transformer implementation🪲 by @KarelZe in https://github.com/KarelZe/thesis/pull/132 ### Writing 📖
Complete notes on layer norm🍔 by @KarelZe in https://github.com/KarelZe/thesis/pull/123
Chapter on layer norm + notes on SSL and embeddings for tabular data 🧲 by @KarelZe in https://github.com/KarelZe/thesis/pull/131
Add chapter on embeddings of tabular data💤 by @KarelZe in https://github.com/KarelZe/thesis/pull/133
Fix broken references in expose 🔗 by @KarelZe in https://github.com/KarelZe/thesis/pull/135
Rework chapters on transformer 🤖 by @KarelZe in https://github.com/KarelZe/thesis/pull/134
[WIP] Add a chapter on attention, self-attention, multi-headed attention, and cross attention🅰️ by @KarelZe in https://github.com/KarelZe/thesis/pull/136 ### Other Changes
Bump gcsfs from 2022.11.0 to 2023.1.0 by @dependabot in https://github.com/KarelZe/thesis/pull/127

Outlook 🚀

Complete notes and write a draft on related works
Complete notes and write a draft on the selection of (semi-) supervised approaches
Complete notes and write a draft on classical trade classification rules
Try to shorten / streamline the theoretical background by one page. Also, aim for understanding, and improve visualizations
Improve transformer implementation e. g., by choosing different search spaces, using numerical embeddings, fixing sample weighting, completing experiments with pytorch 2.0 etc

Full Changelog: https://github.com/KarelZe/thesis/compare/23-04...23-05

- TeX
Published by KarelZe about 3 years ago

thesis - Changes between January, 16th and January, 22nd

What's Changed

Empirical Study ⚗️

Restore soft links 🔗 by @KarelZe in https://github.com/KarelZe/thesis/pull/120
Add current results⚡ by @KarelZe in https://github.com/KarelZe/thesis/pull/121
Change from code review 🧼 by @KarelZe in https://github.com/KarelZe/thesis/pull/124
Shared embeddings and pre-norm in TabTransformer 🤖 by @KarelZe in https://github.com/KarelZe/thesis/pull/118 After writing the TabTransformer chapter, I noticed that the open source implementation I had used was too simplistic. I also notified Borisov et. al of the issue, who had used the same implementation in their study (https://arxiv.org/abs/2110.01889). See https://github.com/kathrinse/TabSurvey/issues/13 for details.
Automatically find maximum batch size🥐 by @KarelZe in https://github.com/KarelZe/thesis/pull/125 ### Writing 📖
Add chapter on point-wise FFN🎱 by @KarelZe in https://github.com/KarelZe/thesis/pull/117
Add chapter on residual connections🔗 by @KarelZe in https://github.com/KarelZe/thesis/pull/119
[WIP] Chapter on layer norm🍔 by @KarelZe in https://github.com/KarelZe/thesis/pull/123 ### Other Changes
Bump fastparquet from 2022.12.0 to 2023.1.0 by @dependabot in https://github.com/KarelZe/thesis/pull/122

Outlook 🧪

It wasn't easy to obtain Jupyter resources on cluster last week. Thus, training and improving the Transformer didn't progress as initially hoped. Two SLURM jobs are still pending. Had some success with small scale experiments, though, with a performance similar to gradient boosting for FTTransformer. The results from Gradient Boosting with option features also look promising. See readme.md.
I also decided to break down training and tuning into smaller chunks after reading https://github.com/google-research/tuning_playbook. I hope it will give us more insights. I have already experimented with gradient tracking and added the option to automatically find the maximum batch size. I also restructured my notes on how I want to progress with training and tuning. Will add the option to keep certain parameters static. Also plan to add a much simpler baseline such as logistic regression and simplify evaluation. Might experiment with retraining.
Writing progressed slower than I anticipated due to various reasons. Still have to write the chapters on attention and MHSA, as well as pre-training of transformers.
I'll use next week to clean up the remaining tasks. 💯 Full Changelog: https://github.com/KarelZe/thesis/compare/23-03...23-04

- TeX
Published by KarelZe about 3 years ago

thesis - Changes between January, 9th and January, 15th

What's Changed

Empirical Study ⚗️

Run feature engineering on large scale (100 %) 💡 by @KarelZe in https://github.com/KarelZe/thesis/pull/109
Run exploratory data analysis on cluster (10 %) by @KarelZe in https://github.com/KarelZe/thesis/pull/108 ### Writing 📖
Add chapter on input embedding (finished) positional encoding (cont'd) 🛌 by @KarelZe in https://github.com/KarelZe/thesis/pull/107
Finish chapter on positional encoding🧵 by @KarelZe in https://github.com/KarelZe/thesis/pull/111
Add chapter on TabTransformer🔢 by @KarelZe in https://github.com/KarelZe/thesis/pull/112
Add chapter on FTTransformer 🤖 by @KarelZe in https://github.com/KarelZe/thesis/pull/113
Correction of column embedding in chapter TabTransformer 🤖 by @KarelZe in https://github.com/KarelZe/thesis/pull/115 ### Other Changes
Bump google-auth from 2.15.0 to 2.16.0 by @dependabot in https://github.com/KarelZe/thesis/pull/110
Bump requests from 2.28.1 to 2.28.2 by @dependabot in https://github.com/KarelZe/thesis/pull/114

Outlook💡

Perform a code review of all previously written code.
Continue with transformer week. 🤖 Mainly write remaining chapters on the classical transformer architecture, attention and MHSA, as well as pre-training of transformers.
Research additional tricks from literature to optimize training behaviour of transformers. Structure them for the chapter on training and tuning our models.
Increase performance of current transformer implementations by applying the tricks from above to match the performance of gradient-boosted trees.
Add shared embeddings to the TabTransformer implementation.
Restructure notes and draft chapter for model selection of supervised and semi-supervised models.

Full Changelog: https://github.com/KarelZe/thesis/compare/23-02...23-03

- TeX
Published by KarelZe about 3 years ago

thesis - Changes between January, 2nd and January, 8th

What's Changed

Empirical Study ⚗️

Create sklearn-compatible estimators 🦜 by @KarelZe in https://github.com/KarelZe/thesis/pull/93. Having a common sklearn-like interface is necessary for further aspects of training and evaluation like calculating SHAP values, creating learning curves, or simplify hyperparameter tuning etc.
Interpretability with SHAP and attention maps 🐇 by @KarelZe in https://github.com/KarelZe/thesis/pull/85. Kernel SHAP values can now be calculated for all models (classical + ml-based). This was marketed as one of the contributions of my paper. Need to research how to handle high correlation between features in kernel SHAP. Attention maps can be calculated for transformer-based models.
Add sample weighting to TransformerClassifier 🏋️ by @KarelZe in https://github.com/KarelZe/thesis/pull/100. Weight samples in the training set similar to how it's done in CatBoost. Therefore, more recent observations become more important.
Early stopping based on accuracy for TransformerClassifier🧁 by @KarelZe in https://github.com/KarelZe/thesis/pull/102. Perform early stopping based on validation accuracy instead of log loss. Thus, the implementation of early stopping for neural networks and gradient boosting is now discovered.
Improve robustness and tests of TabDataset 🚀 by @KarelZe in https://github.com/KarelZe/thesis/pull/101
Add instructions on using SLURM 🐧 by @KarelZe in https://github.com/KarelZe/thesis/pull/103. SLURM enables us to run a script on multiple nodes on the bwHPC cluster and extended periods. Required for final training.
Finalize exploratory data analysis 🚏 by @KarelZe in https://github.com/KarelZe/thesis/pull/105.
Finalize feature engineering🪄 by @KarelZe in https://github.com/KarelZe/thesis/pull/104 ### Writing 📖
Pre-write feature engineering chapter 🪛 by @KarelZe in https://github.com/KarelZe/thesis/pull/88
Write chapter on attention maps (finished) and gbm (contd) 🧭 by @KarelZe in https://github.com/KarelZe/thesis/pull/99. I noticed while implementing attention maps (see #85) that the common practice for calculating attention maps in the tabular domain is myopic. I researched approaches for transformers from other domains e. g., machine translations and documented my findings in this chapter. The chosen approaches take into account all attention layers and can handle attention heads with varying importance.
Questions for bi-weekly meeting❓ by @KarelZe in https://github.com/KarelZe/thesis/pull/106 ### Other Changes
Bump seaborn from 0.12.1 to 0.12.2 by @dependabot in https://github.com/KarelZe/thesis/pull/98

Outlook 🧪

Start into the Transformer week 🎉 . I will spend next week and the week after improving the Transformer-based models. I want to dive into learning rate scheduling, learning rate warm up etc. Will also pre-write the chapters on FTTransformer, TabTransformer, the classical Transformer, and self-attention.

Full Changelog: https://github.com/KarelZe/thesis/compare/23-01...23-02

- TeX
Published by KarelZe about 3 years ago

thesis - Changes between December, 26th and January, 1st

What's Changed

Christmas break🎄

Other Changes

Bump pydantic from 1.10.2 to 1.10.4 by @dependabot in https://github.com/KarelZe/thesis/pull/95

Full Changelog: https://github.com/KarelZe/thesis/compare/v0.2.7...cw-01

- TeX
Published by KarelZe about 3 years ago

thesis - Changes between December, 19th and December, 25th

Took some time off for Christmas. 🎄

What's Changed

Empirical Study ⚗️

Add accuracy for rev lr on test set 🆕 by @KarelZe in https://github.com/KarelZe/thesis/pull/89
Add compliance to pdf/A-2B 👒 by @KarelZe in https://github.com/KarelZe/thesis/pull/90
Removal of TabNet ❎ by @KarelZe in https://github.com/KarelZe/thesis/pull/91
Improve gpu utilization 🚂 by @KarelZe in https://github.com/KarelZe/thesis/pull/87

Writing 📖

Pre-writing of feature engineering and questions🪛 by @KarelZe in https://github.com/KarelZe/thesis/pull/86.

Outlook 🧪

Will take some time off until New Year's Eve. 🧘‍♂️
Continue pre-writing of supervised approaches i. e., transformers, ordered boosting, and #88.
Finalize feature sets. Make sure to add enconomic intuition, remove redundancies and transformations. Include feedback from discussions with @CaroGrau and @pheusel.
Continue work on #85 . Necessary to get insights on feature definitions. Branch will include attention activation and SHAP.
Continue work on #93. Having a common, sklearn-like interface is necessary for further aspects of training and evaluation like calculating SHAP values, creating learning curves, or simplify hyperparameter tuning etc.
Try out SLURM to train models overnight or for longer periods than 4 hours. Jobs can run longer for up to 48 hours.

Full Changelog: https://github.com/KarelZe/thesis/compare/v0.2.6...v0.2.7

- TeX
Published by KarelZe about 3 years ago

thesis - Changes between December, 12th and December, 18th

What's Changed

Empirical Study ⚗️

Add clnv results 🎯 by @KarelZe in https://github.com/KarelZe/thesis/pull/82. Add results for CLNV method as discussed in the meeting with @CaroGrau .
Add learning curves for CatBoost 🐈 by @KarelZe in https://github.com/KarelZe/thesis/pull/83. Helps to detect overfitting/underfitting. Learning curves are now also logged/tracked.
Improve accuracy [~1.2 %] by @KarelZe in https://github.com/KarelZe/thesis/pull/79. Most of the time was spent on improving the first model's accuracy (gbm).. Planned to improve by 4 %, achieved an improvement of 1.2 % compared to the previous week. Obtaining this improvement required a deep dive into gradient boosting, the catboost library and a bit of feature engineering. Roughly 1/3 of the improvement in accuracy comes from improved feature engineering, 1/3s from early stopping, and 1/3 from larger ensembles/fine-grained quantization/sample weighting. I tried to link quantization found in gradient boosting with quantile transformation from feature engineering, but it didn't work out. Did some sanity checks like comparing implementation with lightgbm, time-consistency analysis or updated adversarial validation,
Also, spent quite a bit of time researching on feature engineering techniques, focusing on features that can not be synthesized by neural nets or tree-based approaches. ### Writing 📖
Add reworked TOC and drafts 🎆 by @KarelZe in https://github.com/KarelZe/thesis/pull/80 as requested by @CaroGrau.
Draft for chapters trees, ordered boosting, and imputation🌮 by @KarelZe in https://github.com/KarelZe/thesis/pull/81. Continued research and drafting chapters on decision trees, gradient boosting, and feature scaling and imputation. Requires more work e. g., derivations of loss function in gradient boosting for classification was more involved than I expected. The draft is not as streamlined as it could be.

Outlook 🎆

Focus on drafting chapters only on gradient boosting, basic transformer architectures and specialized architectures.
Train transformers until meeting with @CaroGrau, but spent no time optimizing/improving them.

Full Changelog: https://github.com/KarelZe/thesis/compare/v0.2.5...v0.2.6

- TeX
Published by KarelZe about 3 years ago

thesis - Changes between December, 5th and December, 11th

What's Changed

Empirical Study ⚗️

Add implementation and tests for FTTransformer 🦾 by @KarelZe in https://github.com/KarelZe/thesis/pull/74. Adds a tuneable implementation of the FTTransformer from https://arxiv.org/abs/2106.11959. Most of the code is based on the author's code published by Yandex. Wrote additional tests and made the code work with our hyperparameter search.
Add implementation and tests for TabNet 🧠 by @KarelZe in https://github.com/KarelZe/thesis/pull/75. TabNet is another transformer-based architecture published in https://arxiv.org/abs/1908.07442 and the last model to be implemented. 🎉 Code is based on a popular PyTorch implementation. Made it work with our hyperparameter search and training pipeline and wrote additional tests.
Add tests for all objectives 🎯 by @KarelZe in https://github.com/KarelZe/thesis/pull/76. All training objectives defining the hyperparameter search space and training procedure now have tests.
Add intermediate results of TabTransformer and CatBoostClassifier 🐈 by @KarelZe in https://github.com/KarelZe/thesis/pull/71. Results as discussed in the last meeting with @CaroGrau.
Accelerate models with datapipes and torch.compile() 🚕 by @KarelZe in https://github.com/KarelZe/thesis/pull/64. Tested how the new features (datapipes and torch.compile()) could be used in my project. Still to early as discussed in the meeting with @CaroGrau.
Make calculations data parallel 🛣️ by @KarelZe in https://github.com/KarelZe/thesis/pull/77. All models can now be trained on multiple gpus in parallel, which should speed up training considerably. BwHPC provides up to four gpus that we can use. For gradient boosting, features are split among devices. For neural nets batches are split.
Add pruning support for Bayesian search 🧪 by @KarelZe in https://github.com/KarelZe/thesis/pull/78. I added support to prune unsuccessful trials in our Bayesian search. This should help with training and finding better solutions faster. Additional to the loss, the accuracy is also reported for all neural nets. Moreover, I integrated early stopping into the gradient boosting models, which should help to increase the performance. Also widened the hyperparameter search space for gradient boosted trees, which should help to find better solutions. Still have to verify with large studies on the cluster.

Writing 📖

Add questions for this week 📍 by @KarelZe in https://github.com/KarelZe/thesis/pull/70
Connect and expand notes 👩‍🚀 by @KarelZe in https://github.com/KarelZe/thesis/pull/65. Was able to slightly decrease the pile of papers. However, also found several new ones, like the linformer paper (https://arxiv.org/abs/2006.04768). ### Other Changes
Bump google-auth from 2.14.1 to 2.15.0 by @dependabot in https://github.com/KarelZe/thesis/pull/66
Bump fastparquet from 2022.11.0 to 2022.12.0 by @dependabot in https://github.com/KarelZe/thesis/pull/69

Outlook 💪

Finalize notes on decision trees / gradient boosting. Prepare the first draft.
Update table of contents.
Go back to eda. Define new features based on papers. Revise existing ones based on kde plots.
Create a notebook to study feature transformations / scaling e. g., log transform / robust scaling, systematically.
Study learning curves for gradient boosting models and transformers with default configurations. Verify the settings for early stopping.
Perform adversarial validation more thoroughly. Answer questions like: which features drive the difference between training and test set? What aspect does time play? What would happen, if problematic features were excluded?
Increase test accuracy by 4 %.

Full Changelog: https://github.com/KarelZe/thesis/compare/v0.2.4...v0.2.5

- TeX
Published by KarelZe about 3 years ago

thesis - Changes between November, 28th and December, 4th

What's Changed

Empirical Study ⚗️

Exploratory data analysis 🍁 by @KarelZe in https://github.com/KarelZe/thesis/pull/40, https://github.com/KarelZe/thesis/pull/50, and https://github.com/KarelZe/thesis/pull/51. Refactored eda to training set only, included new features, and added new visualizations. Performed cv to study the results of a gradient-boosted model on different feature sets. Overall the results look promising, as the untuned gradient-boosted tree outperforms the respective classical counterparts on all three subsets. Performance is relatively stable across folds and validation and test set. Log transform and imputation make no difference for gradient-boosted trees, as expected. Problems like the high cardinality of some categorical values still need to be addressed. I also feel like more features or other transformations could help. Will do more research in classical literature on this.
Add model tracking and add saving of optuna.Study to wandb through callback💫 by @KarelZe in https://github.com/KarelZe/thesis/pull/55 and https://github.com/KarelZe/thesis/pull/63 by @KarelZe. All major data (data sets, models, studies, and some diagrams) is no tracked in wandb and saved to gcs. The use of callbacks makes the implementation of other parts like learning rate schedulers or stochastic weight averaging, much easier.
I experimented with further accelerating the TabTransformer through PyTorch 2.0 and datapipes done in https://github.com/KarelZe/thesis/pull/64 (WIP) @KarelZe. This is still in progress, as I wasn't able compile the model yet. Got a vague idea of possible solutions e. g., stripped down implementation or upgrade of cuda. Also, waiting could help, as PyTorch 2.0 is in early beta and has only been announced at the weekend. Didn't look into datapipes yet, which could help closing the gap for serving the gpu enough data. Also did some research on how high cardinality features like ROOT can be handled in the model to avoid an explosion in parameters. The later is necessary to train the model with reasonable performance.
Increased test coverage to 83 % or 45 tests in total. Writing these tests helped me discover some minor bugs e. g., in depth rule, I had previously overlooked. Tests were added for:
- Add tests for logic of ClassicalClassifier 🚑 by @KarelZe in https://github.com/KarelZe/thesis/pull/45
- Add tests for TabDataSet ⛑️ by @KarelZe in https://github.com/KarelZe/thesis/pull/46
- Add tests for TabDataLoader by @KarelZe in https://github.com/KarelZe/thesis/pull/47
- Add mixin and new tests for neural nets 🫗 by @KarelZe in https://github.com/KarelZe/thesis/pull/48
- Add parameterized tests and migrate to PyTest🍭 by @KarelZe in https://github.com/KarelZe/thesis/pull/49
Add new features to train, validation and test set 🏖️ by @KarelZe in https://github.com/KarelZe/thesis/pull/51
Refactor redundant code ClassicalClassifier 🧫 by @KarelZe in https://github.com/KarelZe/thesis/pull/52 and https://github.com/KarelZe/thesis/pull/53.
Remove RDB support 🦣 by @KarelZe in https://github.com/KarelZe/thesis/pull/54

Writing

Add new notes and connected existing ideas. I was able to reduce the pile of unread papers to 30. https://github.com/KarelZe/thesis/pull/65 (WIP) @KarelZe.

Other Changes

Add dependabot.yml for dependency bot 🦾 by @KarelZe in https://github.com/KarelZe/thesis/pull/56
Bump schneegans/dynamic-badges-action from 1.4.0 to 1.6.0 by @dependabot in https://github.com/KarelZe/thesis/pull/57
Bump typer from 0.6.1 to 0.7.0 by @dependabot in https://github.com/KarelZe/thesis/pull/62
Bump fastparquet from 0.8.3 to 2022.11.0 by @dependabot in https://github.com/KarelZe/thesis/pull/60

Outlook:

Pre-write first chapter of thesis
Set up plan for writing
Read 10 papers and add to zettelkasten
Turn eda notebook into scripts for feature generation and document with tests
Train TabTransformer and gradient boosted model until meeting with @CaroGrau
Further improve training performance of TabTransformer. Try out into datapipes and adjust implementation to be closer to paper. Decrease cardinality through nlp techniques

New Contributors

@dependabot made their first contribution in https://github.com/KarelZe/thesis/pull/57

Full Changelog: https://github.com/KarelZe/thesis/compare/v0.2.3...v0.2.4

- TeX
Published by KarelZe about 3 years ago

thesis - Changes between November, 21st and November, 27th

What's Changed

Empirical Study ⚗️

Add TabTransformer baseline 🤖 by @KarelZe in https://github.com/KarelZe/thesis/pull/34. Involved implementation and documentation of the model, early stopping, data set and data loader. Most notably, I was able to speed up the implementation of https://github.com/kathrinse/TabSurvey/ by factor x9.8 (see notebook through an improved data loader, decoupling of training and data loading, and mixed precision support. Also tested were fused operations, pre-loading, and the use of pinned memory. An analysis with the PyTorch profiler reveals that the GPU is now less idle. Training on the entire data set is theoretically possible.
Fix classical rules🐞 by @KarelZe in https://github.com/KarelZe/thesis/pull/41. The issue came up during last week's discussion with @CaroGrau. The differences in accuracy are tiny. Usually < 1 %.
Add test cases for classical classifier ⛑️ by @KarelZe in https://github.com/KarelZe/thesis/pull/42. Tests are formal e. g., correct shapes of predictions or fitting behaviour.
Add implementation of CLNV method 🏖️ by @KarelZe in https://github.com/KarelZe/thesis/pull/43
Add tests for TabTransformer ⛑️ by @KarelZe in https://github.com/KarelZe/thesis/pull/44. Test for shapes of predictions, for parameter updates and convergence.

Writing 📖

Add questions for this weeks meeting ❓ by @KarelZe in https://github.com/KarelZe/thesis/pull/39
Researched techniques and new papers on speeding up transformers

Outlook 🔭

Read more again and minimize the stack of open papers (40+)
Better connect existing ideas in zettelkasten
Finish exploratory data analysis i. e., include new features, refactor to training data only, and do CV to better understand features
Improve test coverage i. e., data loader and classical rules

Full Changelog: https://github.com/KarelZe/thesis/compare/v0.2.2...v0.2.3

- TeX
Published by KarelZe about 3 years ago

thesis - Changes between November, 14th and November, 20th

What's Changed

Empirical Study ⚗️

Add tuneable implementation of gbm and classical rules 🐈‍⬛ by @KarelZe in https://github.com/KarelZe/thesis/pull/35 and https://github.com/KarelZe/thesis/pull/27. Models can now be trained using parametrized scripts i. e., python src/models/train_model.py --trials=5 --seed=42 --model=gbm --dataset=fbv/thesis/train_val_test_w_trade_size:v0 --features=ml. Data is loaded from versioned artefacts. Interrupted studies can now be continued at a later point in time. Added some tests for search. Bayesian search is implemented for gradient-boosted trees, TabTransformer, but also classical rules. Thereby, I was able to find combinations of classical rules previously not reported in the @CaroGrau paper.
Added TabTransformer implementation by @KarelZe in https://github.com/KarelZe/thesis/pull/34 (WIP). The current implementation performs binary classification and can be tuned using Bayesian search. Issues to resolve: improve utilization of accelerator, speed up training, and code quality. Plan to address these with PyTorch profiling, CUDA profiler, a custom PyTorch Dataset, and a bit of luck 🎲 .
Add basic docker support 🐳 by @KarelZe in https://github.com/KarelZe/thesis/pull/28. Docker image now available on docker hub.
Add compliance to pre-commit hooks 🪝 by @KarelZe in https://github.com/KarelZe/thesis/pull/33. Pre-commit hooks help to avoid potential bugs. Code in main is now fully-documented and annotated with type-hints.
Simplified project and test setup 🧯 by @KarelZe in https://github.com/KarelZe/thesis/pull/38. This greatly improves reproducibility and ease of development.

Writing 📖

Add notes to zettelkasten 🗃️ by @KarelZe in https://github.com/KarelZe/thesis/pull/29
Add proposal for feature sets 🧃 by @KarelZe in https://github.com/KarelZe/thesis/pull/31
Simplified and extended readme.md 🎍 by @KarelZe in https://github.com/KarelZe/thesis/pull/36
Finalized expose with numbers 🥳 by @KarelZe in https://github.com/KarelZe/thesis/pull/37

Outlook 🔭

Continue with exploratory data analysis and start with explanatory data analysis
Analyze low resource utilization and slow training of TabTransformer
Read more again and minimize the stack of open papers (30 +)
Better connect existing ideas in zettelkasten
Improve test coverage

Full Changelog: https://github.com/KarelZe/thesis/compare/v0.2.1...v0.2.2

- TeX
Published by KarelZe over 3 years ago

thesis - Changes between November, 7th and November, 13th

What's Changed

Empirical Study ⚗️

Added all classical rules from @CaroGrau paper. 🎉 Any rule can now be stacked together in an arbitrary order predict_rules(layers=[(trade_size,"ex"), (quote,"best"), (quote, "ex")], name="Tradesize + Quote (NBBO) + Quote (ISE)"). Minor differences in accuracy still exist due to a different handling of missing values by @KarelZe in https://github.com/KarelZe/thesis/pull/27 (WIP)
Started with a exploratory data analysis by @KarelZe in https://github.com/KarelZe/thesis/pull/25 (WIP)
Added proposal for feature set definition by @KarelZe in https://github.com/KarelZe/thesis/pull/29 (WIP)
Created a docker image to run code on bwUniCluster 2.0 and runpod by @KarelZe in https://github.com/KarelZe/thesis/pull/28 (WIP)
Did all the set up to connect to bwUniCluster 2.0 ### Writing 📖
Added new notes and updated questions 🗃️ by @KarelZe in https://github.com/KarelZe/thesis/pull/26 and https://github.com/KarelZe/thesis/pull/29 (WIP)

Outlook for Upcoming Week 🔭

implement a transformer-based baseline
rework hyperparameter searches to be interruptable
continue with exploratory data analysis and start with explanatory data analysis
add more notes to zettelkasten
better connect existing ideas in zettelkasten
finish WIPs

Full Changelog: https://github.com/KarelZe/thesis/compare/v0.2.0...v0.2.1

- TeX
Published by KarelZe over 3 years ago

thesis - Changes between October, 31st and November, 6th

What's Changed

Empirical Study ⚗️

Improved adversarial validation and memory-constrained CSV loading ⛑️ by @KarelZe in https://github.com/KarelZe/thesis/pull/17, https://github.com/KarelZe/thesis/pull/18 and https://github.com/KarelZe/thesis/pull/24
Implementation of (promising) GBM baseline, Bayesian search, some classical rules and robustness checks🧸 by @KarelZe in https://github.com/KarelZe/thesis/pull/19
Experimented with training models on runpods.io to mitigate severe performance issues with Google Colab

Writing 📖

Restructured readme ☄️ by @KarelZe in https://github.com/KarelZe/thesis/pull/21 and https://github.com/KarelZe/thesis/pull/22
Added 15+ literature notes to zettelkasten 🗃️ by @KarelZe in https://github.com/KarelZe/thesis/pull/23 and https://github.com/KarelZe/thesis/pull/20
Researched additional 30+ papers to read in the next week

Outlook for Upcoming Week 🔭

start with explanatory data analysis
investigate differences in the accuracy of classical rules with regard to @CaroGrau paper
start implementing a transformer-based baseline
add more notes to zettelkasten
bundle training scripts in docker container

Full Changelog: https://github.com/KarelZe/thesis/compare/v0.1.9...v0.2.0

- TeX
Published by KarelZe over 3 years ago

thesis - Changes between 27th October and 30th October

What's Changed

Empirical Study ⚗️

Set up Google Cloud Storage and Google Colab.
Loaded csv data into pandas data frame, inferred dtypes, performed optimizations and exported into .parquet chunks by @KarelZe in https://github.com/KarelZe/thesis/pull/12.
Added data set versioning using weights & biases.
Created sub samples e. g., 2015 and train, validation, and test set.
Cleaned up requirements / fix version in requirements.txt.
Created tests / assertations against @CaroGrau paper 2.0-mb-data_preprocessing_loading_splitting.ipynb.
Ran adversarial validation in 2.0-mb-data_preprocessing_loading_splitting.ipynb. ### Writing 📖
Added more notes to Zettelkasten e. g., 6f704ff9f6ee9d4333001c8e970cf1b091e568bb. ### Other changes
Added script auto-generate release notes 📯 by @KarelZe in https://github.com/KarelZe/thesis/pull/13.

Full Changelog: https://github.com/KarelZe/thesis/compare/v.0.1.6...v0.1.9

- TeX
Published by KarelZe over 3 years ago

Recent Releases of thesis

thesis - Changes between 10 July - 19 November

What's Changed

Empirical Study ⚗️

thesis - Print-Version 🖨️

What's Changed

Empirical Study ⚗️

thesis - Changes between 3 July - 9 July

What's Changed

Writing 📖

thesis - Changes between 26 June - 2 July

What's Changed

Writing 📖

thesis - Changes between 19 June - 25 June

What's Changed

Writing 📖

thesis - Changes between 12 June - 18 June

What's Changed

Empirical Study ⚗️

thesis - Changes between 5 June - 11 June

What's Changed

Empirical Study ⚗️

thesis - Changes between 29 May - 4 June

What's Changed

Empirical Study ⚗️

Outlook 🔭

thesis - Changes between 22 May - 28 May

What's Changed

Empirical Study ⚗️

Outlook 🔭

thesis - Changes between 15 May - 21 May

What's Changed

Empirical Study ⚗️

thesis - Changes between 8 May - 14 May

What's Changed

Empirical Study ⚗️

thesis - Changes between 1 May - 7 May

What's Changed

Empirical Study ⚗️

Writing 📖

Other Changes

Outlook 🔭

thesis - Changes between 24 April - 30 April

What's Changed

Empirical Study ⚗️

thesis - Changes between 17 April - 23 April

What's Changed

Empirical Study ⚗️

Writing 📖

Outlook 🔭

thesis - Changes between 10 April - 16 April

What's Changed

Empirical Study ⚗️

Outlook

thesis - Changes between 3 April - 9 April

What's Changed

Empirical Study ⚗️

thesis - Changes between 27 March and 2 April

What's Changed

Empirical Study ⚗️

Outlook 🔭

thesis - Changes between 20 March and 26 March

What's Changed

Empirical Study ⚗️

Other Changes

Outlook 🛩️

thesis - Changes between 13 March and 19 March

What's Changed

Empirical Study ⚗️

thesis - Changes between 6 March and 12 March

What's Changed

Writing 📖

Outlook🎒

thesis - Changes between 27 February and 5 March

What's Changed

Writing 📖

Outlook🎒

thesis - Changes between February, 20th and February, 26th

What's Changed

Empirical Study ⚗️