diff --git a/references/obsidian/chapters/environment.md b/references/obsidian/chapters/environment.md index 6357519b..08976435 100644 --- a/references/obsidian/chapters/environment.md +++ b/references/obsidian/chapters/environment.md @@ -34,7 +34,6 @@ dependencies = [ "wandb==0.13.5", ] ``` - - For presentation of chapter see: [[@prokhorenkovaCatBoostUnbiasedBoosting2018]] - source code of experiments and paper is available at https://github.com/KarelZe/thesis/ - Get some inspiration from https://madewithml.com/#mlops \ No newline at end of file diff --git a/references/obsidian/chapters/feature_engineering.md b/references/obsidian/chapters/feature_engineering.md index c54f86de..1e8ab827 100644 --- a/references/obsidian/chapters/feature_engineering.md +++ b/references/obsidian/chapters/feature_engineering.md @@ -72,7 +72,6 @@ - In my dataset the previous or subsequent trade price is already added as feature and thus does not have to be searched recursively. - Motivation for scaling features to $[-1,1]$ range or zero mean. https://stats.stackexchange.com/questions/249378/is-scaling-data-0-1-necessary-when-batch-normalization-is-used - If needed tokenization support: https://github.com/google/sentencepiece - - Visualize behaviour over time e. g., appearing `ROOT`s and calculate statistics. How many of the clients / percentage are in the train set and how many are just in the test set? ![[uuid_over_time.png]] (found at https://www.kaggle.com/competitions/ieee-fraud-detection/discussion/111284) diff --git a/reports/Content/main.tex b/reports/Content/main.tex index 3ce57228..faa47ee6 100644 --- a/reports/Content/main.tex +++ b/reports/Content/main.tex @@ -376,10 +376,11 @@ \subsubsection{ISE Data Set (0.5~p)}\label{sec:ise-data-set} \subsubsection{CBOE Data Set (0.5~p)}\label{sec:cboe-data-set} -\subsubsection{Generation of True - Labels (0.5~p)}\label{sec:generation-of-true-labels} +\subsubsection{Exploratory Data Analysis (2~p)}\label{sec:exploratory-data-analysis} -\subsubsection{Feature Engineering (4~p)}\label{sec:feature-engineering} +\subsubsection{Data Pre-Processing (1~p)}\label{sec:data-preprocessing} + +\subsubsection{Feature Engineering (1.5~p)}\label{sec:feature-engineering} \subsubsection{Train-Test Split (0.5~p)}\label{sec:train-test-split}