Daily Note - 07-05-2024

Kaggle Environment
pip
PCA
pickle
python
latex
pandas
Author

JM Ascacibar

Published

May 7, 2024

Using Path, pip install in kaggle, PCA resources, save/load data pickle files, Remove space in streams, \mathrm in latex, pandas warnings

Daily Note - 07/05/2024

1. Using Path from pathlib:

I was basically trying to set up an if statement that allows the notebook to identify whether is running on Kaggle or locally.

My error was to concatenate a the path variable with the filename using the / operator.

I was able to fix this by using the pathlib library and the Path object.

if 'KAGGLE_KERNEL_RUN_TYPE' in os.environ:
    print('Running in Kaggle')
    path = Path("/kaggle/input/playground-series-s4e5")
    org_path = Path("/kaggle/input/flood-prediction-factors")
else: 
    print("Running Locally")
    path = Path(".")
    org_path = Path(".")
# Data
train = pd.read_csv(path/"train.csv", index_col='id')
test = pd.read_csv(path/"test.csv", index_col='id')
data = pd.concat([train, test], axis=0)
original = pd.read_csv(org_path/"flood.csv")
tr_ext = pd.concat([train, original], axis=0)
# Target variable
TARGET = "FloodProbability"
# Initial Features
init_feat = list(test.columns)

2. PIP in Kaggle

Sometimes you need to install a package that is not available in the Kaggle environment. In order to do that, you can use the !pip install command. If you need to update a package, you can use the !pip install --upgrade command. Using the flag -q will make the installation process quiet, which means that it will not display the output of the installation process.

3. Save data into a pickle file

We import the pickle module. We define separate filenames for the pickle files. We use two separate with statements to open and save each dictionary to its respective pickle file. We use pickle.dump() to serialize and write each dictionary to its corresponding file. We use pickle.load() to read and deserialize each dictionary from its corresponding file. We print the contents of each dictionary to verify that the data was saved and loaded correctly.

oof_filename = 'oof_pred.pkl'
tst_pred_filename = 'tst_pred.pkl'
with open(oof_filename, 'wb') as oof_file:
    pickle.dump(oof, oof_file)
with open(tst_pred_filename, 'wb') as tst_pred_file:
    pickle.dump(tst_pred, tst_pred_file)
print("OOF predictions saved to:", oof_filename)
print("Test predictions saved to:", tst_pred_filename)
with open(oof_filename, 'rb') as oof_file:
    oof = pickle.load(oof_file)
with open(tst_pred_filename, 'rb') as tst_pred_file:
    tst_pred = pickle.load(tst_pred_file)

4. Remove space in the name of a feature

The strip() method is used to remove leading and trailing spaces from a string. In this case, since there is only one trailing space, it will be effectively removed

original.columns = original.columns.str.strip()

5. Using \mathrm in latex

In LaTeX, \mathrm is used to typeset a mathematical expression in roman (upright) font, rather than the default italic font used for math.

\[RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}\]

6. Pandas warnings

You can suppress the warning by adding the following code at the top of your notebook.

import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

Findings and resources:

Back to top