2025 USA-NA-AIO Round 1, Problem 3, Part 1

USAAIO · March 28, 2025, 5:31am

Problem 3 (100 points)

Before starting this problem, make sure to run the following code first without any change:

# DO NOT CHANGE

import numpy as np

import pandas as pd

import copy

import matplotlib.pyplot as plt

from sklearn.preprocessing import StandardScaler

np.random.seed(2025)

""" END OF THIS PART """

\color{red}{\text{WARNING !!!}}

Beyond importing libraries/modules/classes/functions in the preceeding cell, you are NOT allowed to import anything else for the following purposes:
- As a part of your final solution. For instance, if a problem asks you to build a model without using sklearn but you use it, then you will not earn points.
- Temporarily import something to assist you to get a solution. For instance, if a problem asks you to manually compute eigenvalues but you temporarily use np.linalg.eig to get an answer and then delete your code, then you violate the rule.
Rule of thumb: Each part has its particular purpose to intentionally test you something. Do not attempt to find a shortcut to circumvent the rule.
All coding tasks shall run on CPUs, not GPUs.

Part 1 (5 points, coding task)

We study the dataset USAAIO_2025_round1_prob3_train.csv provided in this contest.

The dataset can be found here:

url = "https://drive.google.com/file/d/125YsFPS2nCNRvYyy1tgnD8RhYIUglLX9/view?usp=sharing"

Do the following tasks in this part.

Load USAAIO_2025_round1_prob3_train.csv into a pandas DataFrame object called df_1.
Print the first 10 rows.
Define a function called data_summary that
- Takes a DataFrame object as an input.
- Prints the shape of the DataFrame.
- Prints the data type for each column.
- Prints the count of missing values for each column.
- Delivers no output.
After defining the above function, call it by feeding df_1 to it.

USAAIO · March 28, 2025, 5:32am

### WRITE YOUR SOLUTION HERE ###

df_1 = pd.read_csv('USAAIO_2025_round1_prob3_train.csv')
print(df_1.head(10))

def data_summary(df):
    print(f"Shape: {df.shape}")
    print(f"Data Types: {df.dtypes}")
    print(f"Missing Values per Column: {df.isnull().sum()}")

data_summary(df_1)

""" END OF THIS PART """

Topic		Replies	Views
2025 USA-NA-AIO Round 1, Problem 1, Part 1 2025 USA-NA-AIO Round 1	1	597	March 28, 2025
2025 USA-NA-AIO Round 1, Problem 2, Part 1 2025 USA-NA-AIO Round 1	1	254	March 28, 2025
2025 USA-NA-AIO Round 2, Problem 3, Part 1 2025 USA-NA-AIO Round 2	1	81	May 14, 2025
2025 USA-NA-AIO Round 2, Problem 3, Part 2 2025 USA-NA-AIO Round 2	1	43	May 14, 2025
2025 USA-NA-AIO Round 1, Problem 3, Part 3 2025 USA-NA-AIO Round 1	1	96	March 28, 2025

2025 USA-NA-AIO Round 1, Problem 3, Part 1

Problem 3 (100 points)

Part 1 (5 points, coding task)

Related topics