2025 USA-NA-AIO Round 1, Problem 3, Part 1

Problem 3 (100 points)

Before starting this problem, make sure to run the following code first without any change:

# DO NOT CHANGE

import numpy as np

import pandas as pd

import copy

import matplotlib.pyplot as plt

from sklearn.preprocessing import StandardScaler

np.random.seed(2025)

""" END OF THIS PART """

\color{red}{\text{WARNING !!!}}

  • Beyond importing libraries/modules/classes/functions in the preceeding cell, you are NOT allowed to import anything else for the following purposes:

    • As a part of your final solution. For instance, if a problem asks you to build a model without using sklearn but you use it, then you will not earn points.

    • Temporarily import something to assist you to get a solution. For instance, if a problem asks you to manually compute eigenvalues but you temporarily use np.linalg.eig to get an answer and then delete your code, then you violate the rule.

    Rule of thumb: Each part has its particular purpose to intentionally test you something. Do not attempt to find a shortcut to circumvent the rule.

  • All coding tasks shall run on CPUs, not GPUs.

Part 1 (5 points, coding task)

We study the dataset USAAIO_2025_round1_prob3_train.csv provided in this contest.

The dataset can be found here:

url = "https://drive.google.com/file/d/125YsFPS2nCNRvYyy1tgnD8RhYIUglLX9/view?usp=sharing"

Do the following tasks in this part.

  1. Load USAAIO_2025_round1_prob3_train.csv into a pandas DataFrame object called df_1.

  2. Print the first 10 rows.

  3. Define a function called data_summary that

    • Takes a DataFrame object as an input.

    • Prints the shape of the DataFrame.

    • Prints the data type for each column.

    • Prints the count of missing values for each column.

    • Delivers no output.

  4. After defining the above function, call it by feeding df_1 to it.

### WRITE YOUR SOLUTION HERE ###

df_1 = pd.read_csv('USAAIO_2025_round1_prob3_train.csv')
print(df_1.head(10))

def data_summary(df):
    print(f"Shape: {df.shape}")
    print(f"Data Types: {df.dtypes}")
    print(f"Missing Values per Column: {df.isnull().sum()}")

data_summary(df_1)

""" END OF THIS PART """