15 min read  •  11 min listen

First Steps in Modeling

A Beginner’s Shortcut to Making Predictions with Python

First Steps in Modeling

AI-Generated

April 28, 2025

You’re about to learn how to make your computer guess things for you—using Python. If you’ve ever wondered how to get started with machine learning, this is your shortcut. No jargon, just the steps you need to go from raw data to your first real predictions.


Futuristic living room with neon lights where a person explores a transparent interface filled with weather, transit, and ID checks

Getting Your Hands Dirty: Your First Model

Tiny predictions shape daily life. They steer your route, remind you to grab an umbrella, and help a cashier decide if you look old enough for a movie. Data-driven guesses keep things flowing.

Computers make the same calls but faster and with mountains of information your brain can’t hold. They rely on clear rules and don’t get tired, so their results stay steady.

Tech enthusiast reviewing a glowing spreadsheet and floating code in a dim server room

Meet Your Data: Getting Ready to Model

Your computer can’t guess from nothing. It needs carefully collected examples—rows in a table that capture past situations.

Each column is a feature that describes the row, such as rooms, square footage, or age of a house. One column is the target, the value you hope to predict, like price.

You load data with pandas in Python.

import pandas as pd

data = pd.read_csv("house_prices.csv")  # pretend you have this file
print(data.head())

The head command shows the first five rows so you can spot issues early.

Illustration of a park path splitting into training and testing trails with signposts

Splitting Up: Training and Testing Sets

A model should never judge itself on data it already saw. Fresh questions reveal real skill.

So you divide your table. One slice trains the model, the other measures its performance on unseen rows.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

That 20 % test share leaves enough samples to trust the score.

Steampunk lab where a scientist feeds data gears into a glowing prediction machine

Your First Model: Fitting and Predicting

A model is a learned equation built from your training slice. Fitting means teaching this equation to match the patterns.

For prices, pick LinearRegression. For labels like spam or not spam, choose LogisticRegression.

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)

predictions = model.predict(X_test)

Then check the root mean squared error (RMSE).

from sklearn.metrics import mean_squared_error
print(mean_squared_error(y_test, predictions) ** 0.5)

Switch to classification metrics when dealing with categories.

from sklearn.linear_model import LogisticRegression

clf = LogisticRegression()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print((y_pred == y_test).mean())

Tome Genius

Data Science with Python: From Data to Insights

Part 7

Tome Genius

Cookie Consent Preference Center

When you visit any of our websites, it may store or retrieve information on your browser, mostly in the form of cookies. This information might be about you, your preferences, or your device and is mostly used to make the site work as you expect it to. The information does not usually directly identify you, but it can give you a more personalized experience. Because we respect your right to privacy, you can choose not to allow some types of cookies. Click on the different category headings to find out more and manage your preferences. Please note, blocking some types of cookies may impact your experience of the site and the services we are able to offer. Privacy Policy.
Manage consent preferences
Strictly necessary cookies
Performance cookies
Functional cookies
Targeting cookies

By clicking “Accept all cookies”, you agree Tome Genius can store cookies on your device and disclose information in accordance with our Privacy Policy.

00:00