
Getting Your Hands Dirty: Your First Model
Tiny predictions shape daily life. They steer your route, remind you to grab an umbrella, and help a cashier decide if you look old enough for a movie. Data-driven guesses keep things flowing.
Computers make the same calls but faster and with mountains of information your brain can’t hold. They rely on clear rules and don’t get tired, so their results stay steady.

Meet Your Data: Getting Ready to Model
Your computer can’t guess from nothing. It needs carefully collected examples—rows in a table that capture past situations.
Each column is a feature that describes the row, such as rooms, square footage, or age of a house. One column is the target, the value you hope to predict, like price.
You load data with pandas in Python.
import pandas as pd
data = pd.read_csv("house_prices.csv") # pretend you have this file
print(data.head())
The head command shows the first five rows so you can spot issues early.

Splitting Up: Training and Testing Sets
A model should never judge itself on data it already saw. Fresh questions reveal real skill.
So you divide your table. One slice trains the model, the other measures its performance on unseen rows.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
That 20 % test share leaves enough samples to trust the score.

Your First Model: Fitting and Predicting
A model is a learned equation built from your training slice. Fitting means teaching this equation to match the patterns.
For prices, pick LinearRegression. For labels like spam or not spam, choose LogisticRegression.
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
Then check the root mean squared error (RMSE).
from sklearn.metrics import mean_squared_error
print(mean_squared_error(y_test, predictions) ** 0.5)
Switch to classification metrics when dealing with categories.
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print((y_pred == y_test).mean())
