13 min read  •  13 min listen

Fetching the Facts

How to Pull Data into Python Without Losing Your Mind

Fetching the Facts

AI-Generated

April 28, 2025

You want answers, not headaches. This tome shows you how to grab data from files, databases, and the web—without the usual mess. Learn the simple, reliable ways to get your data into Python, check it’s right, and get on with the fun part.


Files, Folders, and First Steps: Getting Data from Your Computer

Glowing cyberpunk file system labyrinth illustrates navigating complex computer folders to locate data efficiently

Finding Your Way: Paths and Folders in Python

Finding a file on your computer feels simple once you know the rules. Python relies on clear paths—absolute or relative—to point it to the right place.

An absolute path is the full address. Think C:\Users\Maria\Downloads\data.csv on Windows or /Users/maria/Downloads/data.csv on macOS and Linux. A relative path starts from where your script runs. Write data.csv if the file sits beside your script, or ../data.csv if it lives one folder up.

Detective examines folder tree and Python error messages on a chalkboard, symbolizing troubleshooting path mistakes

Python’s os.path.abspath() turns any relative path into an absolute one. The newer pathlib module makes the same task smoother:

from pathlib import Path
file = Path("data.csv")
print(file.resolve())  # absolute path appears here

If a FileNotFoundError pops up, check your working directory with os.getcwd() or Path.cwd() and adjust the path.

Hands connect paper puzzle pieces labeled with a folder and file name, highlighting the slash that joins them

Prefer pathlib for every path because it works the same on Windows, macOS, and Linux. Join parts with / to avoid messy slashes:

root = Path("/Users/maria/Downloads")
file = root / "data.csv"

Print the result if you ever doubt where Python is looking.

Pop-art can opener cracks a CSV can, releasing colorful rows of data while a scientist looks on amazed

Reading the Usual Suspects: CSV, Excel, JSON, and Text

Most data arrives in a few common formats, and pandas opens them all. Reading a CSV takes one line:

import pandas as pd
df = pd.read_csv("data.csv")

If the file uses semicolons or tabs, add sep=";" or sep="\t".

Laptop with glowing windows shows CSV grid, JSON tree, and text file, representing multiple data formats

Pandas also reads Excel sheets:

df = pd.read_excel("data.xlsx")

You might need openpyxl for .xlsx files. Pass sheet_name to pick a specific tab.

For JSON, try:

df = pd.read_json("data.json")

Deeply nested JSON may need json plus pd.json_normalize, yet flat files load fine.

Plain text files vary. If each line holds a record, use pd.read_table. Otherwise, open the file and inspect the first lines before choosing a method.

Vintage scientist studies data table under magnifying glass, emphasizing careful inspection

Quick Checks: Making Sure Your Data Makes Sense

Always verify your data right after loading it. Start with shape to see rows and columns:

print(df.shape)

Then peek at the first rows:

print(df.head())

Call df.info() to catch missing columns or odd data types. If columns like Unnamed: 0 appear, adjust header or skiprows in read_csv.

Floating jigsaw puzzle of a data table with missing and duplicate pieces represents validation checks

Missing values? Count them:

print(df.isnull().sum())

Grab a random sample to spot anomalies:

print(df.sample(5))

Check duplicates before saving cleaned data:

print(df.duplicated().sum())

Add a suffix like _cleaned when writing new files so nothing important disappears.


Tome Genius

Data Science with Python: From Data to Insights

Part 2

Tome Genius

Cookie Consent Preference Center

When you visit any of our websites, it may store or retrieve information on your browser, mostly in the form of cookies. This information might be about you, your preferences, or your device and is mostly used to make the site work as you expect it to. The information does not usually directly identify you, but it can give you a more personalized experience. Because we respect your right to privacy, you can choose not to allow some types of cookies. Click on the different category headings to find out more and manage your preferences. Please note, blocking some types of cookies may impact your experience of the site and the services we are able to offer. Privacy Policy.
Manage consent preferences
Strictly necessary cookies
Performance cookies
Functional cookies
Targeting cookies

By clicking “Accept all cookies”, you agree Tome Genius can store cookies on your device and disclose information in accordance with our Privacy Policy.

00:00