Fetching the Facts

Files, Folders, and First Steps: Getting Data from Your Computer

Glowing cyberpunk file system labyrinth illustrates navigating complex computer folders to locate data efficiently

Finding Your Way: Paths and Folders in Python

Finding a file on your computer feels simple once you know the rules. Python relies on clear paths—absolute or relative—to point it to the right place.

An absolute path is the full address. Think C:\Users\Maria\Downloads\data.csv on Windows or /Users/maria/Downloads/data.csv on macOS and Linux. A relative path starts from where your script runs. Write data.csv if the file sits beside your script, or ../data.csv if it lives one folder up.

Detective examines folder tree and Python error messages on a chalkboard, symbolizing troubleshooting path mistakes

Python’s os.path.abspath() turns any relative path into an absolute one. The newer pathlib module makes the same task smoother:

from pathlib import Path
file = Path("data.csv")
print(file.resolve())  # absolute path appears here

If a FileNotFoundError pops up, check your working directory with os.getcwd() or Path.cwd() and adjust the path.

Hands connect paper puzzle pieces labeled with a folder and file name, highlighting the slash that joins them

Prefer pathlib for every path because it works the same on Windows, macOS, and Linux. Join parts with / to avoid messy slashes:

root = Path("/Users/maria/Downloads")
file = root / "data.csv"

Print the result if you ever doubt where Python is looking.

Pop-art can opener cracks a CSV can, releasing colorful rows of data while a scientist looks on amazed

Reading the Usual Suspects: CSV, Excel, JSON, and Text

Most data arrives in a few common formats, and pandas opens them all. Reading a CSV takes one line:

import pandas as pd
df = pd.read_csv("data.csv")

If the file uses semicolons or tabs, add sep=";" or sep="\t".

Laptop with glowing windows shows CSV grid, JSON tree, and text file, representing multiple data formats

Pandas also reads Excel sheets:

df = pd.read_excel("data.xlsx")

You might need openpyxl for .xlsx files. Pass sheet_name to pick a specific tab.

For JSON, try:

df = pd.read_json("data.json")

Deeply nested JSON may need json plus pd.json_normalize, yet flat files load fine.

Plain text files vary. If each line holds a record, use pd.read_table. Otherwise, open the file and inspect the first lines before choosing a method.

Vintage scientist studies data table under magnifying glass, emphasizing careful inspection

Quick Checks: Making Sure Your Data Makes Sense

Always verify your data right after loading it. Start with shape to see rows and columns:

print(df.shape)

Then peek at the first rows:

print(df.head())

Call df.info() to catch missing columns or odd data types. If columns like Unnamed: 0 appear, adjust header or skiprows in read_csv.

Floating jigsaw puzzle of a data table with missing and duplicate pieces represents validation checks

Missing values? Count them:

print(df.isnull().sum())

Grab a random sample to spot anomalies:

print(df.sample(5))

Check duplicates before saving cleaned data:

print(df.duplicated().sum())

Add a suffix like _cleaned when writing new files so nothing important disappears.

How to Pull Data into Python Without Losing Your Mind

Files, Folders, and First Steps: Getting Data from Your Computer

Finding Your Way: Paths and Folders in Python

Reading the Usual Suspects: CSV, Excel, JSON, and Text

Quick Checks: Making Sure Your Data Makes Sense

Data Science with Python: From Data to Insights