Files, Folders, and First Steps: Getting Data from Your Computer

Finding Your Way: Paths and Folders in Python
Finding a file on your computer feels simple once you know the rules. Python relies on clear paths—absolute or relative—to point it to the right place.
An absolute path is the full address. Think C:\Users\Maria\Downloads\data.csv on Windows or /Users/maria/Downloads/data.csv on macOS and Linux. A relative path starts from where your script runs. Write data.csv if the file sits beside your script, or ../data.csv if it lives one folder up.

Python’s os.path.abspath() turns any relative path into an absolute one. The newer pathlib module makes the same task smoother:
from pathlib import Path
file = Path("data.csv")
print(file.resolve()) # absolute path appears here
If a FileNotFoundError pops up, check your working directory with os.getcwd() or Path.cwd() and adjust the path.

Prefer pathlib for every path because it works the same on Windows, macOS, and Linux. Join parts with / to avoid messy slashes:
root = Path("/Users/maria/Downloads")
file = root / "data.csv"
Print the result if you ever doubt where Python is looking.

Reading the Usual Suspects: CSV, Excel, JSON, and Text
Most data arrives in a few common formats, and pandas opens them all. Reading a CSV takes one line:
import pandas as pd
df = pd.read_csv("data.csv")
If the file uses semicolons or tabs, add sep=";" or sep="\t".

Pandas also reads Excel sheets:
df = pd.read_excel("data.xlsx")
You might need openpyxl for .xlsx files. Pass sheet_name to pick a specific tab.
For JSON, try:
df = pd.read_json("data.json")
Deeply nested JSON may need json plus pd.json_normalize, yet flat files load fine.
Plain text files vary. If each line holds a record, use pd.read_table. Otherwise, open the file and inspect the first lines before choosing a method.

Quick Checks: Making Sure Your Data Makes Sense
Always verify your data right after loading it. Start with shape to see rows and columns:
print(df.shape)
Then peek at the first rows:
print(df.head())
Call df.info() to catch missing columns or odd data types. If columns like Unnamed: 0 appear, adjust header or skiprows in read_csv.

Missing values? Count them:
print(df.isnull().sum())
Grab a random sample to spot anomalies:
print(df.sample(5))
Check duplicates before saving cleaned data:
print(df.duplicated().sum())
Add a suffix like _cleaned when writing new files so nothing important disappears.
