Data 100: Principles and Techniques of Data Science

Grade Received: A

Python | Pandas | Machine Learning

Link: Course Website

Introduction

Data 100 is UC Berkeley's intermediate-level data science course, bridging concepts from statistics, computer science, and data analysis. It emphasizes hands-on experience with data manipulation, visualization, modeling, and decision-making.

Key Topics Covered

1. Data Manipulation: Pandas DataFrames, SQL-style queries, and data cleaning techniques.

2. Visualization: Using Matplotlib, Seaborn, and Altair to create insightful charts and graphs.

3. Statistical Modeling: Linear regression, classification models, and evaluating model performance.

4. Machine Learning: Concepts of overfitting, bias-variance tradeoff, and decision-making under uncertainty.

5. Ethical Considerations: Bias in data and responsible use of predictive models.

Project B: Spam and Ham Classification

Developed a logistic regression classifier to distinguish between spam and ham emails, utilizing natural language processing techniques to analyze email content.

Employed Python libraries such as scikit-learn for machine learning, NLTK for text analysis, and Matplotlib for data visualization.

Achieved 98.5% accuracy in email classification, placing me 15th on the leaderboard of 1120 submissions

This course deepened my ability to work with real-world data, from cleaning to deriving actionable insights.

I appreciated the emphasis on ethical decision-making, particularly when applying machine learning to socially impactful domains.

Project 2 was particularly rewarding as it solidified my understanding of the end-to-end data science pipeline.

Brandon Wu

About

Projects

Courses

Data 100: Principles and Techniques of Data Science

Grade Received: A

Python | Pandas | Machine Learning

Overview

Introduction

Key Topics Covered

Coursework Highlights

Project B: Spam and Ham Classification

Takeaways