• Overview
  • Course Highlights
  • Takeaways
  • Data 100: Principles and Techniques of Data Science

    Grade Received: A

    Python | Pandas | Machine Learning

    Link: Course Website

    Overview

    Introduction

    Data 100 is UC Berkeley's intermediate-level data science course, bridging concepts from statistics, computer science, and data analysis. It emphasizes hands-on experience with data manipulation, visualization, modeling, and decision-making.

    Key Topics Covered

    1. Data Manipulation: Pandas DataFrames, SQL-style queries, and data cleaning techniques.

    2. Visualization: Using Matplotlib, Seaborn, and Altair to create insightful charts and graphs.

    3. Statistical Modeling: Linear regression, classification models, and evaluating model performance.

    4. Machine Learning: Concepts of overfitting, bias-variance tradeoff, and decision-making under uncertainty.

    5. Ethical Considerations: Bias in data and responsible use of predictive models.

    Coursework Highlights

    Project B: Spam and Ham Classification

    Project Visualization

    Developed a logistic regression classifier to distinguish between spam and ham emails, utilizing natural language processing techniques to analyze email content.

    Employed Python libraries such as scikit-learn for machine learning, NLTK for text analysis, and Matplotlib for data visualization.

    Achieved 98.5% accuracy in email classification, placing me 15th on the leaderboard of 1120 submissions

    Takeaways

    This course deepened my ability to work with real-world data, from cleaning to deriving actionable insights.

    I appreciated the emphasis on ethical decision-making, particularly when applying machine learning to socially impactful domains.

    Project 2 was particularly rewarding as it solidified my understanding of the end-to-end data science pipeline.