Data 100 is UC Berkeley's intermediate-level data science course, bridging concepts from statistics, computer science, and data analysis. It emphasizes hands-on experience with data manipulation, visualization, modeling, and decision-making.
1. Data Manipulation: Pandas DataFrames, SQL-style queries, and data cleaning techniques.
2. Visualization: Using Matplotlib, Seaborn, and Altair to create insightful charts and graphs.
3. Statistical Modeling: Linear regression, classification models, and evaluating model performance.
4. Machine Learning: Concepts of overfitting, bias-variance tradeoff, and decision-making under uncertainty.
5. Ethical Considerations: Bias in data and responsible use of predictive models.

Developed a logistic regression classifier to distinguish between spam and ham emails, utilizing natural language processing techniques to analyze email content.
Employed Python libraries such as scikit-learn for machine learning, NLTK for text analysis, and Matplotlib for data visualization.
Achieved 98.5% accuracy in email classification, placing me 15th on the leaderboard of 1120 submissions
This course deepened my ability to work with real-world data, from cleaning to deriving actionable insights.
I appreciated the emphasis on ethical decision-making, particularly when applying machine learning to socially impactful domains.
Project 2 was particularly rewarding as it solidified my understanding of the end-to-end data science pipeline.