Jaewon Shim
Portfolio

Data Science major with Business and Industrial Analytics emphasis @ University of California, Berkeley

Jaewon Shim

Education

University of California Berkeley
Data Science B.A. | GPA: 3.9 / 4.0
Dec 2025 | Berkeley, CA

Work Experiences

Data Scientist Intern, MKS Instruments (Jun 2024 – Dec 2024)

  • Designed and deployed an ensemble machine learning model (LightGBM, Random Forest) on 500K+ rows of laser product inspection data, achieving 92% prediction accuracy and reducing false positives by 35%.
  • Operated the model for real-time defect risk scoring, contributing to a 12% increase in first-pass yield and $750K annual scrap reduction.
  • Led migration of 20+ dashboards from Tableau to Power BI, optimizing DAX data models and ETL pipelines to cut data refresh times by 45% and improve stakeholder usability.
  • Automated reporting workflows using Python and Excel VBA, increasing operational efficiency by 30%, and supported quality initiatives with exploratory data analysis that reduced warranty incidents by 23%.

Python and Mathematics Tutor, Tublet (Feb 2023 – Mar 2025)

  • Facilitated online programming and statistics tutoring sessions, offering guidance to over 100 students.
  • Earned an Honorable IT Tutor Certificate, awarded to top 1% of tutors in the company, for raising student grades from C or below to A in 96% of tutoring sessions.

Skills

  • Python: Pandas, Matplotlib, Seaborn, Scikit-learn, Tensorflow, Keras, LightGBM, PyTorch, Regex, API
  • Database Management: SQL, Query Optimization, ETL Process, Datamart, SAP HANA, Snowflake, Smartsheet
  • Machine Learning: Supervised/Unsupervised Learning, Ensemble Method, Evaluation, Pipeline Automation
  • Deep Learning: Python Frameworks, CNN (Image Processing), RNN (Time-Series), Transfer Learning
  • Mathematics / Statistics: Linear Algebra, Probability, Hypothesis Testing, Regression Analysis, A/B Testing
  • Business Intelligence: Excel, Power BI, Tableau, Smartsheet

Projects

Defect Prediction Model for Laser Product Quality Optimization, MKS Instruments

  • Developed a LightGBM-based ensemble model to predict final-stage laser product defects using over 500K inspection records, achieving 85% recall and minimizing late-stage failures.
  • Automated SMOTE-based resampling and feature generation pipelines in Python, reducing preprocessing time by 40% and ensuring consistent model performance over multiple retraining cycles.
  • Integrated predictions into Power BI to visualize risk trends across product lines and inspection stages, enabling proactive quality control and reducing unexpected scrap events by 20%.

Root Cause Analysis Dashboard, MKS Instruments

  • Utilized Power BI to pinpoint issues in manufacturing or design by analyzing trends in OBQ, AFR, and WIRR.
  • Reduced warranty incidents by 23% in 6 months through quality benchmarking and servicing.
  • Allowed data-driven decision-making by aligning OBQ insights with customer reviews to prioritize areas of improvements.

Samsung Stock Forecasting

  • Featured the implementation of LSTM and GRU architectures in deep learning modeling using the TensorFlow framework, achieving an R-squared value of 0.95 with the GRU model.
  • The model provides actionable insights for stock investors, aiding in optimizing their investment plans.
  • Forecasted Samsung stock prices for the following 10 days, advising against investment due to predicted price decline.

California Housing Cost Modeling

  • Performed exploratory data analysis and random forest regression modeling to predict house prices.
  • The project demonstrates proficiency in regression algorithms, data preprocessing, model evaluation, and hyperparameter tuning.
  • The final model explains 80% of the variance in house prices and the model successfully predicted the price of the target house.

COVID-19 Data Exploration

  • Employed PostgreSQL database and advanced SQL queries to perform multivariate analysis.
  • Explored most infectious countries along with their corresponding death rates.
  • Calculated the global correlation coefficient of -0.751 between GDP and infection rate, highlighting a strong negative association and emphasizing the influence of GDP per capita on the spread of the pandemic.

Bike Ride Moving Average Dashboard

  • London bike rides dataset was used to create moving average visualization with three customizable parameters.
  • Implemented a heatmap with two bar charts in the tooltip, displaying ride length and weather distribution.

Certificates

  • Google Data Analytics Certificate
  • DataCamp SQL Certificate
  • DataCamp Python Certificate
  • IBM Data Science Certificate

Languages

  • Korean: Native/Bilingual Proficiency
  • English: Native/Bilingual Proficiency