Jaewon Shim
Portfolio

Data Science major with Business and Industrial Analytics emphasis @ University of California, Berkeley

Jaewon Shim

Education

University of California Berkeley
Data Science B.A. | GPA: 3.9 / 4.0
Dec 2025 | Berkeley, CA

Work Experiences

Data Scientist Intern, MKS Instruments

  • Led the project to migrate dashboards from Tableau to Power BI, including optimization in the data ETL process.
  • Utilized product quality data to develop a predictive analytics model and created a dashboard using Power BI to effectively present products to stakeholders.
  • Automated tasks, including report generation using Python and Excel VBA, increasing operational efficiency by 30%.
  • Conducted exploratory data analysis to draw insights and identify key trends, providing data-driven decisions.

Python and Mathematics Tutor, Tublet

  • Facilitated online programming and statistics tutoring sessions, offering guidance to over 100 students.
  • Earned an Honorable IT Tutor Certificate, awarded to top 1% of tutors in the company, for raising student grades from C or below to A in 96% of tutoring sessions.

Skills

  • Python: Pandas, Matplotlib, Seaborn, Scikit-learn, TensorFlow, Keras
  • Mathematics/Statistics: Linear Algebra, Applied Calculus, Statistical Modeling, A/B Test
  • SQL: PostgreSQL, MySQL, Microsoft SQL Server, Advanced Queries
  • Others: SAP, Smartsheet, Excel VBA, Snowflake, Power Automate
  • Business Intelligence: Microsoft Power BI, Tableau
  • Machine Learning: Deep Learning, Regression, Classification, Clustering, Ensemble Methods

Projects

Diabetes Classification [Python / Machine Learning]

  • Performed a statistical analysis on diabetes patients dataset and developed a classification model to predict whether a person is diabetic.
  • Identified BMI and age as key predictors of diabetes risk through two sample t-test.
  • Implemented six classifier machine learning algorithms, with the best-performing Extra Trees model correctly classifying 84.5% of all samples.

Deep Learning for Samsung Stock Forecasting [Python / Deep Learning]

  • Featured the implementation of LSTM and GRU architectures in deep learning modeling using the TensorFlow framework, achieving an R-squared value of 0.95 with the GRU model.
  • Provided actionable insights for stock investors, aiding in optimizing their investment plans.
  • Forecasted Samsung stock prices for the following 10 days, advising against investment due to predicted price decline.

Socioeconomic Factors Impacting COVID-19 [SQL]

  • Employed PostgreSQL database and advanced SQL queries to perform multivariate analysis.
  • Explored most infectious countries along with their corresponding death rates.
  • Calculated the global correlation coefficient of -0.751 between GDP and infection rate, highlighting a strong negative association and emphasizing the influence of GDP per capita on the spread of the pandemic.

Root Cause Analysis Dashboard [Power BI], MKS Instruments

  • Pinpointed issues in manufacturing or design by analyzing patterns in OBQ, AFR, and WIRR.
  • Reduced warranty incidents by 23% in 6 months through quality benchmarking and servicing.
  • Allowed data-driven decision-making by aligning OBQ insights with customer reviews to prioritize areas of improvement.

London Bike Rides Dashboard [Tableau]

  • Created moving average visualization with three customizable parameters using London bike rides dataset.
  • Implemented a heatmap with two bar charts in the tooltip, displaying ride length and weather distribution.

Regression Modeling of California Housing Prices [Python / Machine Learning]

  • Performed exploratory data analysis and random forest regression modeling to predict house prices.
  • Demonstrated proficiency in regression algorithms, data preprocessing, model evaluation, and hyperparameter tuning.
  • Explained 80% of the variance in house prices, successfully predicting the price of the target house.

Certificates

  • Google Data Analytics Certificate
  • DataCamp SQL Certificate
  • DataCamp Python Certificate
  • IBM Data Science Certificate

Languages

  • Korean: Native/Bilingual Proficiency
  • English: Native/Bilingual Proficiency