Important Links 🤝
Email: mpeerboccus@ufl.edu
Github
LinkedIn
HuggingFace
Kaggle
Education 🎓
Double Bachelors of Science in Data Science & Biology
University of Florida
Expected Graduation: 2025
- Honors College
- University Research Scholars Program (top 5% of the 2025 graduating class)
Relevant Coursework & Skills 📚
- Programming & Data Science:
Data Structures & Algorithms (C++), Intro to Programming (Object-Oriented Python), Computational Math in Python (NumPy, matplotlib, pandas, seaborn), Linear Algebra for Data Science (scikit-learn, Keras, PyTorch), Databases (SQL, MariaDB, HeidiSQL)
- Mathematics & Statistics:
Calculus I-III, Probability Theory, Linear Regression (R), Statistics Theory, Statistical Learning (R: Regression, Trees, Lasso/Ridge, PCA, CNN’s, KNN, Support Vector Machines)
- Other Topics:
Principles of Microeconomics, Principles of Macroeconomics, Geographic Systems (ArcGIS Pro), Microsoft Office
Experience 🔨
Project: Biddify 🚘
</br>
Github Link
Skills: React, TypeScript, Qwen Vision Language Model, KNN, CNN, BeautifulSoup, Selenium | In Progress
- Developing a hybrid KNN-CNN framework to predict final auction prices for damaged vehicles based on image and metadata analysis
- Utilizing Qwen Vision Language Model (VLM) to extract key features from vehicle photos for prediction
- Building a React-based frontend for users to upload photos and select vehicle details
- Experimenting with web scraping tools like BeautifulSoup and Selenium to extract historical auction data from Copart
- Aiming to integrate real-time prediction capabilities and enhance data mining techniques for improved accuracy
Certification: NVIDIA - Building RAG Agents with LLMs 🧠
Skills: Retrieval-Augmented Generation (RAG), Docker, LangChain, Pydantic, Gradio, LangServe | January 2025
- Completed a workshop on designing and deploying LLM systems with retrieval-augmented generation
- Built modular RAG agents capable of answering dataset-specific queries without fine-tuning
- Gained hands-on experience with embedding techniques for semantic similarity and vector store construction
- Designed chatbot pipelines utilizing tools like Docker, LangChain, and Gradio to manage dialog states efficiently
- Learned advanced techniques for document retrieval and LLM integration for real-world applications
Project: Car Sales Forecasting 🚗
Github Link
Skills: Facebook Prophet, scikit-learn (XGBoost, LASSO, Random Forest), Hyperparameter Tuning | December 2024
- Extracted 10 years of monthly sales data for a popular sedan using CV techniques and aggregated it with economic data from multiple sources
- Implemented Facebook’s Prophet time series forecasting model with one covariate
- Compared Prophet results with XGBoost, Random Forest, and LASSO models, all using a full lagged covariate set
- Performed iterative hyperparameter tuning for Prophet and LASSO models, reducing out-of-sample MAPE (average % deviation from true value) for Prophet from 87% to 16%
- Achieved MAPE ranges from 22% (LASSO) to 11% (Random Forest), demonstrating model effectiveness for forecasting 2024 data using 2013-2023 data
- Utilized the parsimonious LASSO model to identify the most important covariates impacting sales
Certification: NVIDIA & UFIT: Fundamentals of Deep Learning 🧠
November 2024
- Completed a 4-week in-person course delivered by NVIDIA & UFIT
- Covered the theory of neural networks and applications of Deep Learning
- Completed 7 guided projects and a final assessment using PyTorch
Project: COVID Model 🦠
GitHub Link
Skills: scikit-learn, Regression Analysis | May 2024
- Analyzed a dataset with health risk factors and COVID-19 infection status for 1 million patients
- Developed models (random forest, decision tree, logistic regression) using scikit-learn to:
- Predict mortality given COVID infection status
- Predict infection status given patient demographics
- Visualized cross-validation results using confusion matrices and ROC curves (matplotlib, seaborn)
Project: Nutrient Sorter 🍏
GitHub Link
Skills: C++, CSV Parsing, Big O Notation | December 2023
- Parsed a CSV from the FDA with nutrient/vitamin values of over 5000 foods into a data structure: vector of pairs (food name, nutrient/vitamin value)
- Compared efficiency of quick-sort and merge-sort algorithms by tracking computation times
- Verified sorting algorithm performance using Big O Notation