Important Links 🤝

Email: mpeerboccus@ufl.edu

Education 🎓

Double Bachelors of Science in Data Science & Biology
University of Florida
Expected Graduation: 2025

Programming & Data Science:
Data Structures & Algorithms (C++), Intro to Programming (Object-Oriented Python), Computational Math in Python (NumPy, matplotlib, pandas, seaborn), Linear Algebra for Data Science (scikit-learn, Keras, PyTorch), Databases (SQL, MariaDB, HeidiSQL)
Mathematics & Statistics:
Calculus I-III, Probability Theory, Linear Regression (R), Statistics Theory, Statistical Learning (R: Regression, Trees, Lasso/Ridge, PCA, CNN’s, KNN, Support Vector Machines)
Other Topics:
Principles of Microeconomics, Principles of Macroeconomics, Geographic Systems (ArcGIS Pro), Microsoft Office

Github Link Skills: React, TypeScript, Qwen Vision Language Model, KNN, CNN, BeautifulSoup, Selenium | In Progress

Developing a hybrid KNN-CNN framework to predict final auction prices for damaged vehicles based on image and metadata analysis
Utilizing Qwen Vision Language Model (VLM) to extract key features from vehicle photos for prediction
Building a React-based frontend for users to upload photos and select vehicle details
Experimenting with web scraping tools like BeautifulSoup and Selenium to extract historical auction data from Copart
Aiming to integrate real-time prediction capabilities and enhance data mining techniques for improved accuracy

Skills: Retrieval-Augmented Generation (RAG), Docker, LangChain, Pydantic, Gradio, LangServe | January 2025

Completed a workshop on designing and deploying LLM systems with retrieval-augmented generation
Built modular RAG agents capable of answering dataset-specific queries without fine-tuning
Gained hands-on experience with embedding techniques for semantic similarity and vector store construction
Designed chatbot pipelines utilizing tools like Docker, LangChain, and Gradio to manage dialog states efficiently
Learned advanced techniques for document retrieval and LLM integration for real-world applications

Skills: Facebook Prophet, scikit-learn (XGBoost, LASSO, Random Forest), Hyperparameter Tuning | December 2024

Extracted 10 years of monthly sales data for a popular sedan using CV techniques and aggregated it with economic data from multiple sources
Implemented Facebook’s Prophet time series forecasting model with one covariate
Compared Prophet results with XGBoost, Random Forest, and LASSO models, all using a full lagged covariate set
Performed iterative hyperparameter tuning for Prophet and LASSO models, reducing out-of-sample MAPE (average % deviation from true value) for Prophet from 87% to 16%
Achieved MAPE ranges from 22% (LASSO) to 11% (Random Forest), demonstrating model effectiveness for forecasting 2024 data using 2013-2023 data
Utilized the parsimonious LASSO model to identify the most important covariates impacting sales

November 2024

Skills: scikit-learn, Regression Analysis | May 2024

Analyzed a dataset with health risk factors and COVID-19 infection status for 1 million patients
Developed models (random forest, decision tree, logistic regression) using scikit-learn to:
- Predict mortality given COVID infection status
- Predict infection status given patient demographics
Visualized cross-validation results using confusion matrices and ROC curves (matplotlib, seaborn)

Skills: C++, CSV Parsing, Big O Notation | December 2023

Parsed a CSV from the FDA with nutrient/vitamin values of over 5000 foods into a data structure: vector of pairs (food name, nutrient/vitamin value)
Compared efficiency of quick-sort and merge-sort algorithms by tracking computation times
Verified sorting algorithm performance using Big O Notation