Random Forest-Based Recommendation System - Academic Staff and Administrative Staff Case

Random Forest-Based Recommedation System - Academic Staff Case

Status: 🔒 This project is complete but currently private.
A public demo is not available at this time.

🎯 Business Context & Goals

Support universities in making more objective, data-driven recommendations for academic and administrative staff.
Explore how historical performance and competency data can be translated into recommendation insights.
Build a robust, interpretable ML pipeline that can be adapted to similar HR/education contexts.

🧩 My Role & Responsibilities

Led the entire data science lifecycle from problem framing to evaluation and reporting.
Designed and implemented the preprocessing, modeling, and validation strategy.
Communicated findings via visualizations and structured notebooks for non-technical stakeholders.

📊 Impact & Outcomes

Produced a recommendation model with strong accuracy and balanced precision/recall.
Highlighted which features matter most, helping stakeholders understand drivers of recommendations.
Delivered a reusable framework that can be extended to other recommendation or ranking problems.

This project is a comprehensive machine learning solution focused on building and evaluating a Random Forest-based classification system for academic staff (dosen) and administrative staff (tendik) recommendations. The solution covers the full data science pipeline, from data preprocessing and feature engineering to model selection, hyperparameter optimization, and interpretability analysis.

Key Features & Technologies

Data Science & Machine Learning:
- Utilizes Python with libraries such as pandas, scikit-learn, and matplotlib.
- Implements Random Forest classifiers for multi-class recommendation tasks.
- Employs advanced techniques including K-Fold Cross Validation and Grid Search for hyperparameter tuning.
- Conducts ablation studies to analyze the impact of individual features and model parameters on classification performance.
- Visualizes decision trees for model interpretability using Graphviz.
Data Handling:
- Works with real-world datasets (dataset_dosen.csv and dataset_tendik.csv) containing dozens of features related to cognitive, social, and professional competencies.
- Applies robust preprocessing: missing value handling, categorical-to-numeric conversion, feature scaling, and data splitting.
Evaluation & Results:
- Achieves high accuracy (up to 88%) and strong precision/recall/F1-scores, demonstrating reliable classification performance.
- Provides detailed evaluation reports and confusion matrices for model assessment.
Software Engineering:
- Modular Jupyter Notebooks for experimentation, reproducibility, and reporting.
- Well-structured code for easy adaptation to similar classification problems or datasets.

Tech Stack

Programming Language: Python 3.x
Data Science Libraries: pandas, numpy, scikit-learn, imbalanced-learn
Visualization: matplotlib, seaborn, Graphviz
Development Environment: Jupyter Notebook
Data Formats: CSV, Excel (XLSX)
Version Control: Git

Project Outcomes

Developed a reliable, interpretable recommendation system for academic and administrative staff using Random Forests.
Identified key features and optimal model parameters through systematic ablation and grid search studies.
Delivered actionable insights and visualizations to support decision-making in academic environments.

Role: Machine Learning Engineer / Data Scientist / Python Developer

Skills Demonstrated:

End-to-end machine learning workflow
Data preprocessing and feature engineering
Model selection and hyperparameter tuning
Experimentation and ablation studies
Data visualization and interpretability
Reproducible research with Jupyter Notebooks