Nikolaos Kakonas

Data Scientist & Machine Learning Engineer

M.S. Analytics @ Georgia Tech | Technology Consultant @ EY
Atlanta, GA | kakonas.nikos@gmail.com

View My Work

About

I’m a data scientist and machine learning engineer who enjoys building AI systems that solve real problems and actually get used. I’ve worked at Ernst & Young and studied analytics at Georgia Tech, focusing on applied machine learning.


I design end-to-end ML solutions, from LLM-powered tools and regression models to large-scale data pipelines, that automate decisions, improve model performance, and reduce operational costs. My work has helped teams move faster, rely less on manual analysis, and make data-driven decisions at scale.

Work Experience

Graduate Teaching Assistant

Georgia Institute of Technology

Jan 2026 – Present
  • Supporting graduate-level instruction through grading, office hours, and student mentoring

Technology Consultant – Data Science

Ernst & Young (EY)

Jun 2023 – Jul 2025
  • Built a multi-agent Python LLM system using AutoGen and RAG that answered natural-language questions over 50+ Excel files with 100K+ rows, automatically computing analytics metrics and generating visualizations, deployed with FastAPI and Kubernetes, reducing analytics workload by 60%, saving $150K annually, and enabling real-time decision support
  • Created a Python AI tool using LLMs that converted 200+ business requirement documents into production-ready SQL queries across 30+ database tables, saving 15 developer hours weekly and $75K annually while reducing query errors by 95%
  • Deployed an AI-driven robot agent using Python-Java integration and LLMs that autonomously navigated venues to guide 500+ attendees across 3 EY corporate events, answering questions about agendas and directions, reducing staff workload by 30%
  • Benchmarked Google's Meridian model against internal Bayesian Marketing Mix Models, identifying location-aware geomodeling as a key driver, and implemented insights to improve model accuracy by 15% and optimize $2M+ in marketing spend
  • Delivered end-to-end SQL data mart automation consolidating 20+ databases and 100+ Excel files via SSIS, SQL procedures, and scheduled jobs into a unified analytics platform serving 50+ business users, cutting manual reporting by 80%, eliminating 30+ recurring ad-hoc reports weekly, and saving $200K annually in analyst time
  • Presented data mart architecture and ETL pipelines to stakeholders, securing $500K in project approval

Technology Consultant – Data Science Intern

Ernst & Young (EY)

Mar 2023 – May 2023
  • Increased fraud detection accuracy by 10% by applying Graph Neural Networks to model relational transaction patterns, uncovering fraud signals missed by traditional methods, showcased to 20+ prospective clients and secured 3 new banking contracts worth $500K+

Projects

U.S. City Neighborhood Archetypes

U.S. City Neighborhood Archetypes

  • Developed multi-layer spatiotemporal models integrating 50+ years of census, environmental, and housing data (10+ GB, 500K+ records) using K-Means and GMM clustering to identify 8 distinct neighborhood evolution patterns across 100+ U.S. cities
  • Built interactive Mapbox/Deck.gl dashboard forecasting neighborhood changes through 2070, downloaded 2,000+ times by urban planners and researchers, and cited in 2 academic papers analyzing gentrification trends
Customer Churn Prediction

Customer Churn Prediction

  • Built supervised models (XGBoost, CatBoost, Logistic Regression) on 200K+ telecom customer records achieving 0.89 F1-score to predict churn, and applied K-Means and DBSCAN clustering to segment customers into 6 behavioral groups
  • Implemented SHAP interpretability to identify top 10 churn drivers (contract type, support calls, usage patterns), enabling targeted retention campaigns projected to reduce churn by 15% and save $800K+ annually in customer acquisition costs
Economic Connectedness Analysis

Economic Connectedness Analysis

  • Applied machine learning and network analysis on 500K+ mobility records and income data across 100+ U.S. regions to quantify social capital disparities, revealing that low-income neighborhoods had 40% fewer cross-class connections than affluent areas
  • Designed 5 interactive Python and Plotly visualizations illustrating economic connectedness patterns, presented to 3 policy research organizations to inform community investment strategies targeting $10M+ in social mobility programs
Fine-Tuning Mistral for Prompt Evaluation

Fine-Tuning Mistral for Prompt Evaluation

  • Fine-tuned Mistral 7B LLM on 10K+ prompt-response pairs using RLHF-style evaluation pipelines to automate prompt quality scoring, achieving 85% agreement with human evaluators and reducing manual review time by 70%
  • Deployed automated evaluation system processing 1,000+ AI responses daily, enabling rapid iteration on prompt engineering and improving average response quality scores by 25% across production applications
Big Data Analytics for Chicago Crime Patterns

Big Data Analytics for Chicago Crime Patterns

  • Processed 7M+ crime records spanning 20 years using PySpark and SQL, identifying that 60% of violent crimes occurred in just 15% of neighborhoods and peak crime hours clustered between 8PM-2AM on weekends
  • Built predictive models achieving 78% accuracy for high-risk areas and deployed Tableau dashboards used by Chicago PD analysts to optimize patrol allocation, contributing to 12% reduction in response times in targeted zones
Graph Neural Networks for Recommender Systems

Graph Neural Networks for Recommender Systems

  • Compared traditional recommender baselines with state-of-the-art GNN models (ConsisRec, DMGCF, KGNN-LS) on large-scale datasets
  • Improved recommendation accuracy by 7% and reduced error by 4% using graph-based user–item modeling techniques
Open-Source Contribution

Open-Source Contribution – GitHub

  • Collaborated with international developer to enhance Python command-line correction tool with 5K+ GitHub stars, implementing cross-platform compatibility fixes and automated testing suite covering 95% code coverage
  • Reduced bug reports by 40% and improved tool reliability across Windows, Mac, and Linux environments, benefiting 10K+ active users and earning contributor recognition in project documentation
BBC News Network Analysis

BBC News Network Analysis

  • Collected and analyzed 50K+ Twitter interactions using Twitter API to map network of 5,000+ users engaging with BBC News content, identifying 8 distinct information clusters and measuring network centrality metrics
  • Uncovered coordinated information spread patterns related to Ethiopian civil conflict with 300+ highly connected accounts driving 60% of engagement, presenting findings that informed BBC's understanding of audience segmentation and misinformation propagation

Skills

Programming Languages

Java Python R JavaScript SQL HTML CSS MATLAB PySpark Excel

Libraries & Frameworks

Pandas Numpy Matplotlib Scikit-Learn PyTorch TensorFlow Keras Autogen FastAPI Kubernetes

Machine Learning & AI

Machine Learning Forecasting Customer Segmentation LLMs GenAI RAG AI Agents Anomaly Detection Deep Learning Neural Networks CNNs RNNs Transformers Representation Learning ETL

Tools & Cloud Platforms

AWS Azure Databricks Google Cloud Docker Linux Git GitHub Power BI Tableau LaTeX

Education

Master of Science in Analytics

Georgia Institute of Technology | GPA: 3.8

Aug 2025 – Dec 2026

Coursework: Machine Learning, Deep Learning, Conversational AI, Data and Visual Analytics

Bachelor of Science in Management Science and Technology

Athens University of Economics and Business | GPA: 8.51/10

Oct 2019 – Jun 2023

Major: Software Engineering and Data Science
Minor: Operation Research and Business Analytics
Coursework: Applied Machine Learning, Business Intelligence and Big Data Analytics, Database Systems, Social Network Analysis

Get in Touch