AYOUB BOUZ
Data Scientist/Engineer
- +212 642 72 59 16
- contact@ayoubbouz.me
- Casablanca, Morocco
- AyouBouz
- ayoub-bouz
- @BouzAyoub
As a Full Stack Data Scientist with 3+ years of experience, I excel at transforming business ideas into impactful data projects using advanced analysis, predictive modeling, and big data infrastructure. Proficient in Python, SQL and machine learning, I solve complex business challenges and drive actionable insights.
Experiences
Data Scientist/Engineer
- Environmental risks: Flood, Tornadoes, Hurricanes, Storms, Earthquakes, Wildfires, Drought, Volcano Activity, Air pollution, Nuclear radiation.
- Societal risks: Socioeconomic risk, Health infrastructure, Crime. Tasks:
- Process and Analyse Geospatial Data.
- Selecting features, building and optimizing classifiers using machine learning techniques.
- Computing and deploying scientific models (Crime and other risks) on big data infrastructure involving clusters of virtual servers.
- Data mining using state-of-the-art methods.
- Extending company’s data with third party sources of information when needed.
- Enhancing data collection procedures to include information that is relevant for building analytic systems (US Census Bureau, ACS, NIH, USGS, CODE...).
- Processing, cleansing, and verifying the integrity of data used for analysis.
- Doing ad-hoc analysis and presenting results in a clear manner.
- Creating automated anomaly detection systems and constant tracking of its performance.
- Builds vector tilesets map from large collections of GeoJSON files (tippecanoe, OpenLayers, GDAL)
Tools: Python, JS, Numpy, Pandas, GeoPandas, QGIS, PyQGIS, GDAL, Scikit-Learn,Tensorflow, LightGBM, PySpark, AWS EC2, AWS S3, AWS EMR, AWS DynamoDB, PostgreSQL, PostGIS
Data Scientist, Intern
-
Main Project : Recommendation system for 2M Moroccan TV Channel.
- Creation of a data pipeline for 2M Moroccan TV Channel databases (ETL with Python, Mongodb).
- Design a model with a Collaborative filtering Approach (Python, Scikit-Learn).
- Serve the result of prediction with REST API (Flask).
- Create dashboards and data pipelines for other projects (Python, PowerBI).
Data Engineer, Intern
-
Working for Data Factory & Labs:
- Create a relational database from multiple sources using Web Scraping and PDF parsing (Python,PostgresSQL, BeautifulSoup).
- Predict missing emails from Salesforce France and verify their existence using Python / SMTP.
Python Developer, Intern
-
Development of several Python Robots / Scripts for Data Labs which provide these
functions:
- Aspirate several websites that contain information about different companies (Selenium).
- Download and extract data from thousands of XML files (Python).
- Distribute these treatments using a cluster-oriented architecture (PySpark).
- Store the results in a database (PostgresSQL).
Education
-
Engineering Degree: Information Systems and Big DataNational School of Applied Sciences Berrechid2018 - 2021
-
Preparatory ClassNational School of Applied Sciences Tanger2016 - 2018
-
High School Diploma, Mathematical Science BHigh School Ibnou Mandour, Casablanca2015 - 2016
Projects
Power Consumption in Tetouan:
Project aims to predict power consumption of 3 Zones in the city of Tetouan, Morocco, using machine learning techniques. The project involves data preprocessing, feature engineering, model training, and evaluation. Additionally, MLflow is used for experiment tracking, and the final model is deployed on AWS for scalable and accessible predictions. Tools: Numpy, Pandas, Scikit-learn, Flask, Docker, MLflow, Github, AWS EC2, AWS ECRCATCHIO Police Analysis Platform:
The application is used to set up a forecasting and police analysis tool. The system has two parts:- Operational: a management system for daily management tasks.
- Decisional: an analytical system for decision making and spatial coverage using Deep Learning.
TOPLACES:
Web App to Share Your Favorite Places. Tools: NodeJs, Express, ReactJs, Mongodb (Atlas)Loan Predicting:
Build a model that can predict whether or not a borrower will repay their loan. Tools: Tensorflow, KerasFraud Detection in Bank:
Create a fraud detection system using graph database and RandomForest. Tools: Neo4j, Scikit-LearnUSA House Prices:
Predict house prices using Linear Regression. Tools: Pandas, Scikit-learnAds Clicks:
Prediction of whether the user will click on the ad or not using Logistic Regression. Tools: Pandas, Scikit-learnBrexit sentiment analysis for social media:
Conduct sentiment analysis using Twitter API about Brexit and present the findings using statistical descriptive as graphs and wordcloud. Tools: Pandas, Scikit-learn, NLTK, SeabornData analysis realtime data of stock market:
Create a data pipeline for the client to stock realtime data. Tools: Kafka, EC2, S3, Glue, AthenaAutoPost Instagram:
Create a web app for posting on instagram (image + quote+ hashtags) using keywords. Tools: Python, OpenAI, FlaskChatBot With PDFs:
Create a chatbot to chat with your PDFs. Tools: Python, Langchain, HuggingFace, Streamlit
Tools: NodeJs, ExpressJs, Mysql, ChartJs, LeafletJs, Talend, Tensorflow & Keras