FRENCH VERSION

AYOUB BOUZ

Data Scientist/Engineer

image

As a Full Stack Data Scientist with 3+ years of experience, I excel at transforming business ideas into impactful data projects using advanced analysis, predictive modeling, and big data infrastructure. Proficient in Python, SQL and machine learning, I solve complex business challenges and drive actionable insights.


Experiences

Data Scientist/Engineer

Augurisk | 09.2021 - 06.2024 | New York, United States (Remote)
    Augurisk : A platform that helps individuals and businesses assess the climate risks associated with their properties, so they can better prepare for the future.
    Environmental and Societal risks analysis and predictions:
  • Environmental risks: Flood, Tornadoes, Hurricanes, Storms, ​Earthquakes, Wildfires, Drought, Volcano Activity, Air pollution, Nuclear radiation.
  • Societal risks: Socioeconomic risk, Health infrastructure, Crime.
  • Tasks:
  • Process and Analyse Geospatial Data.
  • Selecting features, building and optimizing classifiers using machine learning techniques.
  • Computing and deploying scientific models (Crime and other risks) on big data infrastructure involving clusters of virtual servers.
  • Data mining using state-of-the-art methods.
  • Extending company’s data with third party sources of information when needed.
  • Enhancing data collection procedures to include information that is relevant for building analytic systems (US Census Bureau, ACS, NIH, USGS, CODE...).
  • Processing, cleansing, and verifying the integrity of data used for analysis.
  • Doing ad-hoc analysis and presenting results in a clear manner.
  • Creating automated anomaly detection systems and constant tracking of its performance.
  • Builds vector tilesets map from large collections of GeoJSON files (tippecanoe, OpenLayers, GDAL)

  • Tools: Python, JS, Numpy, Pandas, GeoPandas, QGIS, PyQGIS, GDAL, Scikit-Learn,Tensorflow, LightGBM, PySpark, AWS EC2, AWS S3, AWS EMR, AWS DynamoDB, PostgreSQL, PostGIS

Data Scientist, Intern

Mobiblanc | 02.2021 - 08.2021 | Casablanca, Morocco
    Main Project : Recommendation system for 2M Moroccan TV Channel.
  • Creation of a data pipeline for 2M Moroccan TV Channel databases (ETL with Python, Mongodb).
  • Design a model with a Collaborative filtering Approach (Python, Scikit-Learn).
  • Serve the result of prediction with REST API (Flask).
  • Create dashboards and data pipelines for other projects (Python, PowerBI).

Data Engineer, Intern

Leyton Morocco | 06.2020 - 08.2020 | Casablanca, Morocco
    Working for Data Factory & Labs:
  • Create a relational database from multiple sources using Web Scraping and PDF parsing (Python,PostgresSQL, BeautifulSoup).
  • Predict missing emails from Salesforce France and verify their existence using Python / SMTP.

Python Developer, Intern

Leyton Morocco | 07.2019 - 08.2019 | Casablanca, Morocco
    Development of several Python Robots / Scripts for Data Labs which provide these functions:
  • Aspirate several websites that contain information about different companies (Selenium).
  • Download and extract data from thousands of XML files (Python).
  • Distribute these treatments using a cluster-oriented architecture (PySpark).
  • Store the results in a database (PostgresSQL).

Education

  • Engineering Degree: Information Systems and Big Data
    National School of Applied Sciences Berrechid
    2018 - 2021
  • Preparatory Class
    National School of Applied Sciences Tanger
    2016 - 2018
  • High School Diploma, Mathematical Science B
    High School Ibnou Mandour, Casablanca
    2015 - 2016

Projects

  • Power Consumption in Tetouan:

    Project aims to predict power consumption of 3 Zones in the city of Tetouan, Morocco, using machine learning techniques. The project involves data preprocessing, feature engineering, model training, and evaluation. Additionally, MLflow is used for experiment tracking, and the final model is deployed on AWS for scalable and accessible predictions. Tools: Numpy, Pandas, Scikit-learn, Flask, Docker, MLflow, Github, AWS EC2, AWS ECR

  • CATCHIO Police Analysis Platform:

    The application is used to set up a forecasting and police analysis tool. The system has two parts:
  • Operational: a management system for daily management tasks.
  • Decisional: an analytical system for decision making and spatial coverage using Deep Learning.

  • Tools: NodeJs, ExpressJs, Mysql, ChartJs, LeafletJs, Talend, Tensorflow & Keras

  • TOPLACES:

    Web App to Share Your Favorite Places. Tools: NodeJs, Express, ReactJs, Mongodb (Atlas)
  • Loan Predicting:

    Build a model that can predict whether or not a borrower will repay their loan. Tools: Tensorflow, Keras
  • Fraud Detection in Bank:

    Create a fraud detection system using graph database and RandomForest. Tools: Neo4j, Scikit-Learn
  • USA House Prices:

    Predict house prices using Linear Regression. Tools: Pandas, Scikit-learn
  • Ads Clicks:

    Prediction of whether the user will click on the ad or not using Logistic Regression. Tools: Pandas, Scikit-learn
  • Brexit sentiment analysis for social media:

    Conduct sentiment analysis using Twitter API about Brexit and present the findings using statistical descriptive as graphs and wordcloud. Tools: Pandas, Scikit-learn, NLTK, Seaborn
  • Data analysis realtime data of stock market:

    Create a data pipeline for the client to stock realtime data. Tools: Kafka, EC2, S3, Glue, Athena
  • AutoPost Instagram:

    Create a web app for posting on instagram (image + quote+ hashtags) using keywords. Tools: Python, OpenAI, Flask
  • ChatBot With PDFs:

    Create a chatbot to chat with your PDFs. Tools: Python, Langchain, HuggingFace, Streamlit