Summary

Computer science PhD based in Morlaix, trained as a data scientist, I am currently dedicated to data engineering in the Networks and Telecommunications sector.

My career combines engineering skills, a passion for research and innovation, with a particular expertise in data science and artificial intelligence.

Through academic and industrial projects, I have acquired an overview of data projects, from proof of concept to industrialization and scaling.

Experiences

R&D Data Engineer

2022 - Present
Ekinops, Lannion
  • Analysis of scalability challenges on the IP traffic supervision system (Netflow) from a SD-WAN product
    • Leveraging the existing, in-production OLTP system (custom aggregation, time-partitioning)
    • Deep study of alternatives OLAP systems
  • Implementation of testing/simulation tools, generation of massive realistic network data
    • Upstreamed contributions to scapy Netflow module
  • Architecture and implementation of the alerts/monitoring database of a network management solution (NMS)
  • DBMS optimization (Index tailoring, request profiling)
  • Automation of embedded database integration pipelines for routers (traffic classification)

Python asyncio Galera Cassandra

Go DuckDB PostgreSQL MariaDB pola.rs scapy Gitlab CI/CD

Data Scientist

2019 - 2022
See-d, Vannes
  • Regression, Classification, Forecasting on various industrial topics
    • agronomy
    • logistics
  • Natural Language Processing (NLP)
    • topic segmentation
    • sentiment analysis
  • Image analysis (Computer Vision, anomaly detection, CNN)
    • production monitoring
    • quality control
  • Deployment and industrialization of Data Science projects (MLOps)
  • Report writing and project presentation to stakeholders
  • Audit of business operations, coordination with IT/Datalabs, writing of specifications
  • Writing and conducting pedagogical trainings
    • “Lean 6-sigma black belt” level
    • Qualiopi certified on first session
  • Mentored:
    • 4 internships (BsC and MSc level)
    • 3 phd candidates in a summer school program

Python PySpark PyTorch gensim Keras OpenCV

R PostgreSQL SnowFlake Docker Anaconda

Data Scientist (CIFRE Phd candidate)

2016 - 2019
Avril + See-d + IRISA, Vannes & Rennes (Remote)
  • Machine learning applied to structure-activity relationships (QSAR)
  • Development of Feature-Learning algorithms for for molecular subgraphs
  • Exploration of correlations between topological and macroscopic models
  • Contributed to graph isomorphism problem, library published in MIT

Worked on side subjects as Junior Data-Scientist.

Python PySpark PyTorch gensim R Docker

Education

PhD in Computer Science (CIFRE)

2016 - 2019
Université de Bretagne-Sud

Conducted with success and large autonomy a 3 years data science project involving:

  • academics: IRISA (Expression team) & LMBA
  • industrials: Avril group & See-d (small scale research lab)

Manuscript available on TEL

MSc in Computer Science

2014 - 2016
Université de Bretagne-Sud

Projects

Scott - My thesis contribution to graph isomorphism problem.
  • canonization algorithm working on fully labeled graphs (vertices and edges)
  • provides for any graph a tree representant of its isomorphism class
  • well suited for low-connected graphs (e.g. molecules)
RExpress - Low latency, middleware ReST wrapper of a pool of R interpreters.
  • Has been used to provide real-time access to a pricing model (retail) running in R
  • Key point is to pipe the incoming HTTP request to a pool of interpreters kept opened
  • < 20ms of additional latency.
Cardigraph - Few theoretical calculus of graph classes cardinalities.

OSS Contributions

Some open-sources projects I enjoyed contributing for:

Scapy - Python-based interactive packet manipulation program & library.

Publications

  • Scott: A method for representing graphs as rooted trees for graph canonization
  • Nicolas Bloyet, Pierre-François Marteau, Emmanuel Frénod
    Complex Networks and Their Applications, 2019
  • Étude lexicographique de sous-graphes pour l’élaboration de modèles structures à activité–cas de la chimie organique
  • Nicolas Bloyet, Pierre-François Marteau, Emmanuel Frénod
    Extraction et Gestion des Connaissances, 2019

    Skills

    Technical Stack

    Python asyncio pola.rs pandas scikit HuggingFace Spark PyTorch

    R SQL Go Node.js

    bash Docker Gitlab CI/CD MLFlow AirFlow Prometheus Grafana

    DBMS

    • OLTP PostgreSQL MariaDB Galera

    • OLAP ClickHouse polars DuckDB SnowFlake

    • NoSQL Cassandra MongoDB Redis

    Data Science

    • Regression, Classification, Forecasting
    • Feature Selection

    Data Engineering

    • DBMS optimization (Indexing/Sharding, Profiling, High-Availability)
    • ETL (AirFlow, Kafka, Spark)

    Machine/Deep Learning

    • Transformers
    • Auto-Encoders
    • Generative Adversarial Networks
    • Convolutional Neural Networks
    • Q-Learning

    Algorithmics

    • Graph Theory
    • Algorithms Complexity
    • Distributed Computing
    • Asynchronous Programming

    Natural Language Processing (NLP)

    • Topic segmentation
    • Document model
    • Semantic vectorization models (word2Vec, GloVe)

    Computer Vision (CV)

    • Object Detection/Segmentation
    • Pattern matching
    • Feature Extraction

    Communication

    • Reports writing & presentations
    • Trainings writing & animation