Research items

You will find an up-to-date list of publications on :

And below some software-projects related with these items.

Scott Algorithm

More details on the project page.

A canonization algorithm working on fully labeled graphs (vertices and edges), provides for any graph a tree representant of its isomorphism class, that we can derive into :

a canonical trace (string)
a canonical adjacency matrix

Applications :

graph indexing and retrieval
graph fragmentation and embedding
classification/regression for graph data in DNN context

A Python implementation is provided here under MIT license.

Doctorate in Computer Science (PhD) - CIFRE Thesis (2016-2019)

Thesis funded by an industrial group (CIFRE convention), Industrially directed by See-d and academically framed by IRISA (Expression team) and LMBA.

Topic: Machine learning applied to structure-activity relationships (QSAR). Development of Feature-Learning algorithms for topological data representation, with a focus on graphs. Exploration of correlations between topological and macroscopic models.

Abstract : In the field of chemistry, it is interesting to be able to estimate the physicochemical properties of molecules, especially for industrial applications. These are difficult to estimate by physical simulations, as their implementation often present prohibitive time complexity. However, the emergence of data (public or private) opens new perspectives for the treatment of these problems by statistical methods and machine learning. The main difficulty lies in the characterization of molecules: these are more like a network of atoms (in other words a colored graph) than a vector. Unfortunately, statistical modeling methods usually deal with observations encoded as such, hence the need for specific methods able to deal with graphs-encoded observations, called structure-activity relationships. The aim of this thesis is to take advantage of public corpora to learn the best possible representations of these structures, and to transfer this global knowledge to smaller datasets. We adapted methods used in automatic processing of natural languages to achieve this goal. To implement them, more theoretical work was needed, especially on the graph isomorphism problem. The results obtained on classification / regression tasks are at least competitive with the state of the art, and even sometimes better, in particular on restricted data sets, attesting some opportunities for transfer learning in this field.

Available on TEL.

Keywords: Graph theory, Algorithmic Complexity, QSAR modeling, Statistical modeling, Data analysis, Data science.

Research items

Research items

Scott Algorithm

Doctorate in Computer Science (PhD) - CIFRE Thesis (2016-2019)

Trending Tags