Julie Bourbeillon

Computer Sciences Associate professor

Department of Applied Mathematics and Computer Science

Teaching Unit: Statistics and Computer Sciences
Research Unit: UMR IRHS (Institut de recherche en horticulture et semences)

Bio

Career

Training

Teaching Activities

I am responsible for all computer science courses at the Angers site. My interventions are listed below.

In addition, I regularly tutor students in different contexts (projects in the context of various modules, internships, apprenticeships, etc.) and I regularly sit on various juries from L1 to M2.

Licence

The covered topics regard Information and Communication Technologies, scientific computing and business computing (general IT knowledge, office automation, programming, information systems, databases, data processing in connection with biology, horticulture or landscape). These courses correspond to 4 modules in two different curriculae.

Licence courses - "Post-baccalaureate" curriculum

In the first year of the post-baccalaureate programme, the teaching is centred on the use of Information and Communication Technologies, both within the institution and in a professional perspective. The objectives are as follows:

  •     Basic knowledge of computers (hardware, operating system, networks),
  •     Setting up and configuring one's working environment
  •     Principles of use of common software (organisation of personal computer data, office automation, information search on the Internet),
  •     Publication on the Internet
  •     Ethical and legal context of the use of tools and in particular the Internet.
  •     Management of one's digital identity and protection of personal data.

In the first year of the post-baccalaureate programme, interventions in other modules allow the introduction of the first notions of scientific computing:

  •     Discovery of programming through electrical and electronic activities on Arduino in conjunction with the physics teacher
  •     First notions of data management within the framework of the Campus Biodiversity Analysis with the ecology teachers

In the second year of the post-baccalaureate curriculum, the Introduction to Programming course aims to provide students with tools that allow them to automate data processing tasks or numerically solve problems for which no analytical solution necessarily exists. Thus, the module focuses on the notion of problem solving, from a computer science point of view. The different steps are covered:

  • Introduction to the principles of problem analysis,
  • Basics of algorithmics as a method of problem solving,
  • Basics of a programming language, to express the algorithms in a form understandable by the computer,
  • Iterative error analysis: hypothesis on the origin of the malfunction, solutions, tests...

Application problems are provided by other disciplines: ecology, economics, physics, chemistry, etc.

In the third year of the post-baccalaureate course, the “Databases and information system” course addresses information systems and the place of databases in these systems when they are computerized. Content includes:

  • Computerization process for the company, possible infrastructures and architectures,
  • Essential principles of database design,
  • Basics allowing to go from a theoretical model to an implementation, associated with the set up of graphical interfaces intended for the end-users.

Licence courses - "Post-preparatory classes" curriculum

In the first year of the post-preparatory class course, the “Computer Science” module aims to provide the same skills as the three modules of the post-baccalaureate course. The objectives remain the same:

  • Basic computer skills (hardware, operating system, networks),
  • Principles of effective use of current tools (organization of personal computer data, office automation, information retrieval on the Internet),
  • Ethical and legal context of the use of tools and in particular the Internet,
  • Business IT: programming and databases.

Master

Teaching at master's level aims to professionalize students or provide more specialized skills related to their field of work.

I co-pilot an optional module named "Initiation to Bioinformatics" for M1 students in Horticulture. The objective is to introduce the methods and tools of bioinformatics from the point of view of the user, i.e. the biologist, by:
     • Presenting the main principles of bioinformatics and the use of these approaches in plant industries (R&D, breeding, etc.).
     • Presenting the methods of obtaining and processing different types of data (genomics, transcriptomics, phenotypics, etc.) in biology.
     • Providing a first overview of methods to exploit these data according to standard R protocols, image analysis software or bioinformatics tools available free of charge.

I am co-responsible for a M2 "Experimentation and exploitation of massive data" module, intended for engineering students in horticulture in the "Plant science and engineering" specialty, "Seeds and plants" option. This module is shared with the Master " Plant's biology". The objective is to make students aware of the diversity and complexity of phenotyping and genotyping data in light of their analysis…. This involves showing them the underlying biological problem and the way in which the data are produced through the design of experimental devices and the visit of platforms so that they can implement the appropriate processing tools (analysis, modelization).

Research Summary

Context

A notable trend in research, particularly in biology, is the increase in the scale at which studies are carried out. This results from the emergence of high-throughput experimental techniques and an increase in the volume of publicly available data. An additional difficulty lies in the heterogeneity both of the sources of information (databases, multiple and remote, with heterogeneous formats and interfaces, various file formats, etc.) and of the data (multiple scales: from the population to the molecule ; multiple types: quantitative or qualitative; multiple modes: text or image; multiple levels of structuring: database fields, markup languages, free text). This makes the manipulation of data problematic, and requires the implementation of specific, increasingly computerized approaches, which are adaptations to the biological domain of “Big Data” techniques.

Themes

In this context, a set of difficulties arise throughout the life cycle of scientific data.

My research activities aim to support biologists throughout this process, by developing methods and tools for non-computer scientists. This involves implementing mechanisms from disciplines such as image processing, data science, knowledge engineering, information visualization, human-machine interactions, etc. This objective leads me to take an interest in various themes:
    • Data creation: support for the conduct of experiments, thus facilitating the acquisition of data, in a context of automation and robotization, of high-throughput approaches (in particular high-throughput phenotyping with imaging methods) ,
    • Data pre-processing: preparation of data sets (grouping, filtering, organizing, presenting, etc.),
    • Data analysis: use of data in response to a biological question,
    • Data storage: management of scientific data, associated with the problem of their representation to facilitate both their integration and their use,
    • Data sharing: provision of data in public banks, choice of characteristics to be shared, exchange formats,
    • Data integration: Combination and exploitation of data from new perspectives.

Projects

I carry out this work within the ImHorPhen team at IRHS. They have registered or are part of various research projects:

  • PAYTAL (2011-2015)
  • Verger de demain (2011-2015)
  • AI-Fruit (2012-2016)
  • CRB FraPeR et Apiacées (2014-2016)
  • GRIOTE (2014-2018)
  • ANANdb (2015-2016)
  • EUCLEG (2017-2021)
  • DIVIS (2018-2021)

Publications

Software

I am involved in the development of several software whose source code is made available on the forgemia forge of INRAe.

The IRHS bioinformatics team develops tools for the management of biological data, developments to which I am contributing. Thus, ELVIS (Experiment and Laboratory on Vegetal Information System) brings together the databases and server layer common to the various data management / processing tools developed in the team. ELVIS takes the form of a PostgreSQL database and a data access web service layer developed in Python. ELVIS is broken down into a set of thematic modules. Several business applications developed by the team are based on ELVIS.

The ELVIS project page on ForgeMIA

PREMS is the business application oriented towards laboratory management based on ELVIS. PREMS consists of a set of bricks including the management of projects, samples and experimental results.

The PREMS project page on ForgeMIA

Elterm is the terminology management application based on ELVIS.

In ELVIS, the content of many fields is controlled by lists of possible values, which are generally derived from terminologies:

  • recognized domain terminologies, possibly derived from publicly available taxonomies or ontologies (Plant Ontology, Crop Ontology, etc.)
  • specific terminologies that we can consider disseminating

We therefore store a set of terminologies each covering a theme: morphology of organisms, development stages, growth conditions, etc. The general principle of what we want to store is similar to what is found in standard representations of terminologies in XML format like TermBase Exchange, but in the form of a database. Elterm provides a set of graphical interfaces allowing users to manipulate terminologies stored in ELVIS.

The ELTerm project page on ForgeMIA

Thanks to the wider spread of high-throughput experimental techniques, biologists are accumulating large amounts of datasets which often mix quantitative and qualitative variables and are not always complete, in particular when they regard phenotypic traits. In order to get a first insight into these datasets and reduce the data matrices size scientists often rely on multivariate analysis techniques. However such approaches are not always easily practicable in particular when faced with mixed datasets. Moreover displaying large numbers of individuals leads to cluttered visualisations which are difficult to interpret.

We developed a new methodology to overcome these limits. Its main feature is a new semantic distance tailored for both quantitative and qualitative variables which allows for a realistic representation of the relationships between individuals (phenotypic descriptions in our case). This semantic distance is based on ontologies which are engineered to represent real life knowledge regarding the underlying variables. For easier handling by biologists, we incorporated its use into a complete tool, from raw data file to visualisation. Following the distance calculation, the next steps performed by the tool consist in (i) grouping similar individuals, (ii) representing each group by emblematic individuals we call archetypes and (iii) building sparse visualisations based on these archetypes.Our approach is implemented as a Python pipeline and applied to  a rosebush dataset including passport and phenotypic data.

The DIVIS project page on ForgeMIA

As part of the DIVIS project, we were faced with the need to characterized groups of individuals according to the values of variables in the dataset. Such a method has been developed by F. Husson et al with the catdes() function as part of the FactoMiner R package. However were not completely satisfied with the output of this function regarding both the result data table and the visualisation. Therefore we developped our own Python implementation, with extras...

The QuaDS project page on ForgeMIA

Publications

My productions on HAL