Nelissen data extraction
ICT & Software Engineering
Client company:Nelissen
Rens van Giersbergen
Jarno Weemen
Lukas Jansen
Timo maas
Jordy Walraven
Project description
The main research question of our project is: How can the valuable information from Nelissen’s project files be efficiently extracted, stored, and be made accessible?
The challenge is that all the documents are different formats and not structured for use in software
Context
Nelissen is a leading engineering firm specializing in installation technology, building physics, acoustics, fire safety, and sustainability. With a team of 75 talented employees, Nelissen possesses a wide range of expertise and has successfully completed over 5040 projects to date. The organization is renowned for its smart and sustainable solutions for people in buildings and collaborates with top architects and a variety of clients.
Results
In conclusion, the valuable information from Nelissen’s project files can be efficiently extracted, stored, and made accessible by implementing a structured and systematic approach. By identifying the relevant parameters through a detailed Excel document, we ensure that the data extraction software targets the most valuable information. Utilizing the Rust programming language for its speed and efficiency, coupled with a local AI model, Mistral, allows for effective extraction of key data from text files. Storing the extracted data in a document-based database offers scalability and efficient data management. While some sub-questions remain unanswered, the research provides a solid architecture for the data extraction process, ensuring that Nelissen's project files are efficiently processed and made accessible.
The current state of the extraction process environment is on TRL 4 (functional verification). We have a complete system that works functionally and lives up to the expectations of the stakeholders. The services that are created can be ran and are fully connected to each other through Kafka messages. The results of the extraction are a bit lacking and mostly aren’t exactly what Nelissen expects, but it does show the best-case scenario for most parameters.
To move to the next level, the following activities need to be completed:
- Better results for each parameter.
- Improved part finding through synonyms.
- Code restructuring/improvements.
- Deploying the application in Nelissen's environment
Video
Watch the video on Google Drive
About the project group
We are a group of Software Engineering students from semester 6.
Over the past months, we have worked every thursday and friday on our group project: Data Extraction.
We've worked together on this project following the SCRUM methodology, working in sprints, and delivering our products in iterations (sprints) of 3 weeks.