How boobook and Ghent University collaborated to create a text analysis framework for business applications
Back in March, we introduced the SentEMO project, led by the linguistic department of Ghent University. The project is based on the collaboration between academic researchers and different business organisations, conveying the research around text and sentiment analysis to create a text analytics framework for business applications.
A few weeks ago, all the stakeholders met to discuss the progress update and evolution of the project after the initial six months. Frederik De Boeck, strategic analytics director at boobook, met with the rest of the consortium to discuss the work done and map out the next steps.
"Our role as business partners is to provide feedback during development stages and suggest features that are applicable and useful for businesses. The system itself tries to find aspects about which something is said, and the sentiment and emotion that goes with the statement. During the first six months, we extracted the main aspects and related sentiment. The first statistics on the overall performance of these models are good. One of the main learnings we've discovered is that those topics that are not mentioned a lot, are also harder to retrieve as the system doesn't have enough information to learn from. The more complex the codebook is, the harder it is to predict something. We could say our biggest learning is to keep it as simple as possible," explains Frederik.
"The data we are looking at consists of 7 000 to 8 000 verbatims across eight categories that have been manually tagged. Based on this tagged data, the system now extracts aspects and sentiments on new data being inserted. In the end, the system will also include a recursive loop, continuously learning based on new information coming in. The system will continuously be amended as it’s designed as a self-learning system," says Frederik.
The platform is intended to be multilingual, in four different languages, another aspect that also needs to be developed gradually. "In the first months, the focus was on developing the framework based on the Dutch language. The goal in the following months is to work on other languages, like French, English, and German," shares Frederik.
Currently, the project is more about how valid the data is, how good is prediction modelling is, and how good are we analysing the data. That is why the next big step is finalising the dashboard. "The dashboard will be a great asset to show to our clients and to give a more inside view of how the platform will look and work ultimately. After the first demo of the dashboard, we gave our feedback from the perspective of how we would be using it," says Frederik. And that’s just the reason why such collaboration between industry and academia is important.
So, will the platform be helpful for all businesses?
"The system becomes interesting when you have loads of volume coming in, i.e. plenty of data and information. The platform will be useful for companies that have loads of reviews or other text data coming in. The system needs to feed on a large quantity of data to really make sense out of its AI. On the other hand, the benefit for the smaller samples is that you can reuse pre-trained category-specific models to help mine the information," explains Frederik.
As mentioned before, boobook is a part of a consortium working hard on developing this concept. The role of boobook as a business and strategic consultant is multifaceted and brings to the table the necessary knowledge of the current challenges and needs that companies face.
"This project is very much a learning process, for both academia and companies. We can see firsthand the newest innovation and trends in linguistic and sentiment data analysis. After the first six months, it's great to see we're on track with project objectives. The first results from the statistical point of view are good; now it's time to deepen this study and help form a final product," concludes Frederik.
The official launch of the SentEMO platform is scheduled for the end of 2022. Keep your eye on our future articles, as we’ll keep you updated on the progress!