MEGA
CS 4780 Information Retrieval Final Project | Spring 2021
A collaborative effort with Chenlin Liu,
My Role:
Back-end engineer, Researcher
Duration:
1 Month
Platform:
Django
Area:
Search Engine Development
Overview
Mega is an advanced search engine built upon BM25 that offers multiple search options to cater to the diverse needs of users. The Standard algorithm matches product titles and descriptions with user queries for specific product or brand searches. The With Comment algorithm takes into account product titles, descriptions, and user reviews for a broader range of options. Lastly, the Mega algorithm combines all previous factors and performs sentiment analysis on reviews, assigning higher weight to more positively reviewed products for a more informed decision-making. Overall, Mega aims to provide a tailored search experience by adapting to different search approaches.
Motivation
E-commerce has had a profound impact on modern society, however, as the number of products available online has grown exponentially, it has become increasingly challenging for individuals to find the most suitable item they are looking for. When individuals have a general idea of what they are looking for, but lack specific details, such as searching for the best science fiction books, current ranking algorithms might only return results that include the phrase "best science fiction" in their product title, but not the most highly rated or recommended books within the science fiction genre itself (Fig.1)
Fig. 1 Search results on Amazon with the query "best science fiction"
Even when individuals possess a clear idea of what they are searching for, current methods can still fall short in providing an efficient and effective means of finding the desired product. One notable area of difficulty for many individuals is in making purchasing decisions, particularly in their reliance on reviews. However, the current "filter by average customer review" feature often produces subpar results due to the presence of repetitive and irrelevant data as well as inconsistent review scores.
To address these issues, we propose the development of a search engine that is tailored to meet the diverse search needs of users. This engine would accomplish this by assigning different weights to various criteria, using sentiment analysis on reviews to generate more reliable scores, thus providing a more sophisticated and comprehensive search experience.
Dataset
UCSD Amazon Review Dataset →: category “All Beauty”
Due to the limited capacity of our laptops, we can only afford to run the tests on one specific category under the Amazon Review Dataset. After data-preprocessing, we are left with 4080 product items.