Food Ingredients Similarity Based on Conceptual and Textual Similarity
DOI:
https://doi.org/10.12962/j22759970.v1i2.107Keywords:
Halal, Jaccard distance, Jaro-Winkler distance, Levenshtein distance, Open food facts, WordnetAbstract
Open Food Facts provides a database of food products such as product names, compositions, and additives, where everyone can contribute to add the data or reuse the existing data. The open food facts data are dirty and needs to be processed before storing the data to our system. To reduce redundancy in food ingredients data, we measure the similarity of ingredient food using two similarities: the conceptual similarity and textual similarity. The conceptual similarity measures the similarity between the two datasets by its word meaning (synonym), while the textual similarity is based on fuzzy string matching, namely Levenshtein distance, Jaro-Winkler distance, and Jaccard distance. Based on our evaluation, the combination of similarity measurements using textual and Wordnet similarity (conceptual) was the most optimal similarity method in food ingredients.
References
LPPOM MUI. http://www.halalmui.org/mui14/index.php/main/go_to_section/130/1511/page/1. [Accessed: February 1, 2019].
N.A. Rakhmawati, J. Fatawi, A.C. Najib, and A.A. Firmansyah. “Linked open data for halal food products,” J. King Saud Univ. - Comput. Inf. Sci. 2019, vol. 33, no. 6, pp. 728-739.
Open Food Facts. https://world.openfoodfacts.org/discover [Accessed: January 18, 2019]
C. Leacock, M. Chodorow, G.A. Miller, “Combining local context and WordNet similarity for word sense identification,” In WordNet: An Electronic Lexical Database, C. Fellbaum, Ed., Cambridge, MA, USA: MIT Press, 1998, pp. 265-283.
M. Warin. Using WordNet and Semantic Similarity to Disambiguate an Ontology, 2004.
H. Liu and P. Wang, “Assessing Sentence Similarity Using WordNet based Word Similarity,” J. Softw. 2013, vol. 6, no. 6, pp. 1451-1458.
WordNet (2019): a lexical database for English. (n.d.). https://wordnet.princeton.edu/ [Accessed: January 18, 2019]
C. Zhao and S. Sahni, “String correction using the Damerau-Levenshtein distance,” BMC Bioinform. 2019, vol. 20, 277.
F. Friendly, “Jaro-Winkler distance improvement for approximate string search using indexing data for multiuser application,” J. Phys. Conf. Ser. 2019, vol. 1361, no. 1.
P. Jaccard, “Étude comparative de la distribution florale dans une portion des Alpes et des Jura,” Bull. Soc. Vaudoise Sci. Nat. 1901, vol. 37, pp. 547-579.
R.S. Igoe and Y.H. Hui, Dictionary of Food Ingredients Fourth Edition. Gaithersburg: Aspen Publishers, Inc., 2001.
D. Guessoum, M. Miraoui, and C. Tadj, “A modification of wu and palmer semantic similarity measure,” Conf. UBICOMM 2016, The Tenth International Conference on Mobile Ubiquitous Computing, Systems, Services and Technologies, 2016, pp. 42-46.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2021 Halal Research Journal

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright
Authors who publish their manuscripts in this journal agree to the following terms:
- The copyright of each article remains with the authors.
- Halal Research Journal holds the right to publish the article first under the Creative Commons Attribution 4.0 International License.
- Authors may distribute their published manuscripts non-exclusively (e.g., to institutional repositories or as part of book publications), provided they acknowledge that the article was first published in this journal.
License
Articles published in this journal are licensed under the Creative Commons Attribution 4.0 International License. This license permits anyone to:
- Copy, distribute, adapt, modify, and create derivative works from the material in any form, including for commercial purposes.
- The condition is that proper credit must be given to the authors for the original work.