Food Ingredients Similarity Based on Conceptual and Textual Similarity

Authors

  • Nur Aini Rakhmawati
  • Miftahul Jannah ITS

DOI:

https://doi.org/10.12962/j22759970.v1i2.107

Keywords:

Halal, Jaccard distance, Jaro-Winkler distance, Levenshtein distance, Open food facts, Wordnet

Abstract

Open Food Facts provides a database of food products such as product names, compositions, and additives, where everyone can contribute to add the data or reuse the existing data. The open food facts data are dirty and needs to be processed before storing the data to our system. To reduce redundancy in food ingredients data, we measure the similarity of ingredient food using two similarities: the conceptual similarity and textual similarity. The conceptual similarity measures the similarity between the two datasets by its word meaning (synonym), while the textual similarity is based on fuzzy string matching, namely Levenshtein distance, Jaro-Winkler distance, and Jaccard distance. Based on our evaluation, the combination of similarity measurements using textual and Wordnet similarity (conceptual) was the most optimal similarity method in food ingredients.

References

LPPOM MUI. http://www.halalmui.org/mui14/index.php/main/go_to_section/130/1511/page/1. [Accessed: February 1, 2019].

N.A. Rakhmawati, J. Fatawi, A.C. Najib, and A.A. Firmansyah. “Linked open data for halal food products,” J. King Saud Univ. - Comput. Inf. Sci. 2019, vol. 33, no. 6, pp. 728-739.

Open Food Facts. https://world.openfoodfacts.org/discover [Accessed: January 18, 2019]

C. Leacock, M. Chodorow, G.A. Miller, “Combining local context and WordNet similarity for word sense identification,” In WordNet: An Electronic Lexical Database, C. Fellbaum, Ed., Cambridge, MA, USA: MIT Press, 1998, pp. 265-283.

M. Warin. Using WordNet and Semantic Similarity to Disambiguate an Ontology, 2004.

H. Liu and P. Wang, “Assessing Sentence Similarity Using WordNet based Word Similarity,” J. Softw. 2013, vol. 6, no. 6, pp. 1451-1458.

WordNet (2019): a lexical database for English. (n.d.). https://wordnet.princeton.edu/ [Accessed: January 18, 2019]

C. Zhao and S. Sahni, “String correction using the Damerau-Levenshtein distance,” BMC Bioinform. 2019, vol. 20, 277.

F. Friendly, “Jaro-Winkler distance improvement for approximate string search using indexing data for multiuser application,” J. Phys. Conf. Ser. 2019, vol. 1361, no. 1.

P. Jaccard, “Étude comparative de la distribution florale dans une portion des Alpes et des Jura,” Bull. Soc. Vaudoise Sci. Nat. 1901, vol. 37, pp. 547-579.

R.S. Igoe and Y.H. Hui, Dictionary of Food Ingredients Fourth Edition. Gaithersburg: Aspen Publishers, Inc., 2001.

D. Guessoum, M. Miraoui, and C. Tadj, “A modification of wu and palmer semantic similarity measure,” Conf. UBICOMM 2016, The Tenth International Conference on Mobile Ubiquitous Computing, Systems, Services and Technologies, 2016, pp. 42-46.

Downloads

Published

2021-10-27

How to Cite

[1]
N. A. . Rakhmawati and M. . Jannah, “Food Ingredients Similarity Based on Conceptual and Textual Similarity”, hr, vol. 1, no. 2, pp. 87–95, Oct. 2021.