Document Type : Original Article
Authors
1
Dept. of Computer Science, Faculty of Computers and Information Menoufia University, Egypt
2
Faculty of Computer and Information Menoufia University
3
Dept. of Computer Science, Faculty of Computers and Information, Menoufia University, Egypt
Abstract
Nowadays, massive amount of data flows all the time. Approximately between 20 or 30 percent of these data is text. This data is always organized in semi-structured text, which cannot be used directly. To make use of such huge amounts of textual data, there is a need to detect, extract, and structure the information conveyed through this data in a fast and scalable manner. This can be performed using Information Extraction Techniques. However, the task of information extraction is one of the main challenges in Natural Language Processing and there are limitations for its implementation on a large scale of data. Open Information Extraction (OIE) is an open-domain and relation-independent paradigm to perform information extraction in an unsupervised manner. This technique can lead to high-speed and scalable performance. The review of previous research proposals reveals that there are OIE experiments among different languages, such as English, Portuguese, Spanish, Vietnamese, Chinese, and Germany. This paper reviews the OIE techniques, compare their performance in some languages, and then integrates these results with the languages complexity levels to reveal the relationship between the suitable model and the language complexity level.
Nowadays, massive amount of data flows all the time. Approximately between 20 or 30 percent of these data is text. This data is always organized in semi-structured text, which cannot be used directly. To make use of such huge amounts of textual data, there is a need to detect, extract, and structure the information conveyed through this data in a fast and scalable manner. This can be performed using Information Extraction Techniques. However, the task of information extraction is one of the main challenges in Natural Language Processing and there are limitations for its implementation on a large scale of data. Open Information Extraction (OIE) is an open-domain and relation-independent paradigm to perform information extraction in an unsupervised manner. This technique can lead to high-speed and scalable performance. The review of previous research proposals reveals that there are OIE experiments among different languages, such as English, Portuguese, Spanish, Vietnamese, Chinese, and Germany. This paper reviews the OIE techniques, compare their performance in some languages, and then integrates these results with the languages complexity levels to reveal the relationship between the suitable model and the language complexity level.Keywords—Open Information Extraction; Natural Language Processing
Keywords