| Name: | Description: | Size: | Format: | |
|---|---|---|---|---|
| 4.69 MB | Adobe PDF | 
Advisor(s)
Abstract(s)
The ever-growing Web contains a large amount of data. This large amount of data is useful when combined with applications that can refine it and use it to improve its users’ lives. However, using the data available is not an easy task since most of the information is not represented in machine-friendly formats. Instead, this information is represented in formats ideal for human users, resulting in an additional effort for having machines interpreting, extracting, and integrating it, while at the same time ensuring the consistency of information from different sources. In this project, a solution using an ontology-based integration combined with web robots’ extraction automates the process required for updating information regarding schedules of public transports. An already existing application receives that information and uses it to calculate efficient routes for commuters. The proposed solution can extract information from multiple online sources and transform it into different formats. It can extract and transform the information from PDFs and HTML. The system provides a web service for the exportation of these formats by a route optimization system. This document contains the detailed process of the design and construction of the integration system. It describes the alternatives and selections that lead to the application created. Lastly, it evaluates the solution by performing extraction from several sources relevant to the project’s domain.
Description
Keywords
 Information Retrieval   Web crawling   Information Integration   Ontology   PDF extraction 
