Generalized Structure of the Algorithm for Automated Detection of Non Relevant and Wrong Information on Web Resources
PDF (English)

Słowa kluczowe

Semantic analysis of content
parsing
the architecture of software systems

Jak cytować

Dyvak, M., Kovbasistyi, A., Stakhiv, P., & Lipiński, P. (2017). Generalized Structure of the Algorithm for Automated Detection of Non Relevant and Wrong Information on Web Resources. Journal of Applied Computer Science, 25(1), 23-37. https://doi.org/10.34658/jacs.2017.1.23-37

Abstrakt

In this article the algorithm for automated detection of non-relevant or wrong information on websites is introduced. The algorithm extracts the semantic information from the webpage using third party software and compares the semantic information with the reliable resources. Reliable information is identified by the means of majority voting or extracted from reliable databases

https://doi.org/10.34658/jacs.2017.1.23-37
PDF (English)

Bibliografia

Pasichnyk, N. R. and Dyvak, M., Formalism in the quality site creating problem, Naukovi pratsi DonNTY, ser. Informatyka, kibernetyka ta obchysliuvalna tekhnika, Vol. 14, No. 188, 2011, pp. 325–329.

Pasichnyk, N. R. and Dyvak, M., Matrix the method and algorithm of construction of the content websites structures based on the ontological approach, Naukovi pratsi DonNTY, ser. Informatyka, kibernetyka ta obchysliuvalna tekhnika, Vol. 15, 2012, pp. 184–189, (in Ukrainian).

Pasichnyk, N., Method of forming an ontological content, based on analysis of information at specialized Web-sites, Visnyk HNU: Tekhnichni nauky, Vol. 5, 2012, pp. 241–244, (in Ukrainian).

Pasichnyk, N., P. R. and Dyvak, M., Mathematical model of traffic dynamics of the specialized websites and methods of its identification, Induktyvne modeliuvannia skladnykh system: Zb. nauk. pr., Vol. 5, 2013, pp. 237–247, (in Ukrainian).

Dyvak, M., P. R. and Pasichnyk, N., Identification and modeling of limiting factors systems, Proceedings of the 2016 IEEE First International Conference on Data Stream Mining & Processing (DSMP), 2016, pp. 336–340.

Dyvak, M. and Kowbasistyj, A., Specific features of construction the method of detection the outdated and incorrect information on web resources, Proceedings of the VI Ukrainian school-seminar for young scientists and students Advanced Computer Information Technologies, 2016, pp. 120–121.

The structure of the site. Creation and development of categorization, url: http://seo-for-ucoz.com/load/podgotovka_k_prodvizheniyu/struktura_sajta/1-1-0-4 (in Russian).

Analysis of site structure, url: http://www.web-patrol.net/audit-sitestruktur.html (in Russian).

Information about HTTP status codes:, url: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html.

Information about User-Agent headers:, url: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html.

Information about the XPath language:, url: http://www.w3schools.com/XPath/default.asp.

Xin Wei, James Cai, J. R., Use Base SAS URL to Build Surveillance and Monitoring System for New Clinical Trial Registration, PharmaSUG 2010 Proceedings, 2010, url: http://www.pharmasug.org/cd/papers/AD/AD23.pdf.

Duncan Temple Lang. XML: Tools for parsing and generating XML within R and S-Plus, url: http://CRAN.R-project.org/package=XML.

Duncan Temple Lang. RCurl: General network (HTTP/FTP/...) client interface for R, url: http://CRAN.R-project.org/package=RCurl.

Pobrania pliku

Brak danych dotyczących pobrań pliku.