Generalized Structure of the Algorithm for Automated Detection of Non Relevant and Wrong Information on Web Resources

Mykola Dyvak; Andrii Kovbasistyi; Petro Stakhiv; Piotr Lipiński

doi:10.34658/jacs.2017.1.23-37

Vol. 25 No. 1 (2017), Artykuły

Vol. 25 No. 1 (2017)

Generalized Structure of the Algorithm for Automated Detection of Non Relevant and Wrong Information on Web Resources

Artykuły

https://doi.org/10.34658/jacs.2017.1.23-37

Published June 30, 2017

Mykola Dyvak⁺⁻
Andrii Kovbasistyi⁺⁻
Petro Stakhiv⁺⁻
Piotr Lipiński⁺⁻

Mykola Dyvak

Ternopil National Economic University Department of Computer Science

Andrii Kovbasistyi

Ternopil National Economic University Department of Computer Science

Petro Stakhiv

Lodz University of Technology Institute of Information Technology

Piotr Lipiński

Lodz University of Technology Institute of Information Technology

PDF

Keywords

Semantic analysis of content
parsing
the architecture of software systems

How to Cite

Dyvak, M., Kovbasistyi, A., Stakhiv, P., & Lipiński, P. (2017). Generalized Structure of the Algorithm for Automated Detection of Non Relevant and Wrong Information on Web Resources. Journal of Applied Computer Science, 25(1), 23-37. https://doi.org/10.34658/jacs.2017.1.23-37

Abstract

In this article the algorithm for automated detection of non-relevant or wrong information on websites is introduced. The algorithm extracts the semantic information from the webpage using third party software and compares the semantic information with the reliable resources. Reliable information is identified by the means of majority voting or extracted from reliable databases

https://doi.org/10.34658/jacs.2017.1.23-37

PDF

References

Pasichnyk, N. R. and Dyvak, M., Formalism in the quality site creating problem, Naukovi pratsi DonNTY, ser. Informatyka, kibernetyka ta obchysliuvalna tekhnika, Vol. 14, No. 188, 2011, pp. 325–329.

Pasichnyk, N. R. and Dyvak, M., Matrix the method and algorithm of construction of the content websites structures based on the ontological approach, Naukovi pratsi DonNTY, ser. Informatyka, kibernetyka ta obchysliuvalna tekhnika, Vol. 15, 2012, pp. 184–189, (in Ukrainian).

Pasichnyk, N., Method of forming an ontological content, based on analysis of information at specialized Web-sites, Visnyk HNU: Tekhnichni nauky, Vol. 5, 2012, pp. 241–244, (in Ukrainian).

Pasichnyk, N., P. R. and Dyvak, M., Mathematical model of traffic dynamics of the specialized websites and methods of its identification, Induktyvne modeliuvannia skladnykh system: Zb. nauk. pr., Vol. 5, 2013, pp. 237–247, (in Ukrainian).

Dyvak, M., P. R. and Pasichnyk, N., Identification and modeling of limiting factors systems, Proceedings of the 2016 IEEE First International Conference on Data Stream Mining & Processing (DSMP), 2016, pp. 336–340.

Dyvak, M. and Kowbasistyj, A., Specific features of construction the method of detection the outdated and incorrect information on web resources, Proceedings of the VI Ukrainian school-seminar for young scientists and students Advanced Computer Information Technologies, 2016, pp. 120–121.

The structure of the site. Creation and development of categorization, url: http://seo-for-ucoz.com/load/podgotovka_k_prodvizheniyu/struktura_sajta/1-1-0-4 (in Russian).

Analysis of site structure, url: http://www.web-patrol.net/audit-sitestruktur.html (in Russian).

Information about HTTP status codes:, url: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html.

Information about User-Agent headers:, url: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html.

Information about the XPath language:, url: http://www.w3schools.com/XPath/default.asp.

Xin Wei, James Cai, J. R., Use Base SAS URL to Build Surveillance and Monitoring System for New Clinical Trial Registration, PharmaSUG 2010 Proceedings, 2010, url: http://www.pharmasug.org/cd/papers/AD/AD23.pdf.

Duncan Temple Lang. XML: Tools for parsing and generating XML within R and S-Plus, url: http://CRAN.R-project.org/package=XML.

Duncan Temple Lang. RCurl: General network (HTTP/FTP/...) client interface for R, url: http://CRAN.R-project.org/package=RCurl.

Downloads

Download data is not yet available.

Generalized Structure of the Algorithm for Automated Detection of Non Relevant and Wrong Information on Web Resources

Keywords

How to Cite

Download Citation

Abstract

References

Downloads