Browsing by Author "Algiriyage, N"
Now showing 1 - 5 of 5
- Results Per Page
- Sort Options
- item: Thesis-AbstractDetecting access patterns through analysis of web logs(2015-09-16) Algiriyage, NWith the evolution of the Internet and continuous growth of the global information infrastructure, the amount of data collected online from transactions and events has been drastically increased. Web server access log files collect substantial data about web visitor access patterns. Data mining techniques can be applied on such data (which is known as Web Mining) to reveal lot of useful information about navigational patterns. In this research we analyze the patterns of web crawlers and human visitors through web server access log files. The objectives of this research are to detect web crawlers, identify suspicious crawlers, detect Googlebot impersonation and profile human visitors. During human visitor profiling we group similar web visitors into clusters based on their browsing patterns and profile them. We show that web crawlers can be identified and successfully classified using heuristics. We evaluated our proposed methodology using seven test crawler scenarios. We found that approximately 53.25% of web crawler sessions were from â ˘ AIJknownâ˘A ˙I crawlers and 34.16% exhibit suspicious behavior. We present an effective methodology to detect fake Googlebot crawlers by analyzing web access logs. We propose using Markov chain models to learn profiles of real and fake Googlebots based on their patterns of web resource access sequences. We have calculated log-odds ratios for a given set of crawler sessions and our results show that the higher the log-odds score, the higher the probability that a given sequence comes from the real Googlebot. Experimental results show, at a threshold log-odds score we can distinguish the real Googlebot from the fake. For the purpose of human visitor profiling, an improved similarity measure is proposed and it is used as the distance measure in an agglomerative hierarchical clustering for a data set from an e-commerce web site. To generate profiles, frequent item set mining is applied over the clusters. Our results show that proper visitor clustering can be achieved with the improved similarity measure.
- item: Conference-AbstractIdentification and characterization of crawlers through analysis of web logs(2014-06-18) Algiriyage, N; Jayasena, VSD; Dias, G; Perera, A; Dayananda, K; Sharma, KWeb crawlers are software programs that automatically traverse the hyperlink structure of the world-wide web in order to locate and retrieve information. In addition to crawlers from search engines, we observed many other crawlers which may gather business intelligence, confidential information or even execute attacks based on gathered information while camouflaging their identity. Therefore, it is important for a website owner to know who has crawled his site, and what they have done. In this study we have analyzed crawler patterns in web server logs, developed a methodology to identify crawlers and classified them into three categories. To evaluate our methodology we used seven test crawler scenarios. We found that approximately 53.25% of web crawler sessions were from 'known' crawlers and 34.16% exhibit suspicious behavior.
- item: Conference-AbstractIdentification and characterization of crawlers through analysis of web logsAlgiriyage, N; Jayasena, VSD; Dias, G; Perera, A; Dayananda, KWeb crawlers are software programs that automatically traverse the hyperlink structure of the world-wide web in order to locate and retrieve information. In addition to crawlers from search engines, we observed many other crawlers which may gather business intelligence, confidential information or even execute attacks based on gathered information while camouflaging their identity. Therefore, it is important for a website owner to know who has crawled his site, and what they have done. In this study we have analyzed crawler patterns in web server logs, developed a methodology to identify crawlers and classified them into three categories. To evaluate our methodology we used seven test crawler scenarios. We found that approximately 53.25% of web crawler sessions were from “known” crawlers and 34.16% exhibit suspicious behavior.
- item: Conference-AbstractWeb user profiling using hierarchical clustering with improved similarity measureAlgiriyage, N; Jayasena, S; Dias, GWeb user profiling targets grouping users in to clusters with similar interests. Web sites are attracted by many visitors and gaining insight to the patterns of access leaves lot of information. Web server access log files record every single request processed by web site visitors. Applying web usage mining techniques allow to identify interesting patterns. In this paper we have improved the similarity measure proposed by Velásquez et al. [1] and used it as the distance measure in an agglomerative hierarchical clustering for a data set from an online banking web site. To generate profiles, frequent item set mining is applied over the clusters. Our results show that proper visitor clustering can be achieved with the improved similarity measure.
- item: Conference-Full-textWeb User Profiling using Hierarchical Clustering with Improved Similarity Measure(2015-08-03) Algiriyage, N; Jayasena, S; Dias, GWeb user profiling targets grouping users in to clusters with similar interests. Web sites are attracted by many visitors and gaining insight to the patterns of access leaves lot of information. Web server access log files record every single request processed by web site visitors. Applying web usage mining techniques allow to identify interesting patterns. In this paper we have improved the similarity measure proposed by Vel´asquez et al. [1] and used it as the distance measure in an agglomerative hierarchical clustering for a data set from an online banking web site. To generate profiles, frequent item set mining is applied over the clusters. Our results show that proper visitor clustering can be achieved with the improved similarity measure.