Heavy path based super-sequence frequent pattern mining on web log dataset

Xinran Yu, Turgay Korkmaz

Abstract


Mining web log datasets has been extensively studied using Frequent Pattern Mining (FPM) and its various other forms. Identifyingfrequent patterns in different sequences can help in analyzing the most common sub-sequences (e.g., the pages visitedtogether). However, this approach would not be able to identify general structures spanning over multiple sequences. In responseto understanding general structures, we introduce a new form of sequential pattern mining called super-sequence frequent patternmining (SS-FPM). In contrast to sub-sequences determined by FPM, SS-FPM determines the super-sequences that can containthe common parts from different sequences. This can be useful in understanding the general behavior/flow of users in web usagemining, classifying web pages and users, making predictions etc. In essence, finding frequent super-sequence patterns turnsout to be related to the well-known heaviest (longest) path problem in graphs, which is known to be NP-hard. Accordingly,we transform a given sequential dataset into a sequence graph and formulate the problem as k-hop heaviest path problem. Wethen propose an efficient heuristic called sequence matrix method using dynamic programming techniques. We compared ourmethod to the existing Heavypath method. The results show that our method is more efficient especially on large datasets.


Full Text:

PDF


DOI: https://doi.org/10.5430/air.v4n2p1

Refbacks

  • There are currently no refbacks.


Artificial Intelligence Research

ISSN 1927-6974 (Print)   ISSN 1927-6982 (Online)

Copyright © Sciedu Press 
To make sure that you can receive messages from us, please add the 'Sciedupress.com' domain to your e-mail 'safe list'. If you do not receive e-mail in your 'inbox', check your 'bulk mail' or 'junk mail' folders.