設定簡易貝氏分類器中各屬性先驗分配之方法

在貝氏分類法中,簡易貝氏分類器由於運算速度快,已經被廣泛的使用。而在使用簡易貝氏分類器時,對於資料屬性可能值一般會使用狄氏分配來做為其先驗分配,曾有學者對狄氏分配作為先驗分配的之合理性做過相關探討,但是對參數設定的部份則是採用ㄧ致的調法,由於不同的屬性可能值有不同的特性,若只是因為屬性可能值個數相同就做同樣的調整並不合理,因此本研究的重點是把不同屬性參數值彼此之間的調整視為不相關,對不同的屬性找出其專屬的參數值,而廣義狄氏分配與羅氏分配比狄氏分配更ㄧ般化,且可當作簡易貝氏分類器的先驗分配,因此本研究即針對這三種先驗分配做探討。本研究從UCI資料存放站上找出18個適合的資料檔來做分析,整體來說當...

Full description

Bibliographic Details
Main Authors: 林琦芳, Lin, Chi-Fang
Other Authors: 工業與資訊管理學系碩博士班, 翁慈宗, Wong, Tzu-Tsung
Format: Thesis
Language:Chinese
English
Published: 2007
Subjects:
DML
Online Access:http://ir.lib.ncku.edu.tw/handle/987654321/32134
http://ir.lib.ncku.edu.tw/bitstream/987654321/32134/1/
Description
Summary:在貝氏分類法中,簡易貝氏分類器由於運算速度快,已經被廣泛的使用。而在使用簡易貝氏分類器時,對於資料屬性可能值一般會使用狄氏分配來做為其先驗分配,曾有學者對狄氏分配作為先驗分配的之合理性做過相關探討,但是對參數設定的部份則是採用ㄧ致的調法,由於不同的屬性可能值有不同的特性,若只是因為屬性可能值個數相同就做同樣的調整並不合理,因此本研究的重點是把不同屬性參數值彼此之間的調整視為不相關,對不同的屬性找出其專屬的參數值,而廣義狄氏分配與羅氏分配比狄氏分配更ㄧ般化,且可當作簡易貝氏分類器的先驗分配,因此本研究即針對這三種先驗分配做探討。本研究從UCI資料存放站上找出18個適合的資料檔來做分析,整體來說當先驗分配為狄氏分配及羅氏分配時,屬性個別調整的分類正確率會比屬性一致調整及一致調整後再個別調整略高,當先驗分配為廣義狄氏分配時,則是屬性一致調整後再個別調整會有較高的分類正確率,但整體來說建議使用廣義狄氏分配當做先驗分配,而使用的四種屬性排序方法: 、ADC、SU、DML所得的屬性順序並沒有太大的差異,不過 的計算較為複雜,故建議可以從ADC、SU、DML中挑選一個即可。 Naive Bayesian classifiers are a widely used classification tool because its computation complexity is low In a naive Bayesian classifier the prior distribution of an attribute is explicitly or implicitly assumed to be a Dirichlet distribution A study proposed two alternative types of priors generalized Dirichlet and Liouville distributions and systematically and concurrently changed the parameters of the priors for all attributes to study the performance of the naïve Bayesian classifier Since every attribute is unique it is unreasonable to adjust the parameters of all priors concurrently In this study we consider that the parameter settings on the attribute priors are independent Three methods named concurrent prior setting individual prior setting and concurrent followed by individual prior setting are then proposed to study their impacts on the prediction accuracy of the naïve Bayesian classifier when a prior is either a Dirichlet a generalized Dirichlet or a Liouville distribution The experimental results on 18 data sets from UCI data repository demonstrate that when a prior is either a Dirichlet or a Liouville Distribution individual prior setting generally has a higher classification accuracy than the other two methods However when a prior is a Generalized Dirichlet distribution concurrent followed by individual prior setting has the highest performance The generalized Dirichlet distribution is overall the best choice among the three distribution families The impacts of the four measures ADC SU and DML for ranking attributes on the performance of the naïve Bayesian classifier are insignificant Since the computational cost for measure is higher any one of the other three measures can be used to rank attributes