Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
  • ID3、C4.5的Python实现,其中C4.5有待完善,后续加入CART。

  • 依赖

    • NumPy
    • Matplotlib
  • 测试

     from id3_c45 import DecisionTree if __name__=='__main__': #Toy data X = [[1, 2, 0, 1, 0], [0, 1, 1, 0, 1], [1, 0, 0, 0, 1], [2, 1, 1, 0, 1], [1, 1, 0, 1, 1]] y = ['yes','yes','no','no','no'] clf = DecisionTree(mode='ID3') clf.fit(X,y) clf.show() print clf.predict(X) #['yes' 'yes' 'no' 'no' 'no'] clf_ = DecisionTree(mode='C4.5') clf_.fit(X,y).show() print clf_.predict(X) #['yes' 'yes' 'no' 'no' 'no'] 

    ID3:

    C4.5:

  • 存在的问题

    (1) 如果测试集中某个样本的某个特征的值在训练集中没出现,则会造成训练出来的树的某个分支,对该样本不能分类,出现KeyError:

     from sklearn.datasets import load_digits dataset = load_digits() X = dataset['data'] y = dataset['target'] clf.fit(X[0:1000],y[0:1000]) for i in range(1000,1500): try: print clf.predict(X[i])==y[i] except KeyError: print "KeyError" 

    (2)目前还不能对多个样本并行预测