導航:首頁 > 編程語言 > python決策樹iris

python決策樹iris

發布時間:2022-05-06 19:40:29

python中的sklearn中決策樹使用的是哪一種演算法

sklearn中決策樹分為DecisionTreeClassifier和DecisionTreeRegressor,所以用的演算法是CART演算法,也就是分類與回歸樹演算法(classification and regression tree,CART),劃分標准默認使用的也是Gini,ID3和C4.5用的是信息熵,為何要設置成ID3或者C4.5呢

Ⅱ 基於python的決策樹能進行多分類嗎

決策樹主文件 tree.py

[python] view plain

Ⅲ python sklearn 如何用測試集數據畫出決策樹(非開發樣本)

#coding=utf-8

from sklearn.datasets import load_iris
from sklearn import tree

iris = load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)

from sklearn.externals.six import StringIO
import pydot

dot_data = StringIO()
tree.export_graphviz(clf, out_file=dot_data)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
graph[0].write_dot('iris_simple.dot')
graph[0].write_png('iris_simple.png')

Ⅳ python實現的決策樹怎麼可視化

常用的幾種決策樹演算法有ID3、C4.5、CART:
ID3:選擇信息熵增益最大的feature作為node,實現對數據的歸納分類。
C4.5:是ID3的一個改進,比ID3准確率高且快,可以處理連續值和有缺失值的feature。
CART:使用基尼指數的劃分准則,通過在每個步驟最大限度降低不純潔度,CART能夠處理孤立點以及能夠對空缺值進行處理。

Ⅳ python sklearn決策樹的圖怎麼畫

#coding=utf-8

from sklearn.datasets import load_iris
from sklearn import tree

iris = load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)

from sklearn.externals.six import StringIO
import pydot

dot_data = StringIO()
tree.export_graphviz(clf, out_file=dot_data)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
graph[0].write_dot('iris_simple.dot')
graph[0].write_png('iris_simple.png')

Ⅵ python中使用scikit-learn的決策樹演算法運行錯誤

你好,你有讀寫的許可權嗎?嘗試普通的文件讀寫操作:
f=open('a.txt', 'w')
如果不能正常運行,那麼嘗試用管理員許可權運行你的程序。或者,修改保存的文件名,'iris.doct'修改為其它的名字,如'abcd.dot'.

Ⅶ 如何用python實現隨機森林分類

大家如何使用scikit-learn包中的類方法來進行隨機森林演算法的預測。其中講的比較好的是各個參數的具體用途。
這里我給出我的理解和部分翻譯:
參數說明:
最主要的兩個參數是n_estimators和max_features。
n_estimators:表示森林裡樹的個數。理論上是越大越好。但是伴隨著就是計算時間的增長。但是並不是取得越大就會越好,預測效果最好的將會出現在合理的樹個數。
max_features:隨機選擇特徵集合的子集合,並用來分割節點。子集合的個數越少,方差就會減少的越快,但同時偏差就會增加的越快。根據較好的實踐經驗。如果是回歸問題則:
max_features=n_features,如果是分類問題則max_features=sqrt(n_features)。

如果想獲取較好的結果,必須將max_depth=None,同時min_sample_split=1。
同時還要記得進行cross_validated(交叉驗證),除此之外記得在random forest中,bootstrap=True。但在extra-trees中,bootstrap=False。

這里也給出一篇老外寫的文章:調整你的隨機森林模型參數http://www.analyticsvidhya.com/blog/2015/06/tuning-random-forest-model/


這里我使用了scikit-learn自帶的iris數據來進行隨機森林的預測:

[python]view plain

Ⅷ 使用python+sklearn的決策樹方法預測是否有信用風險

import numpy as np11
import pandas as pd11
names=("Balance,Duration,History,Purpose,Credit amount,Savings,Employment,instPercent,sexMarried,Guarantors,Residence ration,Assets,Age,concCredit,Apartment,Credits,Occupation,Dependents,hasPhone,Foreign,lable").split(',')11
data=pd.read_csv("Desktop/sunshengyun/data/german/german.data",sep='\s+',names=names)11
data.head()11

Balance
Duration
History
Purpose
Credit amount
Savings
Employment
instPercent
sexMarried
Guarantors

Assets
Age
concCredit
Apartment
Credits
Occupation
Dependents
hasPhone
Foreign
lable

0
A11 6 A34 A43 1169 A65 A75 4 A93 A101 … A121 67 A143 A152 2 A173 1 A192 A201 1

1
A12 48 A32 A43 5951 A61 A73 2 A92 A101 … A121 22 A143 A152 1 A173 1 A191 A201 2

2
A14 12 A34 A46 2096 A61 A74 2 A93 A101 … A121 49 A143 A152 1 A172 2 A191 A201 1

3
A11 42 A32 A42 7882 A61 A74 2 A93 A103 … A122 45 A143 A153 1 A173 2 A191 A201 1

4
A11 24 A33 A40 4870 A61 A73 3 A93 A101 … A124 53 A143 A153 2 A173 2 A191 A201 2

5 rows × 21 columns
data.Balance.unique()11
array([『A11』, 『A12』, 『A14』, 『A13』], dtype=object)data.count()11
Balance 1000 Duration 1000 History 1000 Purpose 1000 Credit amount 1000 Savings 1000 Employment 1000 instPercent 1000 sexMarried 1000 Guarantors 1000 Residence ration 1000 Assets 1000 Age 1000 concCredit 1000 Apartment 1000 Credits 1000 Occupation 1000 Dependents 1000 hasPhone 1000 Foreign 1000 lable 1000 dtype: int64#部分變數描述性統計分析
data.describe()1212

Duration
Credit amount
instPercent
Residence ration
Age
Credits
Dependents
lable

count
1000.000000 1000.000000 1000.000000 1000.000000 1000.000000 1000.000000 1000.000000 1000.000000

mean
20.903000 3271.258000 2.973000 2.845000 35.546000 1.407000 1.155000 1.300000

std
12.058814 2822.736876 1.118715 1.103718 11.375469 0.577654 0.362086 0.458487

min
4.000000 250.000000 1.000000 1.000000 19.000000 1.000000 1.000000 1.000000

25%
12.000000 1365.500000 2.000000 2.000000 27.000000 1.000000 1.000000 1.000000

50%
18.000000 2319.500000 3.000000 3.000000 33.000000 1.000000 1.000000 1.000000

75%
24.000000 3972.250000 4.000000 4.000000 42.000000 2.000000 1.000000 2.000000

max
72.000000 18424.000000 4.000000 4.000000 75.000000 4.000000 2.000000 2.000000

data.Duration.unique()11
array([ 6, 48, 12, 42, 24, 36, 30, 15, 9, 10, 7, 60, 18, 45, 11, 27, 8, 54, 20, 14, 33, 21, 16, 4, 47, 13, 22, 39, 28, 5, 26, 72, 40], dtype=int64)data.History.unique()11
array([『A34』, 『A32』, 『A33』, 『A30』, 『A31』], dtype=object)data.groupby('Balance').size().order(ascending=False)11
c:\python27\lib\site-packages\ipykernel\__main__.py:1: FutureWarning: order is deprecated, use sort_values(…) if __name__ == 『__main__』: Balance A14 394 A11 274 A12 269 A13 63 dtype: int64data.groupby('Purpose').size().order(ascending=False)11
c:\python27\lib\site-packages\ipykernel\__main__.py:1: FutureWarning: order is deprecated, use sort_values(…) if __name__ == 『__main__』: Purpose A43 280 A40 234 A42 181 A41 103 A49 97 A46 50 A45 22 A44 12 A410 12 A48 9 dtype: int64data.groupby('Apartment').size().order(ascending=False)11
c:\python27\lib\site-packages\ipykernel\__main__.py:1: FutureWarning: order is deprecated, use sort_values(…) if __name__ == 『__main__』: Apartment A152 713 A151 179 A153 108 dtype: int64import matplotlib.pyplot as plt
%matplotlib inline
data.plot(x='lable', y='Age', kind='scatter',
alpha=0.02, s=50);12341234
![png](output_13_0.png)data.hist('Age', bins=15);11
![png](output_14_0.png)target=data.lable11
features_data=data.drop('lable',axis=1)11
numeric_features = [c for c in features_data if features_data[c].dtype.kind in ('i', 'f')] # 提取數值類型為整數或浮點數的變數11
numeric_features11
[『Duration』, 『Credit amount』, 『instPercent』, 『Residence ration』, 『Age』, 『Credits』, 『Dependents』]numeric_data = features_data[numeric_features]11
numeric_data.head()11

Duration
Credit amount
instPercent
Residence ration
Age
Credits
Dependents

0
6 1169 4 4 67 2 1

1
48 5951 2 2 22 1 1

2
12 2096 2 3 49 1 2

3
42 7882 2 4 45 1 2

4
24 4870 3 4 53 2 2

categorical_data = features_data.drop(numeric_features, axis=1)11
categorical_data.head()11

Balance
History
Purpose
Savings
Employment
sexMarried
Guarantors
Assets
concCredit
Apartment
Occupation
hasPhone
Foreign

0
A11 A34 A43 A65 A75 A93 A101 A121 A143 A152 A173 A192 A201

1
A12 A32 A43 A61 A73 A92 A101 A121 A143 A152 A173 A191 A201

2
A14 A34 A46 A61 A74 A93 A101 A121 A143 A152 A172 A191 A201

3
A11 A32 A42 A61 A74 A93 A103 A122 A143 A153 A173 A191 A201

4
A11 A33 A40 A61 A73 A93 A101 A124 A143 A153 A173 A191 A201

categorical_data_encoded = categorical_data.apply(lambda x: pd.factorize(x)[0]) # pd.factorize即可將分類變數轉換為數值表示
# apply運算將轉換函數應用到每一個變數維度
categorical_data_encoded.head(5)123123

Balance
History
Purpose
Savings
Employment
sexMarried
Guarantors
Assets
concCredit
Apartment
Occupation
hasPhone
Foreign

0
0 0 0 0 0 0 0 0 0 0 0 0 0

1
1 1 0 1 1 1 0 0 0 0 0 1 0

2
2 0 1 1 2 0 0 0 0 0 1 1 0

3
0 1 2 1 2 0 1 1 0 1 0 1 0

4
0 2 3 1 1 0 0 2 0 1 0 1 0

features = pd.concat([numeric_data, categorical_data_encoded], axis=1)#進行數據的合並
features.head()
# 此處也可以選用one-hot編碼來表示分類變數,相應的程序如下:
# features = pd.get_mmies(features_data)
# features.head()1234512345

Duration
Credit amount
instPercent
Residence ration
Age
Credits
Dependents
Balance
History
Purpose
Savings
Employment
sexMarried
Guarantors
Assets
concCredit
Apartment
Occupation
hasPhone
Foreign

0
6 1169 4 4 67 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0

1
48 5951 2 2 22 1 1 1 1 0 1 1 1 0 0 0 0 0 1 0

2
12 2096 2 3 49 1 2 2 0 1 1 2 0 0 0 0 0 1 1 0

3
42 7882 2 4 45 1 2 0 1 2 1 2 0 1 1 0 1 0 1 0

4
24 4870 3 4 53 2 2 0 2 3 1 1 0 0 2 0 1 0 1 0

X = features.values.astype(np.float32) # 轉換數據類型
y = (target.values == 1).astype(np.int32) # 1:good,2:bad1212
from sklearn.cross_validation import train_test_split # sklearn庫中train_test_split函數可實現該劃分

X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=0) # 參數test_size設置訓練集佔比
1234512345
from sklearn.tree import DecisionTreeClassifier
from sklearn.cross_validation import cross_val_score

clf = DecisionTreeClassifier(max_depth=8) # 參數max_depth設置樹最大深度

# 交叉驗證,評價分類器性能,此處選擇的評分標準是ROC曲線下的AUC值,對應AUC更大的分類器效果更好
scores = cross_val_score(clf, X_train, y_train, cv=3, scoring='roc_auc')
print("ROC AUC Decision Tree: {:.4f} +/-{:.4f}".format(
np.mean(scores), np.std(scores)))123456789123456789
ROC AUC Decision Tree: 0.6866 +/-0.0105

#利用learning curve,以樣本數為橫坐標,訓練和交叉驗證集上的評分為縱坐標,對不同深度的決策樹進行對比(判斷是否存在過擬合或欠擬合)
from sklearn.learning_curve import learning_curve

def plot_learning_curve(estimator, X, y, ylim=(0, 1.1), cv=3,
n_jobs=1, train_sizes=np.linspace(.1, 1.0, 5),
scoring=None):
plt.title("Learning curves for %s" % type(estimator).__name__)
plt.ylim(*ylim); plt.grid()
plt.xlabel("Training examples")
plt.ylabel("Score")
train_sizes, train_scores, validation_scores = learning_curve(
estimator, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes,
scoring=scoring)
train_scores_mean = np.mean(train_scores, axis=1)
validation_scores_mean = np.mean(validation_scores, axis=1)

plt.plot(train_sizes, train_scores_mean, 'o-', color="r",
label="Training score")
plt.plot(train_sizes, validation_scores_mean, 'o-', color="g",
label="Cross-validation score")
plt.legend(loc="best")
print("Best validation score: {:.4f}".format(validation_scores_mean[-1]))2223
clf = DecisionTreeClassifier(max_depth=None)
plot_learning_curve(clf, X_train, y_train, scoring='roc_auc')
# 可以注意到訓練數據和交叉驗證數據的得分有很大的差距,意味著可能過度擬合訓練數據了123123
Best validation score: 0.6310

clf = DecisionTreeClassifier(max_depth=10)
plot_learning_curve(clf, X_train, y_train, scoring='roc_auc')1212
Best validation score: 0.6565

clf = DecisionTreeClassifier(max_depth=8)
plot_learning_curve(clf, X_train, y_train, scoring='roc_auc')1212
Best validation score: 0.6762

clf = DecisionTreeClassifier(max_depth=5)
plot_learning_curve(clf, X_train, y_train, scoring='roc_auc')1212
Best validation score: 0.7219

clf = DecisionTreeClassifier(max_depth=4)
plot_learning_curve(clf, X_train, y_train, scoring='roc_auc')1212
Best validation score: 0.7226

Ⅸ python 決策樹可以調什麼參數

調用這個包:
sklearn.treesklearn(scikit-learn)可以去下載,解壓後放入C:Python27Libsite-packages直接使用。需要用同樣的方法額外下載numpy和scipy包,不然會報錯。


例子:

fromsklearn.datasetsimportload_iris
fromsklearn.model_selectionimportcross_val_score
fromsklearn.
clf=DecisionTreeClassifier(random_state=0)
iris=load_iris()
cross_val_score(clf,iris.data,iris.target,cv=10)

Ⅹ 如何將python生成的決策樹利用graphviz畫出來

#這里有一個示例,你可以看一下。
#http://scikit-learn.org/stable/moles/tree.html
>>>fromIPython.displayimportImage
>>>dot_data=tree.export_graphviz(clf,out_file=None,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True,rounded=True,
special_characters=True)
>>>graph=pydotplus.graph_from_dot_data(dot_data)
>>>Image(graph.create_png())

閱讀全文

與python決策樹iris相關的資料

熱點內容
安卓qq郵箱格式怎麼寫 瀏覽:429
如何電信租用伺服器嗎 瀏覽:188
編程中計算根號的思維 瀏覽:181
可愛的程序員16集背景音樂 瀏覽:446
軟體代碼內容轉換加密 瀏覽:795
什麼app看電視不要錢的 瀏覽:16
烏班圖怎麼安裝c語言編譯器 瀏覽:278
plc通訊塊編程 瀏覽:923
我的世界伺服器怎麼清地皮 瀏覽:421
ftp伺服器如何批量改名 瀏覽:314
網易我的世界伺服器成員如何傳送 瀏覽:268
公司雲伺服器遠程訪問 瀏覽:633
法哲學pdf 瀏覽:637
清大閱讀app是什麼 瀏覽:447
怎麼用qq瀏覽器整體解壓文件 瀏覽:585
肺組織壓縮15 瀏覽:270
安卓手機為什麼換電話卡沒反應 瀏覽:797
諸子集成pdf 瀏覽:339
php注冊框代碼 瀏覽:717
手機加密好還是不加好好 瀏覽:815