Terence Parr and Prince Grover
(Terence teaches at the University of San Francisco MS in the Data Science program and Prince is a student, you might know Terence as the creator of the ANTLR parser generator.)
Please send comments, suggestions or corrections to Terence.
Decision trees are the key building blocks for boost boost and Random Forests ™ machines, probably the two most popular machine learning models for structured data. Viewing decision trees is a huge help when learning how these models work and when interpreting models. Unfortunately, the current display packages are rudimentary and not immediately useful for beginners. For example, we could not find a library that displays how decision nodes divide the feature space. Furthermore, it is rare for libraries to support the display of a vector of specific characteristics as it entwines through the decision-making nodes of a tree; we could only find an image that shows this.
Therefore, we have created a general package for scikit: we learn the tree view of the decisions and the interpretation of the models, which we will use heavily in an upcoming machine learning book (written with Jeremy Howard). Here is an example view for a small decision tree (click to enlarge):
This article illustrates the results of this work, describes in detail the specific choices we have made for visualization and outlines the tools and techniques used in the implementation. The visualization software is part of a nascent Python learning library called animl. We assume that you are familiar with the basic mechanism of decision trees if you are interested in viewing them, but let's start with a brief summary so that we all use the same terminology. (If you are not familiar with decision trees, refer to the Introduction of MOA to Machine Learning for Fast.ai Coders.)
Review of the decision tree
A decision tree is a machine learning model based on binary trees (trees with at most one left and right child). A decision tree learns the relationship between observations in a training set, represented as features vectors X and target values y, examining and condensing training data in a binary tree of internal nodes and leaf nodes. (Notation: the vectors are in bold and the scalars are in italics.)
Each leaf in the decision tree is responsible for making a specific prediction. For regression trees, prediction is a value, like price. For classification trees, the forecast is a target category (represented as a whole in the scikit), such as cancer or non-cancer. A decision tree crops the feature space into groups of observations that share similar target values and each leaf represents one of these groups. For regression, the similarity in a leaf means a low variance between the target values and, for classification, means that most or all of the targets are of a single class.
Any path from the root of the decision tree to a specific leaf predictor passes through a series of decisional nodes (internal). Each decision node compares the value of a single function in X, XI, with a specific one division point value learned during training. For example, in a model that predicts apartment rental prices, decision nodes will test features such as the number of bedrooms and the number of bathrooms. (See Section 3 Visualization of the tree interpretation of a single observation.) Even in a classifier with discrete target values, decision nodes continue to compare numerical values feature values because the implementation of the scitkit decision tree assumes that all functions are numeric. Categorical variables must be hot-coded, categorized, labeled, etc.
To train a decision node, the model examines a subset of the training observations (or the complete set of root training). The characteristic of the node and the point of division within the space of this feature are chosen during training to divide the observations into buckets (subsets) of left and right to maximize the similarity as defined above. (This selection process is generally performed through a comprehensive comparison of features and functionality values.) The left bucket contains observations of which XI the values of the features are all lower than the split point and the right bucket has the observations of which XI it is greater than the division point. The construction of the tree proceeds recursively creating decision nodes for the left bucket and the right bucket. Construction stops when a stop criterion is reached, such as having fewer than five observations in the node.
The key elements of the decision tree view
The views of the decision tree should highlight the following important elements, which we demonstrate below.
- Decision-making node functionality with respect to the target value distributions (which we call space target-functionality in this article). We want to know how separable objective values are based on function and on a division point.
- Decision-making node function name and value divided function. We need to know which characteristic each decision node is testing and where in that space the nodes divide the observations.
- Purity of the leaf node, which influences our confidence in forecasts. Leaves with low variance between target values (regression) or an overwhelming majority of the target class (classification) are much more reliable predictors.
- Forecast value of the leaf node. What does this leaf actually expect from the collection of target values?
- Number of samples in decision nodes. Sometimes it is useful to know where all the samples are routed through the decision nodes.
- Number of samples in the leaf nodes. Our goal is a decision tree with fewer, larger and purer leaves. Nodes with too few samples are possible indications of supercharging.
- As is a specific feature vector run down the tree to a leaf. This helps to explain why a particular feature vector gets the prediction it makes. For example, in a regression tree that provides apartment rental prices, we might find a vector of characteristics routed into a leaf with a high price, due to a decision node that controls more than three bedrooms.
Decision tree view gallery
Before digging into previous views, we would like to give a little spoiler to show what is possible. This section highlights some sample visualizations we have constructed from scikit regression and decision tree classification on some data sets. You can also check the complete gallery and the code to generate all the samples.
A comparison with previous views of the state of the art
If you are looking for "viewing decision trees" you will quickly find a Python solution provided by the wonderful scikit: sklearn.tree.export_graphviz. With more work, you can find views for R and even SAS and IBM. In this section, we collect the various views of decision trees that we could find and compare them with the views made by our own animl library. Let's have a more detailed discussion of our views in the next section.
Let's start with the default scitkit view of a decision tree on the well-known Iris data set (click on the images to enlarge them).
|Default view of Iris scikit||Our animl Viewing the iris|
The scikit tree does a good job of representing the tree structure, but we have some quibbles. The colors are not the best and it is not immediately obvious why some of the nodes are colored and some are not. If the colors represent the class expected for this classifier, it would be thought that only the leaves would be colored because only the leaves have forecasts. (It turns out that the non-colored knots do not have majority predictions.) Including the gini coefficient (certainty score) costs space and does not help with the interpretation. The sample count of the various target classes in each node is somewhat useful, but a histogram would be even better. A legend of the color of the target class would be nice. Finally, using true and false, the border labels are not as clear as, for example, labels is . The most obvious difference is that our decision nodes show feature distributions as superimposed superimposed histograms, a histogram by target class. Furthermore, our leaf size is proportional to the number of samples in that leaf.
Scikit uses the same visualization approach for decision tree regressors. For example, here is the display of scikit using the Boston data set, with animlThe version for comparison (click to enlarge the images):
|Default view of Scikit Boston||Our animl View of Boston|
In the scikit tree, it is not immediately clear what the use of color implies, but after studying the image, darker images indicate higher predicted target values. As before, our decision nodes show the distribution of feature space, this time using a function with respect to a value of dispersion of the target value. The leaves use striped diagrams to show the distribution of the reference value; the leaves with more points naturally have more samples.
R programmers also have access to a package for viewing decision trees, which provides similar results to Scikit but with nicer edge labels:
SAS is IBM also provide views of the decision tree (not based on Python). Starting with SAS, we see that their decision nodes include a bar graph relative to the sample target values of the node and other details:
|SAS view||SAS view (the best image quality we could find with numeric functions)|
Indicate the size of the left and right cups by the width of the edge is a nice touch. But those bar charts are difficult to interpret because they do not have a horizontal axis. Decision nodes that test categorical variables (left image) have exactly one bar per category, so they must represent simple category counts, rather than feature distributions. For numeric functions (image on the right), SAS decision nodes show a histogram of a target space or feature space (we can not distinguish from the image). The bar graphs of the SAS node / histograms seem to illustrate only the destination values, which tells us nothing about how the feature space was subdivided.
The SAS tree on the right seems to highlight a path through the decision tree for a specific unknown feature vector, but we could not find other examples from other tools and libraries. The ability to display a specific vector performed along the tree does not seem to be generally available.
Turning to the IBM software, here is a nice visualization that also shows the counts of the categories of decisional nodes as bar charts, from the Watson analysis of IBM (on the TITANIC dataset):
IBM is precedent SPSS the product also had decision tree views:
|SPSS display||SPSS display|
These SPSS decision nodes seem to provide the same SAS-like bar graph as the sample goal class counts.
All the visualizations that we met from the main actors were useful, but we were more inspired by the amazing visualizations in A visual introduction to machine learning, which shows an (animated) decision tree like this:
This view has three unique characteristics compared to the previous work, in addition to the animation:
- decision nodes show how the feature space is divided
- the division points for decision nodes are visually shown (like a wedge) in the distribution
- the leaf size is proportional to the number of samples in that leaf
While that visualization is a coded animation for educational purposes, it points in the right direction.
Our decision tree views
In addition to the educational animation in a visual introduction to machine learning, we could not find a decision tree view package that illustrates how the function space is divided into decision nodes (target-target space). ). This is the critical operation performed during the training of the decision tree model and is what newcomers should focus on, so we will start by looking at the views of decision nodes for the classification and regression trees.
Visualization of the target-target space
The training of a decision node chooses the characteristic XI and divide the value inside XIThe range of values (function space) of the group to group samples with similar target values in two intervals. To be clear, training involves examining the relationship between characteristics and objective values. Unless the decision nodes show somehow the target-target space, the viewer can not see how and why the training has arrived at the split value. To highlight how decision nodes cut out feature space, we have trained a regressor and a classifier with a single function (AGE) (code to generate images). Here is a decision-making regressor tree trained using a single function from the Boston data, AGEand with the label ID of the node activated for discussion purposes here:
Horizontal dotted lines indicate the target average for left and right buckets in decision nodes; a vertical dotted line indicates the division point in the function space. The black wedge highlights the division point and identifies the exact division value. Leaf nodes indicate the target forecast (average) with a dotted line.
As you can see, each one AGE The feature axis uses the same range, rather than the zoom, to simplify the comparison of decision nodes. As we descend through decision nodes, the champion AGE the values are enclosed in narrower and narrower regions. For example, the AGE the feature space in node 0 is divided into the regions of AGE future space shown in nodes 1 and 8. The space of node 1 is then subdivided into the pieces shown in nodes 2 and 5. The prediction leaves are not very pure because the training of a model on a single variable leads to a poor model, but this limited example demonstrates how decision trees cut out feature space.
While the implementation of the decision tree is practically the same for decision-making trees of both classifier and regressor, the way in which we interpret them is very different, so our visualizations are distinct for the two cases. For a regressor, it is best to show the target space of the features with a dispersion graph of functionality with respect to the target. For classifiers, however, the target is a category rather than a number, so we chose to illustrate the target-target space using histograms as an indicator of feature space distributions. Here is a classification tree trained on USER KNOWLEDGE data, always with a single function (PEG) and with the nodes labeled for discussion purposes:
Ignoring the color, the histogram shows the distribution of the PEG function space. The addition of color gives us an indication of the relationship between the function space and the target class. For example, in node 0 we can see which samples with very low the target classes are grouped at the lower end of the PEG function space and samples with high the target classes are grouped in the high end. As with the regressor, the space of a left child's functions is all to the left of the parent's division point in the same function space; in the same way for the right child. For example, by combining the histograms of nodes 9 and 12 we obtain the histogram of node 8. We force the interval of the horizontal axis to be the same for all PEG decisional nodes so that decision nodes lower in the tree are clearly enclosed in narrower regions that are more and more pure.
We use a stacked histogram so that the overlap is clear in the function space between samples with different target classes. Note that the height of the Y axis of the stacked histogram is the total number of samples of all classes; more class counts are stacked one above the other.
When there are more than four or five classes, stacked histograms are difficult to read, so we recommend setting the histogram type parameter to bar not barstacked in this case. With high cardinality target categories, overlapping distributions are more difficult to visualize and things fail, so we set a limit of 10 target classes. Here is an example of a shallow tree that uses the Digit data set of 10 classes using non-stacked histograms:
It concerns the details
So far we have skipped many of the visual cues and details that we have obsessed with during the construction of the library and so we have reached the key elements here.
Our views of the categorizer tree use the node size to provide visual cues for the number of samples associated with each node. Histograms become proportionally shorter when the number of samples in the node decreases and the diameter of the leaf nodes decreases. The feature space (horizontal axis) always has the same width and the same interval for a given characteristic, making it much easier to compare the target spaces of the different nodes. The bars of all the histograms have the same width in pixels. We only use start / stop interval labels for both horizontal and vertical axes to reduce the overall dimensions.
We use a pie chart for filing leaves, despite their bad reputation. In order to indicate purity, the viewer needs only an indication if there is a single strong majority category. The viewer does not need to see the exact relationship between the pie chart elements, which is a key area where pie charts fail. The color of the majority slice of the pie chart provides the prediction of the leaf.
Turning now to the regressor trees, we make sure that the (vertical) target axis of all decision nodes has the same height and the same interval to facilitate comparison of the nodes. The function space of the Regressor (horizontal axis) always has the same width and the same interval for a given function. We set a low alpha for all the points in the scatter plot so that the increase in the density of the nominal value corresponds to a darker color.
The Regressor leaves also show the same range vertically for the target space. We use a stripe chart rather than, for example, a box plot, because the bar chart explicitly shows the distribution while implicitly showing the number of samples by the number of points. (We also write the number of samples in the text for the leaves.) The leaf prediction is the center of mass distribution (average) of the bar graph, which we highlight with a dotted line.
There are also a number of different details that we think can improve the quality of the diagrams:
- The classifiers include a legend
- All colors have been carefully selected from color-blind palettes, a palette carefully selected by number of target categories (from 2 to 10)
- We use a gray text instead of black for the text because it is easier for the eyes
- The lines are thin lines
- We draw bar profiles in bar charts and sections in pie charts
Visualization of the tree interpretation of a single observation
To understand how the training of the model arrives at a specific tree, all the action is in the subdivisions of the space of the characteristics of the decisional nodes, which we have just discussed. Now, let's take a look at how a vector of specific features produces a specific prediction. The key here is to examine the decisions made along the path from the root to the leaf predictor node.
The decision making process inside a node is simple: take the path on the left if the function XI in the test vector X it is less than the dividing point, otherwise take the right path. To highlight the decision-making process, we need to highlight the comparison operation. For decision nodes along the path to the leaf predictor node, we show an orange wedge in place XI in the space of horizontal features. This makes the comparison easy to see; if the orange wedge is to the left of the black wedge, go left or go right. The decision nodes involved in the prediction process are surrounded by dashed lines and the child's edges are thicker and orange. Here are two sample trees showing the test vectors (click on the images to expand):
|KNOWLEDGE data with test vector||Diabetes data with test vector|
The test vector X with the names and values of the functions, it is displayed below the leaf predictor node (or to the right in the left-to-right orientation). The test vector highlights the characteristics used in one or more decision nodes. When the number of functions reaches a threshold of 20 (10 for left-to-right orientation), the test vectors show no unused characteristics to avoid test vectors.
Orientation from left to right
Some users prefer the orientation from left to right rather than from top to bottom and sometimes the nature of the tree simply flows from left to right. The example functionality vectors can still be executed along the left-to-right orientation frame. Here are some examples (click on the images to enlarge them):
|Wine showing a forecast||Diabetes showing a forecast|
Simplified unprocessed layout
Evaluate the generality of a decision tree, if often it helps to get a high-level overview of the tree. This generally means examining things like the shape and size of the trees, but above all, it means looking at the leaves. We would like to know how many samples each leaf has, how pure the target values are and, in general, where most of the weight of the samples falls. Getting an overview is more difficult when the display is too large and therefore we provide a "non-fantasy" option that generates smaller views while retaining key information on the door. Here is an example classifier and a non-patterned regressor with a top-down orientation:
What we have tried and rejected
Those interested in these tree views from the design point of view may find it interesting to read what we have tried and rejected. Starting with the classifiers, we thought the histograms were a bit complex, and perhaps the kernel density estimates would provide a more accurate image. We had decision nodes that looked like this:
The problem is that decisional nodes with only one or two samples gave extremely misleading distributions:
We have also experimented with the use of bubble charts rather than histograms for the decision-making nodes of the classifier:
These seem really fantastic but, in the end, the histograms are easier to read.
Turning to regression trees, we took into account the use of box charts to show the distribution of prediction values and also a simple bar graph to show the number of samples:
This double texture for each leaf is less satisfying than the weft of the strip we are using now. The box chart also does not show the distribution of target values almost like a bar graph. Before the bar graph, we have just defined the target values using the value of the example index as a horizontal axis:
This is misleading because the horizontal axis is usually the function space. We crumpled it up into a weft of strips.
This section provides an example view for the Boston regression dataset and the wine classification data set. You can also check out the complete gallery of sample views and the code to generate samples.
View of the Boston regression tree
Here is a code snippet to load Boston data and train a regression tree with a maximum depth of three decision nodes:
boston = load_boston ()
X_train = boston.data
y_train = boston.target
testX = X_train[5,:]
regr = tree.DecisionTreeRegressor (max_depth = 3)
regr = regr.fit (X_train, y_train)
The code to display the tree involves the passage of the tree model, the training data, the names of characteristics and destination and a test vector (if desired):
viz = dtreeviz (regr, X_train, y_train, target_name = & # 39; price & # 39 ;,
feature_names = boston.feature_names,
X = testX)
viz.save ("boston.svg") # suffix determines the format of the generated image
viz.view () # pop-up to view the image
Visualization of the wine classification tree
Here is a code snippet to load Wine data and train a classifier tree with a maximum depth of three decision nodes:
clf = tree.DecisionTreeClassifier (max_depth = 3)
wine = load_wine ()
clf.fit (wine.data, wine.target)
Displaying a categorizer is equivalent to displaying a regressor, except that the view requires the names of the target classes:
viz = dtreeviz (clf, wine.data, wine.target, target_name = & # 39; wine & # 39 ;,
feature_names = wine.feature_names,
class_names = List (wine.target_names))
In the Jupyter notebooks, the object is back from dtreeviz () has a _repr_svg_ () function used by Jupyter to display the object automatically. See the example notebook.
The Jupyter notebooks at the moment, starting from September 2018, do not correctly display the SVG generated by this library. The characters etc … are all messed up:
The good news is that Github correctly displays them as JupyterLab does.
Use Image (viz.topng ()) to view (bad) in the Juypter notebook or simply to call viz.view (), which will open a window that shows things correctly.
This project was very frustrating with many programming deadings, manipulating parameters, circumventing bugs / limitations in tools and libraries, and creatively creating a set of existing tools. The only fun part was the (countless) sequence of experiments in visual design. We pushed because it seemed likely that the machine learning community would find this view as useful as us. This project represents about two months of arrangement through stackoverflow, documentation and horrible graphic programming.
At the highest level, we used matplotlib to generate images for decision and leaf nodes and combine them into a tree using the venerable graphviz. We have also extensively used HTML labels in the description of the graphviz tree for layout and character purposes. The only big headache was convincing all the components of our solution to produce high quality vector graphics.
Our first coding experiments led us to create a tree of shadows that surrounds decision trees created by scikit, so let's start with that.
Shade trees for decision trees of scikit
Decision-making trees for classifiers and regressors of scikit-learn are built for efficiency, not necessarily for the ease of walking on the tree or for extracting information about the node. We have created animl.trees.ShadowDecTree e animl.trees.ShadowDecTreeNode classes as an easy-to-use wrapper (traditional binary tree) for all tree information. Here's how to create a shadow tree from a skikit classifier or a regressor tree model:
shadow_tree = ShadowDecTree (tree_model, X_train, y_train, feature_names, class_names)
Shadow / node tree classes have many methods that could be useful for other libraries and tools that need to walk for scikit decision trees. For example, forecast () not only does it perform a vector of characteristics through the tree but it also returns the path of the nodes visited. I campioni associati a qualsiasi nodo particolare possono essere passati node_samples ().
Il linguaggio Graphviz per la struttura ad albero di punti è molto utile per ottenere layout di albero decenti se si conoscono tutti i trucchi, come far sì che i bambini di sinistra appaiano a sinistra dei bambini di destra con i bordi dei grafici nascosti interconnessi. Ad esempio, se hai due foglie, leaf4 is leaf5, che deve apparire da sinistra a destra sullo stesso livello, ecco la magia di graphviz:
LSTAT3 -> leaf4 [penwidth=0.3 color=”#444443″ label=<>]
LSTAT3 -> leaf5 [penwidth=0.3 color=”#444443″ label=<>]
rango = stesso;
leaf4 -> leaf5 [style=invis] }
Solitamente utilizziamo etichette HTML su nodi graphviz piuttosto che solo etichette di testo perché danno molto più controllo sulla visualizzazione del testo e offrono la possibilità di mostrare i dati tabulari come tabelle effettive. Ad esempio, quando si visualizza un vettore di prova lungo l'albero, il vettore di prova viene mostrato utilizzando una tabella HTML:
Per generare generare immagini da file graphviz, usiamo il graphviz pacchetto python, che finisce execing point eseguibile binario utilizzando una delle sue routine di utilità (correre()). Occasionalmente, abbiamo usato parametri leggermente diversi su point comando e quindi chiamiamo direttamente correre() così per la flessibilità:
cmd = [“dot”, “-Tpng”, “-o”, filename, dotfilename] stdout, stderr = run (cmd, capture_output = True, check = True, quiet = False)
Usiamo anche il correre() funzione per eseguire il pdf2svg (Strumento di conversione da PDF a SVG), come descritto nella prossima sezione.
Grafica vettoriale tramite SVG
Usiamo matplotlib per generare i nodi decision e leaf e, per ottenere le immagini in un'immagine graphviz / dot, usiamo le etichette graphviz HTML e quindi facciamo riferimento alle immagini generate tramite img tag come questo:
Il numero 94806 è l'ID del processo, che consente di isolare più istanze di animl in esecuzione sulla stessa macchina. Senza questo, è possibile che più processi sovrascrivano gli stessi file temporanei.
Poiché volevamo una grafica vettoriale scalabile, abbiamo provato a importare inizialmente le immagini SVG, ma non è stato possibile convincere Graphviz ad accettare tali file (né in pdf). Ci sono volute quattro ore per capire che la generazione e l'importazione di SVG erano due cose diverse e avevamo bisogno del seguente magico incantesimo su OS X usando –with-librsvg:
$ brew install graphviz –with-librsvg –with-app –with-pango
Originariamente, quando abbiamo fatto ricorso alla generazione di file PNG da matplotlib, abbiamo impostato i punti per pollice (dpi) su 450 in modo che apparissero su schermi ad alta risoluzione come l'iMac. Sfortunatamente, ciò significava che dovevamo specificare la dimensione effettiva che volevamo per l'albero generale usando una tabella HTML in graphviz usando width is height parametri su
Sfortunatamente, l'output SVG di graphviz faceva semplicemente riferimento ai file del nodo che abbiamo importato, piuttosto che incorporare le immagini del nodo all'interno dell'immagine dell'albero generale. Questa è una forma molto scomoda perché inviare una visualizzazione ad albero singolo significa inviare un file zip invece di un singolo file. Abbiamo dedicato il tempo necessario per analizzare l'XML SVG e incorporare tutte le immagini di riferimento all'interno di un singolo file meta-SVG di grandi dimensioni. Ha funzionato alla grande e ci sono state molte celebrazioni.
Quindi abbiamo notato che graphviz non gestisce correttamente il testo nelle etichette HTML durante la generazione di SVG. Ad esempio, il testo delle legende dell'albero classificatore è stato troncato e sovrapposto. Ratti.
Ciò che alla fine ha funzionato per ottenere un singolo file SVG pulito è stato prima la generazione di un file PDF da graphviz e quindi la conversione del PDF in SVG con pdf2svg (pdf2cairo sembra anche funzionare).
Poi abbiamo notato che il notebook Jupyter ha un bug in cui non mostra correttamente i file SVG (vedi sopra). Jupyter lab gestisce correttamente l'SVG come fa Github. Abbiamo aggiunto un topng () metodo in modo che gli utenti del notebook Jupyter possano usarlo Immagine (viz.topng ()) per ottenere immagini in linea. Meglio ancora, chiama viz.view (), che aprirà una finestra che visualizza correttamente le immagini.
A volte la risoluzione di un problema di programmazione riguarda meno gli algoritmi e più il lavoro all'interno dei vincoli e delle capacità dell'ecosistema di programmazione, come strumenti e librerie. Questo è sicuramente il caso con questo software di visualizzazione ad albero delle decisioni. La programmazione non è stata difficile; era più una questione di incolpare senza paura la nostra strada verso la vittoria attraverso un appropriato mashup di strumenti grafici e librerie.
La progettazione della visualizzazione effettiva richiedeva anche un numero apparentemente infinito di esperimenti e modifiche. La generazione di immagini vettoriali di alta qualità richiedeva anche una determinazione patologica e una scia di codice morto lasciato lungo il percorso tortuoso verso il successo.
Sicuramente non siamo degli appassionati di visualizzazione, ma per questo problema specifico ci siamo imbattuti in esso fino a quando non abbiamo ottenuto diagrammi efficaci. Nel seminario di Edward Tufte ho imparato che è possibile inserire un sacco di informazioni in un diagramma ricco, purché non sia un miscuglio arbitrario; l'occhio umano può risolvere molti dettagli. Abbiamo utilizzato un numero di elementi dalla tavolozza del progetto per visualizzare alberi decisionali: colore, spessore della linea, stile della linea, diversi tipi di grafici, dimensioni (area, lunghezza, altezza del grafico, …), trasparenza del colore (alfa), stili di testo (colore, carattere, grassetto, corsivo, dimensioni), annotazioni grafiche e flusso visivo. Tutti gli elementi visivi dovevano essere motivati. Ad esempio, non abbiamo usato il colore solo perché i colori sono belli. Abbiamo usato il colore per evidenziare una dimensione importante (categoria target) perché gli esseri umani individuano rapidamente e facilmente le differenze di colore. Anche le differenze di dimensioni dei nodi dovrebbero essere facilmente individuate dagli umani. (is that a kitty cat or lion?), so we used that to indicate leaf size.
The visualizations described in this document are part of the animl machine learning library, which is just getting started. We'll likely moved the rfpimp permutation importance library into animl soon. At this point, we haven't tested the visualizations on anything but OS X. We'd welcome instructions from programmers on other platforms so that we could include those installation steps in the documentation.
There are a couple of tweaks we like to do, such as bottom justifying the histograms and classifier trees so that it's easier to compare notes. Also, some of the wedge labels overlap with the axis labels. Finally, it would be interesting to see what the trees look like with incoming edge thicknesses proportional to the number of samples in that node.