Python Machine Learning with the KDD Cup 1999 Attack Data Set

by Security Dude



Screen Shot 2012-12-04 at 3.00.49 PM

Screen Shot 2012-12-04 at 3.33.23 PM

Screen Shot 2012-12-05 at 11.34.29 AM

Screen Shot 2012-12-05 at 6.38.49 PM

Training set and testing set

Machine learning is about learning some properties of a data set and applying them to new data. This is why a common practice in machine learning to evaluate an algorithm is to split the data at hand in two sets, one that we call a training set on which we learn data properties, and one that we call a testing set, on which we test these properties.(from sklearn website)


sklearn is a Python module integrating classic machine learning algorithms in the tightly-knit world of scientific Python packages (numpyscipymatplotlib).

Fitting data

The main API implemented by scikit-learn is that of the estimator. An estimator is any object that learns from data; it may a classification, regression or clustering algorithm or a transformer that extracts/filters useful features from raw data.

Addition Reading