The World's Oldest Intern

The story of a serial change artist

Simple Intro to Crypto that my brain likes – almost infographic





Python Machine Learning with the KDD Cup 1999 Attack Data Set



Screen Shot 2012-12-04 at 3.00.49 PM

Screen Shot 2012-12-04 at 3.33.23 PM

Screen Shot 2012-12-05 at 11.34.29 AM

Screen Shot 2012-12-05 at 6.38.49 PM

Training set and testing set

Machine learning is about learning some properties of a data set and applying them to new data. This is why a common practice in machine learning to evaluate an algorithm is to split the data at hand in two sets, one that we call a training set on which we learn data properties, and one that we call a testing set, on which we test these properties.(from sklearn website)


sklearn is a Python module integrating classic machine learning algorithms in the tightly-knit world of scientific Python packages (numpyscipymatplotlib).

Fitting data

The main API implemented by scikit-learn is that of the estimator. An estimator is any object that learns from data; it may a classification, regression or clustering algorithm or a transformer that extracts/filters useful features from raw data.

Addition Reading


Page did not respond in a timely fashion – github


13 Signs that bad guys are using DNS Exfiltration to steal your data



13 Signs that bad guys are using DNS Exfiltration to steal your data

UDP 53 Indicators of Exfiltration

  • encrypted payloads
  • MD5, SHA1, SHA256 hashed subdomains
  • lots of requests to restricted domain
  • lots of requests to one domain
  • lots of requests to fast flux domains
  • plain text requests of subdomains
  • DNS replies have private addresses
  • DNS replies have single IP address
  • lots of DNS traffic going to bad guy country
  • DNS replies have patterned encoding
  • Packet size outside the normal distribution
  • Pattern of many requests to specific domains in round robin pattern
  • Spike in DNS byte count across normal traffic patterns

Packet Capture Creation

tcpdump -i en1 -w dns-file udp dst port 53

Screen Shot 2012-12-06 at 11.55.00 AM

Python DNS Data Exfiltration Tool

Data Exfiltration SME job

Ruby Exfil

C Exfil

DNS RFC (not Real Fried Chicken)

Detection Tool

DNS Exfil Tool

DNS Tunnelling


More Reading

Protect your secrets with crypto – public-private key encryption python

Screen Shot 2012-11-30 at 8.15.56 AM

Explanation of my projects at Hacker School


First project at Hacker School demonstrating my humble beginnings.

Scaling Website Code

Collection of concept ideas written in python. Like memcache, round robin, bloomfilter and hashes.

Persistent Storage

Collection of techniques and methods. Technique to commit list, dict and object storage into ZODB.

Hacker School first week toy projects

Variety of scripts created in the first week of Hacker School. Screenscraper, keylogger, iSight capture, Hacker School map, lambda, reduce brain teaser.

Learning Classes, Netflix API, Flask

Collection of scripts testing ideas about access to Netflix API, using Flask

Twisted Examples

Collection of programs created from the Twisted Framework. Finger Server, HTTP Server, UpperCase Server, Caching Proxy HTTP Server. I also contributed to Twisted and I documented the contributions.

Python Intrusion Detection POC [work in progress]

Using the KDD Cup 1999 dataset, I built an IDS anomaly detection engine for identifying 4 categories of network security attacks. This python system was built to learn about the fundamentals of python network programming, machine learning and lexing – parsing. This system is a slow python POC version of commercial systems available today written in C.

Introduction to Python Network Programming

Using ideas and concepts around Python Network Programming I built a bunch of tools in python. Network Scanners, Packet sniffers, Network Stress testing tools, DNS tools, Proxy Caching Server (later converted to Twisted), Chat Server.

Final Project – DFTP [work in progress]

Domain Name Service File Transfer Protocol (DFTP) Client and Server. Exfiltration with text files, pdf etc.

Based on

More Reading

Mind Exploding Learning of the Day – Async programming with python generators

Make your own packets and sniffer with Scapy

NMAP making noise, Scapy sniffing, EtherApe visualizing all on BT5R3

Generating ICMP traffic with Scapy, sniffing with Scapy, visualizing with EtherApe

Packets decoded by Wireshark – Notice HelloWorld data in ICMP

ICMP scanner with Scapy – 10 hosts with range(10)

Only a few weeks left. What have I done so far in 700 hours?

    1. Network Fundamentals and Socket Programming
    2. Client-side programming – making python clients
    3. RESTful API implementation – serving up data to JSON
    4. Internet Data Handling – HTML, XML, and JSON
    5. Web programming – FTP, Finger, Proxy, HTTP, DNS, and soon BitTorrent
    6. Website programming – Flask, JavaScript, d3.js and JQuery (intros)
    7. A little about thread programming. I wanna learn more about the GIL
    8. Some multiprocessing – queues, pipes, process pools, and shared memory regions
    9. Message passing and data serialization – like pickle, ZODB, select, reactor pattern and twisted
    10. Distributed programming – python streaming and Hadoop, PIG, RPC, and map-reduce
    11. Advanced I/O handling – Twisted and Tornado
    12. Generators and coroutines – chained generator for pipeline data, like log processing
    13. Decorators, list comprehensions, ordered dicts
    14. Magic functions and builtin functions
    15. Introspection with trace, pglogger, and Online Python Tutor scripts
    16. Debugging with pdb and print statements
    17. Unit testing with unittest
    18. Lexers and Parsers – PLY, pyparsing
    19. Web scraping – HTML, Text, and XML
    20. Bitarrays, bytearrays, struct and basic data structures
    21. Bloomfilters and basic algorithms
    22. Geolocation and exif data scraping
    23. Game bots, IRC bots, web scraping bots, tweet bots
    24. Password generators, brute force generators, random generators
    25. MD5, SHA1, vs. MurmurHash – cypto vs fast
    26. scikit-learn, pandas, matplotlib, ipython notebook, numba
    27. lprun, prun, static and dynamic typing, scope, classes, functions, staticmethods, timeit, logging, packaging software
    28. Abelian group or Commutative group ideas, set theory
    29. Maps, reducers, filters, super, lambdas, and python bridges
    30. NetworkX, Gephi, SQL, NOSQL, ElasticSearch, Hadoop , LAMP stack, Heroku, AWS
    31. Rudimentary memcache, round robin, in-memory data structures and database, web scaling issues
    32. Python forensic tools, Python penetration testing tools, dev tools, Python security tools
    33. Pushing code to Github daily and learning VIM, SVN, IRC, blogging, tweeting, linkedin, meetups
    34. Learning to read code and understand code
    35. Contributing to Open Source Software (OSS)
    36. Pairing, coding, presenting, giving workshops, learning, sharing one on one

Friday night hacking with Josh’s BitTorrent client

git pull upstream master