Practical uses of bloom filter – SPAM filter and URL shortener and Weak Password Dict
by Security Dude
The process of building an URL shortener should be pretty simple. We will create a program that asks for the URL.
NOTE: A Bloom filter with just one hash function is equivalent to ordinary hashing.
- Generates a random string
- Check the bloom filter for uniqueness
- Adds the shortened address to the bloom filter
- Duplicates follow regenerate process
When you request a url, another script looks in the database for the random string, and if its found redirects you to the site. This is of course more complicated in production due to needed features like abuse prevention, URL filtering, spam prevention, URL verification, etc.
Other applications for bloom filters could be SPAM url checking, Weak Password Dictionary detection engine, network malware payload detection. Memory requirements in memory is small in comparison to storing hashes. For instance 14 million SHA1 hashes require 291 megabytes. RDS to almost 6GB. Forget about fast retrieve if you need to go across the network or to query a DB. Lookups will be limited to thousands for requests per second. Imagine instant access via memory search using Bloom filters.
I guess bloom filters can be used for streams of data. Log data. User behaviour data, or raw firehose analysis of user generated content. I have seen some research that deals with time reduction of query or the use of bloom filters on network packet analysis.