Greetings, readers! It’s been a little while since I made a post. I just wanted to give you a little heads-up about some of the projects I’m currently working on, so you know what to look for in the future!
As I’ve been working on various ideas and building new tools, I’ve come to find that a great many of them could benefit from distributed processing. I already implemented something like this with my TorSpider project a year ago (see below), but I found that other projects, such as File Roulette (see below), could also benefit from such a framework. So I’ve decided to build a stand-alone, bare-bones distributed processing framework, called DisPro.
At the moment, it’s just an idea, and I haven’t started writing the code for it yet, but as I said, I’ve already done this once before. This time, I plan to craft DisPro as a module that can be imported into new projects, kind of like Twisted or Flask. I haven’t created a true Python module before, so this will be a learning experience for me. I believe this project can help make my life easier, and if others find it useful, even better!
One of the projects that I feel could benefit from distributed processing is the FileRoulette project, which scans the uploadfiles.io website for random files. The idea came from a friend on Discord, and we’ve had a number of other users express interest and excitement over the project. At the moment, we’ve got a working Proof of Concept that can find random files, and we’ve discovered all kinds of things, including resumés, viruses, a PDF of a Playboy from 1956, password dumps, high-resolution maps of prospective urban development, and even a .zip file full of pictures of men’s dress-shoes.
So far, the project has been mostly just for entertainment, but we’ve found a few things that could be dangerous, and we’ve reported our findings to the uploadfiles.io administrators. The next step would be to clean up the code a bit, and use DisPro to allow multiple scanning nodes to communicate with a central repository. With over 3 million files on the site, and 60.4 million possible URL combinations, we don’t intend to find everything, but we might be able to catalogue a significant chunk of the website. (Don’t worry, we’re beinng mindful of the bandwidth we consume, so the impact on uploadfiles.io is minimal.)
Once we’ve incorporated distributed processing into the application, we can expand its scope to cover additional services, such as hastebin. But that’s a ways down the road.
As I was working on TorSpider last year, I kept thinking how useful it would be to have the capability to perform a port scan on discovered services. Yet with the project growing in scale and complexity, I never got around to implementing the functionality. This year, having decided to revisit the TorSpider project, I decided that one important way to improve the code would be to make it more modular, splitting off some of the functionality into smaller modules that could be imported into the project, thereby allowing the code to be cleaner and more readable.
I saw this as an opportunity to tackle the port-scanning problem. By creating a Python module that can perform port scans through SOCKS proxies, I could use that library to add port-scanning functionality to the TorSpider project without cluttering up the code, and I could enable others to use this functionality in their own projects. So I created SockScan. At the moment it is a fully functional, SOCKS-enabled, TCP full-connect port scanner, but it hasn’t yet been converted into a module for being imported into other projects. This is the next step in the project, and as with DisPro, I expect I’ll learn a lot from the task.
As mentioned above, I spent many months last year working on a project called TorSpider, which was a distributed application designed to spider and scan the Tor hidden services network, uncovering as much information as it could find. With the extensive help of an enthusiastic friend, we brought the project to a nearly-complete first release, but as the project grew in complexity, it became harder and harder for me to understand what I was working on. We began using new tools, like Flask and SQLAlchemy, that I’d never seen before, and with my overly-tight work schedule, I found it impossible to keep working on the code.
I never gave up on the project, however, and now, after a year of studying and improving my Python skills, I’m ready to jump back in and tackle the TorSpider project once more. As mentioned above, I’ve decided to take a new approach to the project, offloading a great deal of the code into smaller outside modules, such as SockScan and the DisPro framework. And I intend to take a much more considerate, deliberate approach to the system’s design this time, with significantly cleaner and more readable code.
At the moment, the re-boot of the TorSpider project is still closed-source, and it will likely remain so for a good while as I tinker away at it. However, it has always been my intention to open-source most if not all of the code, and as I work on the project I’ll be releasing blog posts explaining the various techniques I’m using in the project. So keep an eye on this space!
There’s a lot of exciting things on the horizon! I’m currently in a period of transition in my career, so for the next few weeks things will be a bit crazy, but before long I’ll be back on the road and writing code (and blog posts) more regularly.
Until next time, happy hacking!