[A Way In…]

Their perimeter was tighter than a clam with lockjaw. Sal knew she could crack it, with enough time, but time was in scarce supply. Her employer needed results, and fast. Sweat glistened on her forehead as she sat in her car outside the data center, wracking her brain to think of a way in.

Breathe, she told herself. Think this through. There's always a way.

A man stepped out of the building, slinging a messenger bag over his shoulder as he strolled toward the parking lot.

Nice bag, she thought. Black leather. Shiny silver clasp. I'd love a bag like that. Carry my laptop in style.

She glanced at her ratty old backpack with distaste, but the look suddenly faded as realization struck her: He takes his laptop home with him. And he's likely not the only one. In fact, there's a good chance plenty of employees work from home… Not such a rare thing these days. And nobody's home network is locked so tight as this place. Not even mine, she mused.

New game plan. Send a phishing email to the whole company after business hours. Snag some IPs and OS information. Gain access to their laptop while they're at home, gank their VPN credentials, maybe drop in a backdoor. Easy peasy.

Time to get to work.


0: Introduction

Whether you're a hacker exploiting vulnerable home networks to pivot into secure corporate data centers, a government agency running a sting operation on a terrorist organization, or just a curious admin interested in learning more about your website's visitors, being able to collect data from incoming HTTP requests is a useful skill to have. Hackers have used this sill for data extraction in XSS and phishing attacks. File hosts have used this skill to track which IPs download which files. Web developers have used this skill to custom-tailor their sites for specific browsers and operating systems. And today, I'm going to show you how you can employ this technique using Flask in Python 3.

Requirements

  • Python 3.5 or later
  • Flask
  • cheroot (WSGI-compliant python web server module)

Set Up

Before we start development, let's create our virtual environment. In the console of your choice, navigate to the directory in which you wish to craft your code, then run the following:

python3 -m virtualenv venv

If that doesn't work, make sure you've got virtualenv installed. You can also try the following command:

virtualenv --python=python3 venv

This will create a virtual environment (virtualenv) in the venv folder. To activate the virtualenv on Windows, type:

venv\Scripts\activate.bat

If you're using OSX or Linux, instead type:

source venv/bin/activate

Once you've activated the virtualenv, you should see (venv) preceding your command prompt. Any time you're going to work on your Python source code in this project, you'll want to activate your virtualenv. To deactivate it, you can simply type deactivate on OSX or Linux, or use the venv\Scripts\deactivate.bat command in Windows. For now, though, let's leave it active.

The next step is to install the required Python libraries. To do this, use the following command:

pip install flask cheroot

This command will install the Flask and cheroot libraries into your active virtualenv. Once this process is complete, we are ready to begin development!

(You can learn more about Python virtual environments from the official Python documentation.)


1: Create a simple Flask application.

Flask is a powerful and robust library for creating Python microservices. For the scope of this tutorial, we'll only scratch the surface of this library, but you should definitely take a closer look if you're interested in using Python for web development.

To get started, let's create a simple Flask application. We'll call it flask_data_grabber.py:

from flask import Flask

app = Flask(__name__)

@app.route("/")
def grab_data():
    return "Hello, world!", 200

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=5000)

Basically, this script says that when a visitor loads the root directory of this web server, the script will return the words Hello, world! with an HTTP status code of 200 (success). (For a more robust explanation, please read the Flask documentation.) To execute this script, simply type the following in your activated virtualenv command-line:

python flask_data_grabber.py

You'll see output like the following:

(venv) user@computer $ python flask_data_grabber.py
* Serving Flask app "flask_data_grabber" (lazy loading)
* Environment: production
  WARNING: Do not use the development server in a production environment.
  Use a production WSGI server instead.
* Debug mode: off
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

To test this server, you can visit http://127.0.0.1:5000/, where you'll see Hello, world! on an otherwise blank page. If this is not the case, please revisit the previous steps and make sure you didn't miss anything. If you need further help, check the Flask documentation or Stack Overflow for assistance.

To stop your server, simply press CTRL-C.


2: Accessing the HTTP request data.

Now that we've got our Flask server up and running, we can get to work on extracting valuable visitor data from the HTTP request. First, we'll use the flask request module to access the HTTP request data. The data we're looking for is stored in the request.environ dictionary. There are two primary entries in this dictionary that interest us: HTTP_USER_AGENT and REMOTE_ADDR. The former contains information about the user's browser and operating system, and the latter contains their IP address. Let's see how we can access this data by modifying flask_data_grabber.py:

from flask import Flask, request

app = Flask(__name__)

@app.route("/")
def grab_data():
    user_agent = request.environ["HTTP_USER_AGENT"]
    ip = request.environ["REMOTE_ADDR"]
    print(" ! Connection logged:\n"
          "    User Agent: {}\n"
          "    IP: {}".format(user_agent, ip))
    return "Hello, world!", 200

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=5000)

We've made a few modifications to this script. First, we added the request module to our imports. Next, we defined the user_agent and ip variables, pulling fro the request.environ dictionary. Finally, we've added a print statement that tells us all about our visitor.

When we run this updated code, the visitors to the page will see no difference from the previous version of the site. However, on the server side of things, you'll find that the server logs each visitor's user agent and IP:

[...]
 ! Connection logged:
    User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36
    IP: 192.168.1.105
[...]

Looking at this information, we can see that this particular visitor came from 192.168.1.105 and is using the Chrome browser on a Mac. Now we're getting somewhere!


3: Pulling useful information from the user agent string.

So now we've got the IP and the user agent string. But the latter is awfully long… Wouldn't it be nice if we could automatically parse it to extract the useful information? Fortunately, this is easily accomplished using the werkzeug module. Here's our updated flask_data_grabber.py:

from flask import Flask, request
from werkzeug.useragents import UserAgent

app = Flask(__name__)

@app.route("/")
def grab_data():
    user_agent = UserAgent(request.environ["HTTP_USER_AGENT"])
    os = user_agent.platform
    browser = user_agent.browser
    browser_version = user_agent.version
    ip = request.environ["REMOTE_ADDR"]
    print(" ! Connection logged:\n"
          "    OS: {}\n"
          "    Browser: {} (v{})\n"
          "    IP: {}".format(os, browser, browser_version, ip))
    return "Hello, world!", 200

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=5000)

Here you can see we've added an import for the UserAgent class from the werkzeug.useragents module. We've also updated our user_agent definition to reference this class, passing the user agent string to the class as an initialization variable. Finally, we've extracted the operating system, browser, and browser version information from the new user_agent object. When this is all output by the updated print statement, you'll see something like the following when a visitor loads the page:

 ! Connection logged:
    OS: macos
    Browser: chrome (v71.0.3578.98)
    IP: 192.168.1.105

Now we can see clearly that this user is running the macos operating system (OS X) and using the chrome browser, version 71.0.3578.98. And, as always, we can see their IP address.

Information like this is quite valuable! Software developers who release applications for various platforms can use this information to provide a link to the appropriate downloads for the user's OS. Web developers can custom-tailor their website to account for the user's OS and browser. And hackers, of course, can use this information to learn more about their targets, or even to custom-tailor attack code on the fly. For example, if you know that a particular version of Firefox was vulnerable to attack, you could detect whether that browser was being used, automatically choosing whether to send the attack or hold back dependent on the browser and version used by visitors.


4: Working behind a proxy.

So far we've been logging direct connections to our Flask test server. In many cases, however, it is best to deploy your applications behind a proxy server such as nginx or lighttpd. Yet putting your script behind a proxy may have some unintended consequences! Let's take a look.

Working behind nginx.

Nginx is a fantastic proxy and load balancing server designed for production use. There are fairly simple instructions online about how to set up a reverse proxy, and this guide assumes that you've set everything up to pass request headers, as detailed in those instructions.

If we were to run our script behind this proxy as-is, we'd see the following output anytime someone visited the page:

! Connection logged:
   OS: macos
   Browser: chrome (v71.0.3578.98)
   IP: 127.0.0.1

Note that the IP is listed as 127.0.0.1. This is because the user isn't connecting to the Python script directly, but is actually routing their connection through the nginx proxy. When the proxy connects to the Python script, the script records the nginx IP instead of the user's IP. Thus, the script shows 127.0.0.1, the local host IP.

If you followed the nginx reverse proxy instructions linked above, however, you'll recall that nginx passes along the original client's IP in the X-Real-IP header. To access this header, we can modify our code quite simply:

[...]
    ip = request.environ["REMOTE_ADDR"]
    if request.environ.get("HTTP_X_REAL_IP") is not None:
        ip = request.environ["HTTP_X_REAL_IP"]
[...]

As you can see, we added a couple lines after the ip declaration. These lines check to see whether the HTTP_X_REAL_IP value has been set, which correlates to the X-Real-IP header passed by nginx. If that value is set, then we set the ip to that value instead. In this manner, we are able to get the user's true IP, even if we're behind the nginx reverse proxy.

Working behind lighttpd.

Lighttpd is another great proxy and load balancing web server. Their website features instructions for setting up a reverse proxy. For our purposes in writing this tutorial, we used a fresh Debian install and installed a stock lighttpd instance, then simply made the following changes to /etc/lighttpd/lighttpd.conf:

  1. In the server.modules section, we added "mod_proxy", to the bottom of the list.
  2. At the bottom of the file, we added a new line: proxy.server = ( "" => (( "host" => "127.0.0.1", "port" => 5000 )) )

Once these changes were made, we restarted the lighttpd service, enabling the reverse proxy.

As with nginx, when a visitor tries to load your Python app through the proxy, the app will recognize the client's IP as 127.0.0.1 due to the connection being routed through the lighttpd proxy. However, much like nginx, lighttpd includes a special header which refers back to the client's real IP. By default, this header is called HTTP_X_FORWARDED_FOR. To add this to our script, we need only add two new lines:

[...]
    ip = request.environ["REMOTE_ADDR"]
    if request.environ.get("HTTP_X_REAL_IP") is not None:
        ip = request.environ["HTTP_X_REAL_IP"]
    elif request.environ.get("HTTP_X_FORWARDED_FOR") is not None:
        ip = request.environ["HTTP_X_FORWARDED_FOR"]
[...]

In this manner, we have created a scrip that will work regardless of whether it is run on its own or behind a nginx or lighttpd reverse proxy.


5: Logging to a file.

Throughout this tutorial we've been printing all of our output to the command-line. But if you're going to be harvesting user data for any length of time, you probably don't want to have to sit around with your console open and watch for updates. A better solution would be to write our data to a logfile. Let's update our code to implement this change.

First, below the line app = Flask(__name__), add the following line:

json_form = "'ip': '{}', 'os': '{}', 'browser': '{}', 'browser_version': '{}'"

Next, remove the print statement from the file and replace it with the following:

[...]
    with open("access.log", "w") as f:
        f.write(
            "{" + json_form.format(ip, os, browser, browser_version) + "}\n"
        )
[...]

With these changes made, your script will now log all connection information into a file called access.log, in JSON format.


6: Preparing the code for production.

Flask comes with its own built-in web server for testing, but it is not intended for production use. In order to prepare our server for production use, we'll be employing the cheroot library, which is part of CherryPy, a minimalist Python web framework. Cheroot provides a simple, WSGI-compliant web server from which we can run our Flask application. (For more information about WSGI, check out this Wikipedia article.)

To update the code for use with cheroot, make the following changes:

First, below the werkzeug import, add the following two lines:

from cheroot.wsgi import PathInfoDispatcher
from cheroot.wsgi import Server as WSGIServer

Next, update the last section of the script to read as follows:

if __name__ == "__main__":
    dispatcher = PathInfoDispatcher({"/": app})
    server = WSGIServer(("0.0.0.0", 5000), dispatcher)
    try:
        print(" * Server starting:\thttp://0.0.0.0:5000/")
        server.start()
    except KeyboardInterrupt:
        print(" * Halting server...")
        server.stop()

These additions will employ cheroot as the webserver instead of Flask's built-in testing server, making this application a lot more stable in production.


Conclusion

So there you have it! A complete Flask server in Python which can detect incoming connections and log their IPs, OSes, browsers and browser versions. Here is the complete code, ready for use (I've added comments for clarity):

"""Extract data from incoming HTTP requests."""

from cheroot.wsgi import PathInfoDispatcher
from cheroot.wsgi import Server as WSGIServer
from flask import Flask, request
from werkzeug.useragents import UserAgent

app = Flask(__name__)

json_form = "'ip': '{}', 'os': '{}', 'browser': '{}', 'browser_version': '{}'"


@app.route("/")
def grab_data():
    """Extract important data from incoming HTTP requests."""
    # Extract the user agent using werkzeug's UserAgent class.
    user_agent = UserAgent(request.environ["HTTP_USER_AGENT"])
    # Determine the user's operating system.
    os = user_agent.platform
    # Determine what browser they're using.
    browser = user_agent.browser
    # Determine their browser version.
    browser_version = user_agent.version
    # Collect their IP address.
    ip = request.environ["REMOTE_ADDR"]
    if request.environ.get("HTTP_X_REAL_IP") is not None:
        # If we're behind nginx, grab the user's IP from passed headers.
        ip = request.environ["HTTP_X_REAL_IP"]
    elif request.environ.get("HTTP_X_FORWARDED_FOR") is not None:
        # If we're behind lighttpd, grab the user's IP from passed headers.
        ip = request.environ["HTTP_X_FORWARDED_FOR"]
    with open("access.log", "w") as f:
        # Write the pertinent information to the access log.
        f.write(
            "{" + json_form.format(ip, os, browser, browser_version) + "}\n"
        )
    # Return the Hello, world! message to the user.
    return "Hello, world!", 200


if __name__ == "__main__":
    # Set up the cheroot dispatcher and WSGI server.
    dispatcher = PathInfoDispatcher({"/": app})
    server = WSGIServer(("0.0.0.0", 5000), dispatcher)
    try:
        # Attempt to start the webserver.
        print(" * Server starting:\thttp://0.0.0.0:5000/")
        server.start()
    except KeyboardInterrupt:
        # Stop the server cleanly.
        print(" * Halting server...")
        server.stop()

There's still plenty of room for improvement here… You might want to make the server return something other than “Hello, world!” every time it's visited, or you might want to use the information right away rather than log it to a text file. Or you might want to change it so that the script only triggers when a particular file is downloaded. Perhaps you could update the script to send you a text message alert when credentials are collected. I'll leave it to you to decide where to go next.

Happy hacking!