dirsearch.py

One of the first steps when pentesting a website should be scanning for hidden directories. It is essential for finding valuable information or potential attack vectors that might otherwise be unseen on the public-facing site. There are many tools out there that will perform the brute-forcing process, but not all are created equally.

Dirsearch is a tool written in Python used to brute-force hidden web directories and files. It can run on WindowsLinux, and macOS, and it offers a simple, yet powerful command-line interface. With features such as multithreading, proxy support, request delaying, user agent randomization, and support for multiple extensions, dirsearch is a strong contender in the directory scanner arena.

DirBuster is often thought of as the de facto brute-force scanner, but it is written in Java and only offers a GUI, which can make it sort of clunky. Dirsearch is command-line only, and having been written in Python makes it easier to integrate into scripts and other existing projects. DIRB is another popular directory scanner, but it lacks multithreading, making dirsearch the clear winner when it comes to speed.

Dirsearch shines when it comes to recursive scanning. So for every directory it finds, it will go back through and crawl that directory for any additional directories. Recursive scanning, along with its speed and simple command-line usage, make dirsearch a powerful tool that every hacker and pentester should know how to use.

Below, we will be using DVWA on Metasploitable 2 as the target, and Kali Linux as our local machine. You can use a similar setup if you want to follow along.

Installing Dirsearch

The first thing we need to do is install dirsearch from GitHub. The easiest way to do this is with git. So if it’s not already installed on your system, do so with the following command in the terminal:

~# apt-get update && apt-get install git

Now we can use the git clone command to clone the directory where the tool is located:

~# git clone https://github.com/maurosoria/dirsearch

Cloning into 'dirsearch'...
remote: Enumerating objects: 26, done.
remote: Counting objects: 100% (26/26), done.
remote: Compressing objects: 100% (22/22), done.
remote: Total 1661 (delta 7), reused 13 (delta 4), pack-reused 1635
Receiving objects: 100% (1661/1661), 17.70 MiB | 7.04 MiB/s, done.
Resolving deltas: 100% (954/954), done.

Next, change into the newly created directory with the cd command:

~# cd dirsearch/

And use ls to verify everything is there:

~/dirsearch# ls

CHANGELOG.md  db  default.conf  dirsearch.py  lib  logs  README.md  reports  thirdparty

Configuring Dirsearch

With the installation out of the way, we can now run dirsearch, and we can do so in a few different ways.

1Run Dirsearch Using Python

The first is to simply run it with Python, although it needs Python 3 to work correctly. We can see below that it gives us a brief usage example, telling us we need to specify a valid URL (we’ll get to that soon).

~/dirsearch# python3 dirsearch.py

URL target is missing, try using -u <url>

2Run Dirsearch Using Bash

The next way we can run dirsearch is with Bash. Using ls -la will give us the permissions of everything in this directory, and we can see that the tool is executable.

~/dirsearch# ls -la

total 52
drwxr-xr-x  8 root root 4096 Jul  8 12:35 .
drwxr-xr-x 31 root root 4096 Jul  8 12:36 ..
-rw-r--r--  1 root root 1426 Jul  8 12:35 CHANGELOG.md
drwxr-xr-x  2 root root 4096 Jul  8 12:35 db
-rw-r--r--  1 root root  403 Jul  8 12:35 default.conf
-rwxr-xr-x  1 root root 1352 Jul  8 12:35 dirsearch.py
drwxr-xr-x  8 root root 4096 Jul  8 12:35 .git
-rw-r--r--  1 root root  109 Jul  8 12:35 .gitignore
drwxr-xr-x  9 root root 4096 Jul  8 12:36 lib
drwxr-xr-x  2 root root 4096 Jul  8 12:35 logs
-rw-r--r--  1 root root 1376 Jul  8 12:35 README.md
drwxr-xr-x  2 root root 4096 Jul  8 12:35 reports
drwxr-xr-x  8 root root 4096 Jul  8 12:36 thirdparty

So all we have to do to run it is use the dot-slash, which is basically the relative path to a file in the current directory:

~/dirsearch# ./dirsearch.py

URL target is missing, try using -u <url>

3Run Dirsearch Using a Symbolic Link

The last way to run dirsearch, which is my preferred method, is to create a symbolic link in the /bin directory. This will allow us to run the tool from anywhere, as opposed to only in the directory cloned from GitHub.

First, change into the /bin directory:

~/dirsearch# cd /bin/

Then, create a symbolic link to the tool using the ln -s command:

/bin# ln -s ~/dirsearch/dirsearch.py dirsearch

Here I am naming it dirsearch, so when I run dirsearch now in the terminal, the tool will be able to run from any directory. Now, let’s go back to our home directory before we go any further:

/bin# cd

Scanning with Dirsearch

Now when we type dirsearch in the terminal, we get the same usage message from before:

~# dirsearch

URL target is missing, try using -u <url>

To get a more detailed usage example and the full help menu, use the -h flag:

~# dirsearch -h

Usage: dirsearch [-u|--url] target [-e|--extensions] extensions [options]

Options:
  -h, --help            show this help message and exit

  Mandatory:
    -u URL, --url=URL   URL target
    -L URLLIST, --url-list=URLLIST
                        URL list target
    -e EXTENSIONS, --extensions=EXTENSIONS
                        Extension list separated by comma (Example: php,asp)

  Dictionary Settings:
    -w WORDLIST, --wordlist=WORDLIST
    -l, --lowercase
    -f, --force-extensions
                        Force extensions for every wordlist entry (like in
                        DirBuster)

  General Settings:
    -s DELAY, --delay=DELAY
                        Delay between requests (float number)
    -r, --recursive     Bruteforce recursively
    -R RECURSIVE_LEVEL_MAX, --recursive-level-max=RECURSIVE_LEVEL_MAX
                        Max recursion level (subdirs) (Default: 1 [only
                        rootdir + 1 dir])
    --suppress-empty, --suppress-empty
    --scan-subdir=SCANSUBDIRS, --scan-subdirs=SCANSUBDIRS
                        Scan subdirectories of the given -u|--url (separated
                        by comma)
    --exclude-subdir=EXCLUDESUBDIRS, --exclude-subdirs=EXCLUDESUBDIRS
                        Exclude the following subdirectories during recursive
                        scan (separated by comma)
    -t THREADSCOUNT, --threads=THREADSCOUNT
                        Number of Threads
    -x EXCLUDESTATUSCODES, --exclude-status=EXCLUDESTATUSCODES
                        Exclude status code, separated by comma (example: 301,
                        500)
    -c COOKIE, --cookie=COOKIE
    --ua=USERAGENT, --user-agent=USERAGENT
    -F, --follow-redirects
    -H HEADERS, --header=HEADERS
                        Headers to add (example: --header "Referer:
                        example.com" --header "User-Agent: IE"
    --random-agents, --random-user-agents

  Connection Settings:
    --timeout=TIMEOUT   Connection timeout
    --ip=IP             Resolve name to IP address
    --proxy=HTTPPROXY, --http-proxy=HTTPPROXY
                        Http Proxy (example: localhost:8080
    --http-method=HTTPMETHOD
                        Method to use, default: GET, possible also: HEAD;POST
    --max-retries=MAXRETRIES
    -b, --request-by-hostname
                        By default dirsearch will request by IP for speed.
                        This forces requests by hostname

  Reports:
    --simple-report=SIMPLEOUTPUTFILE
                        Only found paths
    --plain-text-report=PLAINTEXTOUTPUTFILE
                        Found paths with status codes
    --json-report=JSONOUTPUTFILE

We can see that this tool has a ton of options and potential configuration settings, but in this tutorial, we will focus on some of the more important ones.

At a minimum, dirsearch needs a URL and at least one file extension to run. For example, we can specify a valid URL with the -u flag, and a file extension to search for with the -e flag:

~# dirsearch -u http://10.10.0.50/dvwa -e php

 _|. _ _  _  _  _ _|_    v0.3.8
(_||| _) (/_(_|| (_| )

Extensions: php | HTTP method: get | Threads: 10 | Wordlist size: 6009

Error Log: /root/dirsearch/logs/errors-19-07-08_12-51-20.log

Target: http://10.10.0.50/dvwa

[12:51:20] Starting:
[12:51:21] 403 -  299B  - /dvwa/.ht_wsr.txt
[12:51:21] 403 -  292B  - /dvwa/.hta
[12:51:21] 403 -  301B  - /dvwa/.htaccess-dev
[12:51:21] 403 -  303B  - /dvwa/.htaccess-marco
[12:51:21] 403 -  303B  - /dvwa/.htaccess-local
[12:51:21] 403 -  301B  - /dvwa/.htaccess.BAK
[12:51:21] 403 -  302B  - /dvwa/.htaccess.bak1
[12:51:21] 403 -  301B  - /dvwa/.htaccess.old
[12:51:21] 403 -  302B  - /dvwa/.htaccess.save
[12:51:21] 403 -  304B  - /dvwa/.htaccess.sample
[12:51:21] 403 -  302B  - /dvwa/.htaccess.orig
[12:51:21] 403 -  303B  - /dvwa/.htaccess_extra
[12:51:21] 403 -  300B  - /dvwa/.htaccess_sc
[12:51:21] 403 -  300B  - /dvwa/.htaccessBAK
[12:51:21] 403 -  300B  - /dvwa/.htaccessOLD
[12:51:21] 403 -  298B  - /dvwa/.htaccess~
[12:51:21] 403 -  296B  - /dvwa/.htgroup
[12:51:21] 403 -  302B  - /dvwa/.htaccess_orig
[12:51:21] 403 -  301B  - /dvwa/.htpasswd-old
[12:51:21] 403 -  301B  - /dvwa/.htaccess.txt
[12:51:21] 403 -  298B  - /dvwa/.htpasswds
[12:51:21] 403 -  301B  - /dvwa/.htaccessOLD2
[12:51:21] 403 -  302B  - /dvwa/.htpasswd_test
[12:51:21] 403 -  296B  - /dvwa/.htusers
[12:51:26] 302 -    0B  - /dvwa/about.php  ->  login.php
[12:51:26] 302 -    0B  - /dvwa/about  ->  login.php
[12:51:29] 200 -    5KB - /dvwa/CHANGELOG.txt
[12:51:29] 200 -    5KB - /dvwa/CHANGELOG
[12:51:30] 301 -  319B  - /dvwa/config  ->  http://10.10.0.50/dvwa/config/
[12:51:30] 200 -  907B  - /dvwa/config/
[12:51:30] 200 -   32KB - /dvwa/COPYING
[12:51:31] 301 -  317B  - /dvwa/docs  ->  http://10.10.0.50/dvwa/docs/
[12:51:31] 200 -  918B  - /dvwa/docs/
[12:51:32] 200 -    1KB - /dvwa/dvwa/
[12:51:32] 200 -    1KB - /dvwa/favicon.ico
[12:51:36] 302 -    0B  - /dvwa/ids_log.php  ->  login.php
[12:51:36] 302 -    0B  - /dvwa/index  ->  login.php
[12:51:36] 302 -    0B  - /dvwa/index.php  ->  login.php
[12:51:36] 302 -    0B  - /dvwa/index.php/login/  ->  login.php
[12:51:37] 200 -    1KB - /dvwa/login
[12:51:37] 200 -    1KB - /dvwa/login/cpanel.php
[12:51:38] 200 -    1KB - /dvwa/login.php
[12:51:38] 200 -    1KB - /dvwa/login/
[12:51:38] 200 -    1KB - /dvwa/login/administrator/
[12:51:38] 200 -    1KB - /dvwa/login/admin/
[12:51:38] 200 -    1KB - /dvwa/login/admin/admin.asp
[12:51:38] 200 -    1KB - /dvwa/login/cpanel/
[12:51:38] 200 -    1KB - /dvwa/login/login
[12:51:39] 200 -    1KB - /dvwa/login/index
[12:51:40] 200 -    1KB - /dvwa/login/super
[12:51:40] 302 -    0B  - /dvwa/logout  ->  login.php
[12:51:40] 302 -    0B  - /dvwa/logout/  ->  login.php
[12:51:41] 200 -  148B  - /dvwa/php.ini
[12:51:41] 200 -    1KB - /dvwa/login/oauth/
[12:51:42] 200 -    5KB - /dvwa/README
[12:51:42] 200 -    5KB - /dvwa/README.txt
[12:51:43] 200 -   26B  - /dvwa/robots.txt
[12:51:43] 302 -    0B  - /dvwa/phpinfo  ->  login.php
[12:51:44] 302 -    0B  - /dvwa/phpinfo.php  ->  login.php
[12:51:45] 302 -    0B  - /dvwa/security  ->  login.php
[12:51:46] 302 -    0B  - /dvwa/security/  ->  login.php
[12:51:46] 200 -    3KB - /dvwa/setup
[12:51:46] 200 -    3KB - /dvwa/setup.php
[12:51:46] 200 -    3KB - /dvwa/setup/

Task Completed

After it kicks off, it gives us information about the extensions, HTTP methods in use, number of threads, and size of the current wordlist (in this case just the default). Then, it starts to crawl the directories and returns what it finds, including the status code, size, and directory name.

We can use the -x flag to exclude certain HTTP status codes. For example, let’s leave out any 403 codes:

~# dirsearch -u http://10.10.0.50/dvwa -e php -x 403

 _|. _ _  _  _  _ _|_    v0.3.8
(_||| _) (/_(_|| (_| )

Extensions: php | HTTP method: get | Threads: 10 | Wordlist size: 6009

Error Log: /root/dirsearch/logs/errors-19-07-08_12-53-21.log

Target: http://10.10.0.50/dvwa

[12:53:21] Starting:
[12:53:27] 302 -    0B  - /dvwa/about  ->  login.php
[12:53:27] 302 -    0B  - /dvwa/about.php  ->  login.php
[12:53:29] 200 -    5KB - /dvwa/CHANGELOG
[12:53:29] 200 -    5KB - /dvwa/CHANGELOG.txt
[12:53:30] 301 -  319B  - /dvwa/config  ->  http://10.10.0.50/dvwa/config/
[12:53:30] 200 -  907B  - /dvwa/config/
[12:53:30] 200 -   32KB - /dvwa/COPYING
[12:53:31] 301 -  317B  - /dvwa/docs  ->  http://10.10.0.50/dvwa/docs/
[12:53:31] 200 -  918B  - /dvwa/docs/
[12:53:31] 200 -    1KB - /dvwa/dvwa/
[12:53:32] 200 -    1KB - /dvwa/favicon.ico
[12:53:35] 302 -    0B  - /dvwa/index.php/login/  ->  login.php
[12:53:35] 302 -    0B  - /dvwa/ids_log.php  ->  login.php
[12:53:35] 302 -    0B  - /dvwa/index.php  ->  login.php
[12:53:35] 302 -    0B  - /dvwa/index  ->  login.php
[12:53:36] 200 -    1KB - /dvwa/login.php
[12:53:36] 200 -    1KB - /dvwa/login/admin/admin.asp
[12:53:36] 200 -    1KB - /dvwa/login
[12:53:36] 200 -    1KB - /dvwa/login/administrator/
[12:53:37] 200 -    1KB - /dvwa/login/admin/
[12:53:37] 200 -    1KB - /dvwa/login/cpanel/
[12:53:37] 200 -    1KB - /dvwa/login/
[12:53:37] 200 -    1KB - /dvwa/login/cpanel.php
[12:53:37] 200 -    1KB - /dvwa/login/login
[12:53:37] 200 -    1KB - /dvwa/login/index
[12:53:39] 200 -    1KB - /dvwa/login/super
[12:53:39] 200 -    1KB - /dvwa/login/oauth/
[12:53:39] 200 -  148B  - /dvwa/php.ini
[12:53:40] 302 -    0B  - /dvwa/logout  ->  login.php
[12:53:40] 302 -    0B  - /dvwa/logout/  ->  login.php
[12:53:41] 200 -    5KB - /dvwa/README
[12:53:41] 200 -    5KB - /dvwa/README.txt
[12:53:41] 200 -   26B  - /dvwa/robots.txt
[12:53:42] 302 -    0B  - /dvwa/phpinfo.php  ->  login.php
[12:53:43] 302 -    0B  - /dvwa/phpinfo  ->  login.php
[12:53:45] 302 -    0B  - /dvwa/security  ->  login.php
[12:53:45] 302 -    0B  - /dvwa/security/  ->  login.php
[12:53:45] 200 -    3KB - /dvwa/setup
[12:53:45] 200 -    3KB - /dvwa/setup.php
[12:53:46] 200 -    3KB - /dvwa/setup/

Task Completed

That can make the results a little cleaner, depending on what we are after. We can also specify multiple codes to exclude by separating them with commas.

We can tell dirsearch to use a wordlist of our choice by setting the -w flag:

~# dirsearch -u http://10.10.0.50/dvwa -e php -x 403,301,302 -w /usr/share/wordlists/wfuzz/general/common.txt

 _|. _ _  _  _  _ _|_    v0.3.8
(_||| _) (/_(_|| (_| )

Extensions: php | HTTP method: get | Threads: 10 | Wordlist size: 949

Error Log: /root/dirsearch/logs/errors-19-07-08_12-57-43.log

Target: http://10.10.0.50/dvwa

[12:57:43] Starting:
[12:57:47] 200 -    1KB - /dvwa/login
[12:57:47] 200 -    3KB - /dvwa/setup

Task Completed

We can see that it didn’t find as many results with this particular wordlist, which makes sense because the size is smaller.

The real power of dirsearch is its ability to perform recursive directory scanning. To run the recursive search, simply tack on the -r flag:

~# dirsearch -u http://10.10.0.50/dvwa -e php -x 403,301,302 -r

 _|. _ _  _  _  _ _|_    v0.3.8
(_||| _) (/_(_|| (_| )

Extensions: php | HTTP method: get | Threads: 10 | Wordlist size: 6009 | Recursion level: 1

Error Log: /root/dirsearch/logs/errors-19-07-08_13-00-35.log

Target: http://10.10.0.50/dvwa

[13:00:35] Starting:
[13:00:44] 200 -    5KB - /dvwa/CHANGELOG
[13:00:44] 200 -    5KB - /dvwa/CHANGELOG.txt
[13:00:44] 200 -  907B  - /dvwa/config/
[13:00:45] 200 -   32KB - /dvwa/COPYING
[13:00:45] 200 -  918B  - /dvwa/docs/
[13:00:46] 200 -    1KB - /dvwa/dvwa/
[13:00:46] 200 -    1KB - /dvwa/favicon.ico
[13:00:51] 200 -    1KB - /dvwa/login.php
[13:00:51] 200 -    1KB - /dvwa/login/admin/
[13:00:51] 200 -    1KB - /dvwa/login
[13:00:51] 200 -    1KB - /dvwa/login/administrator/
[13:00:51] 200 -    1KB - /dvwa/login/index
[13:00:51] 200 -    1KB - /dvwa/login/cpanel/
[13:00:52] 200 -    1KB - /dvwa/login/
[13:00:52] 200 -    1KB - /dvwa/login/admin/admin.asp
[13:00:52] 200 -    1KB - /dvwa/login/cpanel.php
[13:00:53] 200 -    1KB - /dvwa/login/login
[13:00:54] 200 -    1KB - /dvwa/login/oauth/
[13:00:55] 200 -  148B  - /dvwa/php.ini
[13:00:55] 200 -    1KB - /dvwa/login/super
[13:00:56] 200 -    5KB - /dvwa/README
[13:00:56] 200 -    5KB - /dvwa/README.txt
[13:00:57] 200 -   26B  - /dvwa/robots.txt
[13:01:00] 200 -    3KB - /dvwa/setup.php
[13:01:00] 200 -    3KB - /dvwa/setup/
[13:01:00] 200 -    3KB - /dvwa/setup
[13:01:02] Starting: config/
[13:01:10] 200 -  576B  - /dvwa/config/config.inc.php~
[13:01:12] 200 -    0B  - /dvwa/config/config.inc
[13:01:12] 200 -    0B  - /dvwa/config/config.inc.php

Once it completes the initial scan, it will go back through and scan each directory it found recursively. For instance, we can see it start scanning the docs directory:

[13:01:23] Starting: docs/
CTRL+C detected: Pausing threads, please wait...
[e]xit / [c]ontinue / [n]ext: n
[13:01:35] Starting: dvwa/
[13:01:47] 200 -    1KB - /dvwa/dvwa/includes/
[13:01:56] Starting: login/
CTRL+C detected: Pausing threads, please wait...
[e]xit / [c]ontinue / [n]ext: e

We can also pause the scan at any time with a keyboard interrupt. Pressing e will exit the scan completely, c will continue where it left off, and n will move on to the next directory. These give us some control over the results since recursive scanning can often take quite some time.

Using it like this will only recursively search one level deep. To set the recursion level to a deeper value, use the -R flag followed by how many levels deep to go:

~# dirsearch -u http://10.10.0.50/dvwa -e php -x 403,301,302 -r -R 3

 _|. _ _  _  _  _ _|_    v0.3.8
(_||| _) (/_(_|| (_| )

Extensions: php | HTTP method: get | Threads: 10 | Wordlist size: 6009 | Recursion level: 3

Error Log: /root/dirsearch/logs/errors-19-07-08_13-04-30.log

Target: http://10.10.0.50/dvwa

[13:04:31] Starting:
[13:04:39] 200 -    5KB - /dvwa/CHANGELOG
[13:04:39] 200 -    5KB - /dvwa/CHANGELOG.txt
[13:04:40] 200 -  907B  - /dvwa/config/
[13:04:40] 200 -   32KB - /dvwa/COPYING
[13:04:41] 200 -  918B  - /dvwa/docs/
[13:04:41] 200 -    1KB - /dvwa/dvwa/
[13:04:41] 200 -    1KB - /dvwa/favicon.ico
[13:04:47] 200 -    1KB - /dvwa/login.php
[13:04:47] 200 -    1KB - /dvwa/login/cpanel.php
[13:04:47] 200 -    1KB - /dvwa/login
[13:04:47] 200 -    1KB - /dvwa/login/administrator/
[13:04:47] 200 -    1KB - /dvwa/login/admin/
[13:04:47] 200 -    1KB - /dvwa/login/
[13:04:47] 200 -    1KB - /dvwa/login/cpanel/
[13:04:47] 200 -    1KB - /dvwa/login/admin/admin.asp
[13:04:47] 200 -    1KB - /dvwa/login/index
[13:04:48] 200 -    1KB - /dvwa/login/login
[13:04:50] 200 -  148B  - /dvwa/php.ini
[13:04:50] 200 -    1KB - /dvwa/login/super
[13:04:50] 200 -    1KB - /dvwa/login/oauth/
[13:04:52] 200 -    5KB - /dvwa/README
[13:04:52] 200 -    5KB - /dvwa/README.txt
[13:04:52] 200 -   26B  - /dvwa/robots.txt
[13:04:55] 200 -    3KB - /dvwa/setup
[13:04:55] 200 -    3KB - /dvwa/setup.php
[13:04:56] 200 -    3KB - /dvwa/setup/
[13:04:57] Starting: config/
[13:05:06] 200 -  576B  - /dvwa/config/config.inc.php~
[13:05:08] 200 -    0B  - /dvwa/config/config.inc.php
[13:05:08] 200 -    0B  - /dvwa/config/config.inc
[13:05:18] Starting: docs/
[13:05:39] Starting: dvwa/
[13:05:51] 200 -    1KB - /dvwa/dvwa/includes/

We can see down the line, for example, that dirsearch starts scanning the buried includes directory:

[13:07:24] Starting: dvwa/includes/

Task Completed

Wrapping Up

Today, we learned about dirsearch, a powerful brute-force web directory scanner, and some of the advantages it has over other similar tools. We installed dirsearch on our system and set up a symbolic link to allow us to run it from anywhere. We then went over some basic usage examples and showcased the power of the tool’s recursive scanning function. In the end, dirsearch makes it easy to discover hidden directories and files when scanning a website.