Download a Public Website via Git Errors
Are you using Git? Do you publicly disclose .git files on your webroot? If developers clone directly into the webroot during push of an application or website, meta-data left behind by Git repo management can be abused to download all of the application’s source code files. These types of issues can be identified by browsing to http://[website]/.git/config. If this returns any information at all, its likely that your application can and has been downloaded in the past.
How
There are three kind of objects in a git repository
Blob - The actual data (e.g. sourcecode)
Tree - Grouping blobs together
Commit - A specific state of a tree with more meta information (e.g. author/date/message)
All these together are used by git under the hood to maintain the repository. However, the problem that we face is, that these objects are stored as .git/objects/[First-2-bytes]/[Last-38-bytes]
files, where [First-2-bytes][Last-38-bytes] is the SHA1-hash of the object. We need to be smart and guess/extract the filenames of all objects to completely restore the repository, because brute forcing the SHA1 keyspace isn’t a good idea as it would be too time consuming.
What helps us a lot is the fact that there are some standard files in a git repository:
HEAD
objects/info/packs
description
config
COMMIT_EDITMSG
index
packed-refs
refs/heads/master
refs/remotes/origin/HEAD
refs/stash
logs/HEAD
logs/refs/heads/master
logs/refs/remotes/origin/HEAD
info/refs
info/exclude
These files either refer an object by its hash or another file referencing an object and so on. Thus the easiest way is to start with downloading and parsing the aforementioned files. We need to parse these to continue to download the object files.
So for example, we have downloaded the refs/heads/master
file:
> cat .git/refs/heads/master
6916ae52c0b20b04569c262275d27422fc4fcd34
The reference master
points to a commit with the hash 6916ae52c0b20b04569c262275d27422fc4fcd34
. After downloading the commit-object from the server (note the url should be .git/objects/69/16ae52c0b20b04569c262275d27422fc4fcd34
), we can analyse it further:
> git cat-file -t 6916ae52c0b20b04569c262275d27422fc4fcd34
commit
This tells us, that the downloaded object is indeed a commit. Let’s get some details about it:
> git cat-file -p 6916ae52c0b20b04569c262275d27422fc4fcd34
tree fa3887a0b798346c122afdd7c5ecc605bf3c18c0
parent 9264d57c621f66208d689ef653ce8a62c3bccfae
Okay, now we know the hash of the related tree and parent object as well as some information about the author, the committer and the commit message.
We download the tree-object and analyse it:
> git cat-file -p fa3887a0b798346c122afdd7c5ecc605bf3c18c0
040000 tree 532fc6055e09e0a2d5602f4b84c0dbadce1b5f3e Dumper
040000 tree 077ce769dedcf19d0f063246256e8ae0394fd8df Extractor
040000 tree d6e1bd4677a256e760cce5ddaa7db7ea6f9a8900 Finder
100644 blob 9670cf17dfeec351c395493058044b9f9dadbe2a README.md
This tells us which files are stored in that tree. Note that Dumper
, Extractor
and Finder
are also trees (directories). The final step is to download the README.md blob object and cat its content:
> git cat-file -p 9670cf17dfeec351c395493058044b9f9dadbe2a
Git Tools
=============
[...]
We need to take special care of packed files. We can find a list of all packs in .git/objects/info/packs
> cat .git/objects/info/packs
P pack-e38660e6be24bb79d8d929ddea3d194e0dd3cd13.pack
The appropriate pack file is stored in .git/objects/pack/
:
> /usr/bin/ls .git/objects/pack/
pack-e38660e6be24bb79d8d929ddea3d194e0dd3cd13.idx
pack-e38660e6be24bb79d8d929ddea3d194e0dd3cd13.pack
In that case, we need to download both files and then run the following command to extract the packed data:
> git unpack-objects -r < .git/objects/pack/pack-e38660e6be24bb79d8d929ddea3d194e0dd3cd13.pack
Unpacking objects: 100% (15/15), done.
As you can see, by doing this procedure recursively and for every possible hash, which we find in the already downloaded files, we can slowly restore the repository and extract the contents.
Sometimes downloading a specific object will fail, leaving us with an incomplete repository. In that case, we can use git fsck
command to search for these missing/broken object files.
Testing
Enter ‘http://site.com/project-path/.git/config’ in your browser URL bar, where ‘project-path’ is the path to your version controlled directory. If you see something like this:
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
[remote "origin"]
url = git@bitbucket.org:UserName/your-repo.git
fetch = +refs/heads/*:refs/remotes/origin/*
…you need to take action!
Solution
You should never include configuration or other sensitive files in version control for security reasons – gitignore is there for a reason.
Even if you keep sensitive files outside your git repo, it’s still important to restrict access to the .git directory for public-facing projects. You could move your .git directory outside the document root, so that it is not publicly accessible. This is quite a good solution, though I have found it a bit fiddly when a project uses git submodules.
Another alternative is to selectively block public access to all files under the .git directory.