Go to file
2015-09-08 21:58:40 -05:00
services initial commit 2015-09-08 21:56:08 -05:00
enum.py initial commit 2015-09-08 21:56:08 -05:00
Issue.py initial commit 2015-09-08 21:56:08 -05:00
main.py initial commit 2015-09-08 21:56:08 -05:00
Project.py initial commit 2015-09-08 21:56:08 -05:00
README.md updating readme again 2015-09-08 21:58:40 -05:00
Release.py initial commit 2015-09-08 21:56:08 -05:00
Service.py initial commit 2015-09-08 21:56:08 -05:00
Wiki.py initial commit 2015-09-08 21:56:08 -05:00

codescrape

Version 1.0

By: Nathan Adams

License: MIT

Description

This library is to be used to archive project data. Since with the announcement of Google Code going to archive only - I wanted to create a library where you can grab source data before it is gone forever.

Use cases include:

Archive projects due to:

  • Hosting service shutting down
  • Authorities sending cease-and-desist against provider/project
  • Historical/research/ or educational purposes

Usage

Currently srchub and google code are supported. To use:

from services.srchub import srchub
shub = srchub()
projects = shub.getProjects()

or for google code

from services.googlecode import googlecode
gcode = googlecode()
project = gcode.getProject("android-python27")

Sourcehub library will pull all public projects since this list is easily accessed. Google Code does not have a public list persay. And I didn't want to scrape the search results, so I developed it to require you to pass in the project name. If you were to get your hands on a list of google code projects you could easily loop through them:

from services.googlecode import googlecode
gcode = googlecode()
for project in someProjectList:
    project = gcode.getProject(project)
    # do something with project

the project data structure is as follows:

project

  • getRepoURL() -> Returns the URL of the repo
  • getRepoType() -> Returns the type of repo (git, hg, or SVN)
  • getReleases() -> Returns all downloads related to the project
  • getIssues() -> Returns open issues
  • getWikis() -> Returns wikis