indefero/README

monotone implementation notes
-----------------------------

1. general

    This branch contains an implementation of the monotone automation interface.
    It needs at least monotone version 0.47 (interface version 12.0) or
    newer.  To set up a new project with monotone, all you need to do is
    to create a new monotone database with

        $ mtn db init -d project.mtn

    in the configured repository path ('mtn_repositories'). To have a really
    workable setup, this database needs an initial commit on the configured
    master branch of the project.  This can be done easily with

        $ mkdir tmp && touch tmp/remove_me
        $ mtn import -d project.mtn -b master.branch.name \
            -m "initial commit" tmp
        $ rm -rf tmp


2. current state / internals

    The implementation should be fairly stable and fast, though some
    information, such as individual file sizes or last change information,
    won't scale well with the tree size.  Its expected that the mtn
    automation interface improves in this area in the future and that
    these parts can then be rewritten with speed in mind.

    Another area of improvement is the access pattern to the monotone
    database.  While only one process is started per request, the time
    (and server resource) penalty for this could still be dramatic once
    many clients try to access the service.  Luckily, monotone has an
    easy way to deliver its stdio protocol for automation usage over the
    network (mtn au remote_stdio), so the following scenarios are possible:

    a) setup a single mtn server serving one database on a different
      (faster) server and let the stdio client connect to that

    b) setup usher (available from branch net.venge.monotone.contrib.usher
       from the official mtn repository on monotone.ca) as proxy in
       front of several local monotone databases mirroring themselves

    c) like b), but use usher as proxy in front of several other remote
       monotone databases (forwarding)

    The scenario in a) might be needed anyways for a shared hosting
    environment, because a database which gets served via netsync cannot
    be accessed by another local process at the same time (its locked then),
    so ideally both, the network functionality as well as the indefero
    browsing functionality should be delivered from one single database
    per project via netsync.

    The only alternative for this setup is a two-database approach, where one
    database acts as network node and the other as backend for indefero.
    The synchronization between these two would then have to happen via
    standard tools (cron...) or a sync request from one database to the other.

    While the current implementation is ready for the two database approach,
    some code parts and configuration changes have to happen for the remote
    stdio usage.  Bascially this is replacing the initial call to

        mtn -d project.mtn au stdio     (Monotone.php, around line 74)

    with

        mtn au remote_stdio HOSTNAME

    which could be made configurable in conf/idf.php.  But again, this heavily
    depends on the exact anticipated server setup.

    To scale things up a bit, multiple projects should of course use
    separated databases.  The main reason for that is that while read access
    can be granted on a branch level, write access gives total write
    possibilities on the whole database.  One approach would be to start
    one serve process for each database, but the obvious downside here is
    that each of those processes would need to get bound to another
    (non-standard) port making it hard for users to "just clone" the
    project sources without knowing the exact port.

    Usher comes to the rescue here as well.  It has three ways
    to recognize the request for a particular database:

    a) by looking at the requested host name (similar to SNI for Apache)

    b) by evaluating the requested branch pattern

    c) by evaluating the path part from an mtn:// uri (new in mtn 0.48)

    The best way is probably to configure it with c) - instead of pulling
    a project like this

      $ mtn pull hostname branchname

    a user uses the URI syntax (which will, btw. be the default from
    mtn 0.99 onwards):

      $ mtn pull mtn://hostname/database?branchname

    Here, the "/database" part is used by usher to determine which backend
    database should be used for the network action.  The "clone" command
    will also support this mtn:// uri syntax, but this didn't made it into
    0.48, but will be available from 0.99 and later.


3. indefero critique:

    It was not always 100% clear what some of the abstract SCM API method
    wanted in return.  While it helped a lot to have prior art in form of the
    SVN and git implementation, the documentation of the abstract IDF_Scm
    should probably still be improved.

    Since branch and tag names can be of arbitrary size, it was not possible
    to display them completely in the default layout.  This might be a problem
    in other SCMs as well, in particular for the monotone implementation I
    introduced a special filter, called "IDF_Views_Source_ShortenString".

    The API methods getPathInfo() and getTree() return similar VCS "objects"
    which unfortunately do not have a well-defined structure - this should
    probably addressed in future indefero releases.

    While the returned objects from getTree() contain all the needed
    information, indefero doesn't seem to use them to sort the output
    f.e. alphabetically or in such a way that directories are outputted
    before files.  It was unclear if the SCM implementor should do this
    task or not and what the admired default sorting should be.
add a README file which contains the first steps and some hints for the monotone server configuration as well as some indefero API critique 2010-06-23 01:27:12 +02:00			`monotone implementation notes`
			`-----------------------------`

			`1. general`

			`This branch contains an implementation of the monotone automation interface.`
			`It needs at least monotone version 0.47 (interface version 12.0) or`
			`newer. To set up a new project with monotone, all you need to do is`
			`to create a new monotone database with`

			`$ mtn db init -d project.mtn`

			`in the configured repository path ('mtn_repositories'). To have a really`
			`workable setup, this database needs an initial commit on the configured`
			`master branch of the project. This can be done easily with`

			`$ mkdir tmp && touch tmp/remove_me`
			`$ mtn import -d project.mtn -b master.branch.name \`
			`-m "initial commit" tmp`
			`$ rm -rf tmp`


			`2. current state / internals`

			`The implementation should be fairly stable and fast, though some`
			`information, such as individual file sizes or last change information,`
			`won't scale well with the tree size. Its expected that the mtn`
			`automation interface improves in this area in the future and that`
			`these parts can then be rewritten with speed in mind.`

			`Another area of improvement is the access pattern to the monotone`
			`database. While only one process is started per request, the time`
			`(and server resource) penalty for this could still be dramatic once`
			`many clients try to access the service. Luckily, monotone has an`
			`easy way to deliver its stdio protocol for automation usage over the`
			`network (mtn au remote_stdio), so the following scenarios are possible:`

			`a) setup a single mtn server serving one database on a different`
			`(faster) server and let the stdio client connect to that`

			`b) setup usher (available from branch net.venge.monotone.contrib.usher`
			`from the official mtn repository on monotone.ca) as proxy in`
			`front of several local monotone databases mirroring themselves`

			`c) like b), but use usher as proxy in front of several other remote`
			`monotone databases (forwarding)`

			`The scenario in a) might be needed anyways for a shared hosting`
			`environment, because a database which gets served via netsync cannot`
			`be accessed by another local process at the same time (its locked then),`
			`so ideally both, the network functionality as well as the indefero`
			`browsing functionality should be delivered from one single database`
			`per project via netsync.`

			`The only alternative for this setup is a two-database approach, where one`
			`database acts as network node and the other as backend for indefero.`
			`The synchronization between these two would then have to happen via`
			`standard tools (cron...) or a sync request from one database to the other.`

			`While the current implementation is ready for the two database approach,`
			`some code parts and configuration changes have to happen for the remote`
			`stdio usage. Bascially this is replacing the initial call to`

			`mtn -d project.mtn au stdio (Monotone.php, around line 74)`

			`with`

			`mtn au remote_stdio HOSTNAME`

			`which could be made configurable in conf/idf.php. But again, this heavily`
			`depends on the exact anticipated server setup.`

			`To scale things up a bit, multiple projects should of course use`
			`separated databases. The main reason for that is that while read access`
			`can be granted on a branch level, write access gives total write`
			`possibilities on the whole database. One approach would be to start`
			`one serve process for each database, but the obvious downside here is`
			`that each of those processes would need to get bound to another`
			`(non-standard) port making it hard for users to "just clone" the`
			`project sources without knowing the exact port.`

			`Usher comes to the rescue here as well. It has three ways`
			`to recognize the request for a particular database:`

			`a) by looking at the requested host name (similar to SNI for Apache)`

			`b) by evaluating the requested branch pattern`

			`c) by evaluating the path part from an mtn:// uri (new in mtn 0.48)`

			`The best way is probably to configure it with c) - instead of pulling`
			`a project like this`

			`$ mtn pull hostname branchname`

			`a user uses the URI syntax (which will, btw. be the default from`
			`mtn 0.99 onwards):`

			`$ mtn pull mtn://hostname/database?branchname`

			`Here, the "/database" part is used by usher to determine which backend`
			`database should be used for the network action. The "clone" command`
			`will also support this mtn:// uri syntax, but this didn't made it into`
			`0.48, but will be available from 0.99 and later.`


			`3. indefero critique:`

			`It was not always 100% clear what some of the abstract SCM API method`
			`wanted in return. While it helped a lot to have prior art in form of the`
			`SVN and git implementation, the documentation of the abstract IDF_Scm`
			`should probably still be improved.`

			`Since branch and tag names can be of arbitrary size, it was not possible`
			`to display them completely in the default layout. This might be a problem`
			`in other SCMs as well, in particular for the monotone implementation I`
			`introduced a special filter, called "IDF_Views_Source_ShortenString".`

			`The API methods getPathInfo() and getTree() return similar VCS "objects"`
			`which unfortunately do not have a well-defined structure - this should`
			`probably addressed in future indefero releases.`

			`While the returned objects from getTree() contain all the needed`
			`information, indefero doesn't seem to use them to sort the output`
			`f.e. alphabetically or in such a way that directories are outputted`
			`before files. It was unclear if the SCM implementor should do this`
			`task or not and what the admired default sorting should be.`