indefero/README

129 lines
5.6 KiB
Plaintext
Raw Normal View History

monotone implementation notes
-----------------------------
1. general
This branch contains an implementation of the monotone automation interface.
It needs at least monotone version 0.47 (interface version 12.0) or
newer. To set up a new project with monotone, all you need to do is
to create a new monotone database with
$ mtn db init -d project.mtn
in the configured repository path ('mtn_repositories'). To have a really
workable setup, this database needs an initial commit on the configured
master branch of the project. This can be done easily with
$ mkdir tmp && touch tmp/remove_me
$ mtn import -d project.mtn -b master.branch.name \
-m "initial commit" tmp
$ rm -rf tmp
2. current state / internals
The implementation should be fairly stable and fast, though some
information, such as individual file sizes or last change information,
won't scale well with the tree size. Its expected that the mtn
automation interface improves in this area in the future and that
these parts can then be rewritten with speed in mind.
Another area of improvement is the access pattern to the monotone
database. While only one process is started per request, the time
(and server resource) penalty for this could still be dramatic once
many clients try to access the service. Luckily, monotone has an
easy way to deliver its stdio protocol for automation usage over the
network (mtn au remote_stdio), so the following scenarios are possible:
a) setup a single mtn server serving one database on a different
(faster) server and let the stdio client connect to that
b) setup usher (available from branch net.venge.monotone.contrib.usher
from the official mtn repository on monotone.ca) as proxy in
front of several local monotone databases mirroring themselves
c) like b), but use usher as proxy in front of several other remote
monotone databases (forwarding)
The scenario in a) might be needed anyways for a shared hosting
environment, because a database which gets served via netsync cannot
be accessed by another local process at the same time (its locked then),
so ideally both, the network functionality as well as the indefero
browsing functionality should be delivered from one single database
per project via netsync.
The only alternative for this setup is a two-database approach, where one
database acts as network node and the other as backend for indefero.
The synchronization between these two would then have to happen via
standard tools (cron...) or a sync request from one database to the other.
While the current implementation is ready for the two database approach,
some code parts and configuration changes have to happen for the remote
stdio usage. Bascially this is replacing the initial call to
mtn -d project.mtn au stdio (Monotone.php, around line 74)
with
mtn au remote_stdio HOSTNAME
which could be made configurable in conf/idf.php. But again, this heavily
depends on the exact anticipated server setup.
To scale things up a bit, multiple projects should of course use
separated databases. The main reason for that is that while read access
can be granted on a branch level, write access gives total write
possibilities on the whole database. One approach would be to start
one serve process for each database, but the obvious downside here is
that each of those processes would need to get bound to another
(non-standard) port making it hard for users to "just clone" the
project sources without knowing the exact port.
Usher comes to the rescue here as well. It has three ways
to recognize the request for a particular database:
a) by looking at the requested host name (similar to SNI for Apache)
b) by evaluating the requested branch pattern
c) by evaluating the path part from an mtn:// uri (new in mtn 0.48)
The best way is probably to configure it with c) - instead of pulling
a project like this
$ mtn pull hostname branchname
a user uses the URI syntax (which will, btw. be the default from
mtn 0.99 onwards):
$ mtn pull mtn://hostname/database?branchname
Here, the "/database" part is used by usher to determine which backend
database should be used for the network action. The "clone" command
will also support this mtn:// uri syntax, but this didn't made it into
0.48, but will be available from 0.99 and later.
3. indefero critique:
It was not always 100% clear what some of the abstract SCM API method
wanted in return. While it helped a lot to have prior art in form of the
SVN and git implementation, the documentation of the abstract IDF_Scm
should probably still be improved.
Since branch and tag names can be of arbitrary size, it was not possible
to display them completely in the default layout. This might be a problem
in other SCMs as well, in particular for the monotone implementation I
introduced a special filter, called "IDF_Views_Source_ShortenString".
The API methods getPathInfo() and getTree() return similar VCS "objects"
which unfortunately do not have a well-defined structure - this should
probably addressed in future indefero releases.
While the returned objects from getTree() contain all the needed
information, indefero doesn't seem to use them to sort the output
f.e. alphabetically or in such a way that directories are outputted
before files. It was unclear if the SCM implementor should do this
task or not and what the admired default sorting should be.