Archive for the ‘research’ Category

Git on MacOSX

Sunday, March 23rd, 2008

I've been playing with Git lately and wanted to share some instructions and gotchas. I've installed Git on 10.5 and 10.3 (which needs a few extras installed) but haven't tried them on 10.4 yet, let me know how it goes. From my experience, version control is way underutilized (at least in non-computer engineering) and would reduce the amount of duplicated effort, improve the ability to collaborate, and keep detailed history of important text files (like say your thesis if you're using LaTeX, which you should be). Other version control systems I've used are subversion and perforce, but I like git because it's super fast, it's pretty easy to manage remote repositories over ssh, and it's hot right now.

Initial Setup

As usual, you need to have the Developer tools installed.

Edit  ~/.bash_login (for just you) or /etc/profile (for all users on your computer)to add /usr/local/bin + sbin to your $PATH variable, e.g. add the following line:

export PATH="/usr/local/bin:/usr/local/sbin:$PATH"

Build and Install Git

curl -O
tar -xzvf git-
cd git-
./configure --prefix=/usr/local
make all
sudo make install
cd ..

If you get a compile error about po/de.msg make[1]: *** [po/de.msg] Error 127 then you should just be able to run

export NO_MSGFMT=1
make all

assuming that you only need to install the English interface.

If you get an error like referring to expat.h then you will need to build expat using the following commands

curl -O
tar xzvf expat-2.0.1.tar.gz
cd expat-2.0.1
make check
sudo make install
cd ..
source /etc/profile

To make sure that git is installed and working, trying running git from the command line.

Setting up a remote repository

In my case, I was interested in setting up our lab server (OSX 10.3) to properly host git repositories. First, install git on the server. Next, add the following to  /etc/profile, which is a handy script to make repository creation easy. The script is great and is from here.

if [ -z $1 ]; then
echo "usage: $FUNCNAME project-name.git"
mkdir $gitdir
pushd $gitdir
git --bare init
git --bare update-server-info
chmod a+x hooks/post-update
touch git-daemon-export-ok

After creating the folder /Library/WebServer/Documents/git you can just run newgit repository.git to make an accessible repository.

Here's an important step: make sure that /etc/profile properly loads the /usr/local/bin directory (see above). When you push/pull/clone data from your server, things will not go smoothly and will get errors like git-receive-pack: command not found or git-upload-pack: command not found.

Installing gitweb

Git includes a handy .cgi program for viewing your repositories through a browser. It's quick to install assuming you have apache setup with mod_perl installed and it's setup to serve .cgi files. Checkout the gitweb directory in the git source folder after you built git earlier (you didn't delete it yet, right?)

At this point you can use git normally on your remote host. Here are some hastily written examples:

Creating a repository from a directory of existing files

git init
git add .
git commit -m "first commit"

Creating a new repository

git init
(create files, write code)
git add .
git commit -m "first commit"

Putting your code on your server

newgit PROJECT.git
git push USERNAME@YOURSERVER:/Library/WebServer/Documents/git/PROJECT.git master

Pulling code from your server

git clone USERNAME@YOURSERVER:/Library/WebServer/Documents/git/PROJECT.git

Here are a few additional references to help get started (1, 2, 3, 4, and 5).

In Your Computers, Tracking Your Thoughts

Tuesday, September 18th, 2007

The most valuable thing I got out of Google this summer was getting in the habit of writing down what I'm doing on a regular basis. At the end of each day (or each week), writing down what you've done and what you're going to do next makes taking the next step that much easier. About a year ago I tried keeping monthly snippets after reading some motivational piece in Science, but a month is just long enough to get you out of the groove and to lose track of things. Writing more frequently takes the emphasis away from big, grand thoughts and turns it more into a source of reference material.

With that in mind, I decided to setup a system for myself tonight. The first choice was desktop, web, or paper. I chose web because it's searchable, is easy to backup and is available from anywhere. First I checked out Backpack, but decided that paying for something when you can do just as good for free was silly.

With it narrowed down to free, open-source CMS systems I decided to go with a blogging setup because a daily post, single user format works better that way than with a wiki. The two main contenders then were WordPress and MovableType. I've used WordPress in the past so I installed it and tried it out for a few minutes until I read about a sweet iPhone interface for MovableType. The upshot of WordPress is that Dreamhost has a one-click install for it, but the slick iPhone setup could not be contained. Installing MovableType took just a few minutes, and a little .htaccess magic made it nicely hidden away. We'll see how it goes.

Cooperation Makes It Happen

Monday, June 25th, 2007

I saw a great tech talk at work today about Science Commons (by the way, I'm at Google over the summer coding it up). James Boyle from Duke talked about the current state of scientific research and a few ideas to improve upon the state of things.

The first thing that he talked about was a semantic search engine designed to pull together complex ideas from papers. Today, if I'm interested in Alzheimer's and want to identify possible drug targets for treatment or model a cell signaling pathway, the first thing that I'll need to do is start with the literature to find out what's been done. Sounds reasonable, right? It is, except for the hundred thousand papers related to a process like apoptosis (cell death) that I'll be digging through until I need to find a cure for myself. The problem is compounded by the fact that useful information is spread across so many different specialized disciplines that I can't just ask my Alzheimer's researcher buddy to break things down. Review papers can help this somewhat, but you won't necessarily find unexplored connections by reading them and someone has to know enough to write them. Their solution was to make a search engine that could find actual connections, identifying the relationships between cause and effect in the text. He showed some interesting results but you could tell that it has a very long way to go.

He talked about a few other things too, but what I found most interesting was the problem of sharing materials between labs. Being 'scooped' is a problem that is on most people's minds, and it's definitely the most important problem limiting collaboration between labs. If I have a cell line that I think that I can get two good papers out of, there is a lot of motivation to hold off on sharing it with other labs until I have finished my work on it. That's also besides the work required in preparing something to share. Their solution was to setup a formalized sharing system, so that in addition to things like publications and citations being considered in evaluations, the fact that lab X shared 800 embryos with other labs would also be a big plus on the CV.

I've thought a lot about this problem of giving people the motivation to share and haven't come up with a great solution, so it was especially interesting to hear James speak. If there is a good personal relationship where everyone will receive due credit it's easy, but how can someone with a great idea in Japan coordinate with a fabrication wizard at Cornell when they don't even know each other yet? There must be a way to provide the incentive to sharing ideas and materials, anything from mask layouts to microfluidic devices to cell lines and to ensure their quality. Once that's accomplished and you can put together materials from many different people in a black box fashion then output will really shoot off of the charts. And then that perfect search engine will really be important...

The final thing that he touched on was the current debate over public access to journal articles that were funded by the federal government. The huge journal subscription fees are mainly justified by their managing of the peer review and editorial process. But when you think about it, why does it really take that much work? If an online system was setup that all grant awardees were required to enroll in with their specialties, it would be incredibly easy and probably a lot less biased to distribute articles for review that way. And then there's even the option of open pre-publication review which Nature has flirted with lately, but that raises the specter of scooping again.

If you aren't familiar with it (or you've never used flickr), you should check out Creative Commons if you have a second. Also, the actual tech talk will hopefully be online in a few days.