Sunday, May 10, 2015

Using Git and GitHub to Archive Data

This blog post is for those of you who have never used Git or GitHub.   I use Git and GitHub to archive my behavioral data.    These data are uploaded to GitHub, an open web repository where it may be viewed by anyone at any time without any restrictions.  This upload occurs nightly, that is, the data are available within 24 hours of their creation.  The upload is automatic---no lab personnel is needed to start it or approve it.  The upload is comprehensive in that all data files from all experiments are uploaded, even those that correspond to aborted experimental runs or pilot experiments.  The data are uploaded with time stamps and with an automatically generated log.  The system is versioned so that any changes to data files are  logged, and the new and old versions are saved.  In summary, if we collect it, it is there, and it is transparent.   I call data generated this way as Born Open Data.

Since setting up the born-open-data system, I have gotten a few queries about Git and GitHub, the heart of the system.  Git is the versioning software; GitHub is a place on the web (github.com) where the data are stored.  They work hand in hand.

In this post, I walk through a few steps of setting up GitHub for archiving.  I take the perspective of Kirby, my dog, who wishes to archive the following four photos of himself:







Here are Kirby's steps:


1. The first step is to create a repository on the GitHub server.

1a.  Kirby goes to GitHub (github.com)  and signs up for a free account (last option).  Once the account is set up (with user name KirbyHerby) he is given a screen with a lot of options for exploring GitHub.  He ignores these as they are not relevant for his task.

1b.  To create his first repository on the server, Kirby presses the green button that says ``+ New repository" on the bottom left.


1c. Kirby now has to make some choices about the repository.  He names it ``data," enters a description of the repository, makes it public,
initializes it with a README and does not specify which files to ignore or a license.  He then presses the green ``Create repository" button on the bottom, and is given his first view of the repository



Kirby's repository is now at github.com/KirbyHerby/data, and he will bark out this URL to anyone interested.  The repository contains only the README.md file at this point.

2.  The next step is getting a linked copy of this repository on Kirby's local computer.

2a. Kirby  downloads the GitHub application for his operating system (mac.github.com} or windows.github.com), and on installation, chooses to install the command-line tools (trust me, you will use these some day).

2b.  Kirby enters his GitHub username (``KirbyHerby") and password.

2c. He next has to create a local repository and link it to the one on the server.   To do so, he chooses to ``Add repository" and is given a choice to ``Add," ``Create," or ``Clone."  Since the repository already exists at GitHub, he presses ``Clone."  A list of his repositories shows up, and in this case, it is a short list of one repository, ``data."   Kirby then selects ``data" and presses the bottom button ``Clone repository."  The repository now exists on the local computer under the folder ``data."   There are two, separate copies of the same repository: one on the GitHub server and one on Kirby's local machine.




3. Kirby wishes add files to the server repository so others may see them.

3a. Kirby first adds the photo files to the local repository as follows: Kirby copies the photos to the files in the usual way, which for Mac-OSX is by using the Finder.  The following screen shot shows Finder window in the foreground and the GitHub client window in the background.  As can be seen, Kirby has added three files, and these show up in both applications.  Kirby has no more need for the Finder and closes it to get a better view of the local repository in the GitHub client window.




3b.  Kirby is now going to save the updated state of the local repository, which is called committing it.  Committing a local action, and can be thought of as a snapshot of the repository at this point in time.  Kirby turns his attention to the bottom part of the screen.  To commit, Kirby must add a log entry, which in this case is, ``Added three great photos."  The log will contain not only this message, but a description of what files were added, when, and by whom.  This log message is enforced---one cannot make a commit without it.  Finally Kirby presses ``Commit to master."



3c.  Kirby now has to push his changes to the repository to the GitHub server so everyone may see them.  He can do so by pressing the ``sync" button.



That's it.  Kirby's additions are now available to everyone at github.com/KirbyHerby/data

Suppose Kirby realizes that he had forgotten his absolutely favorite photo of him hugging his favorite toy, Panda.  So he copies the photo over in Finder, commits a new version of the repository with a new message, and syncs up the local with the GitHub server version.

There is a lot more to Git and GitHub than this.  Git and GitHub are very powerful, so much so that they are the default for open-source software development world wide.  Multiple people may work on multiple parts of the same project.  Git and GitHub have support for branches, tagging versions, merging files, and resolving conflicts.  More about the system may be learned by studying the wonderful Git Book at git-scm.com/book/en/v2.

Finally, you may wonder why Kirby wanted to post these photos.  Well, Kirby doesn't know anything about Bayesian statistics, but he is loyal.  He knows I advocate Bayes factors.  He also knows that others who advocate ROPEs and credible intervals sell their wares with photos of dogs.  Kirby happens to believe that by posting these, he is contributing to my Bayes-factor cause.  After all, he is cuter than Kruschke's puppies and perhaps he is more talented.  He does know Git and GitHub and has his own repository to prove it.

2 comments:

Richard S. Maddox said...

Such a useful blog thanks for sharing
free download software Full Version

Tendaxes Hobisons said...

It absolutely was terrible right at that moment an important chrome steel hobby see that made available in the exact cost being follow because of other designs. All the parallel between the car and watch is actually entertaining, as well as the phone numbers happen to be essentially identical for every single. Regretably, the particular attractive interpretation of your enjoy that has been blisteringly quickly, and additionally extremely glamorous, met it has the dying. Typically the timepieces definitely will not! It was eventually which means that lovely not to mention chic. This is a wristwatch most likely going being a powerful iconic rendering from the company. Although every person notion all the check out was gorgeous, mainly several consumers in fact understood benefit associated with the country develop, along with the outstanding supplier guiding the application. Quickly in front in order to 2015. the different watches have been in style these days just before plus ceramic may be the hippest information all over. The forex market might be rife with designer watches manufactured from plastic, metallic and also manipulated precious metal http://www.hotwatchsale.co.uk. That devices for decades seems to have tried using distinct resources as well as enjoy a great many others prefers ceramic for several apps even if well ceramic looks after continue to be strange. Because the introduction of the very most very first keep an eye on, they need constructed upon the very first framework establishing a lot of kinds together with stops the fact that grows your grasp in the markets when outstanding valid to their authentic vision- an extra hobby watch for a truly worrying buyer. The principle observe have been grew towards have a tourbillion, a fabulous chronograph, a good continuous date, a fantastic side effect, plus much more!. Wedding reception have a look at truly blanketed throughout stones in cases where that is a person detail. Selecting to your choice or possibly not, any follow is ready to look at one thing brand-new, dazzling not to mention striking. These are extreme conditions, I am a sucker for the particular look at which usually is parked , 48mm great, yet sadly Ariel actually assessed it, so i california not really. Furthermore, I can also just imagine that a band given a bit of a break in the majority about ideas you can get, in terms of prevailing bracelets on the market. Including, when you have obtained typically the bracelet aligned correctly, it ought to be at ease concerning more or less any wrist.