How I used Git Subtree to create the Code Snippet Repository - automatically!
For this blog’s series “Code Snippet Repository” I made use of git’s subtree module, nbconvert and cell tags to automatically extract python codes from jupyter notebooks and push those python scripts to the seperate csr_repo. In this post, I explain how you can setup a pipeline like this yourself! You can see a pic
Setup of your Git Repository
First, setup your main repository and link it to a remote repository like github or gitlab. This code here is adapted from cloudera
1 | cd path/of/desired_location |
Create your Subtree
To establish the subtree, we need to create a new folder for it by using the following git command git subtree add
. Specify the folder-name by prefix=name_for_subfolder
. Besides this folder, you also need the address of the subtree’s remote repository and its branch you want to check out.
1 | git subtree add prefix=name_for_subfolder git@github.com:username/sub_repo.git branch |
Using your new Architecture
We are finished! But how do we use this pipeline now?
Here, we need to keep in mind to which repository we want to make a git push
.
If we just use our standard procedure:
1 | git add . |
Will push the changes to our main repository, including changes made to our subfolder.
However, if we want to push changes in the subfolder to our sub_repo
, we need to use:
1 | git subtree push --prefix=name_for_subfolder git@github.com:username/sub_repo.git branch |
So essentially use the submodule subtree
and specify prefix, repository and branch again!
Extracting Python Code from Jupyter-Notebooks
Now that we have setup the git pipeline, we need an automatic way to extract python code from jupyter-notebooks. This can be done with jupyter cell tags and the module nbconvert
.
The following CLI command renders notebook.ipynb
‘s cells that have not been marked with the no_code
cell tag, into a python script. This script is then stored into the csr_repo
folder which is, you guessed it, my sub-repo folder. This folder is linked to the respective ds-econ_csr
repository on github.
1 | jupyter nbconvert --output-dir='../csr_repo' |
How I used Git Subtree to create the Code Snippet Repository - automatically!