How I used Git Subtree to create the Code Snippet Repository - automatically!

For this blog’s series “Code Snippet Repository” I made use of git’s subtree module, nbconvert and cell tags to automatically extract python codes from jupyter notebooks and push those python scripts to the seperate csr_repo. In this post, I explain how you can setup a pipeline like this yourself! You can see a pic

Graph of Git Subtree: A is the main repo, B is the sub-repo

Setup of your Git Repository

First, setup your main repository and link it to a remote repository like github or gitlab. This code here is adapted from cloudera

1
2
3
4
5
6
7
cd path/of/desired_location

git init
git add *
git commit -a -m 'Initial commit'
git remote add origin git@github.com:username/repo.git

Create your Subtree

To establish the subtree, we need to create a new folder for it by using the following git command git subtree add. Specify the folder-name by prefix=name_for_subfolder. Besides this folder, you also need the address of the subtree’s remote repository and its branch you want to check out.

1
git subtree add prefix=name_for_subfolder git@github.com:username/sub_repo.git branch 

Using your new Architecture

We are finished! But how do we use this pipeline now?
Here, we need to keep in mind to which repository we want to make a git push.

If we just use our standard procedure:

1
2
3
4
git add .
git commit -m "great commit message"
git push

Will push the changes to our main repository, including changes made to our subfolder.


However, if we want to push changes in the subfolder to our sub_repo, we need to use:

1
git subtree push --prefix=name_for_subfolder git@github.com:username/sub_repo.git branch

So essentially use the submodule subtree and specify prefix, repository and branch again!

Extracting Python Code from Jupyter-Notebooks

Now that we have setup the git pipeline, we need an automatic way to extract python code from jupyter-notebooks. This can be done with jupyter cell tags and the module nbconvert.

The following CLI command renders notebook.ipynb‘s cells that have not been marked with the no_code cell tag, into a python script. This script is then stored into the csr_repo folder which is, you guessed it, my sub-repo folder. This folder is linked to the respective ds-econ_csr repository on github.

1
2
3
4
5
jupyter nbconvert --output-dir='../csr_repo' 
--TagRemovePreprocessor.enabled=True
--TagRemovePreprocessor.remove_cell_tags="['no_code',]"
--to script
notebook.ipynb

Code Snippet Repository

This post is part of the Trial and Error series. This collection of posts describes challenges I encountered when building the ds-econ blog!

These topics might be more niche and are often times not as easily googled for.

Feel free to contact me if you have more elegant solutions to the problems I encountered! Write me an email: finn@ds-econ.com

I read every message!

How I used Git Subtree to create the Code Snippet Repository - automatically!

https://www.ds-econ.com/2021/09/07/01_te_gitsubtree/

Author

Finn

Posted on

2021-09-07

Updated on

2022-03-09

Licensed under