I finally decided to take the plunge, drink the Koolaid, or
whatever metaphor fits, and begin using git on some of my personal files.
I ended up learning a lot more about git in the
process. As a result this document may
be a little long, if you want to skip ahead and read the summary that may be
good enough for most people.
To do this right you need a server that can serve as a
remote repository (So you can have off computer storage, if not also
offsite). I covered how to do this in
this article http://www.whiteboardcoder.com/2012/08/installing-git-server-on-ubuntu-1204.html
If I had a simple situation I would just go into the folder
I want to have in a repository and run the following commands
> git init
> git add .
> git commit -m "initial
commit"
|
Then on your git server init a bare repo that you can push your
repository to. In this example I am
sudo'n to the git user (who does not have a normal shell, so you have to
designate it)
> sudo su git -s /bin/bash
> cd /git/repos
> mkdir project.git
> cd project.git
> git --bare init
|
Finally push it up to a remote master server (adjust the url
to your remote server and the path)
> git remote add origin git@example.com:/git/repos/project.git
> git push origin master
|
But…. If you are like me and have a combination of files and
folders I do not want to add to my git nor do I want to move to a different
folder on my machine.
Some of these folders just make no sense storing in git and
some are just too large to bother tracking like large video files.
First I had to find a few command lines to help me calculate
how much space I will save by skipping folders or certain file types. After figuring out what I want to skip and
how much it will save me, how do I properly set up my .gitignore file to make
it work?
Calculating sizes
First I will go over the easy no brainer command line tools
you can use.
du (disk usage)
Finding the size of a folder and all its sub files can be
found using the du command.
From within a folder run the following command.
> du -hs .
|
Or to check a subfolder replace the "." with the
folder name
> du -hs archive
|
Find & awk
I found this nice
little command line snippet at http://stackoverflow.com/questions/599027/calculate-size-of-files-in-shell [1] for calculating the size of all files of
a certain type.
First it's important
to know that the command depends on how your system shows data when you run the
ls command from find. As test run the
following command
> find . -iname "*"
-ls
|
In my case the 7th
column has the size of the file in bits,
we can use that to calculate the cumulative size of a file type.
Here is an example
to find out what the total size in MiB of all .jpg files in this folder and all
its subfolders. (the -iname will allow
it to search on .jpg and .JPG)
> find . -iname "*.jpg"
-ls | awk '{total +=$7} END {print total/(1024*1024)}'
|
This command will
return the size in MiB.
Run a few of these
on different file types to see how much space a particular file type is taking
up.
find -size
I almost forgot to add this one find -size
> find . -size +100M
|
This will list all files that are at least 100MiB in size
Or this one which will also output the size for each file it
finds over 128 MiB
> find "$PWD" -size
+128M | xargs -I {} ls -alh "{}"
|
Find size of folders
Find the size of
every folder and order them by size
> find . -type d | xargs -I {}
du -s {} | sort -V
|
The size is listed in bytes.
wc and tar
Word count can be a very important thing to know as well.
I happen to have a few old systems with 10,000+ files that I
really don't need to keep as files I can archive them in a tar file. But first I need to find them. (why do this?
Well backing up 10,000 tiny files is far more time consuming then
backing up one file of the same size).
> find . -type f | wc -l
|
This will list the number of files within this folder and
all its subdirectories.
I tried to find a one liner that would list the number of
files within each folder (and its subfolders) but I was unsuccessful. If you know of one please post it.
Here is a simple example of tar'n a folder up
> tar cvzf myfolder.tar.gz
myfolder
|
I had one particular folder that contained 80,000 small
files. I Tar'n this folder which saved
me about 800 MiB and I was able to rsync all the directories in 2min 30 sec vs
8 min.
gitignore
Hopefully, now you
have a list of files and folders you want to ignore and not add to your git
repository.
To ignore files you
need to use the .gitignore file.
In the base
directory create a .gitignore file.
> vi .gitignore
|
But, before I get
into my examples, how can you be sure git is really ignoring a file?
Here is how I
confirmed my .gitignore when creating a new repository (this assumes you have
run "git init ." and nothing else yet.
After running
> git init .
|
Run this
> git status
|
Here you can see the
files and folders, in the current. folder that are currently untracked but
would be added if you ran git add .
To check a subfolder
just run something like this
> git status img-folder
|
This will list the
same thing, but for the folder you designate.
If I edit the
.gitignore file
> vi .gitignore
|
And place the
following in it.
*.jpg
|
Now run
> git status img-folder
|
Now you can see that
the .jpg files are being ignored.
I found using status
very helpful in making sure my .gitignore files was correct.
gitignore examples
Here are some of my .gitignore one liners
#Ignore all directories and their contents
*/*
|
If this is the only thing you had you would only get the
files in the top directory everything else would be ignored.
#Ignore all directories that start with /Logo
/Logo*
|
#Ignore those pesky .DS_Store OS X files
.DS_Store
|
*.png
*.JPG
*.pdf
|
Ignore all .png, .jpg, and .pdf files
Double asterisk. In
theory I should be able to do something like this
selling/**/*.JPG
|
The ** would mean not directory or any number of directories
so this would result in any JPG file within the selling directory or an of its
subdirectories to be ignored. However
this feature was introduced in git 1.8 if you do a quick git --version cygwin
only has 1.7.9 So I can't use this on my
box.
I decided to update git on cygwin to 1.9 just to make sure I
don't get myself in trouble.
I found this site http://stackoverflow.com/questions/14330050/how-to-get-git-1-8-in-cygwin
[4]
Intalling git vs 1.9 on cygwin and ubuntu server
To install a newer version of git on cygwin follow this
procedure (I am going to go to version 1.9)
> git clone
https://github.com/git/git.git
> cd git
> git
checkout v1.9.0
|
Now that you have v 1.9 checked out do the following
> make
configure
>
./configure --prefix=/usr/local
> make
> make -i install
|
After doing this I opened a new cygwin window and ran the following to confirm the update
> git
--version
|
I also had to install git v 1.9.0 on my Ubuntu 10.04 server
to do this I ran the following commands
> git clone
https://github.com/git/git.git
> cd git
> git
checkout v1.9.0
|
Now that you have v 1.9 checked out do the following
> make
configure
> ./configure --prefix=/usr
> make
> make -i
install
|
After doing this I opened a new cygwin window and ran the following to confirm the update
> git
--version
|
Issues I ran into
There were a couple of interesting issues I ran into while
trying to put my "normal" files into a git repository and to push it
to a few remote repositories here are the issues and fixes I came up with.
fatal: out of memory, malloc failed
In one particular repository I have a couple of large VMs
one has a 2900 MiB virtual hard drive and the other a 3008 MiB Virtual Hard
drive.
Now I admit this type of file is not a real good candidate
for adding to a git repo. But in this
case it’s a historical VM I am never going to change and I would like an
effective simple backup and transfer mechanism for it that git provides. (It's an old VM I used for my Masters'
Thesis)
I am using a 32-bit version of cygwin on my windows box and
when I run
> git commit -m "Initial
Commit"
|
I get the following error
fatal:
Out of memory, malloc failed (tried to allocate 3041787905 bytes)
|
Running this quick find command I find
> find . -size +1000M -ls
|
My two largest files are
3041787904 bytes
3154509824 bytes
Or
2.8 GiB
2.9 GiB
The first error comes out to exactly the size of my first
file.
One web site I found talking about this issue was https://github.com/hbons/SparkleShare/issues/519
[2]
They suggested updating the .git/config file.
Here is what I did (after wiping my .git folder and running
git init again)
> vi .git/config
|
And add
[pack]
deltaCacheSize = 3072m
packSizeLimit = 3072m
windowMemory = 3072m
[core]
packedGitLimit = 128m
packedGitWindowSize = 128m
|
Then I re-ran
> git add .
> git commit -m "Initial
commit"
|
And I got the same
error… What the heck?
Maybe I need a
little but more overhead room on my memory?
So I updated it to
[pack]
deltaCacheSize = 3600m
packSizeLimit = 3600m
windowMemory = 3600m
[core]
packedGitLimit = 128m
packedGitWindowSize = 128m
|
And tried again.
I still got the
malloc failed error…..
OK I am going to
update my cygwin to a 64 bit version and see if that fixes it.
After installing the 64 bit system I edited the .git/config to the cygwinfollowing.
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
ignorecase = true
[core]
packedGitLimit = 128m
packedGitWindowSize = 128m
[pack]
deltaCacheSize = 128m
packSizeLimit = 128m
windowMemory = 128m
|
Which only has it
set to 128m
And it worked!
Looks like the
64-bit version fixed my problem.
I did notice that my .gitconfig file located
in my home directory (that contains my [user] information also contained
[pack]
windowMemory = 64m
|
Maybe this created
the issue I saw on the 32-bit side? I
would guess the local config file would override this, but I am not sure.
But I thought I
would mention it in case someone sees the same issue.
error: pack-objects died of signal 13
When trying to push
this repo up to a remote repo I go the following errors
Counting objects: 3231, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (3081/3081),
done.
error: inflate: data stream error
(incorrect data check)/s
fatal: pack has bad object at offset
337748223: inflate returned -3
error: pack-objects died of signal 13
error:
failed to push some refs to 'git@git.
Now how do I fix this?
After several
attempts to fix this I really was getting nowhere… I am sure I could get it fixed and working
but I have to admit to myself that git was not meant to do this (repo very
large binary files) and I should not force a square peg in a round hole.
My Solution
So I gave up, threw
in the towel, surrendered… and found another way to deal with it.
Here is my idea…
Within each
potential repo I run the following command.
> find "$PWD"
-size +128M | xargs -I {} ls -alh "{}"
|
This will list all
the files over 128MiB and show there location and actual size.
In addition I also
ran du -hs {folder} on a few folders to
determine how large they are. I have a
few filled with images that I won't put in a git repo either.
Then I gathered up
all the files/folders that I do not want to add to my git repo and add them to
my .gitignore file. For example..
#not-git files and folders
/99_Thesis/VMs
|
rsync script
I still want to be
able to get the VMs folder and its content easily so I created a script to
download the folder using rsync.
On my ubuntu server
(which I am using as a git remote repo) I created a not-git folder
> sudo mkdir -p
/not-git/rsync
|
This is where the
tricky part comes in. Since we are using
rsync we need to give access to the /not-git/rsync folders to any user who
needs access to it. In my case is simple
and it's just me. But if you had other
users they would need permissions to this box in some way to read this folder. (Or you could easily put these files
somewhere else even on an FTP server… just a thought)
Create the script
> vi .rsync-not-git
|
Here is the script I
created
#!/bin/bash
#Check for an override name
name=""
if [ $2 ]
then
name="$2@"
fi
#=======================================
#
#Only spot you should be changing anything
#Array contains subfolders and other array contains
files/folders to rsync
#I don't think bash supports arrays of arrays so i did it this
way
loc="/not-git/rsync/01_folder/"
folders=("Folder1" " Folder2")
files=("file1" "fil2")
#
#===============================================
flags="-avzr"
if [ "$1" == 'push' ]
then
echo "Push it"
#Need to make
the directories
#for folder in
$folders
for i in "${!folders[@]}"
do
ssh "$name"git.example.com mkdir -p $loc${folders[$i]}
rsync $flags "${folders[$i]}${files[$i]}" "$name"git.example.com:$loc${folders[$i]}
done
else
echo "Pull it"
#Create local
folder if it is not present
for i in "${!folders[@]}"
do
if [ "${folders[$i]}" == '' ]
then
rsync $flags "$name"git.example.com:$loc${folders[$i]}${files[$i]} .
else
mkdir
-p ${folders[$i]}
rsync $flags "$name"git.example.com:$loc${folders[$i]}${files[$i]} ${folders[$i]}
fi
done
fi
|
Edit this to your
needs.
Change the loc
variable to the loc on your remote server where you want to place this.
Change folders and files arrays to match what you are ignoring in git but want to rsync.
For example if you
want to rsync /Jeff/move.mp4 and a folder /work/images
You would change it
ot
folders=("Jeff" "work")
files=("move.mp4"
"images")
|
Then to use it run
the script like this
> ./.rsync-not-git
push
|
To pull
> ./.rsync-not-git
pull
|
If you are like me
and are on a system that has a different username for you. You can add a username to the push or pull.
For example
> ./.rsync-not-git
push patman
|
Summary
To sum it all up….
1 - Upgrade git to
at least version 1.9 on your local machine and any remote git server you will
use. (if you are using cygwin use 64-bit cygwin)
2 - edit .git/config
> vi .git/config
|
And add
[core]
packedGitLimit = 128m
packedGitWindowSize = 128m
[pack]
deltaCacheSize = 128m
packSizeLimit = 128m
windowMemory = 128m
|
3 - Ignore very
large files/folders
Run the following
command to find large file
> find "$PWD"
-size +128M | xargs -I {} ls -alh "{}"
|
Any large files add
to the .gitignore files. Also add any
large folders, for example a folder of images, that don't make sense to put
into a repository to the .gitignore file.
4 - rsync the ignore
files
Write a script to
rsync the files and folders you are ignoring in git but want to have a simple
way to download/upload.
The script should
allow for a push and pull (see my
example above of a script I came up with)
That’s it! Use the git for all but your large
file/folders and use git to store a script to rsync the files/folders you
ignored.
References
[1] Calculate size of
files in shell - tpgould
Visited 7/2014
[2] Better git memory
usage settings for huge files #519
Visited 7/2014
[3] Cygwin home page
Visited 7/2014
[4] How to get Git
1.8 in Cygwin?
Visited 7/2014