Backing Up Photos from Google Photos

March 23, 2025

Preparing for World Backup Day — March 31st, the eve of April Fools’ Day — I revisited the topic of Google Takeout

What happened#

This time with a focus solely on photos, because the 200 GB of storage I purchased is 90% full of photos/videos, and the annoying Gmail notification that my mail will stop working requires some kind of solution.

What can be done#

There are essentially two options:

pay Google more money for more storage
reduce usage of the existing storage by deleting junk.

Since Google’s policies have long stopped suiting me (I really should write about that separately — strange I haven’t already), I don’t want to give them more money. I decided to declare videos as the junk:

they take up the most space
they’re not viewed as often as photos
and they’re not as useful
they’re backed up to Synology
- need to update the backup article
they’re also available in Google Takeout archives

Pitfalls#

If I delete videos from Google Photos, then every subsequent backup (takeout) could potentially be missing data that was previously in Photos. This complicates the backup — I can no longer simply keep the latest takeout copy and trust that it has everything; I’ll have to somehow repack it.

Since the last time I dealt with Google Takeout — it hasn’t gotten better — it’s still impossible to automate the download (someone on Reddit got it working once somehow, I couldn’t) — so I’m again manually downloading 8 files of 50 GB each via browser to my laptop, then rsyncing from the laptop to the server. I had to re-upload a few times because the laptop went to sleep; after waking up rsync would continue, but in the end the md5 checksums of a couple of files didn’t match.

Though, something did get better — the server is now (finally) connected via cable instead of WiFi, so the upload was noticeably faster.

The problem#

So I have two (in the future more) sets of archive files that I need to merge into one folder without losing anything along the way — some photos may have been deleted for quality reasons, but others (like videos) — for lack of space. Some I might want to delete locally because they’re duplicates and Google put them in several folders multiple times (although I probably won’t bother since properly you’d want to replace deleted ones with relative symlinks… sounds like yet another project…)

Why not git?#

Late at night, my sleepy brain produced this algorithm:

create a git repository in an empty folder with one initial commit
put the first version of photos from Takeout-01 there and commit
put (rsync) the second version of photos from Takeout-02 and look at the diff:
- how many files were added
- how many were changed
- no deletions expected at all
commit the second version
evaluate how much time and other additional resources this approach takes
reassess the decision
if the reassessment is positive, continue with Takeout-N+1

Unpacking#

The server isn’t very fast in terms of either storage or computing power: unpacking took 3.5 hours

$ time for i in $(\ls *.tgz); do echo $i; tar xf $i -C extract ; done
takeout-20250313T040306Z-001.tgz
takeout-20250313T040306Z-002.tgz
takeout-20250313T040306Z-003.tgz
takeout-20250313T040306Z-004.tgz
takeout-20250313T040306Z-005.tgz
takeout-20250313T040306Z-006.tgz
takeout-20250313T040306Z-007.tgz

real    216m57.269s
user    47m39.655s
sys     113m16.870s

$ ncdu
...
Total disk usage: 316.7 GiB

git add#

I forgot to add time to the git add . command, so we won’t know exactly how long it took — I realized too late, and it’s a shame to waste time restarting.

Although maybe we will find out, because there’s a chance the partition will run out of space before I get to draw any conclusions.

Nine freaking hours#

$ time git add .

real    556m16.660s
user    338m58.527s
sys     77m12.387s

Two hundred freaking gigabytes#

---------------- .git ---------------
  211.4 GiB [###############] /objects
   15.7 MiB [               ]  index
  176.0 KiB [               ] /hooks
   24.0 KiB [               ] /logs
   12.0 KiB [               ] /refs
   12.0 KiB [               ] /info
   12.0 KiB [               ]  config
   12.0 KiB [               ]  description
   12.0 KiB [               ]  HEAD
   12.0 KiB [               ]  COMMIT_EDITMSG

git tuning#

I redid the git not because I ran out of space, but because: a) I was still curious how long it actually took, and b) I forgot to tweak the git settings a bit, and not knowing at which point they take effect — during add or commit — it’s safer to start over, especially since I hadn’t gotten very far.

The git settings are as follows

.gitattributes#

do not apply delta compression, do not merge, do not convert line endings for all files listed here

[attr]media -diff -merge -text -delta
*.jpg  media
*.jpeg media
*.png  media
*.gif  media
*.heic media
*.mov  media
*.mp4  media
*.webp media
*.m4v  media
*.avi  media

Ukrainian filenames#

For some reason git was showing Ukrainian file and directory names as something like \312\313\554 — which I suspect is Unicode encoding, but I’d rather have it in the usual readable form. Fixed like this:

git config core.quotepath false

Conclusions#

The idea is interesting in theory and technically works — but in practice it proved impractical. Burying 240 GB underground (in .git) for the dubious benefit of knowing “what changed between one archival and the next” — and at the cost of rather long git operations — seems like too high a price to pay. I’ll simply “repack” the contents of the takeout archives, overwriting changed files on top of the old ones and adding new ones.

Statistics#

git commit#

Not that bad, actually

real 17m56.089s
user 0m7.987s
sys 0m37.107s

git status before adding files#

$ time git status
On branch master
nothing to commit, working tree clean

real 0m22.238s
user 0m0.398s
sys 0m10.211s

add a files from newest takeout#

rsync blablabla
sent 348,875,203,737 bytes  received 3,209,024 bytes  14,322,067.89 bytes/sec
total size is 356,587,688,292  speedup is 1.02

real    405m59.681s
user    37m38.025s
sys     102m19.912s

git status after adding files#

It took 16.44 seconds to enumerate untracked files.
See 'git help status' for information on how to improve this.

no changes added to commit (use "git add" and/or "git commit -a")

real    57m33.692s
user    31m15.525s
sys     12m55.463s