Setup a personal bookmark search engine with YaCy

· Christopher Hoelter's Blog

How to setup a personal bookmark search engine using YaCy.

Recently I was on the hunt for a cross platform bookmarking solution. My criteria for the tool was:

I tried a variety of services, and most were lacking in some piece of criteria while providing many features I did not need. Eventually, I saw a hacker news article referencing YaCy as a self hosted version of historio.us -- which I liked the simplicity of -- so I decided to give it a try. It met all of my criteria outside of the ability to append notes. Since it snapshots web pages and provides full text search, that hasn't proved to be an issue for how I use it.

YaCy Setup on Debian/Ubuntu #

Assuming you have podman/docker installed on the machine you wish to run YaCy, setup is a breeze.

  1. Install for intel cpu with podman (used on debian). This will download the image from docker.io and start up yacy on port 8090.
podman run -d --name yacy_search_server -p 8090:8090 -p 8443:8443 -v yacy_search_server_data:/opt/yacy_search_server/DATA --restart unless-stopped --log-opt max-size=200m --log-opt max-file=2 docker.io/yacy/yacy_search_server:latest
  1. Open up yacy and modify the following options to configure YaCy to act as a bookmarking service. We don't intend for it to act as a P2P search engine. If credentials are asked for, the default ones are username: "admin" password: "yacy".
  1. Setup bookmarklet that will be used to capture the current webpage to yacy when pressed. Search "depth" is set to 0 so that only the intended page is bookmarked and YaCy doesn't follow any urls on the page. Set the line below as the bookmark url.
javascript: (() => { window.open(`http://localiphere:8090/Crawler_p.html?crawlingDomMaxPages=10000&range=wide&intention=&sitemapURL=&crawlingQ=on&crawlingMode=url&crawlingURL=${encodeURIComponent(window.location.href)}&crawlingFile=&mustnotmatch=&crawlingFile%24file=&crawlingstart=Neuen Crawl starten&mustmatch=.*&createBookmark=on&bookmarkFolder=/crawlStart&xsstopw=on&indexMedia=on&crawlingIfOlderUnit=hour&cachePolicy=iffresh&indexText=on&crawlingIfOlderCheck=on&bookmarkTitle=&crawlingDomFilterDepth=1&crawlingDomFilterCheck=on&crawlingIfOlderNumber=1&crawlingDepth=0`, "_blank"); })()

This bookmark is just one to open up YaCy showing all results ordered by date captured.

localiphere:8090/yacysearch.html?query=*+/date&maximumRecords=10&resource=local&verify=ifexist&prefermaskfilter=&cat=href&constraint=&contentdom=text&strictContentDom=false&meanCount=5&former=a&startRecord=0
  1. If you're running YaCy on a headless server and it doesn't seem accessible from the outside or the podman container seems to stop itself, ensure enable-linger is turned on to keep the container running.
sudo loginctl enable-linger $user

Backing up and restoring data #

Once you have YaCy up and running, at some point you may want to backup or transfer your bookmarks. Here's how to do that.

Backup Data #

These commands will backup yacy data into /home/yacy-backups directory (ensure directory is already created).

podman stop yacy_search_server
podman run --rm -v yacy_search_server_data:/opt/yacy_search_server/DATA -v /home/chris/yacy-backups:/tmp:z docker.io/openjdk:8-stretch bash -c "cd /opt/yacy_search_server && tar -cf - DATA | xz -q -3v -T0 > /tmp/YACYDATA-$(date +\"%Y-%m-%d\").tar.xz"
podman start yacy_search_server

Restore data #

These are the commands to restore data into YaCy from a backup. I believe this will also restore any settings in addition to bookmarks. This assumes there is a file with the backup data called YACYDATA.tar.xz and is present in the target directory (in this example, /home/chris/yacy-backups/)

podman stop yacy_search_server
podman run --rm -v yacy_search_server_data:/opt/yacy_search_server/DATA -v /home/chris/yacy-backups:/tmp:z docker.io/openjdk:8-stretch bash -c "cd /opt/yacy_search_server && rm -rf DATA/* && tar xf /tmp/YACYDATA.tar.xz"
podman start yacy_search_server

Please leave a comment or drop a message at https://lists.sr.ht/~hoelter/public-inbox. Email me directly at [email protected].