1
0
Fork 0
sear.c scrapes search results of popular engines, caches them and creates a simple HTML UI
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
Anton Luka Šijanec 0775495ea6 oops, that space is required. it'll wait till next release. 3 weeks ago
debian 0.0.22 4 weeks ago
misc 0.0.22 4 weeks ago
src oops, that space is required. it'll wait till next release. 3 weeks ago
test fixed parser, fixed leak, O(log n) storage - tsearch(3) - 0.0.17 5 months ago
.gitignore avtomatski merge je bil. 0.0.20 prvi poskus 4 months ago
Makefile 0.0.22 4 weeks ago
README.md 0.0.22 4 weeks ago

README.md

sear.c

sear.c is used as a lightweight replacement for SearX that proxies and caches search results from the Google web search engine. The main advantages over SearX are speed and simplicity.

packaging

debian and ubuntu

First add my software distribution repository prog.sijanec.eu into your APT sources list. See instructions there.

apt install sear.c
systemctl enable sear.c
service sear.c start

gentoo

First add my ebuild overlay repository sijanec/ebuild into your portage repos.conf. See instructions there. Read this note.

emerge --ask www-apps/searc
rc-update add sear.c
rc-service start sear.c

requirements

  • a POSIX system
  • GNU C library (uses tdestroy(3) if compiled without SC_OLD_STORAGE). musl supports tdestroy(3), though CC=musl-gcc does not work.
  • GNU compiler collection (it's written in GNU C - it uses nested functions).
  • GNU Make. (needs to support .NOTPARALLEL:).
  • libxml2-dev (for the simple HTML/1.0 client and HTML parser).
  • libmicrohttpd-dev (for serving results - use a reverse proxy, such as nginx, for HTTPS).
  • xxd (for converting HTML pages into C arrays when compiling from source).

supported browsers

pages that sear.c generates were tested and are usable on the following www clients: ungoogled-chromium, icecat, links and many more

compiling from source

make prepare	# debian only, runs apt install (run as root)
make		# compiles
./sear.c	# runs the server

instructions

  • run the daemon - it starts listening on HTTP port 7327 (remember it by picturing phone keyboard buttons with letters SEAR (; )
  • optional: create a reverse proxy for HTTPS
  • navigate to http://localhost:7327 and do a couple of searches to see if everything works
  • the horseshoe button redirects directly to the first result without wasting time on the results page. use if you feel lucky. (BP)
  • the painting button performs a search for images. PRIVACY WARNING: images are loaded directly from servers (not from google)
  • program writes all logs to standard error
  • setting the h parameter will rewrite links to HTTP from HTTPS
  • setting the l parameter with a number will limit number of displayed links to that number.
  • upstream engines sometimes respond with a CAPTCHA after repediated requests. set the environment variable SC_FALLBACK to a URL prefix (http://fallback.example:7327/search?) to HTTP redirect clients in case of such upstream errors.
  • shipped systemd unit and openrc init file loads environment variables from /etc/sear.c if it exists as VAR=VAL.

configuration

configuration is done with environment variables and with build time definitions:

  • environment variable SC_PORT containing a number defines the port, 7327 by default
  • preprocessor definition SC_LOGMEM when set, causes the program to store all logs to memory and display them via HTTP HTML UI on /logs.html
  • environment variable SC_FALLBACK defines a URL prefix of a search engine (possibly another sear.c instance) to which clients will be HTTP redirected when upstream engine responds with a captcha. Example: http://fallback.example:7327/search?some=param&other=param. HTTP query parameters are appended.
  • environment variable SC_LOGLEVEL overrides the build time preprocessor definition SC_LOGLEVEL, which is by default "SC_LOG_ERROR SC_LOG_WARNING SC_LOG_INFO SC_LOG_DEBUG" (all log levels) and, as the name applies, sets the loglevel to both /logs.html (if enabled) and stderr logging.
  • preprocessor definition SC_OLD_STORAGE defines whether old query storage mechanism O(n) should be used instead of the new tsearch(3) O(log n). This option is deprecated, but I'll leave it in for some time just in case some errors show up with the new implementation (perhaps scary security issues).

when openrc init script or systemd unit file is used, environment variables in newline separated format NAME=VALUE are read from /etc/sear.c, should that file exist.

prebuilt binaries

apart from the usual debian distribution, there are also prebuilt dynamically linked binaries built for amd64, arm64, i386 and armel, as well as debian packages.

before downloading, check that the build passed, indicated below on the badge:

Build Status

screenshots

screenshot in chromium 0 screenshot in chromium 2 screenshot in chromium 3 screenshot in chromium 4 screenshot in chromium 5

security

  • please email me if you find any (security) issues in the program.
  • always run sear.c as an unprivileged user in a chroot (gentoo and debian distribution services do that)

additional information

  • valgrind reports a memory leak, leak is bigger with every API search query. run make valgrind and you'll see it. I was unable to find the bug, but it just bothers me. I wrote a small bug PoC (test/bug) but I could not replicate the bug (cd tmp/bug; make; make valgrind; less valgrind-out.txt - process exits with no leaks possible). Example output from sear.c valgrind with one request done is included in test/bug/example-valgrind.txt. Such small memory leak is not a problem, since we store all extracted data from the query indefinetley anyways, but it's still pretty dumb to leak memory.
  • memory allocations are not checked for failures. This needs to be done to use fanalyzer
  • __attribute__s such as nonnull are not set in struct members of query types and in functions such as htmlspecialchars but if (!arg) return NULL is done instead, which is poor coding style and fanalyzing can't be done in this case. This needs to be fixed to use fanalyzer.

notes

  • gentoo ebuild: openrc's start-stop-daemon lacks support for easy creation of unprivileged daemons in chrooted environments with logging enabled, which sear.c absolutely requires due to it being in early alpha unstable stage. a pull request was submitted to openrc that adds such features; until it's merged and until it's changes are gentoo, sear.c's init script is unusable.