Michael Scheiwiller
Published on

Working with cronjobs

Authors
  • avatar
    Name
    Michael Scheiwiller
    Twitter

Cronjobs are a way to schedule any jobs on your machine/server: run a script, remove files, copy data, make backups. It is a very convenient way to actually automate regular and tedious work. As explained motivation to do so, was the jobscraper of NextGenEnergyJobs.com.

I first thought this is going to be straight forward: I mean we have llm's these days. But i actually ended up running in troubles with the setup and then always postboning this. This is why i wanted to write it down so i get it right from the beginning the next time and maybe it is helpful for others as well.

So first things first: open the crontab file (on your server or unix machine with)

crontab -e

which opens the crontab file in edit mode in your defined texteditor (vim for me). The content then looks like this:

# Edit this file to introduce tasks to be run by cron.
#
# Each task to run has to be defined through a single line
# indicating with different fields when the task will be run
# and what command to run for the task
#
# To define the time you can provide concrete values for
# minute (m), hour (h), day of month (dom), month (mon),
# and day of week (dow) or use '*' in these fields (for 'any').
#
# Notice that tasks will be started based on the cron's system
# daemon's notion of time and timezones.
#
# Output of the crontab jobs (including errors) is sent through
# email to the user the crontab file belongs to (unless redirected).
#
# For example, you can run a backup of all your user accounts
# at 5 a.m every week with:
# 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/
#
# For more information see the manual pages of crontab(5) and cron(8)
#
# m h  dom mon dow   command

So the content itself actually already tells us a lot about cron and cronjobs: The first 5 entries of the line are whitespace separated integers which indicate the

  • minutes
  • hour
  • day of month
  • month
  • and day of week

For any of these entries you also can use '*' for 'any'. After these schedule indicators come the actual command. What i first now was aware of, is that cronjobs run with a limited environment, and the PATH variable might not include the directories needed for e.g. python or even echo. What i first came up with is this line to change into the scraper directory, activate the virtual environement and start the scraper script. Since i had issues getting this to run i then did the following to test (as said cronjobs run in a limited environment, so its better to write the full paths):

* * * * * /bin/echo "Successful cronjob" >> /home/scraperuser/cronjobtest.log

This runs every minute and appends "Successful cronjob" to the specified cronjobtest.log file. With this running i then realised i need to call the bash shell specifically. So i the final command for the scraper was:

0 9 * * * /bin/bash -c 'cd /home/scraperuser/projects/best-jobscraper-in-the-world && source .venv/bin/activate && python3 src/main.py' >> /home/scraperuser/jobscraper.log 2>&1

This cronjob runs everyday at 09:00 am UTC and executes from the bash shell the command in the quotes' and appends it to the jobscraper.log file.