From Github

This assumes you have access to a terminal and have python and git installed on your system.

  • Clone or download the repository

    • git clone https://github.com/Crivella/ocr_translate.git

    • cd ocr_translate

  • (Optional) create and use a virtual environment

    • python -m venv venv

    • venv\Scripts\activate (or the equivalent for your OS)

  • Install the project and its dependencies - pip install .

The Github repo provides not only the Django app files, but also the already configured project files used to start the server.

Run the server

You can either use the run_server.py script that will bootstrap the server for you,

  • python run_server.py

or manually run the Django server.

  • Create/Initialize your database by running

    • python manage.py migrate

    inside your project folder.

  • Run the server:

    • With the Django development server. This is more oriented for developing than deploying, but is fine for a self-hosted single-user server accepting connections only on localhost

      • python manage.py runserver PORT

      • The suggested PORT would be 4000 as it is the one set by default in the extension

    • Nginx + Gunicorn (Linux only):

      • Check Dockerfile and run_server.py files, as this is what the provided docker image makes use of.

At least for the first time, it is suggested to run the server with the Environment variable AUTOCREATE_LANGUAGES set to “true” to automatically load the validated languages and models provided by the project.

Notes

  • Gunicorn workers will each spawn a separate instance of the loaded models, each taking its own space in the memory. This can quickly fill up the memory especially if running on GPU. Ideally set this to 1.

  • Django development server will spawn new threads for handling incoming requests (if no currently existing thread is free), which share the same memory. Running more than one worker per loaded model concurrently might slow down the actual computation and in some case also block the execution.