Ingest Files
File ingestion in surveilr imports and processes files from a file system into a structured database for monitoring and analysis. This process is called walking the filesystem.
In essence, it involves scanning directories and files, then transferring their metadata and content into an RSSD.
Preparing for Ingestion
Before initiating the ingestion process, it’s crucial to understand what files and directories will be processed. surveilr provides a powerful feature called --dry-run to simulate this process without making any changes. This step is essential for ensuring that only the desired files and directories are ingested into the target RSSD.
Example
# Preview files in the current working directory (CWD)$ surveilr ingest files --dry-run
# Preview files in specific directories$ surveilr ingest files --dry-run -r /other -r /other2Setting Up The RSSD
surveilr uses a default SQLite database named resource-surveillance.sqlite.db for storing file system state data. However, in environments with multiple surveillance databases, it’s beneficial to distinguish each RSSD by including unique identifiers in the filename, such as the hostname. This setup facilitates the merging of databases with the surveilr admin merge-sql command.
# Setting a custom RSSD path with a unique identifier$ export SURVEILR_STATEDB_FS_PATH="resource-surveillance-$(hostname).sqlite.db"
# or# Set the custom path by passing it as a value to the `-d` argument$ surveilr ingest -d "resource-surveillance-$(hostname).sqlite.db" filesPerforming File Ingestions
With surveilr, you can easily ingest files from the current working directory or any specified directories. This section covers the commands to perform these ingestions, including how to display statistics about the ingested data.
For a file tree represented below:
/my-files├── project-a│   ├── data.csv│   └── config.yml|   └── schema.json├── project-b│   ├── draft.docx│   └── references.pumlExamples
# Ingest files from the CWD$ cd my-files$ surveilr ingest files
# Ingest files from specific directories by specifying a regex combination$ surveilr ingest files -r my-files/project*
# Ingest files from the CWD and display statistics$ surveilr ingest files --stats