Ingest Files
File ingestion in surveilr
imports and processes files from a file system into a structured database for monitoring and analysis. This process is called walking the filesystem.
In essence, it involves scanning directories and files, then transferring their metadata and content into an RSSD.
Preparing for Ingestion
Before initiating the ingestion process, it’s crucial to understand what files and directories will be processed. surveilr
provides a powerful feature called --dry-run
to simulate this process without making any changes. This step is essential for ensuring that only the desired files and directories are ingested into the target RSSD.
Example
# Preview files in the current working directory (CWD)$ surveilr ingest files --dry-run
# Preview files in specific directories$ surveilr ingest files --dry-run -r /other -r /other2
Setting Up The RSSD
surveilr
uses a default SQLite database named resource-surveillance.sqlite.db
for storing file system state data. However, in environments with multiple surveillance databases, it’s beneficial to distinguish each RSSD
by including unique identifiers in the filename, such as the hostname. This setup facilitates the merging of databases with the surveilr admin merge-sql
command.
# Setting a custom RSSD path with a unique identifier$ export SURVEILR_STATEDB_FS_PATH="resource-surveillance-$(hostname).sqlite.db"
# or# Set the custom path by passing it as a value to the `-d` argument$ surveilr ingest -d "resource-surveillance-$(hostname).sqlite.db" files
Performing File Ingestions
With surveilr
, you can easily ingest files from the current working directory or any specified directories. This section covers the commands to perform these ingestions, including how to display statistics about the ingested data.
For a file tree represented below:
/my-files├── project-a│ ├── data.csv│ └── config.yml| └── schema.json├── project-b│ ├── draft.docx│ └── references.puml
Examples
# Ingest files from the CWD$ cd my-files$ surveilr ingest files
# Ingest files from specific directories by specifying a regex combination$ surveilr ingest files -r my-files/project*
# Ingest files from the CWD and display statistics$ surveilr ingest files --stats