How to Access PILW.IO StorageVault with S3cmd

S3cmd is another easy to use Command line client, which can be used for accessing the S3 protocol-based storage systems or backups. The tool itself is free for Linux or Mac-based systems. For Windows, there is a paid version available called S3ExpressHere is a quick tutorial to work with pilw.io StorageVault with the S3cmd client. 

The blog article herewith will describe the Linux based setup (Ubuntu 18.04LTS), we cover for now S3cmd only.

 

Installation

To install S3cmd, we have several options:

  • Download from Open Source community-driven repository SourceForge
  • Download from GitHub
  • Or install from Linux repositories

Choosing the latter option, you need to be alert the version is not the latest there. It is recommended to use installation from one of the first two options. S3cmd requires to have Python installed and one reason going for the latest version is to use Python version 3.

 

SourceForge Installation

Let’s go first with the SourceForge based installation routine. I created directory s3cmd, where to download the software and run the installation from. First, you need to download the latest version of S3cmd. The latest version available at the time when blog article was written is 2.0.2. So the command is:

~/s3cmd$ wget https://sourceforge.net/projects/s3tools/files/s3cmd/2.0.2/s3cmd-2.0.2.tar.gz

We need to unzip the source:

~/s3cmd$ tar zxvf s3cmd-2.0.2.tar.gz

The files were unzipped to s3cmd-2.0.2 directory and we can remove now the archive, since we do not need it anymore and change directory to unzipped archive:

~/s3cmd$ rm s3cmd-2.0.2.tar.gz
~/s3cmd$ cd s3cmd-2.0.2/

Getting that far, you need to install the S3cmd software:

~/s3cmd/s3cmd-2.0.2$ sudo python3 setup.py install
Using xml.etree.ElementTree for XML processing
running install
.
.
.
Finished processing dependencies for s3cmd==2.0.2

And we should be good to go with using S3cmd.

 

GitHub Installation

For example, purpose herewith created new directory github_s3cmd. Just download the latest master from GitHub:

~/github_s3cmd$ git clone git://github.com/s3tools/s3cmd

The command will copy the latest master to s3cmd directory. You need to run the installation from that directory:

~/github_s3cmd$ cd s3cmd
~/gihub_s3cmd/s3cmd-2.0.2$ sudo python3 setup.py install

And done.

 

Linux Repository Based Installation

For software operations Ubuntu uses Advanced Packaging Tool (apt) to manage software. Here is how the s3cmd can be installed in Ubuntu or Debian systems:

~$ sudo apt-get update
~$ sudo apt-get install s3cmd

All necessary dependencies are installed during the process. At the time when blog article is written, the version of S3cmd installed with Linux repository is 2.0.1.

 

Using S3cmd

Using S3cmd is fairly simple. We will not go through all the possible options and features. If needed, the documentation to the commands and options can be seen with s3cmd -h command. Rather we just show a couple of use cases here with explanations. First, we need to configure s3cmd with the following command:

~$ s3cmd --configure
...
Access Key: LY6PSWJKNZPPFAT8QVPP
Secret Key: pfJuhreDHPNrQhqCiyaGoDvWT3RmTHmuh8XLomLL
Default Region [US]:
S3 Endpoint [s3.amazonaws.com]: s3.pilw.io:8080
DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]: %(bucket)s.s3.pilw.io:8080
Encryption password:
Path to GPG program [/usr/bin/gpg]:
Use HTTPS protocol [Yes]: Yes
HTTP Proxy server name:

New settings:
  Access Key: LY6PSWJKNZPPFAT8QVPP
  Secret Key: pfJuhreDHPNrQhqCiyaGoDvWT3RmTHmuh8XLomLL
  Default Region: US
  S3 Endpoint: s3.pilw.io:8080
  DNS-style bucket+hostname:port template for accessing a bucket: %(bucket)s.s3.pilw.io:8080
  Encryption password:
  Path to GPG program: /usr/bin/gpg
  Use HTTPS protocol: True
  HTTP Proxy server name:
  HTTP Proxy server port: 0

Test access with supplied credentials? [Y/n] y
Please wait, attempting to list all buckets...
Success. Your access key and secret key worked fine :-)

Now verifying that encryption works...
Not configured. Never mind.

Save settings? [y/N] y
Configuration saved to '~/.s3cfg'

For a sake of simplicity, the output of the configuration command is made shorter. For Accces Key and Secret Key use your own keys. Keys in example herewith are for explanation purpose only and not valid for use. We have a blog article to explain how to get keys. The important is also S3 endpoint value s3.pilw.io:8080. The rest of the answers can be left defaults.

Once you need to make changes to the configuration, just run the command s3cmd --configure again.

Once done, you can run s3cmd ls command to test. Here is an example output:

~$ s3cmd ls
2018-08-20 04:57  s3://nextcloud-demo
2018-09-08 07:25  s3://orchesto
2018-08-15 15:25  s3://prodbucket
2018-06-15 08:01  s3://s3fsmount

If you would have some buckets there, these will be listed. Otherwise, the list is empty, but you should not get any error messages, once everything is properly configured.

New buckets can be created with the following command:

~$ s3cmd mb s3://s3cmd-test
Bucket 's3://s3cmd-test/' created

Now lets put some files to newly created bucket:

~$ s3cmd put testset/*.txt s3://s3cmd-test/
upload: 'testset/read-test.txt' -> 's3://s3cmd-test/read-test.txt'  [1 of 3]
upload: 'testset/sfile_1.txt' -> 's3://s3cmd-test/sfile_1.txt'  [2 of 3]
upload: 'testset/sfile_2.txt' -> 's3://s3cmd-test/sfile_2.txt'  [3 of 3]

We can list the bucket contents like that:

 ~$ s3cmd ls s3://s3cmd-test
2018-10-06 17:44        58   s3://s3cmd-test/read-test.txt
2018-10-06 17:44      4717   s3://s3cmd-test/sfile_1.txt
2018-10-06 17:44     11700   s3://s3cmd-test/sfile_2.txt

To retrieve a file from the bucket

~$ s3cmd get s3://s3cmd-test/read-test.txt
download: 's3://s3cmd-test/read-test.txt' -> './read-test.txt'  [1 of 1]

When you want to retrieve the file and save it locally with a different name, just describe new name at the end of the command above.

To delete a bucket, you must empty it first:

~$ s3cmd del s3://s3cmd-test/*
delete: 's3://s3cmd-test/read-test.txt'
delete: 's3://s3cmd-test/sfile_1.txt'
delete: 's3://s3cmd-test/sfile_2.txt'

~$ s3cmd rb s3://s3cmd-test/
Bucket 's3://s3cmd-test/' removed

Files can be deleted by name also, e.g. replacing wildcard (*) with the file name or matching part of the file name with the wildcard.

 

Cool Stuff

Now, when having played with single files, s3cmd can be used also for a bit more complicated stuff. Like file syncing or even backup. It means you can add only files which exist in source but do not exist in the bucket. Or remove files from bucket, which does not exist in a source. To build this way kind of simple backup solution, that can be automated in example with task schedulers. Or if you will, besides simple get and put operation, there is also sync operation.

Here is the initial list of files in a source local directory:

./sfile_2.txt
./testdir1
./testdir1/sfile_1_1.txt
./testdir1/sfile_1_3.txt
./testdir1/sfile_1_2.txt
./sfile_1.txt
./testdir2
./testdir2/sfile_2_2.txt
./testdir2/sfile_2_3.txt
./testdir2/sfile_2_1.txt
./read-test.txt

Lets put some files to bucket s3://s3cmd-sync, also name of uploaded file in the bucket was changed. Uploaded one directory to the same bucket also. To upload directory, I need to use --recursive flag with the command.

~/testset$ s3cmd put read-test.txt s3://s3cmd-sync/uploaded-dir/uploaded-read-test.txt
upload: 'read-test.txt' -> 's3://s3cmd-sync/uploaded-dir/uploaded-read-test.txt'  [1 of 1]
 
~/testset$ s3cmd put --recursive testdir1 s3://s3cmd-sync/uploaded-dir/
upload: 'testdir1/sfile_1_1.txt' -> 's3://s3cmd-sync/uploaded-dir/testdir1/sfile_1_1.txt'  [1 of 3]
upload: 'testdir1/sfile_1_2.txt' -> 's3://s3cmd-sync/uploaded-dir/testdir1/sfile_1_2.txt'  [2 of 3]
upload: 'testdir1/sfile_1_3.txt' -> 's3://s3cmd-sync/uploaded-dir/testdir1/sfile_1_3.txt'  [3 of 3]

If you see, the name of the uploaded directory was not changed, but it has given a new location in a tree. Now, lets run the sync command:

~/testset$ s3cmd sync ./ s3://s3cmd-sync/uploaded-dir/
upload: './sfile_1.txt' -> 's3://s3cmd-sync/uploaded-dir/sfile_1.txt'  [1 of 5]
upload: './sfile_2.txt' -> 's3://s3cmd-sync/uploaded-dir/sfile_2.txt'  [2 of 5]
upload: './testdir2/sfile_2_1.txt' -> 's3://s3cmd-sync/uploaded-dir/testdir2/sfile_2_1.txt'  [3 of 5]
upload: './testdir2/sfile_2_2.txt' -> 's3://s3cmd-sync/uploaded-dir/testdir2/sfile_2_2.txt'  [4 of 5]
upload: './testdir2/sfile_2_3.txt' -> 's3://s3cmd-sync/uploaded-dir/testdir2/sfile_2_3.txt'  [5 of 5]
remote copy: 'uploaded-read-test.txt' -> 'read-test.txt'
Done. Uploaded 38850 bytes in 1.0 seconds, 37.94 kB/s.

As seen, only the files that were not uploaded were synchronised to bucket s3cmd-sync/uploaded-dir. S3cmd checks file sizes and checksums. To test it, make changes to read-test.txt file and run sync command again. The neat feature is also --dry-run option, when you are not sure, what will be deleted or added. This option will print output of all changes without executing the command.

~/testset$ ls -l read-test.txt
-rw-rw-r--  1 user user    58 Sep 30 15:53 read-test.txt
~/testset$ echo "this is a test" >> read-test.txt

~/testset$ ls -l read-test.txt
-rw-rw-r-- 1 user user 73 Oct  7 04:29 read-test.txt

~/testset$ rm sfile_1.txt

~/testset$ s3cmd sync --dry-run --delete-removed ./ s3://s3cmd-sync/uploaded-dir/
delete: 's3://s3cmd-sync/uploaded-dir/sfile_1.txt'
upload: './read-test.txt' -> 's3://s3cmd-sync/uploaded-dir/read-test.txt'
WARNING: Exiting now because of --dry-run

The result shows that read-test.txt will be uploaded to bucket. There are also couple of interesting options:
--skip-existing – does not check files if these are present. No file checksum will be tested
and --delete-removed – will delete files from bucket, which does not exist locally anymore.

Sometimes there are files which you do not want to transfer at all. Like temporary files or some hidden files, etc. For that there are few options to exclude or include files:

  • --exclude / --include – you can describe files or directories to exclude or include to the remote S3 bucket. Standard shell wildcards work also, in example *.jpg.
  • --rexclude / --rinclude – the excluded or included file list can be defined with regular expression patterns.
  • --exclude-from / --include-from – when previous options above will have patterns described in command line, the exclude and include pattern can be described in the file and provided as an argument with these options. The file will be standard text file and you can have multiple lines in the file to describe which files will be and which files will not be transferred. The example of the file will be below herewith.

Assuming you have file with all exclusion patterns defined with --exclude-from, but you would still like to have this one file uploaded to s3, then option --rinclude can be added to command line with file name or pattern that must be uploaded from excluded list.

The use case example, there is a set of various files created. The test file list is:

./sfile_1.txt
./sfile_2.txt
./sfile_3.txt
./sfile_1.tmp
./sfile_2.tmp
./sfile_1.log
./read-test.txt
./testdir1
./testdir1/sfile_1_1.tmp
./testdir1/sfile_1_2.tmp
./testdir1/sfile_1_1.txt
./testdir1/sfile_1_2.txt
./testdir1/sfile_1_3.txt
./testdir1/sfile_1_1.log
./testdir2
./testdir2/sfile_2_1.txt
./testdir2/sfile_2_2.txt
./testdir2/sfile_2_3.txt
./testdir2/sfile_2_1.tmp
./testdir2/sfile_2_2.tmp
./testdir2/sfile_2_1.log

Here are the contents of the exclude-from file, named s3cmd_exclude.lst:

# These pattern matches will be excluded from my daily backup
*.log
*.tmp

And here is the command (with --dry-run enabled) to run the backup:

~$ s3cmd sync --dry-run --exclude-from s3cmd_exclude.lst --include 'sfile_2_*.log' testset s3://s3cmd-sync/backup-demo/
exclude: testset/sfile_1.log
exclude: testset/sfile_1.tmp
exclude: testset/sfile_2.tmp
exclude: testset/testdir1/sfile_1_1.log
exclude: testset/testdir1/sfile_1_1.tmp
exclude: testset/testdir1/sfile_1_2.tmp
exclude: testset/testdir2/sfile_2_1.tmp
exclude: testset/testdir2/sfile_2_2.tmp
upload: 'testset/read-test.txt' -> 's3://s3cmd-sync/backup-demo/testset/read-test.txt'
upload: 'testset/sfile_1.txt' -> 's3://s3cmd-sync/backup-demo/testset/sfile_1.txt'
upload: 'testset/sfile_2.txt' -> 's3://s3cmd-sync/backup-demo/testset/sfile_2.txt'
upload: 'testset/sfile_3.txt' -> 's3://s3cmd-sync/backup-demo/testset/sfile_3.txt'
upload: 'testset/testdir1/sfile_1_1.txt' -> 's3://s3cmd-sync/backup-demo/testset/testdir1/sfile_1_1.txt'
upload: 'testset/testdir1/sfile_1_2.txt' -> 's3://s3cmd-sync/backup-demo/testset/testdir1/sfile_1_2.txt'
upload: 'testset/testdir1/sfile_1_3.txt' -> 's3://s3cmd-sync/backup-demo/testset/testdir1/sfile_1_3.txt'
upload: 'testset/testdir2/sfile_2_1.log' -> 's3://s3cmd-sync/backup-demo/testset/testdir2/sfile_2_1.log'
upload: 'testset/testdir2/sfile_2_1.txt' -> 's3://s3cmd-sync/backup-demo/testset/testdir2/sfile_2_1.txt'
upload: 'testset/testdir2/sfile_2_2.txt' -> 's3://s3cmd-sync/backup-demo/testset/testdir2/sfile_2_2.txt'
upload: 'testset/testdir2/sfile_2_3.txt' -> 's3://s3cmd-sync/backup-demo/testset/testdir2/sfile_2_3.txt'
WARNING: Exiting now because of --dry-run

Just as for explanation, all .log and .tmp files will not be uploaded to bucket. However, as there was given in command line --include 'sfile_2_*.log', the 'testset/testdir2/sfile_2_1.log' had matching shell-like pattern, meaning the file will be uploaded still.

This is a way how you pretty much can have the low-cost backup software. You’d still need some kind of scheduler to automate the backups and you are set. It is still not comparable to the functionality of actual backup softwares, but for simple backup routines, it might work. How cool is that?

 

 

0 Comments

Add Yours →

Leave a Reply