Housekeeping for JFrog Artifactory

July 15, 2019

In Information Technology Housekeeping refers to standard routines whereby a computer system is cleaned up after usage (e.g. freeing resources such as disk space). This housekeeping might include such activities as removing or archiving logs that the system has made as a result of the user's activities, or deletion of temporary files which may otherwise take up space.

Starting Point

Housekeeping can be described as a necessary chore, required to perform a particular computer's regular activity but not necessarily part of the main functionality.

JFrog Artifactory's main functionality is to act as a universal repository manager. Software Developers produce Java artifacts and store them in Artifactory to provide the newest features and releases during the software development process. Many increments (snapshots) and releases are regular.

Most of the time, if you set up and start with a new component in your development process, engineers tend to neglect the necessary housekeeping. Negligence is expensive.

In one of our customer's situations analysis, we discovered that this is the case with JFrog Artifactory as a cloud service on Amazon Web-Services (AWS). As a bonus, we helped our customer to reduce system resources (disk space) by 80% and reduce costs by 50%.

Previous Situation

The software development produced a lot of snapshots and releases from the beginning of time. Most of these artifacts are obsolete or outdated. Overall the disk usage was around 300 to 350 GB. The monthly invoice is around 570 $ with a rising tendency. See below a storage summary of Artifactory Cloud.

Storage Summary

Initial Housekeeping

JFrog Artifactory provides several possibilities to perform housekeeping. We used in the first step the RESTful API and the second step the JFrog CLI.

REST API

Before we start, we check if we could access the Artifactory service. We use example values since we have a non-disclosure agreement with our customer.

$ export ARTIFACTORY_URL=https://mimacom.jfrog.io/mimacom
$ curl -u "curator-user:basic-auth-password" "$ARTIFACTORY_URL/api/system/ping"
OK

Above example use the system health check, and Artifactory responds with OK.

After the check, we can now investigate which artifacts since 2019-03-31 are unused. Use currentmillis to get timestamps in milliseconds.

curl -XGET "$ARTIFACTORY_URL/api/search/usage?notUsedSince=1553986800000&repos=libs-snapshots-local" \
-u curator-user > not-downloaded.txt

The output is saved in not-downloaded.txt. See below an example entry:

{
  "results" : [ {
    "uri" : "https://mimacom.jfrog.io/mimacom/api/storage/libs-snapshots-local/ch/mimacom/components/components-persistence/1.0.0-SNAPSHOT/components-persistence-1.0.0-20140410.125449-1-sources.jar",
    "downloadCount" : 0,
    "lastDownloaded" : "2014-04-10T12:54:52.353Z",
    "remoteDownloadCount" : 0,
    "remoteLastDownloaded" : "1970-01-01T00:00:00.000Z"
  }, .. ]}}

The REST API provides a delete call for one artifact. In order to delete all, you need to parse the output and invoke for each artifact a delete. This is way too inconvienient. Fortunately the JFrog CLI solves that problem.

JFrog CLI

The command-line application bases on the programming language go.

Installation on Mac OS

brew install jfrog-cli-go

Install on Linux

npm install -g jfrog-cli-go

On the first run, the application asks for default settings.

To avoid this message in the future, set the JFROG_CLI_OFFER_CONFIG environment variable to false.
The CLI commands require the Artifactory URL and authentication details
Configuring JFrog CLI with these parameters now will save you having to include them as command options.
You can also configure these parameters later using the 'config' command.
Configure now? (y/n): y
Artifactory server ID: https://mimacom.jfrog.io/mimacom
Artifactory URL [https://mimacom.jfrog.io/mimacom]: https://mimacom.jfrog.io/mimacom
Access token (Leave blank for username and password/API key):
User [curator-user]:

We can use now the Artifactory Query Language (AQL) to retrieve all relevant item for housekeeping. Our final result contains the relevant repositories for housekeeping in the file query-all.json. All artifacts older than three months are going to be deleted.

{
    "files": [{
        "aql": {
            "items.find": {
                "$or": [{
                    "$and": [{
                        "repo": "libs-snapshots-local",
                        "created": {
                            "$before": "3mo"
                        }
                    }],
                    "$and": [{
                        "repo": "dockerv2-local",
                        "created": {
                            "$before": "3mo"
                        }
                    }],
                    "$and": [{
                        "repo": "libs-releases-local",
                        "created": {
                            "$before": "3mo"
                        }
                    }]
                }]
            }
        }
    }]
}

Pay attention that every element has an and conjuction within the first or conjunction of the query. This is neccessary, otherwise you will get false results.

The AQL acts as input for the delete command. You can test the delete command with the --dry-run option against the product shortcut rt (JFrog Artifactory).

$ jfrog rt delete --spec=query-all.json --dry-run
[Info] Searching artifacts...
[Info] Found 8760 artifacts.
Are you sure you want to delete the above paths? (y/n): y
...
Info] [Dry run] Deleting: libs-snapshots-local/org/springframework/data/spring-data-elasticsearch/3.0.0.BUILD-SNAPSHOT/maven-metadata.xml
[Info] [Dry run] Deleting: libs-snapshots-local/org/springframework/data/spring-data-elasticsearch/3.0.0.BUILD-SNAPSHOT/spring-data-elasticsearch-3.0.0.BUILD-SNAPSHOT-DATAES-363-sources.jar
[Info] [Dry run] Deleting: libs-snapshots-local/org/springframework/data/spring-data-elasticsearch/3.0.0.BUILD-SNAPSHOT/spring-data-elasticsearch-3.0.0.BUILD-SNAPSHOT-DATAES-363.jar
[Info] [Dry run] Deleting: libs-snapshots-local/org/springframework/data/spring-data-elasticsearch/3.0.0.BUILD-SNAPSHOT/spring-data-elasticsearch-3.0.0.BUILD-SNAPSHOT-DATAES-363.pom
[Info] [Dry run] Deleting: libs-snapshots-local/org/springframework/data/spring-data-elasticsearch/3.0.0.BUILD-SNAPSHOT/spring-data-elasticsearch-3.0.0.BUILD-SNAPSHOT.jar
[Info] [Dry run] Deleting: libs-snapshots-local/org/springframework/data/spring-data-elasticsearch/3.0.0.BUILD-SNAPSHOT/spring-data-elasticsearch-3.0.0.BUILD-SNAPSHOT.pom
{
  "status": "failure",
  "totals": {
    "success": 0,
    "failure": 8760
  }
}

Running the delete command without the --dry-run option either puts the artifacts into the trash bin or deletes it, depending on the settings. Artifacts in the trash bin get deleted after 14 days (default setting). If you disable the trash bin, the artifacts deletion is immediate.

After Housekeeping

Artifactory Cloud as Software as a Service has many advantages. To check the costs, use the pricing calculator.

The calculation before housekeeping.

Cost Calculator before Housekeeping

The calculation after housekeeping.

New Cost Calculation

Summary

The following picture illustrates the data usage over time with the housekeeping in place.

Data Usage

We bundled the JFrog CLI in a docker container with a crontab execution, to perform a daily cleanup. With the new automated procedure in place, we helped our customer to keep the costs low. We reduced the average storage from 350 GB to 60 GB a day (80% less) and nearly halved the monthly bill from 570 $ to 290 $. Housekeeping is an essential duty — one who can make a difference between an outstanding or unexperienced consultant. Your customer deserves that you anticipate the game plan in the long run.

About the author: Vinh Nguyên

Loves to code, hike and mostly drink black coffee. Favors Apache Kafka, Elasticsearch, Java Development and 80's music.

Comments
Join us