Interacting with Resources#

In Coscine, Resources store all of your data and metadata. As such they represent a key data structure, which you will most certainly interact a lot with.

The Resource object#

The Coscine Python SDK abstracts Coscine resources with a custom Resource datatype. An instance of type coscine.Resource has the following properties:

Property

Datatype

Modifiable

Description

access_url

str

The URL to request access to the Resource

archived

bool

Set the Resource to archived

application_profile

ApplicationProfile

The resource metadata application profile

created

date

Timestamp when the resource was created

creator

str

Coscine user ID of the resource creator

description

str

Resource description

disciplines

list[Discipline]

List of scientific disciplines

display_name

str

Short resource name as displayed in the web interface

fixed_values

dict

Resource default metadata

id

str

Coscine Resource ID

keywords

list[str]

List of related keywords

license

License

Resource data license

name

str

Full resource name

options

ResourceTypeOptions

Settings regarding resource type

pid

str

Persistent Identifier for the resource

quota

ResourceQuota

Resource quota settings

type

ResourceType

Resource type information (e.g. for RDS-S3)

url

str

Project URL as in the web browser

usage_rights

str

Metadata rightholders

visibility

Visibility

Resource metadata visibility setting

Properties can be accessed via the usual dot notation i. e. print(resource.name). Certain properties can be modified (✔) locally and updated remotely by calling Resource.update().

Accessing resources#

The following snippet demonstrates how to get a list of Coscine Resources and prints all returned resources by their full name.

import coscine

client = coscine.ApiClient("My Coscine API Token")
project = client.project("My Project")
resources = project.resources()
for resource in resources:
	print(resource.name)

The list of resources can be filtered according to certain unique resource properties. In the subsequent snippet, only one resource is returned, saving us from manually going through the list if we want to access a speific resource.

try:
	resource = project.resource("My Resource Display Name")
	print(resource.name)
except coscine.TooManyResults:
	print("Found more than 1 resource with the same property!")
except coscine.NotFoundError:
    print("Failed to find a resource via the property!")

We can filter projects according to these properties:

  • project.resource(“Display Name”, Resource.display_name)

  • project.resource(“Full Name”, Resource.name)

  • project.resource(“Resource ID”, Resource.id)

  • project.resource(“Persistent Identifier”, Resource.pid)

  • project.resource(“Web URL”, Resource.url)

Other properties are rejected!

Creating a new resource#

Resources can be created via a simple function call. The new resource object is returned immediately after the resource has been created.

import coscine
from datetime import datetime
client = coscine.ApiClient("My Coscine API Token")
project = client.project("My Project")
resource = project.create_resource(
    "Full Resource Name",
    "Display Name",
    "Resource Description.",
    client.license("MIT License"),
    client.visibility("Project Members"),
    [client.discipline("Computer Science 409")],
    client.resource_type("linked"),
    0, # Quota
    client.application_profiles()[0] # A random application profile
)
print(resource)

Deleting a resource#

If we create resources with the Coscine Python SDK we may also need to delete some resources.

import coscine
client = coscine.ApiClient("My Coscine API Token")
resource = client.project("Some Project").resource("Some Resource I don't like")
try:
    resource.delete()
except coscine.CoscineException:
    print("Something went wrong. Maybe we are not authorized to delete that resource. :(")

Be aware that this function may fail due to unsufficient privileges. Only project owners can delete a resource - members can not! After calling delete() should also not use the resource object in python anymore, as it is invalid in Coscine (but can still be interacted with locally).

Downloading a resource#

Downloading a resource and all of the files inside of that resource is easy with the Resource.download() method. It accepts an optional path where it should save the resource to. If no path is specified, the current local directory is chosen as path.

import coscine

client = coscine.ApiClient("My Coscine API Token")
resource = client.project("My Project").resource("Fancy resource - please don't steal")
resource.download(path="./")  # This is my resource now hehehe

Accessing resource type specific settings#

Resources of type RDS-S3 have special options that we can use to directly access the underlying storage medium via different means.

import coscine

client = coscine.ApiClient("My Coscine API Token")
resource = client.project("My Project").resource("My resource")
print(resource.options.access_key_write)
print(resource.options.secret_key_write)
print(resource.options.access_key_read)
print(resource.options.secret_key_read)
print(resource.options.bucket_name)
print(resource.options.endpoint)

Interacting with files#

Fetching files from a resource can be done in various ways.

import coscine

client = coscine.ApiClient("My Coscine API Token")
resource = client.project("My Project").resource("My resource")
for file in resource.files():
    print(file.path)

This will fetch all files in the root “directory” from the resource. Certain resource types allow for nested directories. To fetch all the files in these directories as well, we need to recurse through them:

import coscine

client = coscine.ApiClient("My Coscine API Token")
resource = client.project("My Project").resource("My resource")
for file in resource.files(recursive=True):
    print(file.path)

We can also gather all files for a specific directory:

import coscine

client = coscine.ApiClient("My Coscine API Token")
resource = client.project("My Project").resource("My resource")
for file in resource.files(path="Only this directory/"):
    print(file.path)

Files can be fetched together with their metadata which will result in at least two API calls - one for the files and one for the metadata. This is however cheaper and preferrable if we already know that we’ll interact with a lot of metadata later down the road. Fetching files without their metadata and later accessing file metadata results in a seperate API call for each time we call the FileObject.metadata() method - in the case of 100 files that is 100 extra API calls (instead of just the 2 API calls). Using resource.files(with_metadata=True) is therefore recommended.
Otherwise fetched files will not have any metadata attached to them unless we explicitly call the respective files metadata method. If we however know that we don’t need the metadata anyway, we can fetch the files much more quickly by leaving out the API call for the metadata, hence why its just an optional function parameter. The metadata endpoint of Coscine is unfortunately very slow, so this really makes a difference!

Uploading is relatively straightforward via the Resource.upload() method:

import coscine

client = coscine.ApiClient("My Coscine API Token")
resource = client.project("My Project").resource("My resource")
metadata = resource.metadata_form()
metadata["Title"] = "Bla bla bla"
metadata["Creator"] = "Me"
with open("Local file path on harddrive", "rb") as fp:
    resource.upload("My file as it would appear in Coscine.txt", fp, metadata)
# Alternative, i.e. for Linked Data resources:
resource.upload("My file as it would appear in Coscine.txt", "Linked Data Content as string or bytes", metadata)

For uploading via a file handle, the file must be opened in binary mode!

One could also use the Resource.stream() method for uploading without metadata if that is supported by the respective resource type (only S3).

For downloading:

import coscine

client = coscine.ApiClient("My Coscine API Token")
resource = client.project("My Project").resource("My resource")
file = resource.file("My file.txt")
file.download()

If use_native is set to True on initialization of the ApiClient, the default upload/download for S3 resources is via a dedicated S3 client. This should be faster, more stable and allow for larger file up- and downloads than via Coscine. Disable the use_native setting when you would like to always use Coscine or if the S3 connection results in errors. The setting can also be modified on the fly via client.native = True/False.

RDF and SPARQL#

Behind the scenes the Coscine Python SDK is using rdflib to interact with metadata, application profiles and so on. It exposes some functionality to its users so that you can directly interact with the semantic components too.

You can directly run SPARQL queries on file metadata. This allows you to run advanced search queries on your metadata and get the files that match certain criteria as a search result. In all queries the ?path parameter is mandatory! If it is missing, an expection will be raised.

import coscine
client = coscine.ApiClient("My API Token")
resource = client.project("My Project").resource("My Resource")
# The following query should filter the files of a resource with
# the base application profile or based on the base profile for
# all files for which their metadata contains the `dcterms:created`
# property (Date Created) and where the month part of the value of
# that property is 2 (February).
# So the query should return all files that have been created in any
# year in the month of february. The limit at the end limits the
# number of results to 20.
QUERY = (
    """
    SELECT ?path WHERE {
        ?path dcterms:created ?value .
        FILTER (month(?value) = 2)
    } LIMIT 20
    """
)
files = resource.query(QUERY)
for file in files:
    print(file.path)

How does this differ from using rdflib to query metadata? A SPARQL query with rdflib returns the metadata query results, not the relevant file objects of the resource. That is basically the only difference. It would be rather nontrivial to implement that functionality by hand, hence the extra method.
Depending on the number of files in the resource and the amount of metadata, certain queries can take a long time (in the order of seconds to low digit minutes in extreme cases).