Graph Analytics Engine API Documentation

graphanalyticsengine.proto

Top

Basic Usage

GraphAnalyticsEngineService is an API for interacting with graph analytics engines. Each engine corresponds to a deployment on AG, granting direct database access for loading graphs and storing results. A single database deployment can accommodate multiple graph analytics engines (GAEs).

Every call, which can take longer to complete, is asynchronous in the sense that it returns a job id and the result can/must be retrieved separately. Please note that these results must be deleted explicitly to free the memory used, since all results are stored in RAM.

The following trigger asynchronous operations, which might take longer to complete:

GraphAnalyticsEngineService

Methods with HTTP bindings

Method Name Method Pattern Body
GraphAnalyticsEngineLoadData POST /v1/loaddata *
GraphAnalyticsEngineLoadDataAql POST /v1/loaddataaql *
GraphAnalyticsEngineRunWcc POST /v1/wcc *
GraphAnalyticsEngineRunScc POST /v1/scc *
GraphAnalyticsEngineRunCompAggregation POST /v1/aggregatecomponents *
GraphAnalyticsEngineRunPageRank POST /v1/pagerank *
GraphAnalyticsEngineRunPythonFunction POST /v1/python *
GraphAnalyticsEngineRunIRank POST /v1/irank *
GraphAnalyticsEngineRunLabelPropagation POST /v1/labelpropagation *
GraphAnalyticsEngineRunAttributePropagation POST /v1/attributepropagation *
GraphAnalyticsEngineRunBetweennessCentrality POST /v1/betweennesscentrality *
GraphAnalyticsEngineRunLineRank POST /v1/linerank *
GraphAnalyticsEngineStoreResults POST /v1/storeresults *
GraphAnalyticsEngineListGraphs GET /v1/graphs
GraphAnalyticsEngineGetGraph GET /v1/graphs/{graph_id}
GraphAnalyticsEngineDeleteGraph DELETE /v1/graphs/{graph_id}
GraphAnalyticsEngineListJobs GET /v1/jobs
GraphAnalyticsEngineGetJob GET /v1/jobs/{job_id}
GraphAnalyticsEngineDeleteJob DELETE /v1/jobs/{job_id}
GraphAnalticsEngineShutdown DELETE /v1/shutdown

Methods and Argument types

Method NameRequest TypeResponse Type
GraphAnalyticsEngineLoadData GraphAnalyticsEngineLoadDataRequest GraphAnalyticsEngineLoadDataResponse

This API call fetches data from the deployment and loads it into memory of the engine for later processing. One can either use a named graph or a list of vertex collections and a list of edge collections. Currently, the API call always loads all vertices and edges from these collections. However, it is possible to select which attribute data is loaded alongside the vertices and the edge topology. These attribute values are stored into a column store, in which each column corresponds to an attribute and has as many rows as there are vertices in the graph. Each loaded graph will get a numerical ID, with which it can be used in computations. This is an asynchronous job which returns the job id immediately. Use the GET graph API with the returned graph ID to get information on errors and the outcome of the loading.

GraphAnalyticsEngineLoadDataAql GraphAnalyticsEngineLoadDataAqlRequest GraphAnalyticsEngineLoadDataResponse

This API fetches data from the ArangoGraph deployment via AQL and load it into memory of the engine for later processing. (NOT IMPLEMENTED YET)

GraphAnalyticsEngineRunWcc GraphAnalyticsEngineWccSccRequest GraphAnalyticsEngineProcessResponse

Process a previously loaded graph with the weakly connected components algorithm (WCC) and store the results in-memory. This essentially means that the direction of edges is ignored and then the connected components of the undirected graph are computed. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation.

GraphAnalyticsEngineRunScc GraphAnalyticsEngineWccSccRequest GraphAnalyticsEngineProcessResponse

Process a previously loaded graph with the strongly connected components algorithm (SCC) and store the results in-memory. This means that the direction of the edges is taken into account and two vertices A and B will be in the same strongly connected component if and only if there is a directed path from A to B and from B to A. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation.

GraphAnalyticsEngineRunCompAggregation GraphAnalyticsEngineAggregateComponentsRequest GraphAnalyticsEngineProcessResponse

Process a previously loaded graph and a computation which has computed connected components (weakly or strongly) by aggregating some vertex data over each component found. The result will be one distribution map for each connected component. It is stored in memory. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation.

GraphAnalyticsEngineRunPageRank GraphAnalyticsEnginePageRankRequest GraphAnalyticsEngineProcessResponse

Process a previously loaded graph with the pagerank algorithm and store the results in-memory. There are some parameters controlling the computation like the damping factor and the maximal number of supersteps. See the input message documentation for details. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation.

GraphAnalyticsEngineRunPythonFunction GraphAnalyticsEnginePythonFunctionRequest GraphAnalyticsEngineProcessResponse

Process a previously loaded graph with custom python based execution algorithm and store the results in-memory. See the input message documentation for details. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation.

GraphAnalyticsEngineRunIRank GraphAnalyticsEnginePageRankRequest GraphAnalyticsEngineProcessResponse

Process a previously loaded graph with the irank algorithm and store the results in-memory. The "irank" algorithms is a variant of pagerank, which changes the initial weight of each vertex. Rather than being 1/N where N is the number of vertices, the value is here different depending on from which vertex collection the vertex comes. If V is from vertex collection C and N is the number of vertices in C, then the initial weight of V is 1/N. As with pagerank, the total sum of ranks stays the same as an invariant of the algorithm. There are some parameters controlling the computation like the damping factor and the maximal number of supersteps. See the input message documentation for details. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation.

GraphAnalyticsEngineRunLabelPropagation GraphAnalyticsEngineLabelPropagationRequest GraphAnalyticsEngineProcessResponse

Process a previously loaded graph with the label propagation algorithm and store the results in-memory. There are some parameters controlling the computation like the name of the attribute to choose the start label from, a flag to indicate if the synchronous or the asynchronous variant is used and the maximal number of supersteps. See the input message documentation for details. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation.

GraphAnalyticsEngineRunAttributePropagation GraphAnalyticsEngineAttributePropagationRequest GraphAnalyticsEngineProcessResponse

Process a previously loaded graph with the attribute propagation algorithm and store the results in-memory. The algorithm basically reads a list of labels from a column for each vertex (see the loaddata operation, for which one can configure which attributes are loaded into the column store). The value can be empty or a string or a list of strings and the set of labels for each vertex is initialized accordingly. The algorithm will then simply propagate each label in each label set along the edges to all reachable vertices and thus compute a new set of labels. After a specified maximal number of steps or if no label set changes any more the algorithm stops. BEWARE: If there are many labels in the system and the graph is well-connected then the result can be huge! There are some parameters controlling the computation like the name of the attribute to choose the start label from, whether the synchronous or the asynchronous variant is to be used, if we propagate along the the edges forwards or backwards and the maximal number of supersteps. See the input message documentation for details. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation.

GraphAnalyticsEngineRunBetweennessCentrality GraphAnalyticsEngineBetweennessCentralityRequest GraphAnalyticsEngineProcessResponse

Process a previously loaded graph with the betweenness-centrality algorithm and store the results in-memory. See https://snap.stanford.edu/class/cs224w-readings/brandes01centrality.pdf for details. There are some parameters controlling the computation like the number of start vertices, the question as to whether edges should be followed in both directions, and whether or not a normalization is done. See the input message documentation for details. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation.

GraphAnalyticsEngineRunLineRank GraphAnalyticsEngineLineRankRequest GraphAnalyticsEngineProcessResponse

Process a previously loaded graph with the linerank algorithm and store the results in-memory. The algorithm measures the importance of a vertex by aggregating the importance of its incident edges. This represents the amount of information that flows through the vertex, therefore the result of this algorithm can be taken as an approximation for betweenness centrality, which is much more computation-intensive. The edge importance is computed by the probability that a random walker, visiting edges via vertices with random restarts, will stay at the edge.

GraphAnalyticsEngineStoreResults GraphAnalyticsEngineStoreResultsRequest GraphAnalyticsEngineStoreResultsResponse

Stores the results from previous jobs into the deployment. One can specify a number of job ids but the requirement is that they produce the same number of results. For example, results from different algorithms which produce one result per vertex can be written to the database together. The target collection must already exist and must be writable. The job produces one document per result and one can prescribe which attribute names should be used for which result. There are some parameters controlling the computation. See the input message description for details. The computation will return a numerical job id, with which the progress can be monitored. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the job.

GraphAnalyticsEngineListGraphs Empty GraphAnalyticsEngineListGraphsResponse

List the graphs in the engine.

GraphAnalyticsEngineGetGraph GraphAnalyticsEngineGraphId GraphAnalyticsEngineGetGraphResponse

Get information about a specific graph.

GraphAnalyticsEngineDeleteGraph GraphAnalyticsEngineGraphId GraphAnalyticsEngineDeleteGraphResponse

Delete a specific graph from memory.

GraphAnalyticsEngineListJobs Empty GraphAnalyticsEngineListJobsResponse

List the jobs in the engine (loading, computing or storing).

GraphAnalyticsEngineGetJob GraphAnalyticsEngineJobId GraphAnalyticsEngineJob

Get information about a specific job (in particular progress and result when done).

GraphAnalyticsEngineDeleteJob GraphAnalyticsEngineJobId GraphAnalyticsEngineDeleteJobResponse

Delete a specific job.

GraphAnalticsEngineShutdown Empty GraphAnalyticsEngineShutdownResponse

Shutdown service.

Empty

Empty input:

GraphAnalyticsEngineAggregateComponentsRequest

Request arguments for GraphAnalyticsEngineRunCompAggregation:

FieldTypeLabelDescription
graph_id uint64

Graph ID

job_id uint64

Job ID

aggregation_attribute string

Aggregation attribute:

GraphAnalyticsEngineAttributePropagationRequest

Request arguments for GraphAnalyticsEngineRunAttributePropagation.

FieldTypeLabelDescription
graph_id uint64

Graph ID. This attribute must be given.

start_label_attribute string

Start label attribute, must be stored in one column of the column store of the graph. Use id of vertex if set to "@id". Values can be empty or a string or a list of strings. All other values are transformed into a string. This attribute must be given.

synchronous bool optional

Flag to indicate whether synchronous (true) or asynchronous label propagation is used. The default is asynchronous, i.e. `false`.

backwards bool optional

Flag to indicate whether the propagation happens forwards (along the directed edges) or backwards (in the opposite direction). The default is forwards, i.e. `false`.

maximum_supersteps uint32 optional

Maximum number of steps to do, default is 64:

GraphAnalyticsEngineBetweennessCentralityRequest

Request arguments for GraphAnalyticsEngineRunPageRank:

FieldTypeLabelDescription
graph_id uint64

Graph ID

k uint64 optional

Number of start vertices, use 0 to start from every single vertex in the graph for a complete result. 0 is the default.

undirected bool optional

Flag, if edges should be used in both directions, default is false:

normalized bool optional

Flag, if a normalization with 1/((N-1)*(N-2)) should be applied, where N is the size of the largest orbit found. Default is false.

parallelism uint32 optional

Number of threads to use:

GraphAnalyticsEngineDeleteGraphResponse

Response for a delete graph request.

FieldTypeLabelDescription
graph_id uint64

ID of graph

error_code int32

Error code, 0 if no error

error_message string

Error message, empty if no error

GraphAnalyticsEngineDeleteJobResponse

Response for a delete job request.

FieldTypeLabelDescription
job_id uint64

ID of job

error bool

Error?

error_code int32

Error code, 0 if no error

error_message string

Error message, empty if no error

GraphAnalyticsEngineErrorResponse

Generic error

FieldTypeLabelDescription
error_code int32

Error code, 0 if no error

error_message string

Error message, empty if no error

GraphAnalyticsEngineGetGraphResponse

FieldTypeLabelDescription
error_code int32

Error code, 0 if no error

error_message string

Error message, empty if no error

graph GraphAnalyticsEngineGraph

The graph

GraphAnalyticsEngineGraph

Description of a graph.

FieldTypeLabelDescription
graph_id uint64

ID of graph

number_of_vertices uint64

Number of vertices:

number_of_edges uint64

Number of edges:

memory_usage uint64

Memory usage:

memory_per_vertex uint64

Memory usage per vertex:

memory_per_edge uint64

Memory usage per edge:

GraphAnalyticsEngineGraphId

ID of an engine and id of a graph

FieldTypeLabelDescription
graph_id string

Graph ID (for path)

GraphAnalyticsEngineJob

Description of a job.

FieldTypeLabelDescription
job_id uint64

ID of the current job

graph_id uint64

Graph of the current job

total uint32

Total progress. Guaranteed to be positive, but could be 1

progress uint32

Progress (0: no progress, equal to total: ready)

error bool

Error flag

error_code int32

Error code

error_message string

Error message

source_job string

Optional source job

comp_type string

Computation type:

memory_usage uint64

Memory usage:

runtime_in_microseconds uint64

Runtime of job in microseconds

GraphAnalyticsEngineJobId

ID of an engine and id of a job

FieldTypeLabelDescription
job_id string

Graph ID (for path)

GraphAnalyticsEngineLabelPropagationRequest

Request arguments for GraphAnalyticsEngineRunLabelPropagation.

FieldTypeLabelDescription
graph_id uint64

Graph ID

start_label_attribute string

Start label attribute, must be stored in one column of the column store of the graph. Use id of vertex if set to "@id".

synchronous bool optional

Flag to indicate whether synchronous (true) or asynchronous label propagation is used (default is false):

random_tiebreak bool optional

Flag indicating if ties in the label choice are broken randomly (uniform distribution) or deterministically (smallest label amongst the most frequent ones), default is false:

maximum_supersteps uint32 optional

Maximum number of steps to do, default is 64:

GraphAnalyticsEngineLineRankRequest

Request arguments for GraphAnalyticsEngineRunLineRank:

FieldTypeLabelDescription
graph_id uint64

Graph ID

damping_factor double optional

Damping factor, default is 0.85:

maximum_supersteps uint32 optional

Maximal number of supersteps, default is 64:

GraphAnalyticsEngineListGraphsResponse

Response arguments from GraphAnalticsEngineListGraphs.

FieldTypeLabelDescription
error_code int32

Error code, 0 if no error

error_message string

Error message, empty if no error

graphs GraphAnalyticsEngineGraph repeated

The graphs

GraphAnalyticsEngineListJobsResponse

Response arguments from GraphAnalyticsEngineListJobs.

FieldTypeLabelDescription
error_code int32

Error code, 0 if no error

error_message string

Error message, empty if no error

jobs GraphAnalyticsEngineJob repeated

The graphs

GraphAnalyticsEngineLoadDataAqlRequest

Request arguments for GraphAnalyticsEngineLoadDataAql.

FieldTypeLabelDescription
job_id uint64

Job ID for results

database string

Database to get graph from

vertex_query string

Vertex query

edge_query string

Edge query

batch_size uint64 optional

Optional batch size

custom_fields GraphAnalyticsEngineLoadDataAqlRequest.CustomFieldsEntry repeated

Map of engine-type specific custom fields (dynamic for this data-load operation)

GraphAnalyticsEngineLoadDataAqlRequest.CustomFieldsEntry

FieldTypeLabelDescription
key string

value string

GraphAnalyticsEngineLoadDataRequest

Request arguments for GraphAnalyticsEngineLoadData.

FieldTypeLabelDescription
database string

Retrieve graph from the specified database

graph_name string optional

Graph name, this is optional, because one can also use a list of vertex and edge collections:

vertex_collections string repeated

Optional list of vertex collections. Must be set, if the `graph_name` is not given, or if data other than the graph topology is to be loaded.

vertex_attributes string repeated

List of attributes to load into the column store for vertices. The column store of the graph will contain one column for each attribute listed here.

vertex_attribute_types string repeated

Types for the vertex attributes. These values are allowed: - "string" - "float" - "integer" - "unsigned"

edge_collections string repeated

List of edge collections. Must be set, if `graph_name` is not given.

parallelism uint32 optional

Optional numeric value for thread parallelism. This is currently used in four places. One is the number of async jobs launched to get data, another is the number of threads to be launched to synchronously work on incoming data. The third is the number of threads used on each DBServer to produce data. And the fourth is the length of the prefetch queue on DBServers. Potentially, we want to allow more arguments to be able to fine tune this better.

batch_size uint64 optional

Optional batch size

custom_fields GraphAnalyticsEngineLoadDataRequest.CustomFieldsEntry repeated

Map of engine-type specific custom fields (dynamic for this data-load operation)

GraphAnalyticsEngineLoadDataRequest.CustomFieldsEntry

FieldTypeLabelDescription
key string

value string

GraphAnalyticsEngineLoadDataResponse

Response arguments from GraphAnalyticsEngineLoadData.

FieldTypeLabelDescription
job_id uint64

ID of the load data operation

graph_id uint64

Graph ID

error_code int32

Error code, 0 if no error

error_message string

Error message, empty if no error

GraphAnalyticsEnginePageRankRequest

Request arguments for GraphAnalyticsEngineRunPageRank:

FieldTypeLabelDescription
graph_id uint64

Graph ID

damping_factor double optional

Damping factor, default is 0.85:

maximum_supersteps uint32 optional

Maximal number of supersteps, default is 64:

seeding_attribute string optional

optional seeding attribute for a seeded pagerank, default is empty for none:

GraphAnalyticsEngineProcessResponse

Response arguments from GraphAnalyticsEngineProcess.

FieldTypeLabelDescription
job_id uint64

ID of the job

error_code int32

Error code, 0 if no error

error_message string

Error message, empty if no error

GraphAnalyticsEnginePythonFunctionRequest

Request arguments for GraphAnalyticsEngineRunPageRank:

FieldTypeLabelDescription
graph_id uint64

Graph ID

function string

The python-based code to be executed. A method called `def worker(graph)` must be defined. The method must return a dataframe or dictionary with the results. The key inside that dict must represent the vertex id, the value (actual computation result) can be of any type.

use_cugraph bool optional

Use cugraph (or regular pandas/pyarrow), default is false:

GraphAnalyticsEngineShutdownResponse

Response for a shutdown request.

FieldTypeLabelDescription
error bool

Error?

error_code int32

Error code, 0 if no error

error_message string

Error message, empty if no error

GraphAnalyticsEngineStoreResultsRequest

Request arguments for GraphAnalyticsEngineStoreResults.

FieldTypeLabelDescription
job_ids uint64 repeated

ID of the jobs of which results are written

attribute_names string repeated

Attribute names to write results to

database string

Database in ArangoDB to use:

vertex_collections GraphAnalyticsEngineStoreResultsRequest.VertexCollectionsEntry repeated

The following map maps collection names as found in the _id entries of vertices to the collections into which the result data should be written. The list of fields is the attributes into which the result is written. An insert operation with overwritemode "update" is used.

parallelism uint32 optional

Optional numeric value for thread parallelism

batch_size uint64 optional

Optional batch size

target_collection string

Target collection for non-graph results:

custom_fields GraphAnalyticsEngineStoreResultsRequest.CustomFieldsEntry repeated

Map of engine-type specific custom fields (dynamic for this store-results operation)

GraphAnalyticsEngineStoreResultsRequest.CustomFieldsEntry

FieldTypeLabelDescription
key string

value string

GraphAnalyticsEngineStoreResultsRequest.VertexCollectionsEntry

FieldTypeLabelDescription
key string

value string

GraphAnalyticsEngineStoreResultsResponse

Response arguments from GraphAnalyticsEngineStoreResults.

FieldTypeLabelDescription
job_id uint64

ID of the store results operation

error_code int32

Error code, 0 if no error

error_message string

Error message, empty if no error

GraphAnalyticsEngineWccSccRequest

Request arguments for WCC or SCC:

FieldTypeLabelDescription
graph_id uint64

Graph ID

custom_fields GraphAnalyticsEngineWccSccRequest.CustomFieldsEntry repeated

Map of engine-type and algorithm-type specific custom fields (dynamic for this process operation)

GraphAnalyticsEngineWccSccRequest.CustomFieldsEntry

FieldTypeLabelDescription
key string

value string

Table of Contents