GraphAnalyticsEngineService is an API for interacting with graph analytics engines. Each engine corresponds to a deployment on AG, granting direct database access for loading graphs and storing results. A single database deployment can accommodate multiple graph analytics engines (GAEs).
Every call, which can take longer to complete, is asynchronous in the sense that it returns a job id and the result can/must be retrieved separately. Please note that these results must be deleted explicitly to free the memory used, since all results are stored in RAM.
The following trigger asynchronous operations, which might take longer to complete:
Method Name | Method | Pattern | Body |
GraphAnalyticsEngineLoadData | POST | /v1/loaddata | * |
GraphAnalyticsEngineLoadDataAql | POST | /v1/loaddataaql | * |
GraphAnalyticsEngineRunWcc | POST | /v1/wcc | * |
GraphAnalyticsEngineRunScc | POST | /v1/scc | * |
GraphAnalyticsEngineRunCompAggregation | POST | /v1/aggregatecomponents | * |
GraphAnalyticsEngineRunPageRank | POST | /v1/pagerank | * |
GraphAnalyticsEngineRunPythonFunction | POST | /v1/python | * |
GraphAnalyticsEngineRunIRank | POST | /v1/irank | * |
GraphAnalyticsEngineRunLabelPropagation | POST | /v1/labelpropagation | * |
GraphAnalyticsEngineRunAttributePropagation | POST | /v1/attributepropagation | * |
GraphAnalyticsEngineRunBetweennessCentrality | POST | /v1/betweennesscentrality | * |
GraphAnalyticsEngineRunLineRank | POST | /v1/linerank | * |
GraphAnalyticsEngineStoreResults | POST | /v1/storeresults | * |
GraphAnalyticsEngineListGraphs | GET | /v1/graphs | |
GraphAnalyticsEngineGetGraph | GET | /v1/graphs/{graph_id} | |
GraphAnalyticsEngineDeleteGraph | DELETE | /v1/graphs/{graph_id} | |
GraphAnalyticsEngineListJobs | GET | /v1/jobs | |
GraphAnalyticsEngineGetJob | GET | /v1/jobs/{job_id} | |
GraphAnalyticsEngineDeleteJob | DELETE | /v1/jobs/{job_id} | |
GraphAnalticsEngineShutdown | DELETE | /v1/shutdown |
Method Name | Request Type | Response Type |
GraphAnalyticsEngineLoadData | GraphAnalyticsEngineLoadDataRequest | GraphAnalyticsEngineLoadDataResponse |
This API call fetches data from the deployment and loads it into memory of the engine for later processing. One can either use a named graph or a list of vertex collections and a list of edge collections. Currently, the API call always loads all vertices and edges from these collections. However, it is possible to select which attribute data is loaded alongside the vertices and the edge topology. These attribute values are stored into a column store, in which each column corresponds to an attribute and has as many rows as there are vertices in the graph. Each loaded graph will get a numerical ID, with which it can be used in computations. This is an asynchronous job which returns the job id immediately. Use the GET graph API with the returned graph ID to get information on errors and the outcome of the loading. |
||
GraphAnalyticsEngineLoadDataAql | GraphAnalyticsEngineLoadDataAqlRequest | GraphAnalyticsEngineLoadDataResponse |
This API fetches data from the ArangoGraph deployment via AQL and load it into memory of the engine for later processing. (NOT IMPLEMENTED YET) |
||
GraphAnalyticsEngineRunWcc | GraphAnalyticsEngineWccSccRequest | GraphAnalyticsEngineProcessResponse |
Process a previously loaded graph with the weakly connected components algorithm (WCC) and store the results in-memory. This essentially means that the direction of edges is ignored and then the connected components of the undirected graph are computed. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation. |
||
GraphAnalyticsEngineRunScc | GraphAnalyticsEngineWccSccRequest | GraphAnalyticsEngineProcessResponse |
Process a previously loaded graph with the strongly connected components algorithm (SCC) and store the results in-memory. This means that the direction of the edges is taken into account and two vertices A and B will be in the same strongly connected component if and only if there is a directed path from A to B and from B to A. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation. |
||
GraphAnalyticsEngineRunCompAggregation | GraphAnalyticsEngineAggregateComponentsRequest | GraphAnalyticsEngineProcessResponse |
Process a previously loaded graph and a computation which has computed connected components (weakly or strongly) by aggregating some vertex data over each component found. The result will be one distribution map for each connected component. It is stored in memory. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation. |
||
GraphAnalyticsEngineRunPageRank | GraphAnalyticsEnginePageRankRequest | GraphAnalyticsEngineProcessResponse |
Process a previously loaded graph with the pagerank algorithm and store the results in-memory. There are some parameters controlling the computation like the damping factor and the maximal number of supersteps. See the input message documentation for details. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation. |
||
GraphAnalyticsEngineRunPythonFunction | GraphAnalyticsEnginePythonFunctionRequest | GraphAnalyticsEngineProcessResponse |
Process a previously loaded graph with custom python based execution algorithm and store the results in-memory. See the input message documentation for details. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation. |
||
GraphAnalyticsEngineRunIRank | GraphAnalyticsEnginePageRankRequest | GraphAnalyticsEngineProcessResponse |
Process a previously loaded graph with the irank algorithm and store the results in-memory. The "irank" algorithms is a variant of pagerank, which changes the initial weight of each vertex. Rather than being 1/N where N is the number of vertices, the value is here different depending on from which vertex collection the vertex comes. If V is from vertex collection C and N is the number of vertices in C, then the initial weight of V is 1/N. As with pagerank, the total sum of ranks stays the same as an invariant of the algorithm. There are some parameters controlling the computation like the damping factor and the maximal number of supersteps. See the input message documentation for details. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation. |
||
GraphAnalyticsEngineRunLabelPropagation | GraphAnalyticsEngineLabelPropagationRequest | GraphAnalyticsEngineProcessResponse |
Process a previously loaded graph with the label propagation algorithm and store the results in-memory. There are some parameters controlling the computation like the name of the attribute to choose the start label from, a flag to indicate if the synchronous or the asynchronous variant is used and the maximal number of supersteps. See the input message documentation for details. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation. |
||
GraphAnalyticsEngineRunAttributePropagation | GraphAnalyticsEngineAttributePropagationRequest | GraphAnalyticsEngineProcessResponse |
Process a previously loaded graph with the attribute propagation algorithm and store the results in-memory. The algorithm basically reads a list of labels from a column for each vertex (see the loaddata operation, for which one can configure which attributes are loaded into the column store). The value can be empty or a string or a list of strings and the set of labels for each vertex is initialized accordingly. The algorithm will then simply propagate each label in each label set along the edges to all reachable vertices and thus compute a new set of labels. After a specified maximal number of steps or if no label set changes any more the algorithm stops. BEWARE: If there are many labels in the system and the graph is well-connected then the result can be huge! There are some parameters controlling the computation like the name of the attribute to choose the start label from, whether the synchronous or the asynchronous variant is to be used, if we propagate along the the edges forwards or backwards and the maximal number of supersteps. See the input message documentation for details. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation. |
||
GraphAnalyticsEngineRunBetweennessCentrality | GraphAnalyticsEngineBetweennessCentralityRequest | GraphAnalyticsEngineProcessResponse |
Process a previously loaded graph with the betweenness-centrality algorithm and store the results in-memory. See https://snap.stanford.edu/class/cs224w-readings/brandes01centrality.pdf for details. There are some parameters controlling the computation like the number of start vertices, the question as to whether edges should be followed in both directions, and whether or not a normalization is done. See the input message documentation for details. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation. |
||
GraphAnalyticsEngineRunLineRank | GraphAnalyticsEngineLineRankRequest | GraphAnalyticsEngineProcessResponse |
Process a previously loaded graph with the linerank algorithm and store the results in-memory. The algorithm measures the importance of a vertex by aggregating the importance of its incident edges. This represents the amount of information that flows through the vertex, therefore the result of this algorithm can be taken as an approximation for betweenness centrality, which is much more computation-intensive. The edge importance is computed by the probability that a random walker, visiting edges via vertices with random restarts, will stay at the edge. |
||
GraphAnalyticsEngineStoreResults | GraphAnalyticsEngineStoreResultsRequest | GraphAnalyticsEngineStoreResultsResponse |
Stores the results from previous jobs into the deployment. One can specify a number of job ids but the requirement is that they produce the same number of results. For example, results from different algorithms which produce one result per vertex can be written to the database together. The target collection must already exist and must be writable. The job produces one document per result and one can prescribe which attribute names should be used for which result. There are some parameters controlling the computation. See the input message description for details. The computation will return a numerical job id, with which the progress can be monitored. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the job. |
||
GraphAnalyticsEngineListGraphs | Empty | GraphAnalyticsEngineListGraphsResponse |
List the graphs in the engine. |
||
GraphAnalyticsEngineGetGraph | GraphAnalyticsEngineGraphId | GraphAnalyticsEngineGetGraphResponse |
Get information about a specific graph. |
||
GraphAnalyticsEngineDeleteGraph | GraphAnalyticsEngineGraphId | GraphAnalyticsEngineDeleteGraphResponse |
Delete a specific graph from memory. |
||
GraphAnalyticsEngineListJobs | Empty | GraphAnalyticsEngineListJobsResponse |
List the jobs in the engine (loading, computing or storing). |
||
GraphAnalyticsEngineGetJob | GraphAnalyticsEngineJobId | GraphAnalyticsEngineJob |
Get information about a specific job (in particular progress and result when done). |
||
GraphAnalyticsEngineDeleteJob | GraphAnalyticsEngineJobId | GraphAnalyticsEngineDeleteJobResponse |
Delete a specific job. |
||
GraphAnalticsEngineShutdown | Empty | GraphAnalyticsEngineShutdownResponse |
Shutdown service. |
||
Empty input:
Request arguments for GraphAnalyticsEngineRunCompAggregation:
Field | Type | Label | Description |
graph_id | uint64 | Graph ID |
|
job_id | uint64 | Job ID |
|
aggregation_attribute | string | Aggregation attribute: |
Request arguments for GraphAnalyticsEngineRunAttributePropagation.
Field | Type | Label | Description |
graph_id | uint64 | Graph ID. This attribute must be given. |
|
start_label_attribute | string | Start label attribute, must be stored in one column of the column store of the graph. Use id of vertex if set to "@id". Values can be empty or a string or a list of strings. All other values are transformed into a string. This attribute must be given. |
|
synchronous | bool | optional | Flag to indicate whether synchronous (true) or asynchronous label propagation is used. The default is asynchronous, i.e. `false`. |
backwards | bool | optional | Flag to indicate whether the propagation happens forwards (along the directed edges) or backwards (in the opposite direction). The default is forwards, i.e. `false`. |
maximum_supersteps | uint32 | optional | Maximum number of steps to do, default is 64: |
Request arguments for GraphAnalyticsEngineRunPageRank:
Field | Type | Label | Description |
graph_id | uint64 | Graph ID |
|
k | uint64 | optional | Number of start vertices, use 0 to start from every single vertex in the graph for a complete result. 0 is the default. |
undirected | bool | optional | Flag, if edges should be used in both directions, default is false: |
normalized | bool | optional | Flag, if a normalization with 1/((N-1)*(N-2)) should be applied, where N is the size of the largest orbit found. Default is false. |
parallelism | uint32 | optional | Number of threads to use: |
Response for a delete graph request.
Field | Type | Label | Description |
graph_id | uint64 | ID of graph |
|
error_code | int32 | Error code, 0 if no error |
|
error_message | string | Error message, empty if no error |
Response for a delete job request.
Field | Type | Label | Description |
job_id | uint64 | ID of job |
|
error | bool | Error? |
|
error_code | int32 | Error code, 0 if no error |
|
error_message | string | Error message, empty if no error |
Generic error
Field | Type | Label | Description |
error_code | int32 | Error code, 0 if no error |
|
error_message | string | Error message, empty if no error |
Field | Type | Label | Description |
error_code | int32 | Error code, 0 if no error |
|
error_message | string | Error message, empty if no error |
|
graph | GraphAnalyticsEngineGraph | The graph |
Description of a graph.
Field | Type | Label | Description |
graph_id | uint64 | ID of graph |
|
number_of_vertices | uint64 | Number of vertices: |
|
number_of_edges | uint64 | Number of edges: |
|
memory_usage | uint64 | Memory usage: |
|
memory_per_vertex | uint64 | Memory usage per vertex: |
|
memory_per_edge | uint64 | Memory usage per edge: |
ID of an engine and id of a graph
Field | Type | Label | Description |
graph_id | string | Graph ID (for path) |
Description of a job.
Field | Type | Label | Description |
job_id | uint64 | ID of the current job |
|
graph_id | uint64 | Graph of the current job |
|
total | uint32 | Total progress. Guaranteed to be positive, but could be 1 |
|
progress | uint32 | Progress (0: no progress, equal to total: ready) |
|
error | bool | Error flag |
|
error_code | int32 | Error code |
|
error_message | string | Error message |
|
source_job | string | Optional source job |
|
comp_type | string | Computation type: |
|
memory_usage | uint64 | Memory usage: |
|
runtime_in_microseconds | uint64 | Runtime of job in microseconds |
ID of an engine and id of a job
Field | Type | Label | Description |
job_id | string | Graph ID (for path) |
Request arguments for GraphAnalyticsEngineRunLabelPropagation.
Field | Type | Label | Description |
graph_id | uint64 | Graph ID |
|
start_label_attribute | string | Start label attribute, must be stored in one column of the column store of the graph. Use id of vertex if set to "@id". |
|
synchronous | bool | optional | Flag to indicate whether synchronous (true) or asynchronous label propagation is used (default is false): |
random_tiebreak | bool | optional | Flag indicating if ties in the label choice are broken randomly (uniform distribution) or deterministically (smallest label amongst the most frequent ones), default is false: |
maximum_supersteps | uint32 | optional | Maximum number of steps to do, default is 64: |
Request arguments for GraphAnalyticsEngineRunLineRank:
Field | Type | Label | Description |
graph_id | uint64 | Graph ID |
|
damping_factor | double | optional | Damping factor, default is 0.85: |
maximum_supersteps | uint32 | optional | Maximal number of supersteps, default is 64: |
Response arguments from GraphAnalticsEngineListGraphs.
Field | Type | Label | Description |
error_code | int32 | Error code, 0 if no error |
|
error_message | string | Error message, empty if no error |
|
graphs | GraphAnalyticsEngineGraph | repeated | The graphs |
Response arguments from GraphAnalyticsEngineListJobs.
Field | Type | Label | Description |
error_code | int32 | Error code, 0 if no error |
|
error_message | string | Error message, empty if no error |
|
jobs | GraphAnalyticsEngineJob | repeated | The graphs |
Request arguments for GraphAnalyticsEngineLoadDataAql.
Field | Type | Label | Description |
job_id | uint64 | Job ID for results |
|
database | string | Database to get graph from |
|
vertex_query | string | Vertex query |
|
edge_query | string | Edge query |
|
batch_size | uint64 | optional | Optional batch size |
custom_fields | GraphAnalyticsEngineLoadDataAqlRequest.CustomFieldsEntry | repeated | Map of engine-type specific custom fields (dynamic for this data-load operation) |
Field | Type | Label | Description |
key | string |
|
|
value | string |
|
Request arguments for GraphAnalyticsEngineLoadData.
Field | Type | Label | Description |
database | string | Retrieve graph from the specified database |
|
graph_name | string | optional | Graph name, this is optional, because one can also use a list of vertex and edge collections: |
vertex_collections | string | repeated | Optional list of vertex collections. Must be set, if the `graph_name` is not given, or if data other than the graph topology is to be loaded. |
vertex_attributes | string | repeated | List of attributes to load into the column store for vertices. The column store of the graph will contain one column for each attribute listed here. |
vertex_attribute_types | string | repeated | Types for the vertex attributes. These values are allowed: - "string" - "float" - "integer" - "unsigned" |
edge_collections | string | repeated | List of edge collections. Must be set, if `graph_name` is not given. |
parallelism | uint32 | optional | Optional numeric value for thread parallelism. This is currently used in four places. One is the number of async jobs launched to get data, another is the number of threads to be launched to synchronously work on incoming data. The third is the number of threads used on each DBServer to produce data. And the fourth is the length of the prefetch queue on DBServers. Potentially, we want to allow more arguments to be able to fine tune this better. |
batch_size | uint64 | optional | Optional batch size |
custom_fields | GraphAnalyticsEngineLoadDataRequest.CustomFieldsEntry | repeated | Map of engine-type specific custom fields (dynamic for this data-load operation) |
Field | Type | Label | Description |
key | string |
|
|
value | string |
|
Response arguments from GraphAnalyticsEngineLoadData.
Field | Type | Label | Description |
job_id | uint64 | ID of the load data operation |
|
graph_id | uint64 | Graph ID |
|
error_code | int32 | Error code, 0 if no error |
|
error_message | string | Error message, empty if no error |
Request arguments for GraphAnalyticsEngineRunPageRank:
Field | Type | Label | Description |
graph_id | uint64 | Graph ID |
|
damping_factor | double | optional | Damping factor, default is 0.85: |
maximum_supersteps | uint32 | optional | Maximal number of supersteps, default is 64: |
seeding_attribute | string | optional | optional seeding attribute for a seeded pagerank, default is empty for none: |
Response arguments from GraphAnalyticsEngineProcess.
Field | Type | Label | Description |
job_id | uint64 | ID of the job |
|
error_code | int32 | Error code, 0 if no error |
|
error_message | string | Error message, empty if no error |
Request arguments for GraphAnalyticsEngineRunPageRank:
Field | Type | Label | Description |
graph_id | uint64 | Graph ID |
|
function | string | The python-based code to be executed. A method called `def worker(graph)` must be defined. The method must return a dataframe or dictionary with the results. The key inside that dict must represent the vertex id, the value (actual computation result) can be of any type. |
|
use_cugraph | bool | optional | Use cugraph (or regular pandas/pyarrow), default is false: |
Response for a shutdown request.
Field | Type | Label | Description |
error | bool | Error? |
|
error_code | int32 | Error code, 0 if no error |
|
error_message | string | Error message, empty if no error |
Request arguments for GraphAnalyticsEngineStoreResults.
Field | Type | Label | Description |
job_ids | uint64 | repeated | ID of the jobs of which results are written |
attribute_names | string | repeated | Attribute names to write results to |
database | string | Database in ArangoDB to use: |
|
vertex_collections | GraphAnalyticsEngineStoreResultsRequest.VertexCollectionsEntry | repeated | The following map maps collection names as found in the _id entries of vertices to the collections into which the result data should be written. The list of fields is the attributes into which the result is written. An insert operation with overwritemode "update" is used. |
parallelism | uint32 | optional | Optional numeric value for thread parallelism |
batch_size | uint64 | optional | Optional batch size |
target_collection | string | Target collection for non-graph results: |
|
custom_fields | GraphAnalyticsEngineStoreResultsRequest.CustomFieldsEntry | repeated | Map of engine-type specific custom fields (dynamic for this store-results operation) |
Field | Type | Label | Description |
key | string |
|
|
value | string |
|
Field | Type | Label | Description |
key | string |
|
|
value | string |
|
Response arguments from GraphAnalyticsEngineStoreResults.
Field | Type | Label | Description |
job_id | uint64 | ID of the store results operation |
|
error_code | int32 | Error code, 0 if no error |
|
error_message | string | Error message, empty if no error |
Request arguments for WCC or SCC:
Field | Type | Label | Description |
graph_id | uint64 | Graph ID |
|
custom_fields | GraphAnalyticsEngineWccSccRequest.CustomFieldsEntry | repeated | Map of engine-type and algorithm-type specific custom fields (dynamic for this process operation) |
Field | Type | Label | Description |
key | string |
|
|
value | string |
|