Graph Analytics Engine API Documentation

graphanalyticsengine.proto

Top

Basic Usage

GraphAnalyticsEngineService is an API for interacting with graph analytics engines. Each engine corresponds to a deployment on AG, granting direct database access for loading graphs and storing results. A single database deployment can accommodate multiple graph analytics engines (GAEs).

Every call, which can take longer to complete, is asynchronous in the sense that it returns a job id and the result can/must be retrieved separately. Please note that these results must be deleted explicitly to free the memory used, since all results are stored in RAM.

The following trigger asynchronous operations, which might take longer to complete:

Load a graph from the deployment via two AQL queries (one for vertices, one for edges) [POST]

Load a graph via the arangodump protocol [POST]

Various calls to start computation jobs [POST]

Write back result of computation job to deployment [POST]

GraphAnalyticsEngineService

Methods with HTTP bindings

Method Name	Method	Pattern	Body
GraphAnalyticsEngineLoadData	POST	/v1/loaddata	*
GraphAnalyticsEngineLoadDataAql	POST	/v1/loaddataaql	*
GraphAnalyticsEngineRunWcc	POST	/v1/wcc	*
GraphAnalyticsEngineRunScc	POST	/v1/scc	*
GraphAnalyticsEngineRunCompAggregation	POST	/v1/aggregatecomponents	*
GraphAnalyticsEngineRunPageRank	POST	/v1/pagerank	*
GraphAnalyticsEngineRunPythonFunction	POST	/v1/python	*
GraphAnalyticsEngineRunIRank	POST	/v1/irank	*
GraphAnalyticsEngineRunLabelPropagation	POST	/v1/labelpropagation	*
GraphAnalyticsEngineRunAttributePropagation	POST	/v1/attributepropagation	*
GraphAnalyticsEngineRunBetweennessCentrality	POST	/v1/betweennesscentrality	*
GraphAnalyticsEngineRunLineRank	POST	/v1/linerank	*
GraphAnalyticsEngineStoreResults	POST	/v1/storeresults	*
GraphAnalyticsEngineListGraphs	GET	/v1/graphs
GraphAnalyticsEngineGetGraph	GET	/v1/graphs/{graph_id}
GraphAnalyticsEngineDeleteGraph	DELETE	/v1/graphs/{graph_id}
GraphAnalyticsEngineListJobs	GET	/v1/jobs
GraphAnalyticsEngineGetJob	GET	/v1/jobs/{job_id}
GraphAnalyticsEngineDeleteJob	DELETE	/v1/jobs/{job_id}
GraphAnalticsEngineShutdown	DELETE	/v1/shutdown

Methods and Argument types

Method Name	Request Type	Response Type
GraphAnalyticsEngineLoadData	GraphAnalyticsEngineLoadDataRequest	GraphAnalyticsEngineLoadDataResponse
This API call fetches data from the deployment and loads it into memory of the engine for later processing. One can either use a named graph or a list of vertex collections and a list of edge collections. Currently, the API call always loads all vertices and edges from these collections. However, it is possible to select which attribute data is loaded alongside the vertices and the edge topology. These attribute values are stored into a column store, in which each column corresponds to an attribute and has as many rows as there are vertices in the graph. Each loaded graph will get a numerical ID, with which it can be used in computations. This is an asynchronous job which returns the job id immediately. Use the GET graph API with the returned graph ID to get information on errors and the outcome of the loading.

GraphAnalyticsEngineLoadDataAql	GraphAnalyticsEngineLoadDataAqlRequest	GraphAnalyticsEngineLoadDataResponse
This API fetches data from the ArangoGraph deployment via AQL and load it into memory of the engine for later processing. (NOT IMPLEMENTED YET)

GraphAnalyticsEngineRunWcc	GraphAnalyticsEngineWccSccRequest	GraphAnalyticsEngineProcessResponse
Process a previously loaded graph with the weakly connected components algorithm (WCC) and store the results in-memory. This essentially means that the direction of edges is ignored and then the connected components of the undirected graph are computed. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation.

GraphAnalyticsEngineRunScc	GraphAnalyticsEngineWccSccRequest	GraphAnalyticsEngineProcessResponse
Process a previously loaded graph with the strongly connected components algorithm (SCC) and store the results in-memory. This means that the direction of the edges is taken into account and two vertices A and B will be in the same strongly connected component if and only if there is a directed path from A to B and from B to A. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation.

GraphAnalyticsEngineRunCompAggregation	GraphAnalyticsEngineAggregateComponentsRequest	GraphAnalyticsEngineProcessResponse
Process a previously loaded graph and a computation which has computed connected components (weakly or strongly) by aggregating some vertex data over each component found. The result will be one distribution map for each connected component. It is stored in memory. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation.

GraphAnalyticsEngineRunPageRank	GraphAnalyticsEnginePageRankRequest	GraphAnalyticsEngineProcessResponse
Process a previously loaded graph with the pagerank algorithm and store the results in-memory. There are some parameters controlling the computation like the damping factor and the maximal number of supersteps. See the input message documentation for details. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation.

GraphAnalyticsEngineRunPythonFunction	GraphAnalyticsEnginePythonFunctionRequest	GraphAnalyticsEngineProcessResponse
Process a previously loaded graph with custom python based execution algorithm and store the results in-memory. See the input message documentation for details. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation.

GraphAnalyticsEngineRunIRank	GraphAnalyticsEnginePageRankRequest	GraphAnalyticsEngineProcessResponse
Process a previously loaded graph with the irank algorithm and store the results in-memory. The "irank" algorithms is a variant of pagerank, which changes the initial weight of each vertex. Rather than being 1/N where N is the number of vertices, the value is here different depending on from which vertex collection the vertex comes. If V is from vertex collection C and N is the number of vertices in C, then the initial weight of V is 1/N. As with pagerank, the total sum of ranks stays the same as an invariant of the algorithm. There are some parameters controlling the computation like the damping factor and the maximal number of supersteps. See the input message documentation for details. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation.

GraphAnalyticsEngineRunLabelPropagation	GraphAnalyticsEngineLabelPropagationRequest	GraphAnalyticsEngineProcessResponse
Process a previously loaded graph with the label propagation algorithm and store the results in-memory. There are some parameters controlling the computation like the name of the attribute to choose the start label from, a flag to indicate if the synchronous or the asynchronous variant is used and the maximal number of supersteps. See the input message documentation for details. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation.

GraphAnalyticsEngineRunAttributePropagation	GraphAnalyticsEngineAttributePropagationRequest	GraphAnalyticsEngineProcessResponse
Process a previously loaded graph with the attribute propagation algorithm and store the results in-memory. The algorithm basically reads a list of labels from a column for each vertex (see the loaddata operation, for which one can configure which attributes are loaded into the column store). The value can be empty or a string or a list of strings and the set of labels for each vertex is initialized accordingly. The algorithm will then simply propagate each label in each label set along the edges to all reachable vertices and thus compute a new set of labels. After a specified maximal number of steps or if no label set changes any more the algorithm stops. BEWARE: If there are many labels in the system and the graph is well-connected then the result can be huge! There are some parameters controlling the computation like the name of the attribute to choose the start label from, whether the synchronous or the asynchronous variant is to be used, if we propagate along the the edges forwards or backwards and the maximal number of supersteps. See the input message documentation for details. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation.

GraphAnalyticsEngineRunBetweennessCentrality	GraphAnalyticsEngineBetweennessCentralityRequest	GraphAnalyticsEngineProcessResponse
Process a previously loaded graph with the betweenness-centrality algorithm and store the results in-memory. See https://snap.stanford.edu/class/cs224w-readings/brandes01centrality.pdf for details. There are some parameters controlling the computation like the number of start vertices, the question as to whether edges should be followed in both directions, and whether or not a normalization is done. See the input message documentation for details. The computation will return a numerical job id, with which the results can later be queried or written back to the database. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the computation.

GraphAnalyticsEngineRunLineRank	GraphAnalyticsEngineLineRankRequest	GraphAnalyticsEngineProcessResponse
Process a previously loaded graph with the linerank algorithm and store the results in-memory. The algorithm measures the importance of a vertex by aggregating the importance of its incident edges. This represents the amount of information that flows through the vertex, therefore the result of this algorithm can be taken as an approximation for betweenness centrality, which is much more computation-intensive. The edge importance is computed by the probability that a random walker, visiting edges via vertices with random restarts, will stay at the edge.

GraphAnalyticsEngineStoreResults	GraphAnalyticsEngineStoreResultsRequest	GraphAnalyticsEngineStoreResultsResponse
Stores the results from previous jobs into the deployment. One can specify a number of job ids but the requirement is that they produce the same number of results. For example, results from different algorithms which produce one result per vertex can be written to the database together. The target collection must already exist and must be writable. The job produces one document per result and one can prescribe which attribute names should be used for which result. There are some parameters controlling the computation. See the input message description for details. The computation will return a numerical job id, with which the progress can be monitored. This is an asynchronous job which returns the job id immediately. Use the GET job API with the job id to get information on progress, errors and the outcome of the job.

GraphAnalyticsEngineListGraphs	Empty	GraphAnalyticsEngineListGraphsResponse
List the graphs in the engine.

GraphAnalyticsEngineGetGraph	GraphAnalyticsEngineGraphId	GraphAnalyticsEngineGetGraphResponse
Get information about a specific graph.

GraphAnalyticsEngineDeleteGraph	GraphAnalyticsEngineGraphId	GraphAnalyticsEngineDeleteGraphResponse
Delete a specific graph from memory.

GraphAnalyticsEngineListJobs	Empty	GraphAnalyticsEngineListJobsResponse
List the jobs in the engine (loading, computing or storing).

GraphAnalyticsEngineGetJob	GraphAnalyticsEngineJobId	GraphAnalyticsEngineJob
Get information about a specific job (in particular progress and result when done).

GraphAnalyticsEngineDeleteJob	GraphAnalyticsEngineJobId	GraphAnalyticsEngineDeleteJobResponse
Delete a specific job.

GraphAnalticsEngineShutdown	Empty	GraphAnalyticsEngineShutdownResponse
Shutdown service.

Empty

Empty input:

GraphAnalyticsEngineAggregateComponentsRequest

Request arguments for GraphAnalyticsEngineRunCompAggregation:

Field	Type	Label	Description
graph_id	uint64		Graph ID
job_id	uint64		Job ID
aggregation_attribute	string		Aggregation attribute:

GraphAnalyticsEngineAttributePropagationRequest

Request arguments for GraphAnalyticsEngineRunAttributePropagation.

Field	Type	Label	Description
graph_id	uint64		Graph ID. This attribute must be given.
start_label_attribute	string		Start label attribute, must be stored in one column of the column store of the graph. Use id of vertex if set to "@id". Values can be empty or a string or a list of strings. All other values are transformed into a string. This attribute must be given.
synchronous	bool	optional	Flag to indicate whether synchronous (true) or asynchronous label propagation is used. The default is asynchronous, i.e. `false`.
backwards	bool	optional	Flag to indicate whether the propagation happens forwards (along the directed edges) or backwards (in the opposite direction). The default is forwards, i.e. `false`.
maximum_supersteps	uint32	optional	Maximum number of steps to do, default is 64:

GraphAnalyticsEngineBetweennessCentralityRequest

Request arguments for GraphAnalyticsEngineRunPageRank:

Field	Type	Label	Description
graph_id	uint64		Graph ID
k	uint64	optional	Number of start vertices, use 0 to start from every single vertex in the graph for a complete result. 0 is the default.
undirected	bool	optional	Flag, if edges should be used in both directions, default is false:
normalized	bool	optional	Flag, if a normalization with 1/((N-1)*(N-2)) should be applied, where N is the size of the largest orbit found. Default is false.
parallelism	uint32	optional	Number of threads to use:

GraphAnalyticsEngineDeleteGraphResponse

Response for a delete graph request.

Field	Type	Label	Description
graph_id	uint64		ID of graph
error_code	int32		Error code, 0 if no error
error_message	string		Error message, empty if no error

GraphAnalyticsEngineDeleteJobResponse

Response for a delete job request.

Field	Type	Label	Description
job_id	uint64		ID of job
error	bool		Error?
error_code	int32		Error code, 0 if no error
error_message	string		Error message, empty if no error

GraphAnalyticsEngineErrorResponse

Generic error

Field	Type	Label	Description
error_code	int32		Error code, 0 if no error
error_message	string		Error message, empty if no error

GraphAnalyticsEngineGetGraphResponse

Field	Type	Label	Description
error_code	int32		Error code, 0 if no error
error_message	string		Error message, empty if no error
graph	GraphAnalyticsEngineGraph		The graph

GraphAnalyticsEngineGraph

Description of a graph.

Field	Type	Label	Description
graph_id	uint64		ID of graph
number_of_vertices	uint64		Number of vertices:
number_of_edges	uint64		Number of edges:
memory_usage	uint64		Memory usage:
memory_per_vertex	uint64		Memory usage per vertex:
memory_per_edge	uint64		Memory usage per edge:

GraphAnalyticsEngineGraphId

ID of an engine and id of a graph

Field	Type	Label	Description
graph_id	string		Graph ID (for path)

GraphAnalyticsEngineJob

Description of a job.

Field	Type	Label	Description
job_id	uint64		ID of the current job
graph_id	uint64		Graph of the current job
total	uint32		Total progress. Guaranteed to be positive, but could be 1
progress	uint32		Progress (0: no progress, equal to total: ready)
error	bool		Error flag
error_code	int32		Error code
error_message	string		Error message
source_job	string		Optional source job
comp_type	string		Computation type:
memory_usage	uint64		Memory usage:
runtime_in_microseconds	uint64		Runtime of job in microseconds

GraphAnalyticsEngineJobId

ID of an engine and id of a job

Field	Type	Label	Description
job_id	string		Graph ID (for path)

GraphAnalyticsEngineLabelPropagationRequest

Request arguments for GraphAnalyticsEngineRunLabelPropagation.

Field	Type	Label	Description
graph_id	uint64		Graph ID
start_label_attribute	string		Start label attribute, must be stored in one column of the column store of the graph. Use id of vertex if set to "@id".
synchronous	bool	optional	Flag to indicate whether synchronous (true) or asynchronous label propagation is used (default is false):
random_tiebreak	bool	optional	Flag indicating if ties in the label choice are broken randomly (uniform distribution) or deterministically (smallest label amongst the most frequent ones), default is false:
maximum_supersteps	uint32	optional	Maximum number of steps to do, default is 64:

GraphAnalyticsEngineLineRankRequest

Request arguments for GraphAnalyticsEngineRunLineRank:

Field	Type	Label	Description
graph_id	uint64		Graph ID
damping_factor	double	optional	Damping factor, default is 0.85:
maximum_supersteps	uint32	optional	Maximal number of supersteps, default is 64:

GraphAnalyticsEngineListGraphsResponse

Response arguments from GraphAnalticsEngineListGraphs.

Field	Type	Label	Description
error_code	int32		Error code, 0 if no error
error_message	string		Error message, empty if no error
graphs	GraphAnalyticsEngineGraph	repeated	The graphs

GraphAnalyticsEngineListJobsResponse

Response arguments from GraphAnalyticsEngineListJobs.

Field	Type	Label	Description
error_code	int32		Error code, 0 if no error
error_message	string		Error message, empty if no error
jobs	GraphAnalyticsEngineJob	repeated	The graphs

GraphAnalyticsEngineLoadDataAqlRequest

Request arguments for GraphAnalyticsEngineLoadDataAql.

Field	Type	Label	Description
job_id	uint64		Job ID for results
database	string		Database to get graph from
vertex_query	string		Vertex query
edge_query	string		Edge query
batch_size	uint64	optional	Optional batch size
custom_fields	GraphAnalyticsEngineLoadDataAqlRequest.CustomFieldsEntry	repeated	Map of engine-type specific custom fields (dynamic for this data-load operation)

GraphAnalyticsEngineLoadDataAqlRequest.CustomFieldsEntry

Field	Type	Label	Description
key	string
value	string

GraphAnalyticsEngineLoadDataRequest

Request arguments for GraphAnalyticsEngineLoadData.

Field	Type	Label	Description
database	string		Retrieve graph from the specified database
graph_name	string	optional	Graph name, this is optional, because one can also use a list of vertex and edge collections:
vertex_collections	string	repeated	Optional list of vertex collections. Must be set, if the `graph_name` is not given, or if data other than the graph topology is to be loaded.
vertex_attributes	string	repeated	List of attributes to load into the column store for vertices. The column store of the graph will contain one column for each attribute listed here.
vertex_attribute_types	string	repeated	Types for the vertex attributes. These values are allowed: - "string" - "float" - "integer" - "unsigned"
edge_collections	string	repeated	List of edge collections. Must be set, if `graph_name` is not given.
parallelism	uint32	optional	Optional numeric value for thread parallelism. This is currently used in four places. One is the number of async jobs launched to get data, another is the number of threads to be launched to synchronously work on incoming data. The third is the number of threads used on each DBServer to produce data. And the fourth is the length of the prefetch queue on DBServers. Potentially, we want to allow more arguments to be able to fine tune this better.
batch_size	uint64	optional	Optional batch size
custom_fields	GraphAnalyticsEngineLoadDataRequest.CustomFieldsEntry	repeated	Map of engine-type specific custom fields (dynamic for this data-load operation)

GraphAnalyticsEngineLoadDataRequest.CustomFieldsEntry

Field	Type	Label	Description
key	string
value	string

GraphAnalyticsEngineLoadDataResponse

Response arguments from GraphAnalyticsEngineLoadData.

Field	Type	Label	Description
job_id	uint64		ID of the load data operation
graph_id	uint64		Graph ID
error_code	int32		Error code, 0 if no error
error_message	string		Error message, empty if no error

GraphAnalyticsEnginePageRankRequest

Request arguments for GraphAnalyticsEngineRunPageRank:

Field	Type	Label	Description
graph_id	uint64		Graph ID
damping_factor	double	optional	Damping factor, default is 0.85:
maximum_supersteps	uint32	optional	Maximal number of supersteps, default is 64:
seeding_attribute	string	optional	optional seeding attribute for a seeded pagerank, default is empty for none:

GraphAnalyticsEngineProcessResponse

Response arguments from GraphAnalyticsEngineProcess.

Field	Type	Label	Description
job_id	uint64		ID of the job
error_code	int32		Error code, 0 if no error
error_message	string		Error message, empty if no error

GraphAnalyticsEnginePythonFunctionRequest

Request arguments for GraphAnalyticsEngineRunPageRank:

Field	Type	Label	Description
graph_id	uint64		Graph ID
function	string		The python-based code to be executed. A method called `def worker(graph)` must be defined. The method must return a dataframe or dictionary with the results. The key inside that dict must represent the vertex id, the value (actual computation result) can be of any type.
use_cugraph	bool	optional	Use cugraph (or regular pandas/pyarrow), default is false:

GraphAnalyticsEngineShutdownResponse

Response for a shutdown request.

Field	Type	Label	Description
error	bool		Error?
error_code	int32		Error code, 0 if no error
error_message	string		Error message, empty if no error

GraphAnalyticsEngineStoreResultsRequest

Request arguments for GraphAnalyticsEngineStoreResults.

Field	Type	Label	Description
job_ids	uint64	repeated	ID of the jobs of which results are written
attribute_names	string	repeated	Attribute names to write results to
database	string		Database in ArangoDB to use:
vertex_collections	GraphAnalyticsEngineStoreResultsRequest.VertexCollectionsEntry	repeated	The following map maps collection names as found in the _id entries of vertices to the collections into which the result data should be written. The list of fields is the attributes into which the result is written. An insert operation with overwritemode "update" is used.
parallelism	uint32	optional	Optional numeric value for thread parallelism
batch_size	uint64	optional	Optional batch size
target_collection	string		Target collection for non-graph results:
custom_fields	GraphAnalyticsEngineStoreResultsRequest.CustomFieldsEntry	repeated	Map of engine-type specific custom fields (dynamic for this store-results operation)

GraphAnalyticsEngineStoreResultsRequest.CustomFieldsEntry

Field	Type	Label	Description
key	string
value	string

GraphAnalyticsEngineStoreResultsRequest.VertexCollectionsEntry

Field	Type	Label	Description
key	string
value	string

GraphAnalyticsEngineStoreResultsResponse

Response arguments from GraphAnalyticsEngineStoreResults.

Field	Type	Label	Description
job_id	uint64		ID of the store results operation
error_code	int32		Error code, 0 if no error
error_message	string		Error message, empty if no error

GraphAnalyticsEngineWccSccRequest

Request arguments for WCC or SCC:

Field	Type	Label	Description
graph_id	uint64		Graph ID
custom_fields	GraphAnalyticsEngineWccSccRequest.CustomFieldsEntry	repeated	Map of engine-type and algorithm-type specific custom fields (dynamic for this process operation)

GraphAnalyticsEngineWccSccRequest.CustomFieldsEntry

Field	Type	Label	Description
key	string
value	string

graphanalyticsengine.proto
Scalar Value Types

Graph Analytics Engine API Documentation

graphanalyticsengine.proto

Basic Usage

GraphAnalyticsEngineService

Methods with HTTP bindings

Methods and Argument types

Empty

GraphAnalyticsEngineAggregateComponentsRequest

GraphAnalyticsEngineAttributePropagationRequest

GraphAnalyticsEngineBetweennessCentralityRequest

GraphAnalyticsEngineDeleteGraphResponse

GraphAnalyticsEngineDeleteJobResponse

GraphAnalyticsEngineErrorResponse

GraphAnalyticsEngineGetGraphResponse

GraphAnalyticsEngineGraph

GraphAnalyticsEngineGraphId

GraphAnalyticsEngineJob

GraphAnalyticsEngineJobId

GraphAnalyticsEngineLabelPropagationRequest

GraphAnalyticsEngineLineRankRequest

GraphAnalyticsEngineListGraphsResponse

GraphAnalyticsEngineListJobsResponse

GraphAnalyticsEngineLoadDataAqlRequest

GraphAnalyticsEngineLoadDataAqlRequest.CustomFieldsEntry

GraphAnalyticsEngineLoadDataRequest

GraphAnalyticsEngineLoadDataRequest.CustomFieldsEntry

GraphAnalyticsEngineLoadDataResponse

GraphAnalyticsEnginePageRankRequest

GraphAnalyticsEngineProcessResponse

GraphAnalyticsEnginePythonFunctionRequest

GraphAnalyticsEngineShutdownResponse

GraphAnalyticsEngineStoreResultsRequest

GraphAnalyticsEngineStoreResultsRequest.CustomFieldsEntry

GraphAnalyticsEngineStoreResultsRequest.VertexCollectionsEntry

GraphAnalyticsEngineStoreResultsResponse

GraphAnalyticsEngineWccSccRequest

GraphAnalyticsEngineWccSccRequest.CustomFieldsEntry

Table of Contents