Accessing Data from Kafka

Apache Kafka is a distributed streaming platform that has three key capabilities:

  • Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system.

  • Store streams of records in a fault-tolerant durable way.

  • Process streams of records as they occur.

Kafka is generally used for two broad classes of applications:

  • Building real-time streaming data pipelines that reliably get data between systems or applications

  • Building real-time streaming applications that transform or react to the streams of data

Kafka sources have a few key attributes that need to be configured as part of their creation. Please view the Data Source JSON specification for more details. These attributes are set during the initial creation of the source or through an ALTER statement using a JSON specification to set the values. An example of such a command is shown below.

create data source {
  "name": "kafka_source",
  "sourceType": "Kafka",
  "bootstrapServers": [{
    "host": "my.kafka.host.com",
    "port": 9092
  }]
};

Magpie provides the ability to map a table to one or more Kafka topics. If mapping to more than one topic, it’s advisable for them to share the same key & value schemas within Kafka; otherwise, create separate tables in Magpie for ease of analysis. The starting offsets and ending offsets (optionally) provide the start and end points from which to load records. See the Create Table from Kafka documentation for more information.

The following is an example of creating a Magpie table that references an underlying Kafka topic:

create table kafka_example
    from kafka data source kafka_source 
    topic "wikimedia-changes" 
    with key type json 
    with value type json;

This will result in a table within the Magpie Context that in turn references the source Kafka topic. When this table is queried within Magpie, Magpie will "reach" into the source and pull the data before combining it with other local data.

Security

To access this type of data source, you may need to adjust the security configuration of your cloud environment. Please reach out to a member of the Silectis team with any questions or support requests.

Was this article helpful?
0 out of 0 found this helpful