10、Flink实战:DataStream之Sink简介及RichSinkFunction

功能就是负责把 Flink 处理后的数据输出到外部系统中。

一、Flink针对DataStream提供了大量的已经实现的数据下沉(sink)方式,具体有:

1、 writeAsText():将元素以字符串形式逐行写入,这些字符串通过调用每个元素的toString()方法来获取;

2、 print()/printToErr():打印每个元素的toString()方法的值到标准输出或者标准错误输出流中;

3、 自定义输出:addSink可以实现把数据输出到第三方存储介质中;

Flink通过内置的Connector和Apache Bahir组件提供了对应sink的支持。

详细参考:https://blog.csdn.net/zhuzuwei/article/details/107137295

二、Sink组件容错性保证

Sink 语义保证 备注
HDFS Exactly-once  
Elasticsearch At-least-once  
Kafka Produce At-least-once / At-most-once Kafka 0.9和0.10提供At-least-once
Kafka 0.11提供Exactly_once
File At-least-once  
Redis At-least-once  

三、实例演示

之前都是print()的sink方式,此处演示sink到Txt文件和Redis数据库。

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.core.fs.FileSystem;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;

public class AddSinkReivew {
    public static void main(String[] args) throws Exception{
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        DataStreamSource<String> lines = env.socketTextStream("192.168.***.***", 8888);

        SingleOutputStreamOperator<Tuple2<String, Integer>> words = lines.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
            @Override
            public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) throws Exception {
                String[] words = s.split(",");
                for (int i = 0; i < words.length; i++) {
                    collector.collect(Tuple2.of(words[i], 1));
                }
            }
        });

        SingleOutputStreamOperator<Tuple2<String, Integer>> summed = words.keyBy(0).sum(1);

        summed.print();

        summed.writeAsText("C:\\Users\\Dell\\Desktop\\flinkTest\\sinkout1.txt", FileSystem.WriteMode.OVERWRITE);

        SingleOutputStreamOperator<Tuple3<String,String, Integer>> words2 = lines.flatMap(new FlatMapFunction<String, Tuple3<String,String, Integer>>() {
            @Override
            public void flatMap(String s, Collector<Tuple3<String,String, Integer>> collector) throws Exception {
                String[] words = s.split(",");
                for (int i = 0; i < words.length; i++) {
                    collector.collect(Tuple3.of("wordscount",words[i], 1));
                }
            }
        });

        SingleOutputStreamOperator<Tuple3<String,String, Integer>> summed2 = words2.keyBy(1).sum(2);

        String configPath = "C:\\Users\\Dell\\Desktop\\flinkTest\\config.txt";
        ParameterTool parameters = ParameterTool.fromPropertiesFile(configPath);
        //设置全局参数
        env.getConfig().setGlobalJobParameters(parameters);

        summed2.addSink(new MyRedisSinkFunction());

        env.execute("AddSinkReivew");
    }
}
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
import redis.clients.jedis.Jedis;

public class MyRedisSinkFunction extends RichSinkFunction<Tuple3<String, String, Integer>>{
    private transient Jedis jedis;
    @Override
    public void open(Configuration config) {
        ParameterTool parameters = (ParameterTool)getRuntimeContext().getExecutionConfig().getGlobalJobParameters();
        String host = parameters.getRequired("redis.host");
        String password = parameters.get("redis.password", "");
        Integer port = parameters.getInt("redis.port", 6379);
        Integer timeout = parameters.getInt("redis.timeout", 5000);
        Integer db = parameters.getInt("redis.db", 0);
        jedis = new Jedis(host, port, timeout);
        jedis.auth(password);
        jedis.select(db);
    }

    @Override
    public void invoke(Tuple3<String, String, Integer> value, Context context) throws Exception {
        if (!jedis.isConnected()) {
            jedis.connect();
        }
        //保存
        jedis.hset(value.f0, value.f1, String.valueOf(value.f2));
    }

    @Override
    public void close() throws Exception {
        jedis.close();
    }
}

WriteAsText中指定的sinkout1.txt并不是一个文件,而是会生成同名文件夹。里面有4个文件对应并行度,保存不同subTask sink的结果。

*

单个文件夹内保存的结果如下:

*

Redis中也有了sink的结果

*

版权声明:本文不是「本站」原创文章,版权归原作者所有 | 原文地址: