The Flow of TensorFlow

The Flow of TensorFlow from Jeongkyu Shin

1. The Flow of TensorFlow Jeongkyu Shin Lablup Inc. 2017. 11. 12 / GDG DevFest Nanjing 2017 2017. 11. 19 / GDG DevFest Seoul 2017

2. Descript.ion § CEO / Co-founder, Lablup Inc. § Develops Backend.AI § Open-source devotee § Google Developer Experts (Machine Learning) § Principal Researcher, KOSSLab., Korea § Textcube open-source project maintainer (10th anniversary!) § Physicist / Neuroscientist § Adj. professor (Dept. of CSE, Hanyang Univ.) § Ph.D in Statistical Physics (complex systems / computational neuroscience) Jeongkyu Shin / @inureyes

3. Machine Learning Era: All came from dust § Machine learning § ”Field of study that gives computers the ability to learn without being explicitly programmed” Arthur Samuel (1959) § "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” Tom Michel (1999) § Type of Machine Learning § Supervised learning § Unsupervised learning § Reinforcement learning § Recommender system

4. Artificial Intelligence § Definition § Allan Turing, ‘The Imitation Game” (1950) => Turing test § John McCarthy, Dartmouth Artificial Intelligence Conference (1956) § Information Processing Language (1955) § From axiom to theory § Heuristics to reduce probing space § Born of LISP programming language § First approach : IF-THEN rule § Probe every possible cases and choose the pathway with highest fitness

5. Artificial Neural Network: Basics § Effect of layers A. K. Jain, J. Mao, K. M. Mohiuddin (1996) Artificial Neural Networks: A Tutorial IEEE Computer 29

6. Winter was coming § First winter (1970s) § Complex problems: too difficult to construct logic models (by hand) § Second winter (1990s) § Overfitting problem → pre-training, supervised backpropagation → dropout (2013) § Convergence → vanishing gradient problem (1991) § Divergence problem → weight decay / sparsity regularization § Tedious training speed → IT evolution, mini-batch § And the spring: Environmental changes open the gate § Rise of big-data § Phenomenal computation cost reduction

7. Deep Learning: flower of the golden era § What if you have enough money to do (formally) crazy experiments? Like § Increase the number of hidden layers § Pour unlimited number of data § Breakthrough of deep learning § Geoffrey Hinton (2005) § Andrew Ng (2012) § Convolution Neural Network § Pooling layer + weight § Recurrent Neural Network § Feedforward routine with (long/short) term memory § Deep disbelief Network § Multipartite neural network with generative model § Deep Q-Network § Using deep learning for reinforcement learning

8. AlphaGo as a mixture of Machine Learning techniques § Reducing search space § Breadth reduction § And depth reduction § Prediction § 13 layer convolutional NN § Value network § Policy network § Principal variation

9. Flow of TensorFlow Still less than two years passed.

10. TensorFlow § Open-source software library for machine learning across a range of tasks § Developed by Google (Dec. 2015~) § Characteristics § Python API (like Theano) § From 1.0, TensorFlow expands native API binding with Java C, etc. § Supports § Linux, macOS § NVidia GPUs (pascal and above)

11. Before TensorFlow § User-friendly Deep-learning toolkits § Caffe (2012) § Generalized programming method to researchers § Provides common NN blocks § Configuration file + training kernel program § Theano (2013~2017) § User code / configuration part is written in Python § Keras (2015~) § Meta-framework for Deep Learning programming § Supports various backends: § Theano (default) / TensorFlow (2016~) / MXNet (2017~) / CNTK (WIP) § ETC § Paddle, Chainer, DL4J…

12. TensorFlow: Summary § Statistics § More than 24000 commits since Dec. 2015 § More than 1140 committers § More than 24000 forks for last 12 months § Dominates Bootstrap! (15000) § More than 6400 TensorFlow-related repository created on GitHub § Current § Complete ML model prototyping § Distributed training § CPU / GPU / TPU / Mobile support § TensorFlow Serving § Enables easier inference / model serving § XLA compiler (1.0~) § Support various environments / speedups § Keras API Support (1.2~) § High-level programming API § Keras-compatible API § Eager Execution (1.4~) § Interactive mode of TensorFlow § Treat TensorFlow python code as real python code https://www.infoworld.com/article/3233283/javascript/at-github-javascript-rules-in-usage-tensorflow-leads-in-forks.html

13. TensorFlow: Summary § TensorFlow Serving § Enables easier inference / model serving § XLA compiler (1.0~) § Support various environments / speedups § Keras API Support (1.2~) § High-level programming API § Keras-compatible API § Eager Execution (1.4~) § Interactive mode of TensorFlow § Treat TensorFlow python code as real python code 2016 2017 ⏤ TensorFlow Serving ⏤ Keras API ⏤ Eager Execution ⏤ TensorFlow Lite ⏤ XLA ⏤ OpenAL w/ OpenCompute ⏤ Distributed TensorFlow ⏤ Multi GPU support ⏤ Mobile TensorFlow ⏤ TensorFlow Datasets ⏤ SKLearn (contrib) ⏤ TensorFlow Slim ⏤ SyntaxNet ⏤ DRAGNN ⏤ TFLearn (contrib) ⏤ TensorFlow TimeSeries

14. How TensorFlow works § CPU § Multiprocessor § AVX-based acceleration § GPU part in chip § OpenMP § GPU § CUDA (NVidia) ➜ cuDNN § OpenCL (AMD) ➜ ComputeCPP / ROCm § TPU (1st, 2nd gen.) § ASIC for accelerating matrix calculation § In-house development by Google https://www.tensorflow.org/get_started/graph_viz

15. How TensorFlow works § Python but not Python § Python API is default API for TensorFlow § However, TF core is written in C++, with cuDNN library (for GPU acceleration) § Computation Graph § User TF code is not a code § it is a configuration to generate computation graph § Session § Creates a computation graph and run the training using C++ core § Tedious debug process

16. Google I/O 2017 / TensorFlow Frontiers How TensorFlow works

17. TensorFlow Features § Recent TensorFlow core features § TensorFlow Estimators § Included in 1.4 (Oct. 2017) / high-level API for using, modeling well-known estimators § TensorFlow Serving (independent project) § TensorFlow Keras-compatible API (Sep. 2017) § Included in 1.3 (Sep. 2017) § TensorFlow Datasets § Included in 1.4 (Oct. 2017) § Upcoming/testing TensorFlow core features § TensorFlow eager execution § Introduced in 1.4 (Oct. 2017) § TensorFlow Lite § (Work-in-progress)

18. XLA: linear algebra compiler for TensorFlow Google I/O 2017 / TensorFlow Frontiers

19. TensorFlow Serving § Serving system for inference service § Components § Servables § Loaders § Managers § Features § Model building § Model versioning § Model saving / loading § Online inference support with RPC

20. Keras-compatible API for TensorFlow § Keras ( https://keras.io ) § High-level API § Focus on user experience § “Deep learning accessible to everyone” § History § Announced at Feb. 2017 § Bundled as an contribution package from TF 1.2 § Official core package since 1.4 § Characteristics § “Simplified workflow for TensorFlow users, more powerful features to Keras users” § Most Keras code can be used on TensorFlow (with keras. to tf.keras.) § Can mix Keras code with TensorFlow codes

21. TensorFlow Datasets § New way to generate data pipeline § Dataset classes § TextLineDataset § TFRecordDataset § FixedLengthRecordDataset § Iterator

22. Example: Decoding and resizing image data # Reads an image from a file, decodes it into a dense tensor, and resizes it # to a fixed shape. def _parse_function(filename, label): image_string = tf.read_file(filename) image_decoded = tf.image.decode_image(image_string) image_resized = tf.image.resize_images(image_decoded, [28, 28]) return image_resized, label # A vector of filenames. filenames = tf.constant(["/var/data/image1.jpg", "/var/data/image2.jpg", ...]) # `labels[i]` is the label for the image in `filenames[i]. labels = tf.constant([0, 37, ...]) dataset = tf.data.Dataset.from_tensor_slices((filenames, labels)) dataset = dataset.map(_parse_function)

23. Eager execution § Announced at Oct. 30, 2017 § Makes TensorFlow execute operations immediately § Returns concrete values § Provides § A NumPy-like library for numerical computation § Support for GPU acceleration and automatic differentiation § A flexible platform for machine learning research and experiments § Advantages § Python debugger tools § Immediate error reporting § Easy control flow § Python data structures

24. Example: Session x = tf.placeholder(tf.float32, shape=[1, 1]) m = tf.matmul(x, x) print(m) # Tensor("MatMul:0", shape=(1, 1), dtype=float32) with tf.Session() as sess: m_out = sess.run(m, feed_dict={x: [[2.]]}) print(m_out) # [[4.]] x = [[2.]] m = tf.matmul(x, x) print(m) # tf.Tensor([[4.]], dtype=float32, shape=(1,1))

25. Example: Instant error x = tf.gather([0, 1, 2], 7) InvalidArgumentError: indices = 7 is not in [0, 3) [Op:Gather]

26. Example: removing metaprogramming x = tf.random_uniform([2, 2]) with tf.Session() as sess: for i in range(x.shape[0]): for j in range(x.shape[1]): print(sess.run(x[i, j])) x = tf.random_uniform([2, 2]) for i in range(x.shape[0]): for j in range(x.shape[1]): print(x[i, j])

27. a = tf.constant(6) while not tf.equal(a, 1): if tf.equal(a % 2, 0): a = a / 2 else: a = 3 * a + 1 print(a) Eager execution: Python Control Flow # Outputs tf.Tensor(3, dtype=int32) tf.Tensor(10, dtype=int32) tf.Tensor(5, dtype=int32) tf.Tensor(16, dtype=int32) tf.Tensor(8, dtype=int32) tf.Tensor(4, dtype=int32) tf.Tensor(2, dtype=int32) tf.Tensor(1, dtype=int32)

28. def square(x): return tf.multiply(x, x) # Or x * x grad = tfe.gradients_function(square) print(square(3.)) # tf.Tensor(9., dtype=tf.float32 print(grad(3.)) # [tf.Tensor(6., dtype=tf.float32))] Eager execution: Gradients

29. def square(x): return tf.multiply(x, x) # Or x * x grad = tfe.gradients_function(square) gradgrad = tfe.gradients_function(lambda x: grad(x)[0]) print(square(3.)) # tf.Tensor(9., dtype=tf.float32) print(grad(3.)) # [tf.Tensor(6., dtype=tf.float32)] print(gradgrad(3.)) # [tf.Tensor(2., dtype=tf.float32))] Eager execution: Gradients

30. def log1pexp(x): return tf.log(1 + tf.exp(x)) grad_log1pexp = tfe.gradients_function(log1pexp) print(grad_log1pexp(0.)) Eager execution: Custom Gradients Works fine, prints [0.5]

31. def log1pexp(x): return tf.log(1 + tf.exp(x)) grad_log1pexp = tfe.gradients_function(log1pexp) print(grad_log1pexp(100.)) Eager execution: Custom Gradients [nan] due to numeric instability

32. @tfe.custom_gradient def log1pexp(x): e = tf.exp(x) def grad(dy): return dy * (1 - 1 / (1 + e)) return tf.log(1 + e), grad grad_log1pexp = tfe.gradients_function(log1pexp) # Gradient at x = 0 works as before. print(grad_log1pexp(0.)) # [0.5] # And now gradient computation at x=100 works as well. print(grad_log1pexp(100.)) # [1.0] Eager execution: Custom Gradients

33. tf.device() for manual placement with tf.device(“/gpu:0”): x = tf.random_uniform([10, 10]) y = tf.matmul(x, x) # x and y reside in GPU memory Eager execution: Using GPUs

34. The same APIs as graph building (tf.layers, tf.train.Optimizer, tf.data etc.) model = tf.layers.Dense(units=1, use_bias=True) optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1) Eager execution: Building Models

35. model = tf.layers.Dense(units=1, use_bias=True) optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1) # Define a loss function def loss(x, y): return tf.reduce_mean(tf.square(y - model(x))) Eager execution: Building Models

36. Compute and apply gradients for (x, y) in get_next_batch(): optimizer.apply_gradients(grad_fn(x, y)) Eager execution: Training Models

37. Compute and apply gradients grad_fn = tfe.implicit_gradients(loss) for (x, y) in get_next_batch(): optimizer.apply_gradients(grad_fn(x, y)) Eager execution: Training Models

38. Comparison TensorFlow TFlearn TF Slim TF Eager Execution Keras (with TF backend) Keras (with MXNet backend) PyTorch CNTK MXNet Difficulty ■■■■ ■■■ ■■ ■■ ■■ ■■■ ■ ■■■■ ■■■■ Extensibility ■■■■ ■■■■ ■■■■ ■■ ■■ ■■ ■ ■■■■ ■■■■ Interactive mode X X X O X X O X X Multi-CPU (NUMA) O O X X O O O O O Multi-CPU (Cluster) O O O X O O X O O Multi-GPU (single node) O O O X O O ? (manual multi- batch) O O Multi-GPU (Cluster) O O O X O O X O O

39. TensorFlow Lite § TensorFlow Lite: Embedded TensorFlow § No additional environment installation required § OS level hardware acceleration § Leverages Android NN § XLA-based optimization support § Enables binding to various programming languages § Developer Preview (4 days ago) § Part of Android O-MR1 Google I/O 2017 / Android meets TensorFlow

40. TensorFlow Lite § Format § FlatBuffers instead of ProtocolBuffers § Provides converter § Models § InceptionV3 § MobileNets: vision-specific model family § API § Java § C++

41. TensorFlow Lite: Why and How § Why? Less traffic / faster response § Image / OCR, Speech <-> Text, Translation, NLP § Motion, GPS and more § ML can extract the meaning from raw data § Image recognition: Send raw image vs. send detected label § Motion detection: Send raw motion vs. send feature vector § How? Model compression § Graph freezing § Graph conversion tools § Quantization § Weight § Calculation § Memory mapping Google I/O 2017 / Android meets TensorFlow

42. Android Neural Network API § New APIs for NeuralNet § Part of Android Framework § Since next Android release § Reduce the library duplication through apps. § Supports Hardware acceleration § GPU, DSP, ISP, NeuralNet chips, etc. Google I/O 2017 / Android meets TensorFlow

43. Flow goes to: market What is flowing through the stream?

44. Market: API-based (personalized) deep learning service § Service with pre-baked models via API § Focuses on the fields that does not require real-time § e.g. Microsoft Azure Cognitive service § Pre-trained ANN + personalized data = personalized NN § Easy personalization : server-side training + =

45. Market: User-side deep learning services § Inference with trained models § Does not require heavy calculation § e.g. ARMv7 with ~512MB / 1GB RAM § Toys / light products § Smart toys for kidult (adult + kids) : Self-driving R/C car / drone § Home appliance and controllers § IoT + ML § Locality : Home (per room), Car, Office, etc. § E.g. Smart home resource management systems

46. Market: Deep Learning service for everyone § Digital assistants War § Digital assistant (with sprakers): Gateway of deep learning based services § Context extraction + inference + features § Echo (Amazon) / Google Home (Google) § Microsoft (Cortana in every MS products) / Apple (HomePod) § Korea? Also entering the war field § Naver: Wave / Friends § Kakao: Kakao mini § SK: Nugu

47. Flow goes to: tech. What is flowing through the stream?

48. Portability and extensibility § Training on § Mac / windows § GPU server § GPU / TPU on Cloud § Prediction / Inference using § Android / iOS § Raspberry Pi and TPU § Android Things Google I/O 2017 / Android meets TensorFlow

49. Open-source Machine Learning Framework § Machine Learning Framework: (almost) open-source § Google: TensorFlow (2015~) § Microsoft: CNTK (2016~) § Amazon: MxNet (2015~) § Facebook: Caffe 2 (2017~) / PyTorch (2016~) § Baidu: PaddlePaddle (2016~) § Why? § 2017 § General goal of new versions: user-friendly syntax § Rise of Keras, PyTorch leads TensorFlow Eager execution

50. Server-side machine learning § Machine learning workload characteristics § Training § Requires ultra-heavy computation resources § Need to feed big, indexed data § OR, (reinforcement learning) need pair model / training environment to give feedbacks § Serving § Requires (relatively) light resources: § Low CPU cost § Middle memory capacity (to load NeuralNet)

51. TensorFlow: Multiverse § TensorFlow AMD GPU acceleration § OpenCL with ComputeCPP (Feb. 2017) § Accelerates c++ codes (codeplay) § Khronos support / SYCL standard § Still in early stage § Only supports Linux § ROCm (AMD) based TensorFlow (Sep. 2017) § First open-source HPC/Hyperscale-class platform for GPU computing § LLVM based / HCC C++ / GCN compiler § https://github.com/ROCmSoftwarePlatform/ hiptensorflow

52. Hand-held machine learning: Why? § Issues from real-time models / apps § Autopilot § Real-time effect on photos / videos § Voice recognition § Automators § Privacy issues § Increasing privacy information § ETC § Lead the network cost reduction

53. Hand-held machine learning: How? § Apple’s approach § Keeping user privacy with Differential Privacy § Gather Anonymized user data § User-specific machine learning models: keep them in the phone § e.g. Photo face detection / voice recognition / smart keyboard § Core ML (iOS 11) § Support Machine Learning model as function (.mlmodel format) § Google’s approach § Ultra-large scale server side training using TPU (2nd gen.) § Mobile: Handles data compression and feature extraction (to reduce traffic) § On the mobile: § Android NeuralNet API (Android O) § TensorFlow Lite on Android (Android O) https://backchannel.com/an-exclusive-look-at-how-ai-and-machine-learning-work-at-apple-8dbfb131932b

54. Hand-held machine learning: How? § Train on server, Serve on smartphone § Enough to serve pre-trained models on smartphones § Both train and serve on smartphone § Keeping privacy / reduce traffic / personalization § Uses GPUs on recent smartphones § Working together § Feature extraction / compression / preprocessing ‒ Mobile side § Machine Learning model training / updating / streaming advanced models ‒ Server side

55. Hand-held machine learning: How? § TensorFlow § Supports both Android and iOS § XCode and Android Studio § XLA compiler framework since TensorFlow 1.0: § Will support diverse languages / environments § Also, optimizing for smartphones and tablets § MobileNet (Apr. 2017) § Efficient Convolutional Neural Networks for Mobile Vision Applications § TensorFlow Lite (Nov. 2017): development focus § Built-in operators for both quantized models (int (8bit) / fixed point) and floating point models (FP10, FP16) § Support for embedded GPUs / ASICs

56. Browser-side machine learning § Machine Learning without hassle § Ingredients for machine learning: Computation, Data, Algorithm § XLA: provides binary-code level optimization for various environment § Do we have cross-platform computation environment? § Java? § Browser! § Recent improvements of web browser § WebGL § Unified programming environment for many GPU-enabled machines § WebAssembly § Binary-level optimization § Shipped to every mainstream browser! (just in this week)

57. Convertible NeuralNet format § ONNX (Open Neural Network Exchange) § Microsoft / Facebook (Sep. 2017) § Caffe 2, PyTorch (by Facebook), CNTK (Microsoft) § MLMODEL (Code ML model, Machine Learning Model) § Apple (Aug. 2017) § Caffe, Keras, scikit-learn, LIBSVM (Open Source) § Provides Core ML converter / specification

58. Recap § Machine Learning / Artificial Intelligence § Flow of TensorFlow § TensorFlow Serving Project § Keras-compatible API § Datasets § Eager execution § TensorFlow Lite § Flow goes to § More user-friendly toolkits / frameworks § API-based / personalized § User-side inference / Hand-held ML § Convertible Machine Learning Model formats

59. End! Thank you for listening https://www.lablup.ai https://backend.ai https://cloud.backend.ai https://www.codeonweb.com https://github.com/lablup Lablup Inc. Backend.AI Backend.AI Cloud CodeOnWeb Service Github repository

저작자표시 비영리 변경금지 (새창열림)

'정보공유' 카테고리의 다른 글

Deep learning text NLP and Spark Collaboration (0)	2018.11.26
2018년 국민디자인단 매뉴얼 (0)	2018.11.23
통계청 국민디자인단 운영계획_통계청 혁신행정담당관실 (0)	2018.11.23
Energy Report - Strange Brew: Adapting to Changing Fundamentals (0)	2018.11.23
Backend.AI: 오픈소스 머신러닝 인프라 프레임워크 (1)	2018.11.23
구글의 머신러닝 비전: TPU부터 모바일까지 (0)	2018.11.23
[책] 신경끄기의 기술 (0)	2018.11.22
[책] 아날로그의 반격 (0)	2018.11.22

The Flow of TensorFlow

'정보공유' 카테고리의 다른 글

관련글

티스토리툴바