Hubert Wang

I am
Hubert Wang

Wechat Official Account
Find fun things here!

5th Annual Global Big Data Conference

Here are my notes after attending the conference.
Part of the presentation slides here.

The Analytics Of Things

Bill Franks, Chief Analytics Officer, International Institute For Analytics

Problems of IoT data

  • No Context.
    • E.g. Position information like gym, airport, home is not being recorded till now. Yet such kind of context information can be very useful.
  • No enough dimensions of data.
    • E.g. To make training process better, MLB improves their measurement by adding the "spin" dimension to current "speed" and "distance" dimensions when collecting baseball's data.
  • Massive number of sensors in total cause data mess.
  • IoT data grows at an unpredictable and exponential speed.

Something to be done of IoT industry in near future

  • Standard must be developed.
    • Some standards can be local (like in home devices), but others must be global (like smart city devices).
  • Governing IoT data.
    • The vast majority of data is almost completely useless. E.g. Most sensors send data every several minutes or even seconds. But most of them remains unchanged and will not be used.
    • 👉 Solution: transmit only the changes.
  • Push processing closer to your things.
    • 👉 Solution: Transmit current centralized analysis to distributed analysis by making the sensors itself do some computing tasks.
  • Aware of hacking by the IoT. Security is the major issue in IoT.
    • Alexa is listening to you.
    • Wireless car knows where you are going now.
    • Power plant's passwords are identical, easy to be hacked.
    • Baby monitor can be stalked.
    • 👉 Solution: Develop protocal for transmitting IoT data, like the HTTP, HTTPS in web.
  • Aware of ownership of data.
    • Connected products make ownership less clear.
    • 👉 Solution: Analyze trees AND forest. Don't only analyze single point data assembly from one line, analyze all lines that connected to this point.
  • Don't let exceptions drive the rules. Let net gains do that.
    • E.g. Should we reject self-driving car just because of car accident happened before? No, actually more people will be saved because self-driving car makes less mistakes and the total number of car accidents will decrease.

Debunking the myth of 100x GPU vs. CPU: efficient training and inference of neural networks on commodity CPUs and embedded devices

Victor Jakubiuk, Chief Science Officer, OnSpecta, MIT
related papers: A Multicore Path to Connectomics-on-Demand, master's paper

Why not GPU / Why CPU?

  • Complex software pipeline
  • At scale would require multiple GPUs
    • Resulting in data distribution issues
    • Expensive (GPU, storage, networking, space, etc.)
  • All neural network frameworks use cuDNN
    • Optimized matrix multiplications, but hard to modify
  • GPU are harder to program
  • Titan X -> throretical peak of 6 TFLOPs
  • Why GPU?
    • Readily available
    • Easy to program & integrate with the rest of the pipeline
    • 18-core Intel Haswell @ 2.5GHz -> theoretical peak of 1.44 TFLOPs

5 Tricks

#1 CNN - Sliding Window Theoretical Speedup

CNN does sliding window across the image and does convolution in each window. Yet the sliding window usually has small stride (the distance the window move each time) like the red and pink box in the diagram below. They have a lot of overlap which results in redundant calculation.

Using the dynamic programming technique, we can significantly speed it up, preserve exactly the same output and re-use the same weights from the original (patch-based) classifiers.

#2 Network Architecture

  • The type of network layers affects speed.
  • Choosing types of network to use is usually a trade off between speed and accuracy.

#3 Fast CPU Framework

  • Intel CPU
    • AVX2 instruction set ("Advanced Vector Extensions")
    • AVX512 (Intel Xeon Phi)
    • 2 FMA ("fused multiply-add") units
    • 32-floating point operations per cycle
    • 18-core, 2.5GHz -> the oretical peak of 1.44 TFLOPS
  • NVIDIA Titan X:
    • Theoretical peak of 6 TFLOPS

#4 Efficient Concurrency - how to scale to multiple cores

  • CILK (Cilk Plus/Cilk++) = “nice threads"
  • General purpose work-stealing scheduler, simple “fork-join" primitive
  • Double-ended work queues
  • Supported by GCC and the Intel Compiler
  • Basic building blocks:
    • Hyperobjects
    • cilk_spawn f(x)
    • cilk_sync;
    • cilk_for (int i = 0; i < N; i++)
  • Linear scaling across multi-core CPUs...
  • As long as the data iterated over fits into L3 cache!
    • Large matrix multiplication must be executed over sub-metrics

#5 2-layer Memory Buffer

  • Throughput, only forward pass
  • Statlcally allocate two large matrix buffers (input & output)
    • Before the network's execution
  • Swap them in between layers
  • Memory bound by the largest layer
  • No dynamic allocation, reduced cache trashing, and OS paging


  • Bind threads to cores (eliminates NUMA Issues)


  • Writing multi-core code is hard: "one weird trick" doesn't exist.
  • Combining algorithmic speed-up and "piercing" thru logical abstraction layers saves computations
  • General purpose frameworks don't fully utilize peak FLOPS
  • Not polluting L3 cache is crucial
  • SIMD/vectorization is hard
  • CILK simplifies parallelization
  • One GPU is not 100x faster than one CPU
    • With careful engineering. performance is on par!
    • These techniques apply to training as well

Data Architectures for AI/Bot driven applications

Venkata Tatavarty, CTO, Context360

Currently the smart Apps are making technologies disappear into the background.

How to build a bot in 2 weeks?

  1. Fintech / Bank API
  2. NLU API
  3. Set Rules — Context usages to create an AI-Based Predictive Market like the diagram below:

Bot architecture of C360

C360 Tech Platform

User Interaction Types for Bots

  1. User Initiated Requests.

   + Specific, historical data, budgeting, etc.
+ E.g.
+ How did I spend[Context] in January[Parameter: time]?
+ How much do I spend[Context] on Amazon[Parameter: business] every month[Parameter: time]?

  1. Assistant Initiated (through notifications)

   + Suggest tips, provide contextually relevant information
+ E.g.
+ How much will I save[Context] at the end of the year[Parameter: time]?

  1. Reminders (through notifications, reminders)

   + Set up by user, acted when an event happens
+ E.g.
+ Remind me to pay Manoj $30 when we are together[Context: Set an IFTTT Rule w Context360]
+ Remind me to not spend more than $20 when I am at Cheesecake Factory[Context: Set an IFTTT Rule w Context360].

Potential Usages of Context Data

  • At a {place/category}
  • When leaves a {place/category}
  • When arrives at a {place/category}
  • Is on business travel
  • When together with {name}
  • Owns/Uses a {device}
  • In the market for buying {name}
  • Is a {person}
  • Set a walk-in geofence at {lat, lng}
  • More 200+ Activities / 50+ Personas @ Context360

Bullet: A Real Time Data Query Engine

Michael Natkovich, Director, Yahoo Inc. & Akshai Sarma, Software Engineer, Yahoo/Oath


Bullet is a real-time query engine that lets you run queries on very large data streams. It's special because it's a look-forward query system. Queries are submitted first and they operate on data that arrive after the query is submitted. This system is open-source with docs.

This example gets the Top 3 most popular type values. (More examples here)



Write a Comment