Machine Learning Using Google Technology · Knowledge Based Kaggle (more cooperative) Good ConvNet...

transcript

Machine Learning Using Google Technology https://machinelearning.group/!

Thanks to our sponsors:

� About Us & Upcoming Events � ML on the Google Cloud (Google) � Kaggle and TensorFlow (MLG)

•  Vision Contest

� Q & A

Hands-on Knowledge Community

✓  GroupProjects✓  KaggleCompe33ons

✓  Hackathons✓  IndividualProjects

✓  LocalMLCommunity

✓  PersonalNetworking✓  IndustryExposure✓  FieldTrips

✓  StudyGroups✓  Introstoconcepts✓  Detailedpresenta3ons✓  Subjectseries✓  Step-by-steptutorials

� Create an algorithm to distinguish dogs from cats •  https://www.kaggle.com/c/dogs-vs-cats

•  Puppy vs. Cat (2006) � https://www.youtube.com/watch?v=7bcV-TL9mho

•  Surprised Kitty (2009) � https://www.youtube.com/watch?v=0Bmhjf0rKe8

•  Google’s Artificial Brain Learns to Find Cat Videos (2012) �  https://www.wired.com/2012/06/google-x-neural-network/

“The researchers started with a Google-developed algorithm primed to differentiate cats from dogs. Then they fed it images from medical databases …“

https://www.wired.com/2017/01/look-x-rays-moles-living-ai-coming-job/

� Datasets & Real World Problems � Prizes & Recruitment

•  Leadership Boards •  Rankings •  Medals

� Google purchased Kaggle in March 2017

•  Part of Cloud Division

� Example competitions: •  Right Whale Detection (Cornell) •  Predicting Seizures from Brain Waves (NIH) •  Passenger Screening Algorithm (HSA) •  Home Price Prediction (Zillow) •  Hotel Recommendations (Expedia) •  West Nile Virus Prediction •  Shelter Animal Outcomes •  Kobe Bryant Shot Selection

� Discussions � Code Sharing via kernels

•  Python, R, Julia, SQLite

“Kaggle grandmasters say they’re driven as much by a compulsion to learn as to win.”

From: https://www.wired.com/story/solve-these-tough-data-problems-and-watch-job-offers-roll-in/

� A library for developing ML applications � Google’s Brain team Open Sourced in Nov 2015

•  https://www.tensorflow.org/

� Fast & Flexible •  Core is very fast (optimized C++ & CUDA) •  Works on embedded devices

� Low-level

1.  Construct Graph 2.  Drive Computation By Feeding Data

(e.g. Train model or get results from model)

Step 1: Define your Graph/Model

How? Using: A.  Tensors

-  N-dimensional arrays

B.  Operations (aka Ops) -  Takes 0 or more Tensors -  Returns 0 or more Tensors

C.  Variables (they maintain state)

-  Must be explicitly initialized -  Contain tensors

Step 2: Drive Computation (e.g. Get Classification Result)

a. Feeds (send custom input) •  Patching a tensor into any operation •  Placeholder operation (using tf.placeholder ())

b. Fetch (get output) •  Get the outputs of operations by passing in the

tensors to retrieve (using session.run())

If you try to ignore how it works under the hood because “TensorFlow automagically makes my networks learn”, you will not

be ready to wrestle with the dangers it presents, and you will be much less effective at building and debugging neural networks.

- Andrej Karpathy https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496bthy

� Competition:

•  https://www.kaggle.com/c/dogs-vs-cats � Our solution:

•  https://www.kaggle.com/kbhits/tensorflow-starter-kit-fixed

� Knowledge Based Kaggle (more cooperative) � Good ConvNet Baseline available � Get Hands-On Experience With:

•  Python •  Kaggle and Community Feedback •  Data manipulation (normalization, splitting,

augmentation) •  ML concepts •  TensorFlow Library •  ConvNets

� Standardize on •  Python (3.5 required for Windows) •  TensorFlow (Google’s ML Library)

� Choice of •  Operating System (Mac, Windows, Linux) •  Python Package Manager (e.g. Anaconda) •  Python IDE (PyCharm, LiClipse, VIM, …)

� Part of free Udacity Deep Learning course by Google: •  https://www.udacity.com/course/deep-learning--ud730

� Logistic Regression ‘Name the Letter’ (A-J)

� Goal •  Create a model that can accurately classify the images

� Data Included: •  12,500 cat pictures in training set •  12,500 dog pictures in training set •  12,500 dog/cat pictures in test set

� Noticeable differences between NotMNIST: •  Only two classes •  Different sizes and aspect ratios •  RGB instead of greyscale

Step 1: Create Graph •  Layers �  3x: ConvNet + Max Polling �  Feature maps: 32, 32, 64

�  2x: Fully Connected Layers •  Activation functions: ReLU & Softmax •  Loss function: Cross Entropy •  Optimization: RMSPropOptimizer

Step 2: Normalize & Partition data

Step 3: Train model using training set

Step 4: Once trained, show accuracy using test set

� Competition:

•  https://www.kaggle.com/c/digit-recognizer � Our solution:

•  https://github.com/DavidsonMachineLearningGroup/DigitRecognizer

http://imgaug.readthedocs.io/en/latest/index.html

� Free Udacity Course •  https://www.udacity.com/course/deep-learning--ud730

� Amazing ConvNet Blog

•  http://cs231n.github.io/convolutional-networks/

https://machinelearning.group/!

Machine Learning Using Google Technology · Knowledge Based Kaggle (more cooperative) Good ConvNet...

Documents