Introduction to Data Science and Machine Learning Featuring Python
Overview
Data science is an exciting discipline, which leverages Machine Learning and Artificial Intelligence to enable decision makers to turn raw data into understanding, insight and actionable options. With the enormous volume and variety of data being created and collected daily, Data Science is one of today’s fastest-growing and critically important fields for businesses, organizations and government. Data Scientists are in demand by both industry and the public sector with robust job growth expected well into the next decade.
Course Description:
This hands on, project-based course is an introduction to the Python programming language as well as to higher-level Microsoft Excel features. Students will learn the fundamentals of problem solving and algorithms, as well as how to use the leading development environments for tackling Data Science challenges and building real world applications with Machine Learning capabilities. This course will provide a strong foundational knowledge to advance on to our Data Science and Artificial Intelligence Practitioner programs from our IBM Skills Academy programs.
Objective:
This course is for who are interested in learning more about data science and / or pursuing a career in data science and machine learning. Learners will gain foundational knowledge of the technology capabilities and insight into the benefits and power of Data Science and Machine Learning.
Additionally, this course provides a pathway and prerequisite to our IBM Skills Academy Data Science Practitioner program. Participants will learn ways to develop a competitive edge and how the right metrics can help to achieve strategic business goals.
Prerequisites:
A solid foundational knowledge in MS Excel and familiarity with a programming language.
Target Audience:
Information Architects, Data Analysts, Statisticians, Developers, Business Intelligence professionals, Business Analysts, Big Data specialists, Coders, Web Developers, learners interested in Predictive Analytics and anyone looking to expand their skills and / or advance their career by learning these valuable and in demand knowledge areas.
Course Outline:
Weeks 1 – 5
Week 1: Getting Started with Google Colab; Variables; Conditional Logic;
- Session 1: Getting started with Jupyter Notebook in Google Colab Python Variables: Character, Integers, Floats, Logicals (Booleans) Datatypes; Programming Best Practices
- Session 2: Conditional Logic (if, elif, else) Python Lists: Index and Item
Week 2: For Loops and While Loops; Math Operators; Random Numbers
- Session 3: For Loops; for range Math Operators: +,-,*,/,% (modulus), min, max Concatenation and changing variable datatypes
- Session 4: While Loops and Counter Variables Skipping items in a loop and ending a loop early random number module
Week 3: List and String Manipulations; User Input
- Session 5: Manipulating Lists and Stings List Methods: Append, Pop, Join, Split Tuples and Sets vs. Lists
- Session 6: Working with User Input Requiring User Input each time through a Loop
Week 4: Python Dictionaries & Functions
- Session 7: Python Dictionaries Converting JSON format to Dictionary
- Session 8: Defining a Python Function Adding Parameters when Defining a Function Supplying Arguments when calling a function return values
Week 5: Arrays and Data Frames with Numpy and Pandas Modules
- Session 9: Numpy Module for Array Manipulations Multi-Dimensional (2D, 3D) Arrays
- Session 10: Pandas Module: Data Frames Importing Data as CSV (Comma Separated Values)
Weeks 6 – 10
Week 6: Data Visualizaton with Matplotlib and Seaborn Modules
- Session 11: Matplotlib Module for Data Visualization Plotting Data with Matplotlib Pyplot
- Session 12: Seaborn Module for Data Visualization
Week 7: Object Oriented Programming
- Session 13: Object Oriented Programming (OOP) Classes, Objects, Methods and Properties Defining a Python Class
- Session 14: Defining a Python Class, continued Importing a Class as a Module
Week 8: Data Science: Working with Sales and Customer Review Data
- Session 15: Data Science Project 1: Working with Sales Data Loading Sales CSV file from shared Google Drive Exploring and Visualizing the Data
- Session 16: Data Science Project 2: Working with Customer Review Data Loading Customer Review CSV file from shared Google Drive Exploring and Visualizing the Data
Week 9: Data Science: Advanced Data Exploration and Visualization
- Session 17: Data Science Project 3: Advanced Data Exploration Plotting Multiple Variables on an x-y graph with color coding Deriving Insight from Data
- Session 18: Data Science Project 4: Advanced Data Visualization The art and science of visually compelling yet clear data visualizations
Week 10: Amimated Bar Chart Race as Data Visualization:
- Session 19: Animated Bar Chart Race: Population rankings changing over time
- Session 20: Animated Bar Chart Race: All-Time HR leaders changing over time
Weeks 11 – 15
Week 11: Introduction to Machine Learning & Neural Networks
- Session 21: Intro to principles of Artificial Neural Networks and Supervised Learning MNIST hand-written digits dataset classifier, part 1
- Session 22: Models: Building, Training and Evaluating a Neural Network MNIST hand-written digits dataset classifier, part 2
Week 12: Machine Learning Regression: Predicting Future Prices
- Session 23: Principle of Regression in Data Science and Machine Learning Plotting regression lines in a scatterplot
- Session 24: Working with Facebook Prophet (prophetfb) Predicting future prices for commodities
Week 13: Machine Learning for Natural Language Processing (NLP)
- Session 25: Loading, Manipulating and Analyzing Sentiment in Text String Removing Punctuation and Non-Keywords from Strings The nltk modle (Natural Language ToolKit)
- Session 26: Building a ML Model which evaluates text for Sentiment Training, Testing and Evaluating the ML Model
Week 14: Machine Learning for Computer Vision (CV)
- Session 27: Computer Vision modules and image manipulations the OpenCV and PIL (Python Image Library) modules
- Session 28: Building an Image Classifier using Supervised Learning Is it a Dog or a Cat: Training and Testing an ML Model Conclusion and direction for further study
Week 15: YOLO (You Only Look Once) Computer Vision Engine
- Session 29: An overview of YOLO (You Only Look Once) image classifier engine
- Session 30: Training, Testing and Evaluating a YOLO Model
FAQs
What is Data Science?
Data science is an exciting discipline that allows you to turn raw data into understanding, insight, and knowledge. It uses analytics and machine learning to help users make predictions, enhance optimization, and improve operations and decision-making. The goal of “R Programming for Data Science” is to help you learn the most important tools in R that will allow you to do data science. As you progress through this course, you’ll learn how to approach a variety of data science challenges, using the best parts of R.
Why is Data Science Important?
Data is one of the important assets in every organization because it helps business leaders make decisions based on facts, statistical numbers and trends The importance of data science is based on the ability to take existing data that is not necessarily useful on its own and combine it with other data points to generate insights an organization can use to learn more about its customers and audience.
Today’s data science teams are expected to answer many questions. Business demands better prediction and optimization based on real-time insights
With the volume and variety of social, mobile and device data, along with new technologies and tools, data science today plays a broader role than ever before. Business considers data science and AI to be a technology-enabled strategy.
Are there jobs available in Data Science?
The short answer is yes. Data science is one of the fastest growing fields today and is expected to continue into the next decade. As most of the fields are emerging continuously, the importance of data science is increasing rapidly. Data science has influenced various areas. Its effect can be observed in multiple sectors such as the retail industry, healthcare, government, financial and education.
It has become an important part of almost every sector. It provides the best solutions that help to fulfill the challenges of the ever-increasing demand and maintainable future. As the importance of data science is increasing day by day, the need for a data scientist is also growing. If you have the skills, there are jobs available not to mention those currently in technical careers (e.g. programming) climbing the career ladder with additional skills such as a data science practitioner.
What about non-technical or leadership roles in Data Science?
As the growth of data accelerates, so does the importance of data science and the teams of data scientists formed to turn this data into useful information, insight and knowledge. While companies prepare for big data integration, business leaders need to adapt their roles as team leaders for their data science employees. Your data science team should have the expertise to process data with freedom, but business leaders still need to understand the basic structures of what’s happening to create value from that data.
Why is this important for you or your organization? A New Era of Business Leader
Put into context in today’s business environment, there’s no situation where it’s okay to say as the leader, I don’t know what’s going on but my team does and that’s good enough. Yet many business leaders don’t know the most basic principles of data science. Business leaders (managers, directors, executives, vice presidents, etc.) don’t need to know the intimate details of data science processes but as the line between big data and business operations disappear, it’s more important than ever for business leaders to speak (understand) a little data science. This translates into to having some basic foundational knowledge.
Why it’s important to understand the basics:
Data science can be good storytelling but it is still science. Telling a story can often obscure the facts or make links where there aren’t any. Having the foundational knowledge or basic proficiency can help you avoid:
- Getting taken - manipulating the data, not telling the whole story, targeted information gaps, all this things could make it easier to coerce or persuade you into a bad decision
- Asking the wrong questions – data pulls are only as good as the questions you’re asking. Data must be evaluated regularly and that requires starting with the right question(s).
- Replicating bias – data is neutral, but it’s aggregation and results are often the product of our preconceived ideas. Understanding the basics of data science helps you sort our the messiness of data in the real world.
Summer/Fall 2022
Location: Online (virtual/remote)
Dates: July 19th – October 27th (Tuesday & Thursday evenings)
Time: 6:00 pm – 8:00 pm
Catalogue #: CE-COMP 2239
Class #: 9536
Cost: $ 2,450.00
How to Register:
Register over the phone using MC, Visa or Discover. Call 914-606-6830, press 1
You will need the Class # when speaking with a representative.
Office hours are Monday – Thursday 8:30 a.m. to 7:15 p.m.
Friday 8:30 a.m. to 4:30 p.m. (in summer, 9:00 a.m. – 12:00 noon) Saturday 9:00 a.m. to 3:30 p.m. (in summer, closed some Saturdays)For course questions, please contact:
Jim Irvine, Director of Corporate and Continuing Professional Education 914-606-6658 james.irvine@sunywcc.edu