Introduction to Data Science and Machine Learning Featuring Python
Overview
Data science is an exciting discipline, which leverages Machine Learning and Artificial Intelligence to enable decision makers to turn raw data into understanding, insight and actionable options. With the enormous volume and variety of data being created and collected daily, Data Science is one of today’s fastest-growing and critically important fields for businesses, organizations and government. Data Scientists are in demand by both industry and the public sector with robust job growth expected well into the next decade.
Course Description:
This hands on, project-based course is an introduction to the Python programming language as well as to SQL database design. Students will learn the fundamentals of problem solving and algorithms, as well as how to use the leading development environments for tackling Data Science challenges and building real world applications with Machine Learning capabilities. This course will provide a strong foundational knowledge to start a career as a data scientist.
Objective:
After taking this class, students are expected to:
- Understand fundamentals of the Python programming language and create scripts that interact with data sets, machine learning models and databases,
- Interact with data sets in various formats and create meaningful visualizations based on business requirements,
- Understand the basics of Python-based machine learning models and when to select the appropriate algorithms based on business requirements,
- Gain proficiency with the Google Colab programming tool,
- Gain proficiency with the Anaconda Data Science Platform and Jupyter Notebooks,
- Gain Proficiency with installing SQL Databases,
- Gain Proficiency with SQL Database design and writing basic SQL queries,
Target Audience:
Information Architects, Data Analysts, Statisticians, Developers, Business Intelligence professionals, Business Analysts, Big Data specialists, Coders, Web Developers, learners interested in Predictive Analytics and anyone looking to expand their skills and / or advance their career by learning these valuable and in demand knowledge areas.
Course Outline:
Weeks 1 – 5
Week 1: Getting Started with Google Colab; Variables; Conditional Logic;
- Session 1: Getting started with Jupyter Notebook in Google Colab Python Variables: Character, Integers, Floats, Logicals (Booleans) Datatypes; Programming Best Practices
- Session 2: Conditional Logic (if, elif, else) Python Lists: Index and Item
Week 2: For Loops and While Loops; Math Operators; Random Numbers
- Session 3: For Loops; for range Math Operators: +,-,*,/,% (modulus), min, max Concatenation and changing variable datatypes
- Session 4: While Loops and Counter Variables Skipping items in a loop and ending a loop early random number module
Week 3: List and String Manipulations; User Input
- Session 5: Manipulating Lists and Stings List Methods: Append, Pop, Join, Split Tuples and Sets vs. Lists
- Session 6: Working with User Input Requiring User Input each time through a Loop
Week 4: Python Dictionaries & Functions
- Session 7: Python Dictionaries Converting JSON format to Dictionary
- Session 8: Defining a Python Function Adding Parameters when Defining a Function Supplying Arguments when calling a function return values
Week 5: Arrays and Data Frames with Numpy and Pandas Modules
- Session 9: Numpy Module for Array Manipulations Multi-Dimensional (2D, 3D) Arrays
- Session 10: Pandas Module: Data Frames Importing Data as CSV (Comma Separated Values)
Weeks 6 – 10
Week 6: Data Visualizaton with Matplotlib and Seaborn Modules
- Session 11: Matplotlib Module for Data Visualization Plotting Data with Matplotlib Pyplot
- Session 12: Seaborn Module for Data Visualization
Week 7: Object Oriented Programming
- Session 13: Object Oriented Programming (OOP) Classes, Objects, Methods and Properties Defining a Python Class
- Session 14: Defining a Python Class, continued Importing a Class as a Module
Week 8: Data Science: Working with Sales and Customer Review Data
- Session 15: Data Science Project 1: Working with Sales Data Loading Sales CSV file from shared Google Drive Exploring and Visualizing the Data
- Session 16: Data Science Project 2: Working with Customer Review Data Loading Customer Review CSV file from shared Google Drive Exploring and Visualizing the Data
Week 9: Data Science: Data Exploration and Visualization
- Session 17: Data Exploration Plotting Multiple Variables on an x-y graph with color coding Deriving Insight from Data
- Session 18: Data Visualization – creating visually compelling yet clear data visualizations
Week 10: SQL Databases
- Session 19: Install MS SQL Server and MySQL Databases (Windows and iOS)
- Session 20: Connecting Jupyter Notebooks to SQL Databases
Weeks 11 – 15
Week 11: Introduction to Machine Learning
- Session 21: Intro to Linear and Logistic Regression
- Session 22: Intro to principles of Artificial Neural Networks
Week 12: Machine Learning Regression: Predicting Future Prices
- Session 23: Principle of Regression in Data Science and Machine Learning Plotting regression lines in a scatterplot
- Session 24: Predicting future prices for commodities
Week 13: Machine Learning for Natural Language Processing (NLP) and Image Classification
- Session 25: Loading, Manipulating and Analyzing Sentiment in Text String Removing Punctuation and Non-Keywords from Strings
- Session 26: Building an Image Classifier using Supervised Learning
Week 14: Using SQL to Generate Data Sets
- Session 27: Write SQL Queries To Populate Data Sets
- Session 28: Revisit “Working with Sales and Customer Review Data” using SQL
Week 15: SQL Database Design Capstone Project Fundamentals
- Session 29: Design Database Tables Based on IMDB CSV Data
- Session 30: Implement SQL Database with Machine Learning Model
FAQs
What is Data Science?
Data science is an exciting discipline that allows you to turn raw data into understanding, insight, and knowledge. It uses analytics and machine learning to help users make predictions, enhance optimization, and improve operations and decision-making. The goal of “R Programming for Data Science” is to help you learn the most important tools in R that will allow you to do data science. As you progress through this course, you’ll learn how to approach a variety of data science challenges, using the best parts of R.
Why is Data Science Important?
Data is one of the important assets in every organization because it helps business leaders make decisions based on facts, statistical numbers and trends The importance of data science is based on the ability to take existing data that is not necessarily useful on its own and combine it with other data points to generate insights an organization can use to learn more about its customers and audience.
Today’s data science teams are expected to answer many questions. Business demands better prediction and optimization based on real-time insights
With the volume and variety of social, mobile and device data, along with new technologies and tools, data science today plays a broader role than ever before. Business considers data science and AI to be a technology-enabled strategy.
Are there jobs available in Data Science?
The short answer is yes. Data science is one of the fastest growing fields today and is expected to continue into the next decade. As most of the fields are emerging continuously, the importance of data science is increasing rapidly. Data science has influenced various areas. Its effect can be observed in multiple sectors such as the retail industry, healthcare, government, financial and education.
It has become an important part of almost every sector. It provides the best solutions that help to fulfill the challenges of the ever-increasing demand and maintainable future. As the importance of data science is increasing day by day, the need for a data scientist is also growing. If you have the skills, there are jobs available not to mention those currently in technical careers (e.g. programming) climbing the career ladder with additional skills such as a data science practitioner.
What about non-technical or leadership roles in Data Science?
As the growth of data accelerates, so does the importance of data science and the teams of data scientists formed to turn this data into useful information, insight and knowledge. While companies prepare for big data integration, business leaders need to adapt their roles as team leaders for their data science employees. Your data science team should have the expertise to process data with freedom, but business leaders still need to understand the basic structures of what’s happening to create value from that data.
Why is this important for you or your organization? A New Era of Business Leader
Put into context in today’s business environment, there’s no situation where it’s okay to say as the leader, I don’t know what’s going on but my team does and that’s good enough. Yet many business leaders don’t know the most basic principles of data science. Business leaders (managers, directors, executives, vice presidents, etc.) don’t need to know the intimate details of data science processes but as the line between big data and business operations disappear, it’s more important than ever for business leaders to speak (understand) a little data science. This translates into to having some basic foundational knowledge.
Why it’s important to understand the basics:
Data science can be good storytelling but it is still science. Telling a story can often obscure the facts or make links where there aren’t any. Having the foundational knowledge or basic proficiency can help you avoid:
- Getting taken - manipulating the data, not telling the whole story, targeted information gaps, all this things could make it easier to coerce or persuade you into a bad decision
- Asking the wrong questions – data pulls are only as good as the questions you’re asking. Data must be evaluated regularly and that requires starting with the right question(s).
- Replicating bias – data is neutral, but it’s aggregation and results are often the product of our preconceived ideas. Understanding the basics of data science helps you sort our the messiness of data in the real world.
Fall 2023
Location: Online (virtual/remote)
Dates: November 14 – February 15 (Tuesday / Thursday evenings) skip dates 11/21, 11/23, 12/26, 12/28
Time: 6:00 pm to 8:00 pm
Catalogue #: CE-COMP 2239
Class #: 92247
Cost: $ 2,450.00
How to Register:
Register over the phone using MC, Visa or Discover. Call 914-606-6830, press 1
You will need the Class # when speaking with a representative.
Office hours for registration are Monday – Thursday 8:30 a.m. to 7:15 p.m.
Friday 8:30 a.m. to 4:30 p.m. (in summer, 9:00 a.m. – 12:00 noon) Saturday 9:00 a.m. to 3:30 p.m. (in summer, closed some Saturdays)For course questions, please contact:
Jim Irvine, Director of Corporate and Continuing Professional Education 914-606-6658 james.irvine@sunywcc.edu