Data Science

Data Science with Python Training

The Data Science with Python course is designed to impart an in-depth knowledge of the various libraries and packages required to perform data analysis, data visualization, web scraping, machine learning, and natural language processing using Python. The course is packed with real-life projects, assignment, demos, and case studies to give a hands-on and practical experience to the participants.

Mastering advanced analytics techniques: The course also covers advanced analytics techniques like clustering, decision tree, and regression. The course covers time series, it's modelling.

As a part of the course, you are provided with 4 real-life industry projects on customer segmentation, macro calls, attrition analysis, and retail analysis.

    • Mon – Fri ( 6 Weeks ) | 07.30 AM - 9.30 PM Time (IST) (any 2 hours)

    • Sat – Sun ( 8 Weeks ) | 08.30 AM - 10:00 PM Time (IST) (any 3 hours)

Why Data Science with Python?

  • Avg. Salary for Data Science with Python Tester: Rs 5,270,432/- per year
  • Global software testing market to reach $50 billion by 2020 – NASSCOM
  • Data Science with Python has a market share of about 27.7%.
  • Used by top industries across various business Verticals. Ex: Cognizant, Cigniti, Dell, Luxoft, Viasat, etc.
  • Data Science with Python Tester in United States can earn $87,000 –

Objective of the course

By the end of this Data Science with Python training course, you will be able to:

This course will enable you to:

  • Gain an in-depth understanding of data science process, data wrangling, data exploration, data visualization, hypothesis building, and testing. You will also learn the basics of statistics.
  • Install the required Python environment and other auxiliary tools and libraries
  • Understand the essential concepts of Python programming like data types, tuples, lists, dicts, basic operators, and functions.
  • Perform high-level mathematical computing using NumPy package and its large library of mathematical functions
  • Perform scientific and technical computing using SciPy package and its sub-packages such as Integrate, Optimize, Statistics, IO, and Weave.
  • Perform data analysis and manipulation using data structures and tools provided in Pandas package
  • Gain expertise in machine learning using the Scikit-Learn package
  • Gain an in-depth understanding of supervised learning and unsupervised learning models like linear regression, logistic regression, clustering, dimensionality reduction, K-NN, and pipeline
  • Use Scikit-Learn package for natural language processing
  • Use matplotlib library of Python for data visualization
  • Extract useful data from websites by performing web scrapping using Python
  • Integrate Python with Hadoop, Spark, and MapReduce

Who should take the course?

This Data Science with Python certification training course is suitable for:

  • Analytics professionals who want to work with Python
  • Software professionals looking for a career switch in the field of analytics
  • IT professionals interested in pursuing a career in analytics
  • Graduates looking to build a career in Analytics and Data Science
  • Experienced professionals who would like to harness data science in their fields
  • Anyone with a genuine interest in the field of Data Science

Data Science with Training Course Syllabus

This course will introduce you to the field of data science and will prepare you for the next three courses in the MicroMasters: Statistics, Machine Learning, and Spark. To conduct data analysis, you'll learn a collection of powerful, open-source, tools including:

  • python
  • jupyter notebooks
  • pandas
  • numpy
  • matplotlib
  • scikit learn
  • nltk
  • Basic process of data science
  • Python and Jupyter notebooks
  • An applied understanding of how to manipulate and analyze uncurated datasets
  • Basic statistical analysis and machine learning methods
  • How to effectively visualize results

Introduction: Welcome and overview of the course. Introduction to the data science process and the value of learning data science.

Background: In this optional Topic, we provide a brief background in python or unix to get you up and running.

Jupyter and Numpy: Jupyter notebooks are one of the most commonly used tools in data science as they allow you to combine your research notes with the code for the analysis. After getting started in Jupyter, we'll learn how to use numpy for data analysis. numpy offers many useful functions for processing data as well as data structures which are time and space efficient.

Pandas: Pandas, built on top of numpy, adds data frames which offer critical data analysis functionality and features.

Visualization: When working with large datasets, you often need to visualize your data to gain a better understanding of it. Also, when you reach conclusions about the data, you'll often wish to use visualizations to present your results.

Mini Project: With the tools of Jupyter notebooks, numpy, pandas, and Visualization, you're ready to do sophisticated analysis on your own. You'll pick a dataset we've worked with already and perform an analysis for this first project.

Machine Learning: To take your data analysis skills one step further, we'll introduce you to the basics of machine learning and how to use sci-kit learn - a powerful library for machine learning.

Working with Text and Databases: You'll find yourself often working with text data or data from databases. This Topic will give you the skills to access that data. For text data, we'll also give you a preview of how to analyze text data using ideas from the field of Natural Language Processing and how to apply those ideas using the Natural Language Processing Toolkit (NLTK) library.

Final Project: These Topics let you showcase all your new skills in an end-to-end data analysis project. You'll pick the dataset, do the data munging, ask the research questions, visualize the data, draw conclusions, and present your results


Machine learning using python syllabus


  • Machine Learning Languages, Types, and Examples
  • Machine Learning vs Statistical Modelling
  • Supervised vs Unsupervised Learning
  • Supervised Learning Classification
  • Unsupervised Learning
  • K-Nearest Neighbors
  • Decision Trees
  • Random Forests
  • Reliability of Random Forests
  • Advantages & Disadvantages of Decision Trees
  • Regression Algorithms
  • Model Evaluation
  • Model Evaluation: Overfitting & Underfitting
  • Understanding Different Evaluation Models
  • K-Means Clustering plus Advantages & Disadvantages
  • Hierarchical Clustering plus Advantages & Disadvantages
  • Measuring the Distances Between Clusters - Single Linkage Clustering
  • Measuring the Distances Between Clusters - Algorithms for Hierarchy Clustering
  • Density-Based Clustering
  • Dimensionality Reduction: Feature Extraction & Selection
  • Collaborative Filtering & Its Challenges
  • Classifying with k-Nearest Neighbors
  • Splitting datasets one feature at a time: decision trees
  • Classifying with probability theory: naïve Bayes
  • Logistic regression
  • Support vector machines
  • Improving classification with the AdaBoost meta-algorithm
  • Predicting numeric values: regression
  • Tree-based regression
  • Grouping unlabeled items using k-means clustering
  • Association analysis with the Apriori algorithm
  • Efficiently finding frequent itemsets with FP-growth
  • Using principal component analysis to simplify data
  • Simplifying data with the singular value decomposition
  • Big data and MapReduce