4.34 out of 5
1472 reviews on Udemy

Data Processing with Python

Learn how to use Python and Pandas for cleaning and reorganizing huge amounts of data.
Ardit Sulce
10,971 students enrolled
Build 10 advanced Python scripts which together make up a data analysis and visualization program.
Solve six exercises related to processing, analyzing and visualizing US income data with Python.
Learn the fundamental blocks of the Python programming language such as variables, datatypes, loops, conditionals, functions and more.
Use Python to batch download files from FTP sites, extract, rename and store remote files locally.
Import data into Python for analysis and visualization from various sources such as CSV and delimited TXT files.
Keep the data organized inside Python in easily manageable pandas dataframes.
Merge large datasets taken from various data file formats.
Create pivot tables in Python out of large datasets.
Perform various operations among data columns and rows.
Query data from Python pandas dataframes.
Export data from Python into various formats such as TXT, CSV, Excel, HTML and more.
Use Python to perform various visualizations such as time series, plots, heatmaps, and more.
Create KML Google Earth files out of CSV files.

Data scientists spend only 20 percent of their time on building machine learning algorithms and 80 percent of their time finding, cleaning, and reorganizing huge amounts of data. That mostly happen because many use graphical tools such as Excel to process their data. However, if you use a programming language such as Python you can drastically reduce the time it takes for processing your data and make them ready for use in your project. This course will show how Python can be used to manage, clean, and organize huge amounts of data.

This course assumes you have basic knowledge of variables, functions, for loops, and conditionals. In the course you will be given access to a million records of raw historical weather data and you will use Python in every single step to deal with that dataset. That includes learning how to use Python to batch download and extract the data files, load thousands of files in Python via pandas, cleaning the data, concatenating and joining data from different sources, converting between fields, aggregating, conditioning, and many more data processing operations. On top of that, you will also learn how to calculate statistics and visualize the final data. The course also covers a series of exercises where you will be given some sample data then practice what you learned by cleaning and reorganizing those data using Python.

Getting Started

Installing Python and Python libraries

You will learn how to install Python through the Anaconda package which is a complete package that will not only install Python into your computer, but also other libraries needed for data analysis and visualizations such as pandas, matplotlib, numpy, scipy, etc.

Python editors - Spyder and iPython

You will learn how to use the Spyder environment to write scripts of Python code and also learn how to use iPython which is an enhanced interactive shell where you type in and execute Python code. iPython is tailored for data analysis applications

Downloading Many Files with Python

Section introduction

Short lecture introducing you to this section of the course.

Navigating through FTP directory trees with Python

You will learn how to write Python code that establishes a connection to an FTP server and accesses the files of the FTP site.

Storing Python code

You will learn how to use the Spyder editor for executing complete scripts of Python code.

Creating an FTP function

You will learn how to create a custom FTP function that logs in to an FTP site and generates a list of file names contained in the site.

Downloading an FTP file

You will learn the Python code that downloads a single file from an FTP site.

About the next lecture

Something to keep in mind for the next lecture.

Practice No.1: Creating an FTP File Downloader

Here we start building our data analysis program.

In this particular lecture, we will build an FTP function that will login to the FTP site, and download a given range of files from the site.

Extracting Data from Archive Files

Extracting ZIP, TAR, GZ and other archive file formats

You will learn how to extract various types of archive files using the patool library and the for loop.

Extracting RAR files

You will learn how to extract RAR archive files.

Practice No.2: Creating a Batch Archive Extractor

Here you will write a function that will fetch the archive files downloaded by the FTP function and it will extract them all in a local directory.

Working with TXT and CSV Files

Section introduction

Short lecture introducing you to this section of the course.

Reading delimited TXT and CSV files

You will learn how to easily read CSV and delimited TXT files using the pandas library and use their data inside Python.

Reading Excel files
Exporting data from Python to files

You will learn how to export data from Python to CSV and TXT files.

Reading fixed width TXT files

You will learn how to open data from TXT files which columns are delimited by a certain width.

Exporting data back to HTML and other file formats

You will learn how to quickly export a pandas dataframe into an HTML file.

Data Analysis Exercise 1
Data Analysis Exercise 1: Solution

Getting Started with Pandas

Get started with Pandas

We already used the pandas library in the previous section. Here you will be given an official tour to the pandas data analysis library.

Practice No.3: Calculating and Adding Columns to CSV Files

You will create a function that grabs all the TXT files of a folder, opens each of them in Python as dataframes, adds a column in each dataframe and exports the updated dataframes back to CSV files.

Data Analysis Exercise 2
Data Analysis Exercise 2: Solution

Merging Data

Practical No.4: Concatenating multiple CSV files

You will write a function that gets all the CSV files and concatenates them vertically using the pandas concatenate function by creating a single CSV containing everything.

Data Analysis Exercise 3
Data Analysis Exercise 3: Solution
Practice No. 5: Joining Data Based on a Matching Column

You will write a function that will join columns of a pandas dataframe to another dataframe.

Data Analysis Exercise 4
Data Analysis Exercise 4: Solution
Data Analysis Exercise 5
Solution: 5 of 6

Data Aggregation

Practice No. 6: Pivoting Large Amounts of Data

You will learn how to use the pandas pivot function by creating a pivoted dataframe out of a large CSV file by aggregating the data values.

Visualizing Data

Data visualization with Python

You will learn how to use the visualization features available in Python and generate graphs using the matplotlib and the seaborn libraries.

More visualization techniques

You will expand your knowledge on performing visualizations of different kinds out of pandas dataframes and adding labels and legends to the generated graphs.

Practice No. 7: Producing Image Files

You will learn create a function that will access the pivoted dataframe and it will generate a graph representing the data, and save the graph inside a PNG image file.

Data Analysis Exercise 6
Data Analysis Exercise 6: Solution

Mapping Spatial Data

Programmatically creating KML Google Earth files with Python

You will learn how to create a point KML file using the simplekml library and display the file in Google Earth.

Practice No, 8: Creating KML Google Earth fIles from CSV data

You will create a function that grabs the data from a pandas dataframe and creates a KML file using the latitude and the longitude information contained in the dataframe.

Putting everything together

User interaction

You will learn how to make your script interact with a user who runs it.

Exercise: User interaction
Exercise: User interaction: Solution
Practice No. 9: Polishing the Program, I

You will learn how to execute all the functions of the programs in one single click.

Practice No. 10: Polishing the Program, II

You will learn how to make your program more user friendly by integrating the user input functionality.

Practice No. 11: Creating Python Modules

You will learn how to convert your program into a Python module so you can import it in other scripts.

Bonus Section: Using Python in Jupyter Notebooks to Boost Productivity

Getting started with Jupyter Notebooks

Setting up Jupyter and learning how to use its keyboard shortcuts.

Data cleaning project, Part I

Learn how to handle a problem of joining raw data with no key column to base the join to.

Data cleaning project, Part II

Learn to apply various operations including in-line visualizations on a Jupyter browser-based notebook.

Bonus Lecture
You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
4.3 out of 5
1472 Ratings

Detailed Rating

Stars 5
Stars 4
Stars 3
Stars 2
Stars 1
30-Day Money-Back Guarantee


4 hours on-demand video
17 articles
Full lifetime access
Access on mobile and TV
Certificate of Completion