Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython

Author: Wes McKinney | Language: English | ISBN: B009NLMB8Q | Format: PDF

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython Description

Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you’ll need to effectively solve a broad set of data analysis problems. This book is not an exposition on analytical methods using Python as the implementation language.

Written by Wes McKinney, the main author of the pandas library, this hands-on book is packed with practical cases studies. It’s ideal for analysts new to Python and for Python programmers new to scientific computing.

Use the IPython interactive shell as your primary development environment
Learn basic and advanced NumPy (Numerical Python) features
Get started with data analysis tools in the pandas library
Use high-performance tools to load, clean, transform, merge, and reshape data
Create scatter plots and static or interactive visualizations with matplotlib
Apply the pandas groupby facility to slice, dice, and summarize datasets
Measure data by points in time, whether it’s specific instances, fixed periods, or intervals
Learn how to solve problems in web analytics, social sciences, finance, and economics, through detailed examples

Product Details
Table of Contents
Reviews

File Size: 6443 KB
Print Length: 472 pages
Page Numbers Source ISBN: 1449319793
Simultaneous Device Usage: Unlimited
Publisher: O'Reilly Media; 1 edition (October 8, 2012)
Sold by: Amazon Digital Services, Inc.
Language: English
ASIN: B009NLMB8Q
Text-to-Speech: Enabled
X-Ray:
Enabled
Lending: Not Enabled
Amazon Best Sellers Rank: #25,085 Paid in Kindle Store (See Top 100 Paid in Kindle Store)
- #6
  in Kindle Store > Kindle eBooks > Computers & Technology > Programming > Python
- #22
  in Books > Computers & Technology > Programming > Languages & Tools > Python
- #24
  in Books > Computers & Technology > Databases > Data Mining
#6
in Kindle Store > Kindle eBooks > Computers & Technology > Programming > Python
#22
in Books > Computers & Technology > Programming > Languages & Tools > Python
#24
in Books > Computers & Technology > Databases > Data Mining

Wes McKinney's "Python for Data Analysis" (O'Reilly, 2012) is a tour pandas and NumPy (mostly pandas) for folks looking to crunch "big-ish" data with Python. The target audience is not Pythonistas, but rather scientists, educators, statisticians, financial analysts, and the rest of the "non-programmer" cohort that is finding more and more these days that it needs to do a little bit-sifting to get the rest of their jobs done.

First, two warnings:

1. **This book is not an introduction to Python.** While McKinney does not assume that you know *any* Python, he isn't exactly going to hold your hand on the language here. There is an appendix ("Python Language Essentials") that beginners will want to read before getting too far, but otherwise you're on your own. ("Lucky for you Python is executable pseudocode"?)

2. **This book is not about theories of data analysis.** What I mean by that is: if you're looking for a book that is going to tell you the *types* of analyses to do, this is not that book. McKinney assumes that you already know, through your "actual" training, what kinds of analyses you need to perform on your data, and how to go about the computations necessary for those analyses.

That being said: McKinney is the principal author on pandas, a Python package for doing data transformation and statistical analysis. The book is largely about pandas (and NumPy), offering overviews of the utilities in these packages, and concrete examples on how to employ them to great effect. In examining these libraries, McKinney also delves into general methodologies for munging data and performing analytical operations on them (e.g., normalizing messy data and turning it into graphs and tables).

I think this book is genuinely trying to be helpful, by giving an extended tutorial on the pandas library; but the tutorial covers only selected topics, and needs to be supplemented with a comprehensive function reference. The narrative also needs to be cut with the help of a strict editor.

If you are trying to decide whether to learn to use the pandas library, this book is for you. It starts with an example of how python and the pandas library can make it easy to do some basic analyses of data, and then develops more specialized chapters: summary statistics, data storage, data transformation (merging and joining), plotting, aggregation, time-series, special considerations for financial or economic data, advanced special topics.

Once I decided to use the pandas library, the book suddenly became less useful. The author has a verbose pedagogical style, and the book never departs from its tutorial perspective. Functions are introduced with examples but no definitions, and it's hard to find the rare summaries of functions, function arguments, or discussion suggesting when to use one method instead of another.

If you want to do something very close to what's done in an example, it's easy to follow along. Once you want to do something not emphasized or covered by an example, there is no guidance, no reference or dictionary section to give any hint about where I might search next --- google will probably direct you to stackoverflow.com, or the official pandas documentation site.

For example, suppose you have loaded your data into a DataFrame, and you want to use another column as the index. The book has several pages on the useful reindex() method, but that method is for resampling the data.

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython Preview

Link

Please Wait...