Data Science Essentials in Python: Collect - Organize - Explore - Predict - Value PDF

ebook img

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Data Science Essentials in Python: Collect - Organize - Explore - Predict - Value

Early praise for Data Science Essentials in Python This book does a fantastic job at summarizing the various activities when wrangling data with Python. Each exercise serves an interesting challenge that is fun to pursue. This book should no doubt be on the reading list of every aspiring data scientist. ➤ Peter Hampton Ulster University Data Science Essentials in Python gets you up to speed with the most common tasks and tools in the data science field. It’s a quick introduction to many different techniques for fetching, cleaning, analyzing, and storing your data. This book helps you stay productive so you can spend less time on technology research and more on your intended research. ➤ Jason Montojo Coauthor of Practical Programming: An Introduction to Computer Science Using Python 3 For those who are highly curious and passionate about problem solving and making data discoveries, Data Science Essentials in Python provides deep insights and the right set of tools and techniques to start with. Well-drafted examples and exercises make it practical and highly readable. ➤ Lokesh Kumar Makani CASB expert, Skyhigh Networks We've left this page blank to make the page numbers the same in the electronic and paper books. We tried just leaving it out, but then people wrote us to ask about the missing pages. Anyway, Eddy the Gerbil wanted to say “hello.” Python Companion to Data Science Collect → Organize → Explore → Predict → Value Dmitry Zinoviev The Pragmatic Bookshelf Raleigh, North Carolina Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and The Pragmatic Programmers, LLC was aware of a trademark claim, the designations have been printed in initial capital letters or in all capitals. The Pragmatic Starter Kit, The Pragmatic Programmer, Pragmatic Programming, Pragmatic Bookshelf, PragProg and the linking g device are trade- marks of The Pragmatic Programmers, LLC. Every precaution was taken in the preparation of this book. However, the publisher assumes no responsibility for errors or omissions, or for damages that may result from the use of information (including program listings) contained herein. Our Pragmatic books, screencasts, and audio books can help you and your team create better software and have more fun. Visit us at https://pragprog.com. The team that produced this book includes: Katharine Dvorak (editor) Potomac Indexing, LLC (index) Nicole Abramowitz (copyedit) Gilson Graphics (layout) Janet Furlow (producer) For sales, volume licensing, and support, please contact [email protected]. For international rights, please contact [email protected]. Copyright © 2016 The Pragmatic Programmers, LLC. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior consent of the publisher. Printed in the United States of America. ISBN-13: 978-1-68050-184-1 Encoded using the finest acid-free high-entropy binary digits. Book version: P1.0—August 2016 To my beautiful and most intelligent wife Anna; to our children: graceful ballerina Eugenia and romantic gamer Roman; and to my first data science class of summer 2015. Contents Acknowledgments . . . . . . . . . . . xi Preface . . . . . . . . . . . . . . xiii 1. What Is Data Science? . . . . . . . . . . 1 Unit 1. Data Analysis Sequence 3 Unit 2. Data Acquisition Pipeline 5 Unit 3. Report Structure 7 Your Turn 8 2. Core Python for Data Science . . . . . . . . . 9 Unit 4. Understanding Basic String Functions 10 Unit 5. Choosing the Right Data Structure 13 Unit 6. Comprehending Lists Through List Comprehension 15 Unit 7. Counting with Counters 17 Unit 8. Working with Files 18 Unit 9. Reaching the Web 19 Unit 10. Pattern Matching with Regular Expressions 21 Unit 11. Globbing File Names and Other Strings 26 Unit 12. Pickling and Unpickling Data 27 Your Turn 28 3. Working with Text Data . . . . . . . . . . 29 Unit 13. Processing HTML Files 30 Unit 14. Handling CSV Files 34 Unit 15. Reading JSON Files 36 Unit 16. Processing Texts in Natural Languages 38 Your Turn 44 4. Working with Databases . . . . . . . . . . 47 Unit 17. Setting Up a MySQL Database 48 Contents • viii Unit 18. Using a MySQL Database: Command Line 51 Unit 19. Using a MySQL Database: pymysql 55 Unit 20. Taming Document Stores: MongoDB 57 Your Turn 61 5. Working with Tabular Numeric Data . . . . . . . 63 Unit 21. Creating Arrays 64 Unit 22. Transposing and Reshaping 67 Unit 23. Indexing and Slicing 69 Unit 24. Broadcasting 71 Unit 25. Demystifying Universal Functions 73 Unit 26. Understanding Conditional Functions 75 Unit 27. Aggregating and Ordering Arrays 76 Unit 28. Treating Arrays as Sets 78 Unit 29. Saving and Reading Arrays 79 Unit 30. Generating a Synthetic Sine Wave 80 Your Turn 82 6. Working with Data Series and Frames . . . . . . 83 Unit 31. Getting Used to Pandas Data Structures 85 Unit 32. Reshaping Data 92 Unit 33. Handling Missing Data 98 Unit 34. Combining Data 101 Unit 35. Ordering and Describing Data 105 Unit 36. Transforming Data 109 Unit 37. Taming Pandas File I/O 116 Your Turn 119 7. Working with Network Data. . . . . . . . . 121 Unit 38. Dissecting Graphs 122 Unit 39. Network Analysis Sequence 126 Unit 40. Harnessing Networkx 127 Your Turn 134 8. Plotting . . . . . . . . . . . . . 135 Unit 41. Basic Plotting with PyPlot 136 Unit 42. Getting to Know Other Plot Types 139 Unit 43. Mastering Embellishments 140 Unit 44. Plotting with Pandas 143 Your Turn 146

Description:

Go from messy, unstructured artifacts stored in SQL and NoSQL databases to a neat, well-organized dataset with this quick reference for the busy data scientist. Understand text mining, machine learning, and network analysis; process numeric data with the NumPy and Pandas modules; describe and analyz