README
Last updated
Last updated
The csv
and itertools
Modules ... OR ... The pandas
Package
Create a Python application to automate a business process.
Practice using Python to process CSV files.
Practice researching and leveraging the capabilities provided by Python modules and third-party packages.
Assume you own and operate a successful small business, selling artisan clothing products through an online platform like Amazon, Etsy, or eBay.
At the end of each calendar month, the online platform makes available for download from its admin interface a CSV file representing all individual sales orders for that month. See the "monthly sales data" examples (row per day per product sold that day).
To aid your ability to make data-driven decisions, you decide to create a Python program which will automate the process of gleaning business insights from the monthly sales data.
Use the GitHub.com online interface to create a new remote project repository called something like "monthly-sales". When prompted by the GitHub.com online interface, let's get in the habit of adding a "README.md" file and a Python-flavored ".gitignore" file (and also optionally a "LICENSE") during the repo creation process. After this process is complete, you should be able to view the repo on GitHub.com at an address like https://github.com/YOUR_USERNAME/monthly-sales
.
After creating the remote repo, use GitHub Desktop software or the command-line to download or "clone" it onto your computer. Choose a familiar download location like the Desktop.
After cloning the repo, navigate there from the command-line:
Use your text editor or the command-line to create a file in that repo called "reporter.py", and then place the following contents inside:
Make sure to save Python files like this whenever you're done editing them. After setting up a virtual environment, we will be ready to run this file.
Finally, download one or more of the "monthly sales data" CSV files into your exercise repository, inside a new sub-directory called "data" (e.g. "monthly-sales/data/monthly-sales/sales-201803.csv"). You can assume that each of these CSV files will have a name resembling "sales-YYYYMM.csv" (where "YYYY" represents the four digit year and "MM" represents the zero-padded month). And you can assume each of these CSV files will have the same header row (date
, product
, unit price
, units sold
, sales price
).
Create and activate a new Anaconda virtual environment:
From within the virtual environment, install any packages you think you'll need:
From within the virtual environment, demonstrate your ability to run the Python script from the command-line:
If you see the "GENERATING SALES REPORT FOR MONTH OF OCTOBER 2013..." message, you're ready to move on to project development. This would be a great time to make any desired modifications to your project's "README.md" file (like adding instructions for how to setup and run the app like you've just done), and then make your first commit, with a message like "Setup the repo".
Adapt the contents of the "reporter.py" script to process the CSV file to display a human-readable representation of the given month and year and the total sales for that month.
Example output:
HINT: use either the
csv
module or thepandas
package to read the CSV file contents into memory for further processing
Adapt your script to also identify the top-selling products for that month and the total sales for each. Example output:
HINT: use the
groupby()
method provided by either the theitertools
module or thepandas
package to group the rows by product for further processing
There are many many ways to solve the challenges. Here are some example solutions via two different approaches (i.e. the csv
module vs the pandas
package):