Add a section to read and parse data into a DataFrame #610

anuradhawick · 2024-04-11T03:31:57Z

How could the content be improved?

The following section introduce how data can be processed using loops

Automating data processing using For Loops

I believe it would also be advantageous to have a similar section in the following

Here we can briefly introduce python generators as well. For example, consider a CSV file where entries are name, age, location. We can parse this data to a dataframe using a generator. Image location is a comma separated string field and we want to read latitude and longitude separately.

name	age	location
John	50	123341,123321
Emily	25	321321,123321
Wick	35	123341,654789
Raj	40	987789,123321

import csv
import pandas as pd


def transform_lines(csv_path):
    reader = csv.reader(open(csv_path))

    for line_no, line in enumerate(reader):
        if line_no == 0:
            yield ["Name", "Age", "Latitude", "Longitude"]
        else:
            name, age, location = line
            lat, lng = location.split(",")
            yield [name, int(age), float(lat), float(lng)]


lines = transform_lines("./data.csv")
df = pd.DataFrame(lines)

print(df.head())

This is specially useful in large datasets where loading large amount of data in text form is memory consuming.

The text was updated successfully, but these errors were encountered:

quist00 · 2024-04-19T23:55:41Z

Hi, @anuradhawick
Thanks for taking time to suggest this modification. It definitely addresses an issue that many researchers will likely incur at some point. That said, what are your thoughts though on this being a good match for potentially absolute beginners. I fear that if someone is brand new to all this, there is a lot of automagical stuff introduced by the yield keyword that might be a bridge to far for some to wrap their head around. It might be a better fit for the instructor notes. Also, there is the potential for a community based re-write (https://carpentries.slack.com/archives/C03LE48AY/p1711535383742769) so I will likely table any major changes like this until that is settled one way or another. If you wanted to do a PR to put it into the instructor section prior to that, though, I would be happy to consider it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a section to read and parse data into a DataFrame #610

Add a section to read and parse data into a DataFrame #610

anuradhawick commented Apr 11, 2024

quist00 commented Apr 19, 2024

Add a section to read and parse data into a DataFrame #610

Add a section to read and parse data into a DataFrame #610

Comments

anuradhawick commented Apr 11, 2024

How could the content be improved?

quist00 commented Apr 19, 2024