Python Trimming Scraped Data

Trimming scraped data means removing unwanted or unnecessary parts of the data that you have collected from a web page. For example, you may want to trim whitespace, HTML tags, punctuation marks, or irrelevant text from your scraped data.

One way to trim scraped data using Python is to use Pandas module, which provides various methods for data cleaning and manipulation. Pandas can read HTML tables from a web page and convert them into DataFrame objects, which are tabular structures that can be easily filtered, sorted, and modified.

Another way to trim scraped data using Python is to use BeautifulSoup module, which allows you to parse HTML documents and extract elements based on their tags, attributes, or content. BeautifulSoup can also help you remove HTML tags, convert text into different formats, and handle encoding issues.

When we scrape some text, heading there is a lot of unwanted text (\t, \n, \t, etc.) also get scraped. Trimming is a way to getting rid of that unwanted data. There is a method in python named strip() that will trim all the scraped data.

# Trimming A String
data = "\n\n\n \t  David's Foord and Restaurant \t \n\n\n  "
print(data.strip())

# output
David's Foord and Restaurant

# Trimming List of Strings
data = ["\n\n\n Burger \t   ","\n\t Pizza \t  "]
cleaned_data = [i.strip() for i in data]
print(cleaned_data)

# output
["Burger","Pizza"]

If you have any questions about this code, you can drop a line in comment.

 

Comments

Popular posts from this blog

Python chr() Built in Function

Stock Market Predictions with LSTM in Python

Collections In Python

Python Count Occurrence Of Elements

Python One Liner Functions