Tải bản đầy đủ (.pdf) (28 trang)

Bài báo cáo tổng hợp môn tin học ứng dụng processing and creating reports on excel

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (4.51 MB, 28 trang )

<span class="text_page_counter">Trang 1</span><div class="page_container" data-page="1">

BỘ GIÁO DỤC VÀ ĐÀO TẠO

<b>TRƯỜNG ĐẠI HỌC SƯ PHẠM KỸ THUẬT TP.HCM </b>KHOA ĐÀO TẠO CHẤT LƯỢNG CAO

<i><b>Sinh viên </b></i><b>thực hiện : </b>Trần Nguyễn Hiền Vân

<b> </b>

<b> </b>TP. Hồ Chí Minh, tháng năm 202<b>8 2 </b>

</div><span class="text_page_counter">Trang 2</span><div class="page_container" data-page="2">

Table of contents

<b>Chapter 1: Searching for data </b>

1.1 Create and retrieve data from world bank databank page, Go to page 1.1.2 Choose 5 favorite countries: Bhutan, Singapore, Japan, Nepal, Vietnam

1.1.3 Selection of 10 criteria in the clustered index include

Access to clean fuels and technologies for cooking (% of population)

1.1.4 Selection of survey time: from 2006 to 2020

<b>Chapter 2: Processing and creating reports on Excel </b>

2.1 Processing data from Excel files

2.2 Create dashboards and reports that match existing data. 2.2.1 Create data reporting dashboard

2.2.2 Calculation of the remaining data (finding sum, max, min, average, vlookup, ...) is done in the Excel file "Data Analytics" including content sentences.

2.2.3 Use Conditional Formatting to highlight (Excel File “Data Conditional Formatting)

<b>Chapter 3: Analyzing processed data on GOOGLE COLAB </b>

3.1 Upload, connect Google drive with Google Colab

3.2 Run the commands to read the file with pandas, then tell the size of the tuple 3.3 Description of the data fields.

3.4 Plot the distribution chart for at least 2 columns, commentEnter code 3.5 Draw at least 2 arbitrary graphs using seaborn, comment

</div><span class="text_page_counter">Trang 3</span><div class="page_container" data-page="3">

<b>Chapter 1: Searching for data </b>

<b>1.1 Create and retrieve data from world bank databank page, Go to page</b>

1.1.1 Select 1 Database out of 86 provided databases. I chose the topic: Sustainable Energy for All

<b>1.1.2 Choose 5 favorite countries: Bhutan, Singapore, Japan, Nepal, Vietnam </b>

<b>1.1.3 Selection of 10 criteria in the clustered index include</b>

<b>Access to clean fuels and technologies for cooking (% of population) </b>

• Access to clean fuels and technologies for cooking, rural (% of rural population) • Access to clean fuels and technologies for cooking, urban (% of urban population) • Access to electricity (% of population)

• Access to electricity, rural (% of rural population) • Access to electricity, urban (% of urban population) • Adjusted savings: carbon dioxide damage (% of GNI) • Adjusted savings: consumption of fixed capital (% of GNI)

• Adjusted net savings, including particulate emission damage (% of GNI) • Adjusted net savings, excluding particulate emission damage (% of GNI)

</div><span class="text_page_counter">Trang 4</span><div class="page_container" data-page="4">

Criteria and indicators students choose to do the test

<b>Criteria for qualitative and quantitative: </b>

I chose the topic of sustainable fuel, because Vietnam is a very polluted country and ranks 2nd in the world in terms of cancer incidence from such agents. Therefore, I want to analyze the data and compare it with the data in Vietnam to show how the developed countries manage clean energy sources, from which some temporary solutions can be given. Vietnam

- Bhutan is my country of choice because this is the place in the world where people want to live, because it has fresh air and and generally agrees with the metrics they use to manage waste.

<b>Criteria for time </b>

Selection period from 2006 - 2020

- I want to analyze the oldest time to the present, the purpose is to find the difference in sustainable energy when the country transforms technology, and as a measure to analyze the change from the past to the present, and How human impact has affected fossil fuel energy resources

- The next criterion for the time I choose is that I want to make it more difficult for myself, try to exploit my full potential by choosing a long time to analyze data instead of following the request of the topic. is a minimum of 10 years.

<b>Chapter 2: Processing and creating reports on Excel </b>

</div><span class="text_page_counter">Trang 5</span><div class="page_container" data-page="5">

From datablank data after selecting requirements and criteria. We export to an excel file with raw data as follows:

<b>2.1 Processing data from Excel files </b>

Step 1: Use the trim function to trim excess data and remove spaces before and after the text in the COUNTRY column

Column A: copied from column B Note:

Trim. function syntax Cell (A1) = Trim (B1)

Inference: Column A data has been filtered to remove spaces before and after the text.

Step 2: Select an empty cell to enter the desired data and format Select data area

</div><span class="text_page_counter">Trang 6</span><div class="page_container" data-page="6">

Display data: no cells found

Inferred: no blank cells and no need to enter data, desired format.

Step 3: Use the ISNUMBER function to check whether the data is text or not

If the function is of the NUMBER format, the ISNUMBER function will display TRUE, indicating that the format is correct.

If the function is in TEXT format, the ISNUMBER function will display FALSE, which indicates that the format is not correct.

Result: When entering the ISNUMBER function, it checks that all cells display TRUE, indicating that the data has been formatted as a policy in NUMBER format.

Step 4: Color the error cells

Select data area

Home -> Condition Formatting -> New Rule -> Format only cell that contains Select Format only cell with -> Errors -> Choose pink format format

Result: Cells in the data range do not contain pink cells (which have error types: #N/A, #VALUA, #REFI, #NULL!)

Inference: all data is not colored so the data does not contain error cells

Step 5: Replace cells with 0 into empty cells

Select File -> Option -> Advanced -> Check Show a zero in cells that have zero value in Display options for this worksheet

</div><span class="text_page_counter">Trang 7</span><div class="page_container" data-page="7">

Result: there are 2 cells containing 0's, H45, I45

Step 6: Replace empty cells with the character “-”

Select Format Cells -> Number -> Accounting -> Symbol (None) Result: Show empty cells as

Step 7: Filter with New Query

Select Data -> New Query -> From File -> From Excel Workbook -> Insert Excel File -> Display a New Query table

Start filtering -> Tick the arrow -> Remove null values -> Oke

</div><span class="text_page_counter">Trang 8</span><div class="page_container" data-page="8">

The obtained data table has the following form

</div><span class="text_page_counter">Trang 9</span><div class="page_container" data-page="9">

The same way to remove null values in the remaining columns. Columns without null values remain the same.

After filtering is complete, export the file

Select Close & Load -> Close & Load to export to an Excel page

</div><span class="text_page_counter">Trang 10</span><div class="page_container" data-page="10">

From there, the data table obtained has the following form:

<b>2.2 Create dashboards and reports that match existing data. 2.2.1 Create data reporting dashboard </b>

Question 1: Draw column chart From Data file create pivot from Access to clean fuels and technologies for cooking (% of population) and Access to clean fuels and technologies for cooking, rural (% of rural population)

</div><span class="text_page_counter">Trang 16</span><div class="page_container" data-page="16">

<b>2.2.2 Calculation of the remaining data (finding sum, max, min, average, vlookup, ...) is done in the Excel file "Data Analytics" including content sentences. </b>

1.Total access by countries from 2006 - 2020

2. Bhutan uses the lowest access to clean fuel and technology for cooking (% of population), what is the data, and similar to other countries

From there, find the country with the lowest rate among countries

3. What is the highest access to clean fuel and technology for cooking (% of population) country, what is the data, and similar to other countries

4. What is the average access to clean fuel and technology for cooking (% of population) country, what is the data, and similar to other countries

5. Find area codes, abbreviations, Serial numbers of countries

6. What is the total amount of access to fuel and cooking technology (% of population) of Bhutan and VietNam, Singapore in 2020

</div><span class="text_page_counter">Trang 17</span><div class="page_container" data-page="17">

average and exponential flattening in 2021

8. Use the exponential leveling method with alpha = 0.2 to make the same time forecast in question 1.

</div><span class="text_page_counter">Trang 18</span><div class="page_container" data-page="18">

9. Compare the results of questions 7 and 8. Which method gives better results.

The result in question 7 has a smaller MSE value than sentence 8. So the result in question 7 will be better

</div><span class="text_page_counter">Trang 19</span><div class="page_container" data-page="19">

<b>2.2.3 Use Conditional Formatting to highlight (Excel File </b>“Data Conditional Formatting)

1. Use Conditional Formatting to highlight cells with more than 20 of Bhutan country

</div><span class="text_page_counter">Trang 20</span><div class="page_container" data-page="20">

2020

5. Use Conditional Formatting to highlight cells 100% of Bhutan country 2006 - 2020

<b>Chapter 3: Analyzing processed data on GOOGLE COLAB 3.1 Upload, connect Google drive with Google Colab </b>

After creating the Excel file in chapter 1 and processing chapter 2. Upload the drive file .csv

Open googledrive, create a folder, create a googlecolab file, and download the file from your computer's library as file.csv

Generate code to connect google drive with google Colab

</div><span class="text_page_counter">Trang 21</span><div class="page_container" data-page="21">

<b>Copy the link into the code </b>

<b>Paste the link into the code </b>

<b>3.2 Run the commands to read the file with pandas, then tell the size of the tuple </b>

</div><span class="text_page_counter">Trang 22</span><div class="page_container" data-page="22">

Data set size includes 14 columns, 5 rows

Includes a column containing data about country, year, and criteria such asAccess to clean fuels and

technologies for cooking (% of population)

• Access to clean fuels and technologies for cooking, rural (% of rural population) • Access to clean fuels and technologies for cooking, urban (% of urban population) • Access to electricity (% of population)

• Access to electricity, rural (% of rural population) • Access to electricity, urban (% of urban population) • Adjusted savings: carbon dioxide damage (% of GNI) • Adjusted savings: consumption of fixed capital (% of GNI)

• Adjusted net savings, including particulate emission damage (% of GNI) • Adjusted net savings, excluding particulate emission damage (% of GNI)

<b>3.3 Description of the data fields. </b>

</div><span class="text_page_counter">Trang 23</span><div class="page_container" data-page="23">

From the above data we can see that there are many different columns and rows.

</div><span class="text_page_counter">Trang 24</span><div class="page_container" data-page="24">

And are sorted much better based on the 10 criteria outlined above

<b>3.4 Plot the distribution chart for at least 2 columns, comment</b>Nhập code

In general, the data input is simpler and the results are concise. On the contrary, in the cleaned data, although the input has many complexities in terms of code, the results are easy to see, the chart plot is relatively harmonious and the colors are stable. And also beautiful results.

The data presented in Chart format is distributed in a harmonious color and the relative proportions are approximately the same.

</div><span class="text_page_counter">Trang 25</span><div class="page_container" data-page="25">

The data has 5 columns showing 5 different countries about Access to clean and technology, in which the highest percentage is Singapore at 100%, followed by Jaapan, Bhutan, Vietnam and the lowest is the country Nepal occupies, respectively. rate of 20%.

<b>3.5 Draw at least 2 arbitrary graphs using seaborn, comment </b>

The data is represented as a column, using the command “sns.boxplot”. From the fairly simple data about Access to electricity but somewhat we understand their figures are in the range of 90 - 100%

</div><span class="text_page_counter">Trang 26</span><div class="page_container" data-page="26">

The data is shown by 4 dotted lines from bottom to top, in general, countries have different over the years, but in general, access to clean fuels and technology over the years has changed and grown from the command " sns.implot”

The data is shown by 4 dotted lines from bottom to top. In general, countries have differences over the years, but in general, access to clean fuels and technology over the years has changed and grown unevenly. from the command “sns.implt”

</div><span class="text_page_counter">Trang 27</span><div class="page_container" data-page="27">

From the command "pairplot" we have an overview of many different shapes, but most of them have small and medium differences.

With the command "sns.heatmap, matrix" we see that there is a difference between the criteria, most of the numbers are from 0.1 to 1 with different parameters.

</div>

×