2.2. bash challenge¶
How to approach the challenge¶
To solve the following problem sets try to find the answers by learning to use the tools that have been introduced to you. Start by reviewing the lectures and tutorials, then move on to google searching for tutorials on the tools or tasks that are being asked. Also try the class chatroom. Please do not turn to AI bots as a first approach. I encourage you to challenge yourself to try to understand the answers that you find. Only then should you optionally turn to a bot to ask if there is a better or faster way to answer the question, and/or to provide an explanation to you of why your answer works.
The assignment¶
Your assignment is to create a new Markdown document on GitHub within your
hack-2-shell
repository from tutorial 2.0. To do this, navigate to the
repository webpage and find the button that says assignment.md
. This will
bring you inside of an interactive Markdown editor where you can type in
code and text and also preview it as rendered text.
Enter your answers to the problems below into this document using Markdown, while following these instructions: (1) Label each question with a header; (2) write/copy the question as plain text; (3) write a description of how you found a solution to the problem in plain text; and finally (4) write a code-block with your solution. You will use your terminal to find solutions to the problems, and then enter your answers into the Markdown document. Below you can see an example response.
Example Response¶
Question X: sort the string¶
Parse the string X into a list of elements and then use the sort
command to
sort them in reverse alphanumeric order. X="apples,bananas,oranges,pancakes"
.
I first looked for shell command to sort items and found the man page of sort
.
I see that it sorts lines of text, so I realized I need to first split the elements of X
onto separate lines. Therefore, I looked for a string splitting command and found
the substitution tool sed
to replace "," with "\n", which effectively splits a
string into separate lines. This program works on files, so I need to feed my
string X to it as if it were a file. I did this using a pipe from the program
echo
which reads text to stdout. Finally, I sorted the substituted text using
sort
with the -r
option to reverse it.
echo $x | sed 's/,/\n/g' | sort -r
pancakes
oranges
bananas
apples
Note
Follow this same format to answer the 4 questions below and save your answers to your markdown document on github. If you get stuck, ask your fellow classmates for help on the shared gitter chatroom page.
1. Get the data files¶
Download the following data files from the internet using the curl command:
http://eaton-lab.org/pdsb/test.fastq.gz
and http://eaton-lab.org/pdsb/iris-data-dirty.csv
. Use the less
or zless
commands to look at each file.
Describe what these commands do.
Finally, use the head
command to print the first 5 lines of the file iris-data-dirty.csv
.
2. Clean the data¶
Use grep
, uniq
, and sed
for this question.
Check that all of the species names are spelled correctly in
the file iris-data-dirty.csv
. Also check for missing values stored as NA. Create a new
file where mispelled names are replaced with the correct values, and lines with NA are
excluded, and save it as iris-data-clean.csv
. Use cut
, sort
and uniq
to list the
number of data values there are for each species in the new cleaned data file. Describe your work.
3. Find a sequence¶
Find how many lines in the data file test.fastq.gz
start with "TGCAG" and end with "GAG". Describe your work.
4. Find a sequence chunk¶
Using grep
and other tools if necessary find all lines that contain the sequence "AAAACCCC" and for each print that line, the line above it, and two lines below it (so that a 4-line chunk around each search hit is printed). Describe your work.
Alert
Do not forget to save your results in the Markdown document in your GitHub hack-2-shell repository. We will be sharing these answers with each other in class next week. If you get stuck on the problems above ask for help from your class mates on the gitter chatroom page.