Skip to content

2.2. bash challenge

How to approach the challenge

To solve the following problem sets try to find the answers by learning to use the tools that have been introduced to you. Start by reviewing the lectures and tutorials, then move on to google searching for tutorials on the tools or tasks that are being asked. Also try the class chatroom. Please do not turn to AI bots as a first approach. I encourage you to challenge yourself to try to understand the answers that you find. Only then should you optionally turn to a bot to ask if there is a better or faster way to answer the question, and/or to provide an explanation to you of why your answer works.

The assignment

Your assignment is to create a new Markdown document on GitHub within your hack-2-shell repository from tutorial 2.0. To do this, navigate to the repository webpage and find the button that says Add file, and then select "Create new file". Name the file assignment.md. This will bring you inside of an interactive Markdown editor where you can type in code and text and also preview it as rendered text.

Enter your answers to the problems below into this document using Markdown, while following these instructions: (1) Label each question with a header; (2) write/copy the question as plain text; (3) write a description of how you found a solution to the problem in plain text; and finally (4) write a code-block with your solution. You will use your terminal to find solutions to the problems, and then enter your answers into the Markdown document. Below you can see an example response.

Example Response

Question X: sort the string

Parse the string X into a list of elements and then use the sort command to sort them in reverse alphanumeric order. X="apples,bananas,oranges,pancakes".

I first looked for shell command to sort items and found the man page of sort. I see that it sorts lines of text, so I realized I need to first split the elements of X onto separate lines. Therefore, I looked for a string splitting command and found the substitution tool sed to replace "," with "\n", which effectively splits a string into separate lines. This program works on files, so I need to feed my string X to it as if it were a file. I did this using a pipe from the program echo which reads text to stdout. Finally, I sorted the substituted text using sort with the -r option to reverse it.

echo $x | sed 's/,/\n/g' | sort -r
pancakes
oranges
bananas
apples

Note

Follow this same format to answer the 4 questions below and save your answers to your markdown document on github. If you get stuck, ask your fellow classmates for help on the shared gitter chatroom page.

1. Get the data files

Download the following data files from the internet using the curl command: http://eaton-lab.org/pdsb/test.fastq.gz and http://eaton-lab.org/pdsb/iris-data-dirty.csv. Use the less or zless commands to look at each file. Describe what these commands do. Finally, use the head command to print the first 5 lines of the file iris-data-dirty.csv.

2. Clean the data

Use grep, uniq, and sed for this question. Check that all of the species names are spelled correctly in the file iris-data-dirty.csv. Also check for missing values stored as NA. Create a new file where mispelled names are replaced with the correct values, and lines with NA are excluded, and save it as iris-data-clean.csv. Use cut, sort and uniq to list the number of data values there are for each species in the new cleaned data file. Describe your work.

3. Find a sequence

Find how many lines in the data file test.fastq.gz start with "TGCAG" and end with "GAG". Describe your work.

4. Find a sequence chunk

Using grep and other tools if necessary find all lines that contain the sequence "AAAACCCC" and for each print that line, the line above it, and two lines below it (so that a 4-line chunk around each search hit is printed). Describe your work.

Alert

Do not forget to save your results in the Markdown document in your GitHub hack-2-shell repository. We will be sharing these answers with each other in class next week. If you get stuck on the problems above ask for help from your class mates on the gitter chatroom page.